Compares matrix multiplication implementations with timeit

numpy has a very fast implementation of matrix multiplication. There are many ways to be slower. The following uses timeit to compare implementations.

Compared implementations:

Preparation

import timeit
import numpy

from teachcompute.validation.cython.td_mul_cython import (
    multiply_matrix,
    c_multiply_matrix,
    c_multiply_matrix_parallel,
    c_multiply_matrix_parallel_transposed as cmulparamtr,
)

va = numpy.random.randn(150, 100).astype(numpy.float64)
vb = numpy.random.randn(100, 100).astype(numpy.float64)
ctx = {
    "va": va,
    "vb": vb,
    "c_multiply_matrix": c_multiply_matrix,
    "multiply_matrix": multiply_matrix,
    "c_multiply_matrix_parallel": c_multiply_matrix_parallel,
    "c_multiply_matrix_parallel_transposed": cmulparamtr,
}

Measures

numpy

res0 = timeit.timeit("va @ vb", number=100, globals=ctx)
print("numpy time", res0)
numpy time 0.023710604999905627

python implementation

res1 = timeit.timeit("multiply_matrix(va, vb)", number=10, globals=ctx)
print("python implementation", res1)
python implementation 5.634933213000295

cython implementation

res2 = timeit.timeit("c_multiply_matrix(va, vb)", number=100, globals=ctx)
print("cython implementation", res2)
cython implementation 0.10908522800036735

cython implementation parallelized

res3 = timeit.timeit("c_multiply_matrix_parallel(va, vb)", number=100, globals=ctx)
print("cython implementation parallelized", res3)
cython implementation parallelized 0.11675325099986367

cython implementation parallelized, AVX + transposed

res4 = timeit.timeit(
    "c_multiply_matrix_parallel_transposed(va, vb)", number=100, globals=ctx
)
print("cython implementation parallelized avx", res4)
cython implementation parallelized avx 0.00955157299995335

Speed up…

print(f"numpy is {res1 / res0:f} faster than pure python.")
print(f"numpy is {res2 / res0:f} faster than cython.")
print(f"numpy is {res3 / res0:f} faster than parallelized cython.")
print(f"numpy is {res4 / res0:f} faster than avx parallelized cython.")
numpy is 237.654552 faster than pure python.
numpy is 4.600694 faster than cython.
numpy is 4.924094 faster than parallelized cython.
numpy is 0.402840 faster than avx parallelized cython.

Total running time of the script: (0 minutes 5.980 seconds)

Gallery generated by Sphinx-Gallery