Note
Go to the end to download the full example code.
Compares matrix multiplication implementations with timeit¶
numpy has a very fast implementation of
matrix multiplication. There are many ways to be slower.
The following uses timeit
to compare implementations.
Compared implementations:
Preparation¶
import timeit
import numpy
from teachcompute.validation.cython.td_mul_cython import (
multiply_matrix,
c_multiply_matrix,
c_multiply_matrix_parallel,
c_multiply_matrix_parallel_transposed as cmulparamtr,
)
va = numpy.random.randn(150, 100).astype(numpy.float64)
vb = numpy.random.randn(100, 100).astype(numpy.float64)
ctx = {
"va": va,
"vb": vb,
"c_multiply_matrix": c_multiply_matrix,
"multiply_matrix": multiply_matrix,
"c_multiply_matrix_parallel": c_multiply_matrix_parallel,
"c_multiply_matrix_parallel_transposed": cmulparamtr,
}
Measures¶
numpy
res0 = timeit.timeit("va @ vb", number=100, globals=ctx)
print("numpy time", res0)
numpy time 0.027860324999892327
python implementation
res1 = timeit.timeit("multiply_matrix(va, vb)", number=10, globals=ctx)
print("python implementation", res1)
python implementation 5.810410273999878
cython implementation
res2 = timeit.timeit("c_multiply_matrix(va, vb)", number=100, globals=ctx)
print("cython implementation", res2)
cython implementation 0.10331855200001883
cython implementation parallelized
res3 = timeit.timeit("c_multiply_matrix_parallel(va, vb)", number=100, globals=ctx)
print("cython implementation parallelized", res3)
cython implementation parallelized 0.04946396800005459
cython implementation parallelized, AVX + transposed
res4 = timeit.timeit(
"c_multiply_matrix_parallel_transposed(va, vb)", number=100, globals=ctx
)
print("cython implementation parallelized avx", res4)
cython implementation parallelized avx 0.012492371999996976
Speed up…
numpy is 208.555007 faster than pure python.
numpy is 3.708447 faster than cython.
numpy is 1.775427 faster than parallelized cython.
numpy is 0.448393 faster than avx parallelized cython.
Total running time of the script: (0 minutes 6.022 seconds)