Gallerie d’exemples¶
Parallelization of a dot product with processes (joblib)
Associativity and matrix multiplication
Compares matrix multiplication implementations with timeit
Parallelization of a dot product with processes (concurrent.futures)
Compares dot implementations (numpy, python, blas)
Measuring CPU performance with a vector sum
Compares filtering implementations (numpy, cython)
Measuring CUDA performance with a vector addition
Measuring CPU performance with a parallelized vector sum
Measuring CUDA performance with a vector sum
Compares dot implementations (numpy, c++, sse, openmp)
Measuring CUDA performance with a vector addition with streams
Compares implementations for a Piecewise Linear
Measuring CPU performance with a parallelized vector sum and AVX
Export a LLAMA model into ONNX
Compares dot implementations (numpy, cython, c++, sse)
Compares matrix multiplication implementations