Gallerie d’exemples¶
Parallelization of a dot product with processes (joblib)
Compares matrix multiplication implementations with timeit
Associativity and matrix multiplication
Parallelization of a dot product with processes (concurrent.futures)
Compares dot implementations (numpy, python, blas)
Measuring CPU performance with a vector sum
Compares filtering implementations (numpy, cython)
Measuring CUDA performance with a vector addition
Compares dot implementations (numpy, c++, sse, openmp)
Measuring CUDA performance with a vector sum
Measuring CPU performance with a parallelized vector sum
Compares implementations for a Piecewise Linear
Measuring CUDA performance with a vector addition with streams
Export a LLAMA model into ONNX
Compares dot implementations (numpy, cython, c++, sse)
Measuring CPU performance with a parallelized vector sum and AVX
Compares matrix multiplication implementations