ONNX Benchmarks#
Shows the list of benchmarks implemented the Examples Gallery.
CPU#
plot_optim_tree_ensemble#
See TreeEnsemble optimization.
This packages implements a custom kernel for
TreeEnsembleRegressor and TreeEnsembleClassifier
and let the users choose the parallelization parameters.
This scripts tries many values to select the best one
for trees trains with scikit-learn and a
sklearn.ensemble.RandomForestRegressor
.
CUDA#
These tests only works if they are run a computer with CUDA enabled.
plot_bench_gemm_f8#
See Measuring Gemm performance with different input and output tests.
The script checks the speed of cublasLtMatmul for various types and dimensions on square matricies. The code is implementation in C++ and does not involve onnxruntime. It checks configurations implemented in cuda_gemm.cu.
- onnx_extended.validation.cuda.cuda_example_py.gemm_benchmark_test(test: int = 0, N: int = 10, m: int = 16, n: int = 16, k: int = 16, lda: int = 16, ldb: int = 16, ldd: int = 16) Dict[str, float] #
Benchmark Gemm on CUDA:param vect: array :param test: a test configuration (int) :param N: number of repetitions :param m: dimensions of the matrices :param n: dimensions of the matrices :param k: dimensions of the matrices :param lda: leading dimension of A :param ldb: leading dimension of B :param ldd: leading dimension of the result :return: metrics in a dictionary
plot_bench_gemm_ort#
See Measuring performance about Gemm with onnxruntime.
The script checks the speed of cublasLtMatmul with a custom operator for onnxruntime and implemented in custom_gemm.cu.
plot_profile_gemm_ort#
See Profiles a simple onnx graph including a singleGemm.
The benchmark profiles the execution of Gemm for different types and configuration. That includes a custom operator only available on CUDA calling function cublasLtMatmul.
No specific provider#
plot_bench_cypy_ort#
See Measuring onnxruntime performance against a cython binding.
The python package for onnxruntime is implemented with pybind11. It is less efficient than cython which makes direct calls to the Python C API. The benchmark evaluates that cost.