Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
onnx-extended 0.4.0 documentation
Logo
onnx-extended 0.4.0 documentation

Contents

  • Tutorial
    • CReferenceEvaluator
    • Cython Binding of onnxruntime
    • Custom Kernels for onnxruntime
    • Focus on operators optimization
      • Using C implementation of operator Conv
      • How float format has an impact on speed computation
      • Measuring Gemm performance with different input and output tests
      • Measuring performance about Gemm with onnxruntime
      • Profiles a simple onnx graph including a singleGemm
      • Compares implementations of Einsum
      • Fusing multiplication operators on CUDA
      • TreeEnsemble optimization
      • TreeEnsemble, dense, and sparse
    • Many Tools to help investigating issues
      • External Data and Big Models
      • Onnx Manipulations
      • Quantization
      • Statistics
      • Profiling onnxruntime
      • Debug Intermediate Results
      • Compare multiple versions of onnxruntime
      • Trees
    • Build from source
      • Build with cython
      • Build with pybind11
      • Build with CUDA
      • Build with onnxruntime
      • Readings
    • Experiments about parallelization
      • Measuring CPU performance
  • command lines
  • API
    • onnx_extended.__init__.py
    • onnx_extended.ext_test_case
    • onnx_extended.memory_peak
    • onnx_extended.helper
    • onnx_extended.ortcy
    • onnx_extended.ortops
      • onnx_extended.ortops.tutorial.cpu
      • onnx_extended.ortops.tutorial.cuda
      • onnx_extended.ortops.optim.cpu
      • onnx_extended.ortops.optim.cuda
    • onnx_extended.plotting
    • onnx_extended.reference
    • validation
      • validation.cpu
      • validation.cuda
      • validation.bench_trees
      • validation.bench_trees
    • tools
      • onnx_extended.tools.onnx_io
      • onnx_extended.tools.einsum
      • onnx_extended.tools.graph
      • onnx_extended.tools.graph.onnx_graph_transformer
      • onnx_extended.tools.onnx_inline
      • onnx_extended.tools.onnx_nodes
      • onnx_extended.tools.stats_nodes
      • onnx_extended.tools
  • Technical Details
    • Install CUDA on WSL (2)
    • Useful commands on Linux
    • Gemm and storage order
    • 2023-09-05 - version GLIBCXX_3.4.30 not found
  • ONNX Benchmarks
  • Examples Gallery
    • Measuring CPU performance
    • Using C implementation of operator Conv
    • Measuring onnxruntime performance against a cython binding
    • Evaluating random access for sparse
    • Measuring performance of TfIdfVectorizer
    • Measuring Gemm performance with different input and output tests
    • Gemm Exploration with CUDA
    • Fuse Tranpose and Cast on CUDA
    • Compares implementations of Einsum
    • Fusing multiplication operators on CUDA
    • How float format has an impact on speed computation
    • TreeEnsemble optimization
    • Optimizing ScatterND operator on CUDA
    • Optimizing Masked ScatterND operator on CUDA
    • TreeEnsemble, dense, and sparse
    • Profiles a simple onnx graph including a singleGemm
    • Measuring performance about Gemm with onnxruntime
    • Evaluate different implementation of TreeEnsemble

More

  • Change Logs
  • LICENSE
Back to top
View this page

Examples Gallery¶

Measuring CPU performance

Measuring CPU performance

Using C implementation of operator Conv

Using C implementation of operator Conv

Measuring onnxruntime performance against a cython binding

Measuring onnxruntime performance against a cython binding

Evaluating random access for sparse

Evaluating random access for sparse

Measuring performance of TfIdfVectorizer

Measuring performance of TfIdfVectorizer

Measuring Gemm performance with different input and output tests

Measuring Gemm performance with different input and output tests

Gemm Exploration with CUDA

Gemm Exploration with CUDA

Fuse Tranpose and Cast on CUDA

Fuse Tranpose and Cast on CUDA

Compares implementations of Einsum

Compares implementations of Einsum

Fusing multiplication operators on CUDA

Fusing multiplication operators on CUDA

How float format has an impact on speed computation

How float format has an impact on speed computation

TreeEnsemble optimization

TreeEnsemble optimization

Optimizing ScatterND operator on CUDA

Optimizing ScatterND operator on CUDA

Optimizing Masked ScatterND operator on CUDA

Optimizing Masked ScatterND operator on CUDA

TreeEnsemble, dense, and sparse

TreeEnsemble, dense, and sparse

Profiles a simple onnx graph including a singleGemm

Profiles a simple onnx graph including a singleGemm

Measuring performance about Gemm with onnxruntime

Measuring performance about Gemm with onnxruntime

Evaluate different implementation of TreeEnsemble

Evaluate different implementation of TreeEnsemble

Download all examples in Python source code: auto_examples_python.zip

Download all examples in Jupyter notebooks: auto_examples_jupyter.zip

Gallery generated by Sphinx-Gallery

Next
Measuring onnxruntime performance against a cython binding
Previous
ONNX Benchmarks
Copyright © 2023-2024, Xavier Dupré
Made with Sphinx and @pradyunsg's Furo