Custom Kernels for onnxruntime

onnxruntime implements a C API which allows the user to add custom implementation for any new operator. This mechanism is described on onnxruntime documentation Custom operators. This packages implements a couple of custom operators for CPU and GPU (NVIDIA). The first steps is to register an assembly to let onnxruntime use them.

from onnxruntime import InferenceSession, SessionOptions
from onnx_extended.ortops.optim.cpu import get_ort_ext_libs

opts = SessionOptions()
opts.register_custom_ops_library(get_ort_ext_libs()[0])

sess = InferenceSession(
    "<model_name_or_bytes>", opts, providers=[..., "CPUExecutionProvider"]
)

It supports any onnxruntime C API greater than version:

<<<

from onnx_extended.ortcy.wrap.ortinf import get_ort_c_api_supported_version

print(get_ort_c_api_supported_version())

>>>

    16

Next section introduces the list of operators and assemblies this package implements.

onnx_extended.ortops.tutorial.cpu

<<<

from onnx_extended.ortops.tutorial.cpu import get_ort_ext_libs

print(get_ort_ext_libs())

>>>

    ['/home/xadupre/github/onnx-extended/onnx_extended/ortops/tutorial/cpu/libortops_tutorial_cpu.so']

onnx_extended.ortops.tutorial.cpu.CustomGemmFloat

Implements operator Gemm for float type. Operator CustomGemmFloat16 implements the same for the float 16. CustomGemmFloat8E4M3FN allows Float8E4M3FN as inputs and floats as outputs.

Provider

CPUExecutionProvider

Attributes

  • to: quantized type

Inputs

  • A (T1): tensor of type T

  • B (T1): tensor of type T

  • C (T2): tensor of type T (optional)

  • scaleA (TF): scale for A (optional)

  • scaleB (TF): scale for B (optional)

  • scale (TF): scale for the result (optional)

Outputs

  • Y (T2): result of Gemm

Constraints

  • T1: float, float 16 or Float8E4M3FN

  • T2: float or float 16

  • TF: float

onnx_extended.ortops.tutorial.cpu.DynamicQuantizeLinear

Implements DynamicQuantizeLinear opset 20.

Provider

CPUExecutionProvider

Attributes

  • to: quantized type

Inputs

  • X (T1): tensor of type T

Outputs

  • Y (T2): quantized X

  • scale (TS): scale

  • Y (T2): zero point

Constraints

  • T1: float, float 16

  • TS: float

  • T2: int8, uint8, float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz

onnx_extended.ortops.tutorial.cpu.MyCustomOp

It does the sum of two tensors.

Provider

CPUExecutionProvider

Inputs

  • X (T): tensor of type T

  • Y (T): tensor of type T

Outputs

  • Z (T): addition of X, Y

Constraints

  • T: float

onnx_extended.ortops.tutorial.cpu.MyCustomOpWithAttributes

It does the sum of two tensors + a constant equal to cst = att_float + att_int64 + att_string[0] + att_tensot[0].

Provider

CPUExecutionProvider

Attributes

  • att_float: a float

  • att_int64: an integer

  • att_tensor: a tensor of any type and shape

  • att_string: a string

Inputs

  • X (T): tensor of type T

  • Y (T): tensor of type T

Outputs

  • Z (T): addition of X, Y + cst

Constraints

  • T: float

onnx_extended.ortops.tutorial.cuda

<<<

from onnx_extended.ortops.tutorial.cuda import get_ort_ext_libs

try:
    print(get_ort_ext_libs())
except AssertionError as e:
    print(f"CUDA is not enabled: {e}")

>>>

    ['/home/xadupre/github/onnx-extended/onnx_extended/ortops/tutorial/cuda/libortops_tutorial_cuda.so']

onnx_extended.ortops.tutorial.cuda.CustomGemm

It calls CUDA library for Gemm \alpha A B + \beta C.

Provider

CUDAExecutionProvider

Inputs

  • A (T): tensor of type T

  • B (T): tensor of type T

  • C (T): tensor of type T

  • D (T): tensor of type T

  • E (T): tensor of type T

Outputs

  • Z (T): \alpha A B + \beta C

Constraints

  • T: float, float16, bfloat16

onnx_extended.ortops.optim.cpu

<<<

from onnx_extended.ortops.optim.cpu import get_ort_ext_libs

print(get_ort_ext_libs())

>>>

    ['/home/xadupre/github/onnx-extended/onnx_extended/ortops/optim/cpu/libortops_optim_cpu.so']

onnx_extended.ortops.option.cpu.DenseToSparse

Converts a dense tensor into a sparse one. All null values are skipped.

Provider

CPUExecutionProvider

Inputs

  • X (T): 2D tensor

Outputs

  • Y (T): 1D tensor

Constraints

  • T: float

onnx_extended.ortops.option.cpu.SparseToDense

Converts a spadenserse tensor into a sparse one. All missing values are replaced by 0.

Provider

CPUExecutionProvider

Inputs

  • X (T): 1D tensor

Outputs

  • Y (T): 2D tensor

Constraints

  • T: float

onnx_extended.ortops.option.cpu.TfIdfVectorizer

Implements TfIdfVectorizer.

Provider

CPUExecutionProvider

Attributes

See onnx TfIdfVectorizer. The implementation does not support string labels. It is adding one attribute.

  • sparse: INT64, default is 0, the output and the computation are sparse, see

Inputs

  • X (T1): tensor of type T1

Outputs

  • label (T3): labels of type T3

  • Y (T2): probabilities of type T2

Constraints

  • T1: float, double

  • T2: float, double

  • T3: int64

onnx_extended.ortops.option.cpu.TreeEnsembleClassifier

It does the sum of two tensors.

Provider

CPUExecutionProvider

Attributes

See onnx TreeEnsembleClassifier. The implementation does not support string labels. The only change:

nodes_modes: string contenation with ,

Inputs

  • X (T1): tensor of type T1

Outputs

  • label (T3): labels of type T3

  • Y (T2): probabilities of type T2

Constraints

  • T1: float, double

  • T2: float, double

  • T3: int64

onnx_extended.ortops.option.cpu.TreeEnsembleClassifierSparse

It does the sum of two tensors.

Provider

CPUExecutionProvider

Attributes

See onnx TreeEnsembleClassifier. The implementation does not support string labels. The only change:

nodes_modes: string contenation with ,

Inputs

  • X (T1): tensor of type T1 (sparse)

Outputs

  • label (T3): labels of type T3

  • Y (T2): probabilities of type T2

Constraints

  • T1: float, double

  • T2: float, double

  • T3: int64

onnx_extended.ortops.option.cpu.TreeEnsembleRegressor

It does the sum of two tensors.

Provider

CPUExecutionProvider

Attributes

See onnx TreeEnsembleRegressor. The only change:

nodes_modes: string contenation with ,

Inputs

  • X (T1): tensor of type T1

Outputs

  • Y (T2): prediction of type T2

Constraints

  • T1: float, double

  • T2: float, double

onnx_extended.ortops.option.cpu.TreeEnsembleRegressorSparse

It does the sum of two tensors.

Provider

CPUExecutionProvider

Attributes

See onnx TreeEnsembleRegressor. The only change:

nodes_modes: string contenation with ,

Inputs

  • X (T1): tensor of type T1 (sparse)

Outputs

  • Y (T2): prediction of type T2

Constraints

  • T1: float, double

  • T2: float, double