Custom Kernels for onnxruntime¶
onnxruntime implements a C API which allows the user to add custom implementation for any new operator. This mechanism is described on onnxruntime documentation Custom operators. This packages implements a couple of custom operators for CPU and GPU (NVIDIA). The first steps is to register an assembly to let onnxruntime use them.
from onnxruntime import InferenceSession, SessionOptions
from onnx_extended.ortops.optim.cpu import get_ort_ext_libs
opts = SessionOptions()
opts.register_custom_ops_library(get_ort_ext_libs()[0])
sess = InferenceSession(
"<model_name_or_bytes>", opts, providers=[..., "CPUExecutionProvider"]
)
It supports any onnxruntime C API greater than version:
<<<
from onnx_extended.ortcy.wrap.ortinf import get_ort_c_api_supported_version
print(get_ort_c_api_supported_version())
>>>
16
Next section introduces the list of operators and assemblies this package implements.
onnx_extended.ortops.tutorial.cpu¶
<<<
from onnx_extended.ortops.tutorial.cpu import get_ort_ext_libs
print(get_ort_ext_libs())
>>>
['/home/xadupre/github/onnx-extended/onnx_extended/ortops/tutorial/cpu/libortops_tutorial_cpu.so']
onnx_extended.ortops.tutorial.cpu.CustomGemmFloat¶
Implements operator Gemm for float type. Operator CustomGemmFloat16 implements the same for the float 16. CustomGemmFloat8E4M3FN allows Float8E4M3FN as inputs and floats as outputs.
Provider
CPUExecutionProvider
Attributes
to: quantized type
Inputs
A (T1): tensor of type T
B (T1): tensor of type T
C (T2): tensor of type T (optional)
scaleA (TF): scale for A (optional)
scaleB (TF): scale for B (optional)
scale (TF): scale for the result (optional)
Outputs
Y (T2): result of Gemm
Constraints
T1: float, float 16 or Float8E4M3FN
T2: float or float 16
TF: float
onnx_extended.ortops.tutorial.cpu.DynamicQuantizeLinear¶
Implements DynamicQuantizeLinear opset 20.
Provider
CPUExecutionProvider
Attributes
to: quantized type
Inputs
X (T1): tensor of type T
Outputs
Y (T2): quantized X
scale (TS): scale
Y (T2): zero point
Constraints
T1: float, float 16
TS: float
T2: int8, uint8, float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz
onnx_extended.ortops.tutorial.cpu.MyCustomOp¶
It does the sum of two tensors.
Provider
CPUExecutionProvider
Inputs
X (T): tensor of type T
Y (T): tensor of type T
Outputs
Z (T): addition of X, Y
Constraints
T: float
onnx_extended.ortops.tutorial.cpu.MyCustomOpWithAttributes¶
It does the sum of two tensors + a constant equal to cst = att_float + att_int64 + att_string[0] + att_tensot[0].
Provider
CPUExecutionProvider
Attributes
att_float: a float
att_int64: an integer
att_tensor: a tensor of any type and shape
att_string: a string
Inputs
X (T): tensor of type T
Y (T): tensor of type T
Outputs
Z (T): addition of X, Y + cst
Constraints
T: float
onnx_extended.ortops.tutorial.cuda¶
<<<
from onnx_extended.ortops.tutorial.cuda import get_ort_ext_libs
try:
print(get_ort_ext_libs())
except AssertionError as e:
print(f"CUDA is not enabled: {e}")
>>>
['/home/xadupre/github/onnx-extended/onnx_extended/ortops/tutorial/cuda/libortops_tutorial_cuda.so']
onnx_extended.ortops.tutorial.cuda.CustomGemm¶
It calls CUDA library for Gemm .
Provider
CUDAExecutionProvider
Inputs
A (T): tensor of type T
B (T): tensor of type T
C (T): tensor of type T
D (T): tensor of type T
E (T): tensor of type T
Outputs
Z (T):
Constraints
T: float, float16, bfloat16
onnx_extended.ortops.optim.cpu¶
<<<
from onnx_extended.ortops.optim.cpu import get_ort_ext_libs
print(get_ort_ext_libs())
>>>
['/home/xadupre/github/onnx-extended/onnx_extended/ortops/optim/cpu/libortops_optim_cpu.so']
onnx_extended.ortops.option.cpu.DenseToSparse¶
Converts a dense tensor into a sparse one. All null values are skipped.
Provider
CPUExecutionProvider
Inputs
X (T): 2D tensor
Outputs
Y (T): 1D tensor
Constraints
T: float
onnx_extended.ortops.option.cpu.SparseToDense¶
Converts a spadenserse tensor into a sparse one. All missing values are replaced by 0.
Provider
CPUExecutionProvider
Inputs
X (T): 1D tensor
Outputs
Y (T): 2D tensor
Constraints
T: float
onnx_extended.ortops.option.cpu.TfIdfVectorizer¶
Implements TfIdfVectorizer.
Provider
CPUExecutionProvider
Attributes
See onnx TfIdfVectorizer. The implementation does not support string labels. It is adding one attribute.
sparse: INT64, default is 0, the output and the computation are sparse, see
Inputs
X (T1): tensor of type T1
Outputs
label (T3): labels of type T3
Y (T2): probabilities of type T2
Constraints
T1: float, double
T2: float, double
T3: int64
onnx_extended.ortops.option.cpu.TreeEnsembleClassifier¶
It does the sum of two tensors.
Provider
CPUExecutionProvider
Attributes
See onnx TreeEnsembleClassifier. The implementation does not support string labels. The only change:
nodes_modes: string contenation with ,
Inputs
X (T1): tensor of type T1
Outputs
label (T3): labels of type T3
Y (T2): probabilities of type T2
Constraints
T1: float, double
T2: float, double
T3: int64
onnx_extended.ortops.option.cpu.TreeEnsembleClassifierSparse¶
It does the sum of two tensors.
Provider
CPUExecutionProvider
Attributes
See onnx TreeEnsembleClassifier. The implementation does not support string labels. The only change:
nodes_modes: string contenation with ,
Inputs
X (T1): tensor of type T1 (sparse)
Outputs
label (T3): labels of type T3
Y (T2): probabilities of type T2
Constraints
T1: float, double
T2: float, double
T3: int64
onnx_extended.ortops.option.cpu.TreeEnsembleRegressor¶
It does the sum of two tensors.
Provider
CPUExecutionProvider
Attributes
See onnx TreeEnsembleRegressor. The only change:
nodes_modes: string contenation with ,
Inputs
X (T1): tensor of type T1
Outputs
Y (T2): prediction of type T2
Constraints
T1: float, double
T2: float, double
onnx_extended.ortops.option.cpu.TreeEnsembleRegressorSparse¶
It does the sum of two tensors.
Provider
CPUExecutionProvider
Attributes
See onnx TreeEnsembleRegressor. The only change:
nodes_modes: string contenation with ,
Inputs
X (T1): tensor of type T1 (sparse)
Outputs
Y (T2): prediction of type T2
Constraints
T1: float, double
T2: float, double