onnx_diagnostic.reference

ExtendedReferenceEvaluator

class onnx_diagnostic.reference.ExtendedReferenceEvaluator(proto: Any, opsets: Dict[str, int] | None = None, functions: List[ReferenceEvaluator | FunctionProto] | None = None, verbose: int = 0, new_ops: List[type[OpRun]] | None = None, **kwargs)[source][source]

This class replaces the python implementation by custom implementation. The evaluator allows to test scenarios outside what an onnx backend bound to the official onnx operators definition could do such as optimization patterns involving onnxruntime contrib operators.

from onnx_diagnostic.reference import ExtendedReferenceEvaluator
ref = ExtendedReferenceEvaluator(...)

The class overloads or adds the following operators by default:

<<<

import pprint
from onnx_diagnostic.reference import ExtendedReferenceEvaluator

pprint.pprint(ExtendedReferenceEvaluator.default_ops)

>>>

    [<class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.AddAdd'>,
     <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.AddMul'>,
     <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.AddSharedInput'>,
     <class 'onnx_diagnostic.reference.ops.op_attention.Attention'>,
     <class 'onnx_diagnostic.reference.ops.op_average_pool_grad.AveragePoolGrad'>,
     <class 'onnx_diagnostic.reference.ops.op_bias_softmax.BiasSoftmax'>,
     <class 'onnx_diagnostic.reference.ops.op_concat.Concat'>,
     <class 'onnx_diagnostic.reference.ops.op_cast_like.CastLike_15'>,
     <class 'onnx_diagnostic.reference.ops.op_cast_like.CastLike_19'>,
     <class 'onnx_diagnostic.reference.ops.op_complex.ComplexModule'>,
     <class 'onnx_diagnostic.reference.ops.op_constant_of_shape.ConstantOfShape'>,
     <class 'onnx_diagnostic.reference.ops.op_fused_matmul.FusedMatMul'>,
     <class 'onnx_diagnostic.reference.ops.op_gather.Gather'>,
     <class 'onnx_diagnostic.reference.ops.op_gather_elements.GatherElements'>,
     <class 'onnx_diagnostic.reference.ops.op_gather_grad.GatherGrad'>,
     <class 'onnx_diagnostic.reference.ops.op_scatternd_of_shape.MaskedScatterNDOfShape'>,
     <class 'onnx_diagnostic.reference.ops.op_memcpy_host.MemcpyFromHost'>,
     <class 'onnx_diagnostic.reference.ops.op_memcpy_host.MemcpyToHost'>,
     <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.MulAdd'>,
     <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.MulMul'>,
     <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.MulSharedInput'>,
     <class 'onnx_diagnostic.reference.ops.op_mul_sigmoid.MulSigmoid'>,
     <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.MulSub'>,
     <class 'onnx_diagnostic.reference.ops.op_negxplus1.NegXplus1'>,
     <class 'onnx_diagnostic.reference.ops.op_qlinear_conv.QLinearConv'>,
     <class 'onnx_diagnostic.reference.ops.op_qlinear_average_pool.QLinearAveragePool'>,
     <class 'onnx_diagnostic.reference.ops.op_quick_gelu.QuickGelu'>,
     <class 'onnx_diagnostic.reference.ops.op_replace_zero.ReplaceZero'>,
     <class 'onnx_diagnostic.reference.ops.op_rotary.Rotary'>,
     <class 'onnx_diagnostic.reference.ops.op_scan.Scan'>,
     <class 'onnx_diagnostic.reference.ops.op_scatter_elements.ScatterElements'>,
     <class 'onnx_diagnostic.reference.ops.op_scatternd_of_shape.ScatterNDOfShape'>,
     <class 'onnx_diagnostic.reference.ops.op_simplified_layer_normalization.SimplifiedLayerNormalization'>,
     <class 'onnx_diagnostic.reference.ops.op_skip_layer_normalization.SkipLayerNormalization'>,
     <class 'onnx_diagnostic.reference.ops.op_slice.Slice_1'>,
     <class 'onnx_diagnostic.reference.ops.op_slice.Slice_10'>,
     <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.SubMul'>,
     <class 'onnx_diagnostic.reference.ops.op_complex.ToComplex'>,
     <class 'onnx_diagnostic.reference.ops.op_transpose_cast.Transpose2DCastFP16'>,
     <class 'onnx_diagnostic.reference.ops.op_transpose_cast.Transpose2DCastFP32'>,
     <class 'onnx_diagnostic.reference.ops.op_tri_matrix.TriMatrix'>]
run(*args, **kwargs)[source][source]

See onnx.reference.ReferenceEvaluator.run().

OnnxruntimeEvaluator

class onnx_diagnostic.reference.OnnxruntimeEvaluator(proto: str | FunctionProto | ModelProto | GraphProto | NodeProto | OnnxruntimeEvaluator, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool = False, verbose: int = 0, local_functions: Dict[Tuple[str, str], FunctionProto | ModelProto | GraphProto | NodeProto | OnnxruntimeEvaluator] | None = None, ir_version: int = 10, opsets: int | Dict[str, int] | None = None, whole: bool = False)[source][source]

This class loads an onnx model and the executes one by one the nodes with onnxruntime. This class is mostly meant for debugging.

Parameters:
  • proto – proto or filename

  • session_options – options

  • providers – providers

  • nvtx – enable nvidia events

  • providersNone, “CPU”, “CUDA” or a list of providers

  • graph_optimization_level – see onnxruntime.SessionOptions

  • log_severity_level – see onnxruntime.SessionOptions

  • log_verbosity_level – see onnxruntime.SessionOptions

  • optimized_model_filepath – see onnxruntime.SessionOptions

  • disable_aot_function_inlining – see onnxruntime.SessionOptions

  • use_training_api – use onnxruntime-traning API

  • verbose – verbosity

  • local_functions – additional local function

  • ir_version – ir version to use when unknown

  • opsets – opsets to use when unknown

  • whole – if True, do not split node by node

property input_names: List[str]

Returns input names.

property input_types: List[TypeProto]

Returns input types.

property output_names: List[str]

Returns output names.

property output_types: List[TypeProto]

Returns output types.

run(outputs: List[str] | None, feed_inputs: Dict[str, Any], intermediate: bool = False) Dict[str, Any] | List[Any][source][source]

Runs the model. It only works with numpy arrays.

Parameters:
  • outputs – required outputs or None for all

  • feed_inputs – inputs

  • intermediate – returns all output instead of the last ones

Returns:

outputs, as a list if return_all is False, as a dictionary if return_all is True

TorchOnnxEvaluator

class onnx_diagnostic.reference.TorchOnnxEvaluator(proto: FunctionProto | GraphProto | ModelProto, providers: Tuple[str, ...] = ('CPUExecutionProvider',), opsets: Dict[str, int] | None = None, local_functions: Dict[Tuple[str, str], TorchOnnxEvaluator] | None = None, verbose: int = 0)[source][source]

Torch evaluator for onnx models. The model does not stores the original proto it evaluates to avoid

Parameters:
  • proto – a proto

  • providers – where to run the model

  • opsets – needed if proto is a graph

  • functions – known local functions

  • verbose – verbosity level

The class holds the following attributes:

  • providers: providers

  • default_device: default torch device

  • constants: all initializers or constants

  • kernels: kernels

  • runtime_info: produced by first_used_last_used

  • last_used: contains the list of intermediate results,

    to remove after every node execution, this avoid the memory to grow too much

  • functions: local functions

The class is not multithreaded. runtime_info gets updated by the the class. The list of available kernels is returned by function onnx_diagnostic.reference.torch_evaluator.get_kernels(). Example:

<<<

import onnx
import onnx.helper as oh
import torch
from onnx_diagnostic.helpers import string_type
from onnx_diagnostic.reference import TorchOnnxEvaluator

TFLOAT = onnx.TensorProto.FLOAT

proto = oh.make_model(
    oh.make_graph(
        [
            oh.make_node("Sigmoid", ["Y"], ["sy"]),
            oh.make_node("Mul", ["Y", "sy"], ["ysy"]),
            oh.make_node("Mul", ["X", "ysy"], ["final"]),
        ],
        "-nd-",
        [
            oh.make_tensor_value_info("X", TFLOAT, [1, "b", "c"]),
            oh.make_tensor_value_info("Y", TFLOAT, ["a", "b", "c"]),
        ],
        [oh.make_tensor_value_info("final", TFLOAT, ["a", "b", "c"])],
    ),
    opset_imports=[oh.make_opsetid("", 18)],
    ir_version=9,
)

sess = TorchOnnxEvaluator(proto)
feeds = dict(X=torch.rand((4, 5)), Y=torch.rand((4, 5)))
result = sess.run(None, feeds)
print(string_type(result, with_shape=True, with_min_max=True))

>>>

    #1[T1s4x5[0.003964880481362343,0.3176541030406952:A0.12865978828631341]]

Adding verbose=1 shows which kernels is executed:

<<<

import onnx
import onnx.helper as oh
import torch
from onnx_diagnostic.helpers import string_type
from onnx_diagnostic.reference import TorchOnnxEvaluator

TFLOAT = onnx.TensorProto.FLOAT

proto = oh.make_model(
    oh.make_graph(
        [
            oh.make_node("Sigmoid", ["Y"], ["sy"]),
            oh.make_node("Mul", ["Y", "sy"], ["ysy"]),
            oh.make_node("Mul", ["X", "ysy"], ["final"]),
        ],
        "-nd-",
        [
            oh.make_tensor_value_info("X", TFLOAT, [1, "b", "c"]),
            oh.make_tensor_value_info("Y", TFLOAT, ["a", "b", "c"]),
        ],
        [oh.make_tensor_value_info("final", TFLOAT, ["a", "b", "c"])],
    ),
    opset_imports=[oh.make_opsetid("", 18)],
    ir_version=9,
)

sess = TorchOnnxEvaluator(proto, verbose=1)
feeds = dict(X=torch.rand((4, 5)), Y=torch.rand((4, 5)))
result = sess.run(None, feeds)
print(string_type(result, with_shape=True, with_min_max=True))

>>>

    +I X: RuntimeValue(name='X', kind=5, shape=(4, 5), value=CT1s4x5[0.0803823471069336,0.9955793023109436:A0.44844584465026854])
    +I Y: RuntimeValue(name='Y', kind=5, shape=(4, 5), value=CT1s4x5[0.06577038764953613,0.9594824314117432:A0.4405914843082428])
    Sigmoid_6(Y) -> sy
    +R sy: RuntimeValue(name='sy', kind=1, shape=(4, 5), is_shape=False, value=CT1s4x5[0.5164366960525513,0.7230181694030762:A0.6060050100088119])
    Mul_1(Y, sy) -> ysy
    +R ysy: RuntimeValue(name='ysy', kind=1, shape=(4, 5), is_shape=False, value=CT1s4x5[0.033966243267059326,0.693723201751709:A0.2866501869633794])
    - clean Y
    - clean sy
    Mul_1(X, ysy) -> final
    +R final: RuntimeValue(name='final', kind=9, shape=(4, 5), is_shape=False, value=CT1s4x5[0.010385666973888874,0.30979418754577637:A0.10924520436674356])
    - clean X
    - clean ysy
    ++ outputs final
    - clean X
    - clean Y
    - clean final
    #1[T1s4x5[0.010385666973888874,0.30979418754577637:A0.10924520436674356]]

It also shows when a result is not needed anymore. In that case, it is deleted to free the memory it takes. The runtime can also execute the kernel the onnx model on CUDA. It follows the same logic as onnxruntime.InferenceSession: providers=["CUDAExecutionProvider"]. It is better in that case to move the input on CUDA. The class tries to move every weight on CUDA but tries to keep any tensor identified as a shape in CPU. Some bugs may remain as torch raises an exception when devices are expected to be the same. The runtime was validated with model arnir0/Tiny-LLM.

class IO(name: str, type: int, shape: Tuple[int | str, ...])[source][source]
get_inputs()[source][source]

Same API than onnxruntime.

get_outputs()[source][source]

Same API than onnxruntime.

property on_cuda: bool

Tells if the default device is CUDA.

run(outputs: List[str] | None, feeds: Dict[str, Tensor] | Dict[str, ndarray]) List[Tensor | None] | List[ndarray | None][source][source]

Runs the ONNX model.

Parameters:
  • outputs – outputs required

  • feeds – inputs

Returns:

output tensors.

run_with_values(*args: OpRunTensor | None, context: Dict[str, RuntimeValue] | None = None) OpRunValue | Tuple[OpRunValue, ...][source][source]

Runs the ONNX model.

Parameters:
  • args – inputs

  • context – local context for the execution of subgraphs

Returns:

output OpRunTensor

Other functions