onnx_diagnostic.reference¶
modules
ExtendedReferenceEvaluator¶
- class onnx_diagnostic.reference.ExtendedReferenceEvaluator(proto: Any, opsets: Dict[str, int] | None = None, functions: List[ReferenceEvaluator | FunctionProto] | None = None, verbose: int = 0, new_ops: List[type[OpRun]] | None = None, **kwargs)[source][source]¶
This class replaces the python implementation by custom implementation. The evaluator allows to test scenarios outside what an onnx backend bound to the official onnx operators definition could do such as optimization patterns involving onnxruntime contrib operators.
from onnx_diagnostic.reference import ExtendedReferenceEvaluator ref = ExtendedReferenceEvaluator(...)
The class overloads or adds the following operators by default:
<<<
import pprint from onnx_diagnostic.reference import ExtendedReferenceEvaluator pprint.pprint(ExtendedReferenceEvaluator.default_ops)
>>>
[<class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.AddAdd'>, <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.AddMul'>, <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.AddSharedInput'>, <class 'onnx_diagnostic.reference.ops.op_attention.Attention'>, <class 'onnx_diagnostic.reference.ops.op_average_pool_grad.AveragePoolGrad'>, <class 'onnx_diagnostic.reference.ops.op_bias_softmax.BiasSoftmax'>, <class 'onnx_diagnostic.reference.ops.op_concat.Concat'>, <class 'onnx_diagnostic.reference.ops.op_cast_like.CastLike_15'>, <class 'onnx_diagnostic.reference.ops.op_cast_like.CastLike_19'>, <class 'onnx_diagnostic.reference.ops.op_complex.ComplexModule'>, <class 'onnx_diagnostic.reference.ops.op_constant_of_shape.ConstantOfShape'>, <class 'onnx_diagnostic.reference.ops.op_fused_matmul.FusedMatMul'>, <class 'onnx_diagnostic.reference.ops.op_gather.Gather'>, <class 'onnx_diagnostic.reference.ops.op_gather_elements.GatherElements'>, <class 'onnx_diagnostic.reference.ops.op_gather_grad.GatherGrad'>, <class 'onnx_diagnostic.reference.ops.op_scatternd_of_shape.MaskedScatterNDOfShape'>, <class 'onnx_diagnostic.reference.ops.op_memcpy_host.MemcpyFromHost'>, <class 'onnx_diagnostic.reference.ops.op_memcpy_host.MemcpyToHost'>, <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.MulAdd'>, <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.MulMul'>, <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.MulSharedInput'>, <class 'onnx_diagnostic.reference.ops.op_mul_sigmoid.MulSigmoid'>, <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.MulSub'>, <class 'onnx_diagnostic.reference.ops.op_negxplus1.NegXplus1'>, <class 'onnx_diagnostic.reference.ops.op_qlinear_conv.QLinearConv'>, <class 'onnx_diagnostic.reference.ops.op_qlinear_average_pool.QLinearAveragePool'>, <class 'onnx_diagnostic.reference.ops.op_quick_gelu.QuickGelu'>, <class 'onnx_diagnostic.reference.ops.op_replace_zero.ReplaceZero'>, <class 'onnx_diagnostic.reference.ops.op_rotary.Rotary'>, <class 'onnx_diagnostic.reference.ops.op_scan.Scan'>, <class 'onnx_diagnostic.reference.ops.op_scatter_elements.ScatterElements'>, <class 'onnx_diagnostic.reference.ops.op_scatternd_of_shape.ScatterNDOfShape'>, <class 'onnx_diagnostic.reference.ops.op_simplified_layer_normalization.SimplifiedLayerNormalization'>, <class 'onnx_diagnostic.reference.ops.op_skip_layer_normalization.SkipLayerNormalization'>, <class 'onnx_diagnostic.reference.ops.op_slice.Slice_1'>, <class 'onnx_diagnostic.reference.ops.op_slice.Slice_10'>, <class 'onnx_diagnostic.reference.ops.op_add_add_mul_mul.SubMul'>, <class 'onnx_diagnostic.reference.ops.op_complex.ToComplex'>, <class 'onnx_diagnostic.reference.ops.op_transpose_cast.Transpose2DCastFP16'>, <class 'onnx_diagnostic.reference.ops.op_transpose_cast.Transpose2DCastFP32'>, <class 'onnx_diagnostic.reference.ops.op_tri_matrix.TriMatrix'>]
OnnxruntimeEvaluator¶
- class onnx_diagnostic.reference.OnnxruntimeEvaluator(proto: str | FunctionProto | ModelProto | GraphProto | NodeProto | OnnxruntimeEvaluator, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool = False, verbose: int = 0, local_functions: Dict[Tuple[str, str], FunctionProto | ModelProto | GraphProto | NodeProto | OnnxruntimeEvaluator] | None = None, ir_version: int = 10, opsets: int | Dict[str, int] | None = None, whole: bool = False)[source][source]¶
This class loads an onnx model and the executes one by one the nodes with onnxruntime. This class is mostly meant for debugging.
- Parameters:
proto – proto or filename
session_options – options
providers – providers
nvtx – enable nvidia events
providers – None, “CPU”, “CUDA” or a list of providers
graph_optimization_level – see
onnxruntime.SessionOptions
log_severity_level – see
onnxruntime.SessionOptions
log_verbosity_level – see
onnxruntime.SessionOptions
optimized_model_filepath – see
onnxruntime.SessionOptions
disable_aot_function_inlining – see
onnxruntime.SessionOptions
use_training_api – use onnxruntime-traning API
verbose – verbosity
local_functions – additional local function
ir_version – ir version to use when unknown
opsets – opsets to use when unknown
whole – if True, do not split node by node
- run(outputs: List[str] | None, feed_inputs: Dict[str, Any], intermediate: bool = False) Dict[str, Any] | List[Any] [source][source]¶
Runs the model. It only works with numpy arrays.
- Parameters:
outputs – required outputs or None for all
feed_inputs – inputs
intermediate – returns all output instead of the last ones
- Returns:
outputs, as a list if return_all is False, as a dictionary if return_all is True
TorchOnnxEvaluator¶
- class onnx_diagnostic.reference.TorchOnnxEvaluator(proto: FunctionProto | GraphProto | ModelProto, providers: Tuple[str, ...] = ('CPUExecutionProvider',), opsets: Dict[str, int] | None = None, local_functions: Dict[Tuple[str, str], TorchOnnxEvaluator] | None = None, verbose: int = 0)[source][source]¶
Torch evaluator for onnx models. The model does not stores the original proto it evaluates to avoid
- Parameters:
proto – a proto
providers – where to run the model
opsets – needed if proto is a graph
functions – known local functions
verbose – verbosity level
The class holds the following attributes:
providers: providers
default_device: default torch device
constants: all initializers or constants
kernels: kernels
runtime_info: produced by
first_used_last_used
- last_used: contains the list of intermediate results,
to remove after every node execution, this avoid the memory to grow too much
functions: local functions
The class is not multithreaded. runtime_info gets updated by the the class. The list of available kernels is returned by function
onnx_diagnostic.reference.torch_evaluator.get_kernels()
. Example:<<<
import onnx import onnx.helper as oh import torch from onnx_diagnostic.helpers import string_type from onnx_diagnostic.reference import TorchOnnxEvaluator TFLOAT = onnx.TensorProto.FLOAT proto = oh.make_model( oh.make_graph( [ oh.make_node("Sigmoid", ["Y"], ["sy"]), oh.make_node("Mul", ["Y", "sy"], ["ysy"]), oh.make_node("Mul", ["X", "ysy"], ["final"]), ], "-nd-", [ oh.make_tensor_value_info("X", TFLOAT, [1, "b", "c"]), oh.make_tensor_value_info("Y", TFLOAT, ["a", "b", "c"]), ], [oh.make_tensor_value_info("final", TFLOAT, ["a", "b", "c"])], ), opset_imports=[oh.make_opsetid("", 18)], ir_version=9, ) sess = TorchOnnxEvaluator(proto) feeds = dict(X=torch.rand((4, 5)), Y=torch.rand((4, 5))) result = sess.run(None, feeds) print(string_type(result, with_shape=True, with_min_max=True))
>>>
#1[T1s4x5[0.003964880481362343,0.3176541030406952:A0.12865978828631341]]
Adding
verbose=1
shows which kernels is executed:<<<
import onnx import onnx.helper as oh import torch from onnx_diagnostic.helpers import string_type from onnx_diagnostic.reference import TorchOnnxEvaluator TFLOAT = onnx.TensorProto.FLOAT proto = oh.make_model( oh.make_graph( [ oh.make_node("Sigmoid", ["Y"], ["sy"]), oh.make_node("Mul", ["Y", "sy"], ["ysy"]), oh.make_node("Mul", ["X", "ysy"], ["final"]), ], "-nd-", [ oh.make_tensor_value_info("X", TFLOAT, [1, "b", "c"]), oh.make_tensor_value_info("Y", TFLOAT, ["a", "b", "c"]), ], [oh.make_tensor_value_info("final", TFLOAT, ["a", "b", "c"])], ), opset_imports=[oh.make_opsetid("", 18)], ir_version=9, ) sess = TorchOnnxEvaluator(proto, verbose=1) feeds = dict(X=torch.rand((4, 5)), Y=torch.rand((4, 5))) result = sess.run(None, feeds) print(string_type(result, with_shape=True, with_min_max=True))
>>>
+I X: RuntimeValue(name='X', kind=5, shape=(4, 5), value=CT1s4x5[0.0803823471069336,0.9955793023109436:A0.44844584465026854]) +I Y: RuntimeValue(name='Y', kind=5, shape=(4, 5), value=CT1s4x5[0.06577038764953613,0.9594824314117432:A0.4405914843082428]) Sigmoid_6(Y) -> sy +R sy: RuntimeValue(name='sy', kind=1, shape=(4, 5), is_shape=False, value=CT1s4x5[0.5164366960525513,0.7230181694030762:A0.6060050100088119]) Mul_1(Y, sy) -> ysy +R ysy: RuntimeValue(name='ysy', kind=1, shape=(4, 5), is_shape=False, value=CT1s4x5[0.033966243267059326,0.693723201751709:A0.2866501869633794]) - clean Y - clean sy Mul_1(X, ysy) -> final +R final: RuntimeValue(name='final', kind=9, shape=(4, 5), is_shape=False, value=CT1s4x5[0.010385666973888874,0.30979418754577637:A0.10924520436674356]) - clean X - clean ysy ++ outputs final - clean X - clean Y - clean final #1[T1s4x5[0.010385666973888874,0.30979418754577637:A0.10924520436674356]]
It also shows when a result is not needed anymore. In that case, it is deleted to free the memory it takes. The runtime can also execute the kernel the onnx model on CUDA. It follows the same logic as
onnxruntime.InferenceSession
:providers=["CUDAExecutionProvider"]
. It is better in that case to move the input on CUDA. The class tries to move every weight on CUDA but tries to keep any tensor identified as a shape in CPU. Some bugs may remain as torch raises an exception when devices are expected to be the same. The runtime was validated with model arnir0/Tiny-LLM.- run(outputs: List[str] | None, feeds: Dict[str, Tensor] | Dict[str, ndarray]) List[Tensor | None] | List[ndarray | None] [source][source]¶
Runs the ONNX model.
- Parameters:
outputs – outputs required
feeds – inputs
- Returns:
output tensors.
- run_with_values(*args: OpRunTensor | None, context: Dict[str, RuntimeValue] | None = None) OpRunValue | Tuple[OpRunValue, ...] [source][source]¶
Runs the ONNX model.
- Parameters:
args – inputs
context – local context for the execution of subgraphs
- Returns:
output OpRunTensor