onnx_diagnostic.helpers.ort_session¶
- class onnx_diagnostic.helpers.ort_session.InferenceSessionForNumpy(sess: ModelProto | str | InferenceSession, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None)[source]¶
Wraps an onnxruntime.InferenceSession to overload method run to support
numpy.ndarray
.- Parameters:
sess – model or inference session
session_options – options
providers – providers
nvtx – enable nvidia events
providers – None, “CPU”, “CUDA” or a list of providers
graph_optimization_level – see
onnxruntime.SessionOptions
log_severity_level – see
onnxruntime.SessionOptions
log_verbosity_level – see
onnxruntime.SessionOptions
optimized_model_filepath – see
onnxruntime.SessionOptions
disable_aot_function_inlining – see
onnxruntime.SessionOptions
use_training_api – use onnxruntime-traning API
- run(output_names: List[str] | None, feeds: Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]) List[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None] [source]¶
Calls
onnxruntime.InferenceSession.run()
.
- run_dlpack(output_names: List[str] | None, feeds: Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]) Tuple[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None, ...] [source]¶
Same as
onnxruntime.InferenceSession.run()
except that feeds is a dictionary ofnp.ndarray
. The output device is CPU even if the outputs are on CUDA.
- class onnx_diagnostic.helpers.ort_session.InferenceSessionForTorch(sess: ModelProto | str | InferenceSession, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None)[source]¶
Wraps an onnxruntime.InferenceSession to overload method run to support
torch.Tensor
.- Parameters:
sess – model or inference session
session_options – options
providers – providers
nvtx – enable nvidia events
providers – None, “CPU”, “CUDA” or a list of providers
graph_optimization_level – see
onnxruntime.SessionOptions
log_severity_level – see
onnxruntime.SessionOptions
log_verbosity_level – see
onnxruntime.SessionOptions
optimized_model_filepath – see
onnxruntime.SessionOptions
disable_aot_function_inlining – see
onnxruntime.SessionOptions
use_training_api – use onnxruntime-traning API
- run(output_names: List[str] | None, feeds: Dict[str, Tensor]) Tuple[Tensor, ...] [source]¶
Same as
onnxruntime.InferenceSession.run()
except that feeds is a dictionary oftorch.Tensor
.
- run_dlpack(output_names: List[str] | None, feeds: Dict[str, Tensor]) Tuple[Tensor, ...] [source]¶
Same as
onnxruntime.InferenceSession.run()
except that feeds is a dictionary oftorch.Tensor
. The output device is CPU even if the outputs are on CUDA.
- run_training_api(*inputs, output_names: List[str] | None = None) Tuple[Tensor, ...] [source]¶
Calls the former training API now implemented in onnxruntime as well.
- Parameters:
inputs – list of
torch.Tensor
output_names – requested outputs or None for all
- Returns:
tuple of
torch.Tensor
- onnx_diagnostic.helpers.ort_session.investigate_onnxruntime_issue(proto: ModelProto | str, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None, onnx_to_session: str | Callable[[ModelProto], InferenceSession] | None = None, feeds: Dict[str, Tensor] | Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]] | None = None, verbose: int = 0, dump_filename: str | None = None, infer_shapes: bool = True, quiet: bool = False)[source]¶
Invgestigates a crashing model. It tries every node until it crashes by adding the ones one by one in the model.
- Parameters:
proto – model or inference session
session_options – options
providers – providers
nvtx – enable nvidia events
providers – None, “CPU”, “CUDA” or a list of providers
graph_optimization_level – see
onnxruntime.SessionOptions
log_severity_level – see
onnxruntime.SessionOptions
log_verbosity_level – see
onnxruntime.SessionOptions
optimized_model_filepath – see
onnxruntime.SessionOptions
disable_aot_function_inlining – see
onnxruntime.SessionOptions
use_training_api – use onnxruntime-traning API
onnx_to_session – function to load a model into an inference session if automated way implemented in this function is not enough, if it is equal
cpu_session
, the callable becomes:lambda model: onnxruntime.InferenceSession( model.SerializeToString(), providers=["CPUExecutionProvider"])
feeds – run onnxruntime as well
verbosity – verbosity level
dump_filename – if not None, the function dumps the last model run
infer_shapes – run shape inference
quiet – if True, raises an exception, False, just stops and return the failing node
The most simple use:
investigate_onnxruntime_issue( model, feeds=feeds, verbose=10, dump_filename="test_investigate_onnxruntime_issue_callable.onnx", onnx_to_session="cpu_session", )
Full example:
<<<
import numpy as np import onnx import onnx.helper as oh from onnx_diagnostic.helpers.ort_session import investigate_onnxruntime_issue TFLOAT = onnx.TensorProto.FLOAT model = oh.make_model( oh.make_graph( [ oh.make_node("Add", ["x", "y"], ["gggg"]), oh.make_node("Add", ["gggg", "z"], ["final"]), ], "dummy", [ oh.make_tensor_value_info("x", TFLOAT, [None, None]), oh.make_tensor_value_info("y", TFLOAT, [None, None]), oh.make_tensor_value_info("z", TFLOAT, [None, None]), ], [oh.make_tensor_value_info("final", TFLOAT, [None, None])], ), opset_imports=[oh.make_opsetid("", 18)], ir_version=9, ) onnx.checker.check_model(model) feeds = { "x": np.random.rand(5, 6).astype(np.float32), "y": np.random.rand(5, 6).astype(np.float32), "z": np.random.rand(5, 6).astype(np.float32), } investigate_onnxruntime_issue( model, feeds=feeds, verbose=1, graph_optimization_level=False, dump_filename="last_issue.onnx", )
>>>
[investigate_onnxruntime_issue] found 2 nodes and 3 inputs [investigate_onnxruntime_issue] run shape inference [investigate_onnxruntime_issue] cls=<class 'onnx_diagnostic.helpers.ort_session.InferenceSessionForNumpy'> [investigate_onnxruntime_issue] + node 0: Add(x, y) -> gggg [investigate_onnxruntime_issue] + node 1: Add(gggg, z) -> final [investigate_onnxruntime_issue] done.
- onnx_diagnostic.helpers.ort_session.make_feeds(proto: ModelProto | List[str], inputs: Any, use_numpy: bool = False, copy: bool = False) Dict[str, Tensor | ndarray] [source]¶
Serializes the inputs to produce feeds expected by
onnxruntime.InferenceSession
.- Parameters:
proto – onnx model or list of names
inputs – any kind of inputs
use_numpy – if True, converts torch tensors into numpy arrays
copy – a copy is made, this should be the case if the inputs is ingested by
OrtValue
- Returns:
feeds dictionary