onnx_diagnostic.helpers.ort_session¶

class onnx_diagnostic.helpers.ort_session.InferenceSessionForNumpy(sess: ModelProto | str | InferenceSession, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None)[source][source]¶

Wraps an onnxruntime.InferenceSession to overload method run to support numpy.ndarray.

Parameters:

sess – model or inference session
session_options – options
providers – providers
nvtx – enable nvidia events
providers – None, “CPU”, “CUDA” or a list of providers
graph_optimization_level – see onnxruntime.SessionOptions
log_severity_level – see onnxruntime.SessionOptions
log_verbosity_level – see onnxruntime.SessionOptions
optimized_model_filepath – see onnxruntime.SessionOptions
disable_aot_function_inlining – see onnxruntime.SessionOptions
use_training_api – use onnxruntime-traning API

run(output_names: List[str] | None, feeds: Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]]) → List[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None][source][source]¶: Calls onnxruntime.InferenceSession.run().

run_dlpack(output_names: List[str] | None, feeds: Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]]) → Tuple[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None, ...][source][source]¶: Same as onnxruntime.InferenceSession.run() except that feeds is a dictionary of np.ndarray. The output device is CPU even if the outputs are on CUDA.

class onnx_diagnostic.helpers.ort_session.InferenceSessionForTorch(sess: ModelProto | str | InferenceSession, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None)[source][source]¶

Wraps an onnxruntime.InferenceSession to overload method run to support torch.Tensor.

Parameters:

sess – model or inference session
session_options – options
providers – providers
nvtx – enable nvidia events
providers – None, “CPU”, “CUDA” or a list of providers
graph_optimization_level – see onnxruntime.SessionOptions
log_severity_level – see onnxruntime.SessionOptions
log_verbosity_level – see onnxruntime.SessionOptions
optimized_model_filepath – see onnxruntime.SessionOptions
disable_aot_function_inlining – see onnxruntime.SessionOptions
use_training_api – use onnxruntime-traning API

run(output_names: List[str] | None, feeds: Dict[str, Tensor]) → Tuple[Tensor, ...][source][source]¶: Same as onnxruntime.InferenceSession.run() except that feeds is a dictionary of torch.Tensor.

run_dlpack(output_names: List[str] | None, feeds: Dict[str, Tensor]) → Tuple[Tensor, ...][source][source]¶: Same as onnxruntime.InferenceSession.run() except that feeds is a dictionary of torch.Tensor. The output device is CPU even if the outputs are on CUDA.

run_training_api(*inputs, output_names: List[str] | None = None) → Tuple[Tensor, ...][source][source]¶

Calls the former training API now implemented in onnxruntime as well.

Parameters:

inputs – list of torch.Tensor
output_names – requested outputs or None for all

Returns:

tuple of torch.Tensor

Invgestigates a crashing model. It tries every node until it crashes by adding the ones one by one in the model.

Parameters:

proto – model or inference session
session_options – options
providers – providers
nvtx – enable nvidia events
providers – None, “CPU”, “CUDA” or a list of providers
graph_optimization_level – see onnxruntime.SessionOptions
log_severity_level – see onnxruntime.SessionOptions
log_verbosity_level – see onnxruntime.SessionOptions
optimized_model_filepath – see onnxruntime.SessionOptions
disable_aot_function_inlining – see onnxruntime.SessionOptions
use_training_api – use onnxruntime-traning API
onnx_to_session – function to load a model into an inference session if automated way implemented in this function is not enough, if it is equal cpu_session, the callable becomes: lambda model: onnxruntime.InferenceSession( model.SerializeToString(), providers=["CPUExecutionProvider"])
feeds – run onnxruntime as well
verbosity – verbosity level
dump_filename – if not None, the function dumps the last model run
infer_shapes – run shape inference
quiet – if True, raises an exception, False, just stops and return the failing node

The most simple use:

investigate_onnxruntime_issue(
    model,
    feeds=feeds,
    verbose=10,
    dump_filename="test_investigate_onnxruntime_issue_callable.onnx",
    onnx_to_session="cpu_session",
)

Full example:

<<<

import numpy as np
import onnx
import onnx.helper as oh
from onnx_diagnostic.helpers.ort_session import investigate_onnxruntime_issue

TFLOAT = onnx.TensorProto.FLOAT
model = oh.make_model(
    oh.make_graph(
        [
            oh.make_node("Add", ["x", "y"], ["gggg"]),
            oh.make_node("Add", ["gggg", "z"], ["final"]),
        ],
        "dummy",
        [
            oh.make_tensor_value_info("x", TFLOAT, [None, None]),
            oh.make_tensor_value_info("y", TFLOAT, [None, None]),
            oh.make_tensor_value_info("z", TFLOAT, [None, None]),
        ],
        [oh.make_tensor_value_info("final", TFLOAT, [None, None])],
    ),
    opset_imports=[oh.make_opsetid("", 18)],
    ir_version=9,
)
onnx.checker.check_model(model)
feeds = {
    "x": np.random.rand(5, 6).astype(np.float32),
    "y": np.random.rand(5, 6).astype(np.float32),
    "z": np.random.rand(5, 6).astype(np.float32),
}
investigate_onnxruntime_issue(
    model,
    feeds=feeds,
    verbose=1,
    graph_optimization_level=False,
    dump_filename="last_issue.onnx",
)

>>>

    [investigate_onnxruntime_issue] found 2 nodes and 3 inputs
    [investigate_onnxruntime_issue] run shape inference
    [investigate_onnxruntime_issue] cls=<class 'onnx_diagnostic.helpers.ort_session.InferenceSessionForNumpy'>
    [investigate_onnxruntime_issue] + node 0: Add(x, y) -> gggg
    [investigate_onnxruntime_issue] + node 1: Add(gggg, z) -> final
    [investigate_onnxruntime_issue] done.