onnx_diagnostic.helpers.ort_session

class onnx_diagnostic.helpers.ort_session.InferenceSessionForNumpy(sess: ModelProto | str | InferenceSession, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None)[source]

Wraps an onnxruntime.InferenceSession to overload method run to support numpy.ndarray.

Parameters:
  • sess – model or inference session

  • session_options – options

  • providers – providers

  • nvtx – enable nvidia events

  • providersNone, “CPU”, “CUDA” or a list of providers

  • graph_optimization_level – see onnxruntime.SessionOptions

  • log_severity_level – see onnxruntime.SessionOptions

  • log_verbosity_level – see onnxruntime.SessionOptions

  • optimized_model_filepath – see onnxruntime.SessionOptions

  • disable_aot_function_inlining – see onnxruntime.SessionOptions

  • use_training_api – use onnxruntime-traning API

run(output_names: List[str] | None, feeds: Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]) List[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None][source]

Calls onnxruntime.InferenceSession.run().

run_dlpack(output_names: List[str] | None, feeds: Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]) Tuple[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None, ...][source]

Same as onnxruntime.InferenceSession.run() except that feeds is a dictionary of np.ndarray. The output device is CPU even if the outputs are on CUDA.

class onnx_diagnostic.helpers.ort_session.InferenceSessionForTorch(sess: ModelProto | str | InferenceSession, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None)[source]

Wraps an onnxruntime.InferenceSession to overload method run to support torch.Tensor.

Parameters:
  • sess – model or inference session

  • session_options – options

  • providers – providers

  • nvtx – enable nvidia events

  • providersNone, “CPU”, “CUDA” or a list of providers

  • graph_optimization_level – see onnxruntime.SessionOptions

  • log_severity_level – see onnxruntime.SessionOptions

  • log_verbosity_level – see onnxruntime.SessionOptions

  • optimized_model_filepath – see onnxruntime.SessionOptions

  • disable_aot_function_inlining – see onnxruntime.SessionOptions

  • use_training_api – use onnxruntime-traning API

run(output_names: List[str] | None, feeds: Dict[str, Tensor]) Tuple[Tensor, ...][source]

Same as onnxruntime.InferenceSession.run() except that feeds is a dictionary of torch.Tensor.

run_dlpack(output_names: List[str] | None, feeds: Dict[str, Tensor]) Tuple[Tensor, ...][source]

Same as onnxruntime.InferenceSession.run() except that feeds is a dictionary of torch.Tensor. The output device is CPU even if the outputs are on CUDA.

run_training_api(*inputs, output_names: List[str] | None = None) Tuple[Tensor, ...][source]

Calls the former training API now implemented in onnxruntime as well.

Parameters:
  • inputs – list of torch.Tensor

  • output_names – requested outputs or None for all

Returns:

tuple of torch.Tensor

onnx_diagnostic.helpers.ort_session.investigate_onnxruntime_issue(proto: ModelProto | str, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None, onnx_to_session: str | Callable[[ModelProto], InferenceSession] | None = None, feeds: Dict[str, Tensor] | Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]] | None = None, verbose: int = 0, dump_filename: str | None = None, infer_shapes: bool = True, quiet: bool = False)[source]

Invgestigates a crashing model. It tries every node until it crashes by adding the ones one by one in the model.

Parameters:
  • proto – model or inference session

  • session_options – options

  • providers – providers

  • nvtx – enable nvidia events

  • providersNone, “CPU”, “CUDA” or a list of providers

  • graph_optimization_level – see onnxruntime.SessionOptions

  • log_severity_level – see onnxruntime.SessionOptions

  • log_verbosity_level – see onnxruntime.SessionOptions

  • optimized_model_filepath – see onnxruntime.SessionOptions

  • disable_aot_function_inlining – see onnxruntime.SessionOptions

  • use_training_api – use onnxruntime-traning API

  • onnx_to_session – function to load a model into an inference session if automated way implemented in this function is not enough, if it is equal cpu_session, the callable becomes: lambda model: onnxruntime.InferenceSession( model.SerializeToString(), providers=["CPUExecutionProvider"])

  • feeds – run onnxruntime as well

  • verbosity – verbosity level

  • dump_filename – if not None, the function dumps the last model run

  • infer_shapes – run shape inference

  • quiet – if True, raises an exception, False, just stops and return the failing node

The most simple use:

investigate_onnxruntime_issue(
    model,
    feeds=feeds,
    verbose=10,
    dump_filename="test_investigate_onnxruntime_issue_callable.onnx",
    onnx_to_session="cpu_session",
)

Full example:

<<<

import numpy as np
import onnx
import onnx.helper as oh
from onnx_diagnostic.helpers.ort_session import investigate_onnxruntime_issue

TFLOAT = onnx.TensorProto.FLOAT
model = oh.make_model(
    oh.make_graph(
        [
            oh.make_node("Add", ["x", "y"], ["gggg"]),
            oh.make_node("Add", ["gggg", "z"], ["final"]),
        ],
        "dummy",
        [
            oh.make_tensor_value_info("x", TFLOAT, [None, None]),
            oh.make_tensor_value_info("y", TFLOAT, [None, None]),
            oh.make_tensor_value_info("z", TFLOAT, [None, None]),
        ],
        [oh.make_tensor_value_info("final", TFLOAT, [None, None])],
    ),
    opset_imports=[oh.make_opsetid("", 18)],
    ir_version=9,
)
onnx.checker.check_model(model)
feeds = {
    "x": np.random.rand(5, 6).astype(np.float32),
    "y": np.random.rand(5, 6).astype(np.float32),
    "z": np.random.rand(5, 6).astype(np.float32),
}
investigate_onnxruntime_issue(
    model,
    feeds=feeds,
    verbose=1,
    graph_optimization_level=False,
    dump_filename="last_issue.onnx",
)

>>>

    [investigate_onnxruntime_issue] found 2 nodes and 3 inputs
    [investigate_onnxruntime_issue] run shape inference
    [investigate_onnxruntime_issue] cls=<class 'onnx_diagnostic.helpers.ort_session.InferenceSessionForNumpy'>
    [investigate_onnxruntime_issue] + node 0: Add(x, y) -> gggg
    [investigate_onnxruntime_issue] + node 1: Add(gggg, z) -> final
    [investigate_onnxruntime_issue] done.
onnx_diagnostic.helpers.ort_session.make_feeds(proto: ModelProto | List[str], inputs: Any, use_numpy: bool = False, copy: bool = False) Dict[str, Tensor | ndarray][source]

Serializes the inputs to produce feeds expected by onnxruntime.InferenceSession.

Parameters:
  • proto – onnx model or list of names

  • inputs – any kind of inputs

  • use_numpy – if True, converts torch tensors into numpy arrays

  • copy – a copy is made, this should be the case if the inputs is ingested by OrtValue

Returns:

feeds dictionary