onnx_diagnostic.helpers.ort_session¶
- class onnx_diagnostic.helpers.ort_session.InferenceSessionForNumpy(sess: ModelProto | str | InferenceSession, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None)[source][source]¶
- Wraps an onnxruntime.InferenceSession to overload method run to support - numpy.ndarray.- Parameters:
- sess – model or inference session 
- session_options – options 
- providers – providers 
- nvtx – enable nvidia events 
- providers – None, “CPU”, “CUDA” or a list of providers 
- graph_optimization_level – see - onnxruntime.SessionOptions
- log_severity_level – see - onnxruntime.SessionOptions
- log_verbosity_level – see - onnxruntime.SessionOptions
- optimized_model_filepath – see - onnxruntime.SessionOptions
- disable_aot_function_inlining – see - onnxruntime.SessionOptions
- use_training_api – use onnxruntime-traning API 
 
 - run(output_names: List[str] | None, feeds: Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]]) List[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None][source][source]¶
 - run_dlpack(output_names: List[str] | None, feeds: Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]]) Tuple[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None, ...][source][source]¶
- Same as - onnxruntime.InferenceSession.run()except that feeds is a dictionary of- np.ndarray. The output device is CPU even if the outputs are on CUDA.
 
- class onnx_diagnostic.helpers.ort_session.InferenceSessionForTorch(sess: ModelProto | str | InferenceSession, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None)[source][source]¶
- Wraps an onnxruntime.InferenceSession to overload method run to support - torch.Tensor.- Parameters:
- sess – model or inference session 
- session_options – options 
- providers – providers 
- nvtx – enable nvidia events 
- providers – None, “CPU”, “CUDA” or a list of providers 
- graph_optimization_level – see - onnxruntime.SessionOptions
- log_severity_level – see - onnxruntime.SessionOptions
- log_verbosity_level – see - onnxruntime.SessionOptions
- optimized_model_filepath – see - onnxruntime.SessionOptions
- disable_aot_function_inlining – see - onnxruntime.SessionOptions
- use_training_api – use onnxruntime-traning API 
 
 - run(output_names: List[str] | None, feeds: Dict[str, Tensor]) Tuple[Tensor, ...][source][source]¶
- Same as - onnxruntime.InferenceSession.run()except that feeds is a dictionary of- torch.Tensor.
 - run_dlpack(output_names: List[str] | None, feeds: Dict[str, Tensor]) Tuple[Tensor, ...][source][source]¶
- Same as - onnxruntime.InferenceSession.run()except that feeds is a dictionary of- torch.Tensor. The output device is CPU even if the outputs are on CUDA.
 - run_training_api(*inputs, output_names: List[str] | None = None) Tuple[Tensor, ...][source][source]¶
- Calls the former training API now implemented in onnxruntime as well. - Parameters:
- inputs – list of - torch.Tensor
- output_names – requested outputs or None for all 
 
- Returns:
- tuple of - torch.Tensor
 
 
- onnx_diagnostic.helpers.ort_session.investigate_onnxruntime_issue(proto: ModelProto | str, session_options: SessionOptions | None = None, providers: str | List[str] | None = None, nvtx: bool = False, enable_profiling: bool = False, graph_optimization_level: GraphOptimizationLevel | bool = None, log_severity_level: int | None = None, log_verbosity_level: int | None = None, optimized_model_filepath: str | None = None, disable_aot_function_inlining: bool | None = None, use_training_api: bool | None = None, onnx_to_session: str | Callable[[ModelProto], InferenceSession] | None = None, feeds: Dict[str, Tensor] | Dict[str, Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]] | None = None, verbose: int = 0, dump_filename: str | None = None, infer_shapes: bool = True, quiet: bool = False)[source][source]¶
- Invgestigates a crashing model. It tries every node until it crashes by adding the ones one by one in the model. - Parameters:
- proto – model or inference session 
- session_options – options 
- providers – providers 
- nvtx – enable nvidia events 
- providers – None, “CPU”, “CUDA” or a list of providers 
- graph_optimization_level – see - onnxruntime.SessionOptions
- log_severity_level – see - onnxruntime.SessionOptions
- log_verbosity_level – see - onnxruntime.SessionOptions
- optimized_model_filepath – see - onnxruntime.SessionOptions
- disable_aot_function_inlining – see - onnxruntime.SessionOptions
- use_training_api – use onnxruntime-traning API 
- onnx_to_session – function to load a model into an inference session if automated way implemented in this function is not enough, if it is equal - cpu_session, the callable becomes:- lambda model: onnxruntime.InferenceSession( model.SerializeToString(), providers=["CPUExecutionProvider"])
- feeds – run onnxruntime as well 
- verbosity – verbosity level 
- dump_filename – if not None, the function dumps the last model run 
- infer_shapes – run shape inference 
- quiet – if True, raises an exception, False, just stops and return the failing node 
 
 - The most simple use: - investigate_onnxruntime_issue( model, feeds=feeds, verbose=10, dump_filename="test_investigate_onnxruntime_issue_callable.onnx", onnx_to_session="cpu_session", ) - Full example: - <<< - import numpy as np import onnx import onnx.helper as oh from onnx_diagnostic.helpers.ort_session import investigate_onnxruntime_issue TFLOAT = onnx.TensorProto.FLOAT model = oh.make_model( oh.make_graph( [ oh.make_node("Add", ["x", "y"], ["gggg"]), oh.make_node("Add", ["gggg", "z"], ["final"]), ], "dummy", [ oh.make_tensor_value_info("x", TFLOAT, [None, None]), oh.make_tensor_value_info("y", TFLOAT, [None, None]), oh.make_tensor_value_info("z", TFLOAT, [None, None]), ], [oh.make_tensor_value_info("final", TFLOAT, [None, None])], ), opset_imports=[oh.make_opsetid("", 18)], ir_version=9, ) onnx.checker.check_model(model) feeds = { "x": np.random.rand(5, 6).astype(np.float32), "y": np.random.rand(5, 6).astype(np.float32), "z": np.random.rand(5, 6).astype(np.float32), } investigate_onnxruntime_issue( model, feeds=feeds, verbose=1, graph_optimization_level=False, dump_filename="last_issue.onnx", ) - >>> - [investigate_onnxruntime_issue] found 2 nodes and 3 inputs [investigate_onnxruntime_issue] run shape inference [investigate_onnxruntime_issue] cls=<class 'onnx_diagnostic.helpers.ort_session.InferenceSessionForNumpy'> [investigate_onnxruntime_issue] + node 0: Add(x, y) -> gggg [investigate_onnxruntime_issue] + node 1: Add(gggg, z) -> final [investigate_onnxruntime_issue] done.