onnx_diagnostic.torch_onnx.sbs¶
- onnx_diagnostic.torch_onnx.sbs.run_aligned(ep: ExportedProgram, onx: ModelProto | FunctionProto, run_cls: Callable[[ModelProto | FunctionProto | GraphProto | NodeProto], List[ndarray | Tensor]], args: Tuple[Tensor, ...] | None = None, kwargs: Dict[str, Any] | None = None, use_tensor: bool = False, atol: float | None = None, rtol: float | None = None, verbose: int = 0, exc: bool = True, reset_names: List[str] | None = None, replay_configuration: ReplayConfiguration | None = None, run_onnx_with_torch_inputs: bool = False) Iterator[RunAlignedRecord][source][source]¶
Runs in parallel both the exported program and the onnx proto and looks for discrepancies. The function does match on result names so it assumes the exported program and the onnx model have the same names for equivalent results.
- Parameters:
ep – exported program
onx – model or function proto
run_cls – defines the runtime to use for this task
args – input args
kwargs – input kwargs
use_tensor – use torch tensors instead of numpy arrays for the onnx runtime
atol – absolute tolerance
rtol – relative tolerance
verbose – verbosity level
exc – stops if an exception
reset_names – list of names, the onnx execution takes the torch outputs instead of its own result if the names falls into that set
replay_configuration – configuration to let the user dump any problematic piece of the onnx graph he wants to replay in order to investigate later, see :class: ReplayConfiguration <onnx_diagnostic.torch_onnx.sbs.ReplayConfiguration>
run_onnx_with_torch_inputs – run an onnx operator with torch results if they available
- Returns:
a list of
RunAlignedRecord
Example:
<<<
import pandas import torch from onnx_diagnostic.reference import ( # This can be replaced by any runtime taking NodeProto as an input. ExtendedReferenceEvaluator as ReferenceEvaluator, ) from onnx_diagnostic.torch_onnx.sbs import run_aligned class Model(torch.nn.Module): def forward(self, x): ry = x.abs() rz = ry.exp() rw = rz + 1 ru = rw.log() + rw return ru x = torch.randn((5, 4)) Model()(x) # to make sure the model is running ep = torch.export.export( Model(), (x,), dynamic_shapes=({0: torch.export.Dim("batch")},) ) onx = torch.onnx.export( Model(), (x,), dynamic_shapes=({0: torch.export.Dim("batch")},) ).model_proto results = list( run_aligned(ep, onx, ReferenceEvaluator, (x,), atol=1e-5, rtol=1e-5, verbose=1) ) print("------------") print("final results") df = pandas.DataFrame(results) df = df.apply(lambda col: col.fillna("") if col.dtype == "object" else col) print(df)
>>>
[torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`... [torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`... ✅ [torch.onnx] Run decomposition... /usr/lib/python3.12/copyreg.py:99: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. return cls.__new__(cls, *args) [torch.onnx] Run decomposition... ✅ [torch.onnx] Translate the graph into ONNX... [torch.onnx] Translate the graph into ONNX... ✅ [run_aligned] run_cls=<class 'onnx_diagnostic.reference.evaluator.ExtendedReferenceEvaluator'> [run_aligned] run_cls_kwargs={'opsets': {'': 20}, 'verbose': 0} [run_aligned] ep: model has 0 torch constants or weights. [run_aligned] ep: walks through 7 nodes from torch [run_aligned] ep: found 0 torch constants or weights. [run_aligned] ep: found inputs ['x'] [run_aligned] ep: found outputs ['add_1'] [run_aligned] nx: walks through 5 nodes from onnx [run_aligned] args: (CT1s5x4,) [run_aligned] kwargs: None [run_aligned] onnx: #1[CT1s5x4] [run_aligned] nx: walks through 1 onnx inputs [run_aligned-nx] +inp: x: CT1s5x4 [run_aligned] nx: handles 1 initializers from onnx [run_aligned] nx: handled 2 initializers from onnx [run_aligned] nx: memory cpu 0.000 Mb [run_aligned] nx: memory cuda 0.000 Mb [run_aligned] nx: 2 constants [run_aligned] nx: 1 inputs [run_aligned] nx: 1 outputs [run_aligned] bo: 1 outputs [run_aligned] run_cls_kwargs={'opsets': {'': 20}, 'verbose': 0} [run_aligned] ep: starts side-by-side with 7 fx nodes and 5 onnx nodes 0%| | 0/12 [00:00<?, ?it/s] ep 0/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0: 0%| | 0/12 [00:00<?, ?it/s] ep 1/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0: 8%|8 | 1/12 [00:00<00:00, 7557.30it/s] ep 1/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0: 17%|#6 | 2/12 [00:00<00:00, 6052.39it/s] ep 2/7 nx 1/5 yielded=1 maxabs=0.000 #inf=0 #nan=0: 25%|##5 | 3/12 [00:00<00:00, 2433.36it/s] ep 2/7 nx 1/5 yielded=1 maxabs=0.000 #inf=0 #nan=0: 33%|###3 | 4/12 [00:00<00:00, 2883.67it/s] ep 3/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0: 42%|####1 | 5/12 [00:00<00:00, 2433.74it/s] ep 4/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0: 50%|##### | 6/12 [00:00<00:00, 2717.69it/s] ep 4/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0: 58%|#####8 | 7/12 [00:00<00:00, 3054.53it/s] ep 4/7 nx 3/5 yielded=2 maxabs=0.000 #inf=0 #nan=0: 67%|######6 | 8/12 [00:00<00:00, 3104.88it/s] ep 5/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 75%|#######5 | 9/12 [00:00<00:00, 2981.03it/s] ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 83%|########3 | 10/12 [00:00<00:00, 3198.83it/s] ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 92%|#########1| 11/12 [00:00<00:00, 3455.46it/s] ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 100%|##########| 12/12 [00:00<00:00, 3269.56it/s] [run_aligned] done with status=yielded=4 maxabs=0.000 #inf=0 #nan=0 ------------ final results ep_id_node onnx_id_node ep_name onnx_name ep_target onnx_op_type onnx_id_output ep_shape_type ... onnx_time_run err_abs2 err_rel2 err_dev2 err_nan2 err_h012 err_h0012 comment 0 NaN -1 scalar_tensor_default initializer NaN ... NaN 1 0.0 -1 x x input input NaN CT1s5x4 ... NaN 2 1.0 0 abs_1 abs_1 aten.abs.default Abs 0.0 CT1s5x4 ... 0.000069 3 2.0 1 exp exp aten.exp.default Exp 0.0 CT1s5x4 ... 0.000135 4 4.0 3 log log aten.log.default Log 0.0 CT1s5x4 ... 0.000038 5 5.0 4 add_1 add_13 aten.add.Tensor Add 0.0 CT1s5x4 ... 0.000039 [6 rows x 24 columns]This example uses
onnx.reference.ReferenceEvaluatorto run the onnx model but onnxruntime can also be used throughonnx_diagnostic.helpers.ort_session.InferenceSessionForTorch. It relies on onnxruntime and selects CPU or CUDA depending on the device where the inputs are located.The
torch.export.ExportedProgramcan be saved on disk withep.save("<filename>.pt")and restored withtorch.export.load("<filename>.pt"). That leeds the input to save. We can decouple the export and the alignment.<<<
import onnx import torch from onnx_diagnostic.torch_export_patches.patch_inputs import use_dyn_not_str class Model(torch.nn.Module): def forward(self, x): ry = x.abs() rz = ry.exp() rw = rz + 1 ru = rw.log() + rw return ru x = torch.randn((5, 4)) dynamic_shapes = ({0: "batch"},) Model()(x) # to make sure the model is running ep = torch.export.export(Model(), (x,), dynamic_shapes=use_dyn_not_str(dynamic_shapes)) onx = torch.onnx.export(Model(), (x,), dynamic_shapes=dynamic_shapes).model_proto torch.export.save(ep, "test_doc_sbs_example.pt2") onnx.save(onx, "test_doc_sbs_example.onnx") torch.save((x,), "test_doc_sbs_example.pt")
>>>
[torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`... [torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`... ✅ [torch.onnx] Run decomposition... /usr/lib/python3.12/copyreg.py:99: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. return cls.__new__(cls, *args) [torch.onnx] Run decomposition... ✅ [torch.onnx] Translate the graph into ONNX... [torch.onnx] Translate the graph into ONNX... ✅Then we can restore all of them and run it.
<<<
import pandas import onnx import torch from onnx_diagnostic.torch_onnx.sbs import run_aligned from onnx_diagnostic.reference import OnnxruntimeEvaluator ep = torch.export.load("test_doc_sbs_example.pt2") onx = onnx.load("test_doc_sbs_example.onnx") inputs = torch.load("test_doc_sbs_example.pt") results = list( run_aligned( ep, onx, OnnxruntimeEvaluator, inputs, atol=1e-5, rtol=1e-5, verbose=1, use_tensor=True, ) ) print("------------") print("final results") df = pandas.DataFrame(results) df = df.apply(lambda col: col.fillna("") if col.dtype == "object" else col) print(df)
>>>
[run_aligned] run_cls=<class 'onnx_diagnostic.reference.ort_evaluator.OnnxruntimeEvaluator'> [run_aligned] run_cls_kwargs={'ir_version': 10, 'opsets': {'': 20}, 'verbose': 0, 'providers': ['CPUExecutionProvider']} [run_aligned] ep: model has 0 torch constants or weights. [run_aligned] ep: walks through 7 nodes from torch [run_aligned] ep: found 0 torch constants or weights. [run_aligned] ep: found inputs ['x'] [run_aligned] ep: found outputs ['add_1'] [run_aligned] nx: walks through 5 nodes from onnx [run_aligned] args: (CT1s5x4,) [run_aligned] kwargs: None [run_aligned] onnx: #1[CT1s5x4] [run_aligned] nx: walks through 1 onnx inputs [run_aligned-nx] +inp: x: CT1s5x4 [run_aligned] nx: handles 1 initializers from onnx [run_aligned] nx: handled 2 initializers from onnx [run_aligned] nx: memory cpu 0.000 Mb [run_aligned] nx: memory cuda 0.000 Mb [run_aligned] nx: 2 constants [run_aligned] nx: 1 inputs [run_aligned] nx: 1 outputs [run_aligned] bo: 1 outputs [run_aligned] run_cls_kwargs={'ir_version': 10, 'opsets': {'': 20}, 'verbose': 0, 'providers': ['CPUExecutionProvider']} [run_aligned] ep: starts side-by-side with 7 fx nodes and 5 onnx nodes 0%| | 0/12 [00:00<?, ?it/s] ep 0/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0: 0%| | 0/12 [00:00<?, ?it/s] ep 1/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0: 8%|8 | 1/12 [00:00<00:00, 12228.29it/s] ep 1/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0: 17%|#6 | 2/12 [00:00<00:00, 8216.07it/s] ep 2/7 nx 1/5 yielded=1 maxabs=0.000 #inf=0 #nan=0: 25%|##5 | 3/12 [00:00<00:00, 863.74it/s] ep 2/7 nx 1/5 yielded=1 maxabs=0.000 #inf=0 #nan=0: 33%|###3 | 4/12 [00:00<00:00, 1086.33it/s] ep 3/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0: 42%|####1 | 5/12 [00:00<00:00, 365.78it/s] ep 4/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0: 50%|##### | 6/12 [00:00<00:00, 430.19it/s] ep 4/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0: 58%|#####8 | 7/12 [00:00<00:00, 496.33it/s] ep 4/7 nx 3/5 yielded=2 maxabs=0.000 #inf=0 #nan=0: 67%|######6 | 8/12 [00:00<00:00, 467.04it/s] ep 5/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 75%|#######5 | 9/12 [00:00<00:00, 435.06it/s] ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 83%|########3 | 10/12 [00:00<00:00, 477.36it/s] ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 92%|#########1| 11/12 [00:00<00:00, 520.95it/s] ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 100%|##########| 12/12 [00:00<00:00, 496.95it/s] [run_aligned] done with status=yielded=4 maxabs=0.000 #inf=0 #nan=0 ------------ final results ep_id_node onnx_id_node ep_name onnx_name ep_target onnx_op_type onnx_id_output ep_shape_type ... onnx_time_run err_abs2 err_rel2 err_dev2 err_nan2 err_h012 err_h0012 comment 0 NaN -1 scalar_tensor_default initializer NaN ... NaN 1 0.0 -1 x x input input NaN CT1s5x4 ... NaN 2 1.0 0 abs_1 abs_1 aten.abs.default Abs 0.0 CT1s5x4 ... 0.001861 3 2.0 1 exp exp aten.exp.default Exp 0.0 CT1s5x4 ... 0.007897 4 4.0 3 log log aten.log.default Log 0.0 CT1s5x4 ... 0.001889 5 5.0 4 add_1 add_13 aten.add.Tensor Add 0.0 CT1s5x4 ... 0.001665 [6 rows x 24 columns]A command line can also be run:
python -m onnx_diagnostic sbs -i <tensors>.input.pt \ --ep <exported_program>.pt2 \ -m <model>.onnx \ -o results.xlsx \ -v 1 --atol=0.1 --rtol=1