onnx_diagnostic.torch_onnx.sbs

onnx_diagnostic.torch_onnx.sbs.run_aligned(ep: ExportedProgram, onx: ModelProto | FunctionProto, run_cls: Callable[[ModelProto | FunctionProto | GraphProto | NodeProto], List[ndarray | Tensor]], args: Tuple[Tensor, ...] | None = None, kwargs: Dict[str, Any] | None = None, use_tensor: bool = False, atol: float | None = None, rtol: float | None = None, verbose: int = 0, exc: bool = True, reset_names: List[str] | None = None, replay_configuration: ReplayConfiguration | None = None, run_onnx_with_torch_inputs: bool = False) Iterator[RunAlignedRecord][source][source]

Runs in parallel both the exported program and the onnx proto and looks for discrepancies. The function does match on result names so it assumes the exported program and the onnx model have the same names for equivalent results.

Parameters:
  • ep – exported program

  • onx – model or function proto

  • run_cls – defines the runtime to use for this task

  • args – input args

  • kwargs – input kwargs

  • use_tensor – use torch tensors instead of numpy arrays for the onnx runtime

  • atol – absolute tolerance

  • rtol – relative tolerance

  • verbose – verbosity level

  • exc – stops if an exception

  • reset_names – list of names, the onnx execution takes the torch outputs instead of its own result if the names falls into that set

  • replay_configuration – configuration to let the user dump any problematic piece of the onnx graph he wants to replay in order to investigate later, see :class: ReplayConfiguration <onnx_diagnostic.torch_onnx.sbs.ReplayConfiguration>

  • run_onnx_with_torch_inputs – run an onnx operator with torch results if they available

Returns:

a list of RunAlignedRecord

Example:

<<<

import pandas
import torch
from onnx_diagnostic.reference import (
    # This can be replaced by any runtime taking NodeProto as an input.
    ExtendedReferenceEvaluator as ReferenceEvaluator,
)
from onnx_diagnostic.torch_onnx.sbs import run_aligned


class Model(torch.nn.Module):
    def forward(self, x):
        ry = x.abs()
        rz = ry.exp()
        rw = rz + 1
        ru = rw.log() + rw
        return ru


x = torch.randn((5, 4))
Model()(x)  # to make sure the model is running
ep = torch.export.export(
    Model(), (x,), dynamic_shapes=({0: torch.export.Dim("batch")},)
)
onx = torch.onnx.export(
    Model(), (x,), dynamic_shapes=({0: torch.export.Dim("batch")},)
).model_proto
results = list(
    run_aligned(ep, onx, ReferenceEvaluator, (x,), atol=1e-5, rtol=1e-5, verbose=1)
)
print("------------")
print("final results")
df = pandas.DataFrame(results)
df = df.apply(lambda col: col.fillna("") if col.dtype == "object" else col)
print(df)

>>>

    [torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`...
    [torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`... ✅
    [torch.onnx] Run decomposition...
    /usr/lib/python3.12/copyreg.py:99: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
      return cls.__new__(cls, *args)
    [torch.onnx] Run decomposition... ✅
    [torch.onnx] Translate the graph into ONNX...
    [torch.onnx] Translate the graph into ONNX... ✅
    [run_aligned] run_cls=<class 'onnx_diagnostic.reference.evaluator.ExtendedReferenceEvaluator'>
    [run_aligned] run_cls_kwargs={'opsets': {'': 20}, 'verbose': 0}
    [run_aligned] ep: model has 0 torch constants or weights.
    [run_aligned] ep: walks through 7 nodes from torch
    [run_aligned] ep: found 0 torch constants or weights.
    [run_aligned] ep: found inputs  ['x']
    [run_aligned] ep: found outputs ['add_1']
    [run_aligned] nx: walks through 5 nodes from onnx
    [run_aligned]   args: (CT1s5x4,)
    [run_aligned] kwargs: None
    [run_aligned]   onnx: #1[CT1s5x4]
    [run_aligned] nx: walks through 1 onnx inputs
    [run_aligned-nx] +inp: x: CT1s5x4
    [run_aligned] nx: handles 1 initializers from onnx
    [run_aligned] nx: handled 2 initializers from onnx
    [run_aligned] nx: memory cpu 0.000 Mb
    [run_aligned] nx: memory cuda 0.000 Mb
    [run_aligned] nx: 2 constants
    [run_aligned] nx: 1 inputs
    [run_aligned] nx: 1 outputs
    [run_aligned] bo: 1 outputs
    [run_aligned] run_cls_kwargs={'opsets': {'': 20}, 'verbose': 0}
    [run_aligned] ep: starts side-by-side with 7 fx nodes and 5 onnx nodes
    
  0%|          | 0/12 [00:00<?, ?it/s]
ep 0/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0:   0%|          | 0/12 [00:00<?, ?it/s]
ep 1/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0:   8%|8         | 1/12 [00:00<00:00, 6831.11it/s]
ep 1/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0:  17%|#6        | 2/12 [00:00<00:00, 6610.41it/s]
ep 2/7 nx 1/5 yielded=1 maxabs=0.000 #inf=0 #nan=0:  25%|##5       | 3/12 [00:00<00:00, 3054.85it/s]
ep 2/7 nx 1/5 yielded=1 maxabs=0.000 #inf=0 #nan=0:  33%|###3      | 4/12 [00:00<00:00, 3618.90it/s]
ep 3/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0:  42%|####1     | 5/12 [00:00<00:00, 2889.04it/s]
ep 4/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0:  50%|#####     | 6/12 [00:00<00:00, 3189.58it/s]
ep 4/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0:  58%|#####8    | 7/12 [00:00<00:00, 3567.02it/s]
ep 4/7 nx 3/5 yielded=2 maxabs=0.000 #inf=0 #nan=0:  67%|######6   | 8/12 [00:00<00:00, 3602.58it/s]
ep 5/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0:  75%|#######5  | 9/12 [00:00<00:00, 3032.03it/s]
ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0:  83%|########3 | 10/12 [00:00<00:00, 3216.24it/s]
ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0:  92%|#########1| 11/12 [00:00<00:00, 3478.13it/s]
ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 100%|##########| 12/12 [00:00<00:00, 3287.50it/s]
    [run_aligned] done with status=yielded=4 maxabs=0.000 #inf=0 #nan=0
    ------------
    final results
       ep_id_node  onnx_id_node ep_name  ... err_h012 err_h0012 comment
    0         NaN            -1          ...                           
    1         0.0            -1       x  ...                           
    2         1.0             0   abs_1  ...                           
    3         2.0             1     exp  ...                           
    4         4.0             3     log  ...                           
    5         5.0             4   add_1  ...                           
    
    [6 rows x 24 columns]

This example uses onnx.reference.ReferenceEvaluator to run the onnx model but onnxruntime can also be used through onnx_diagnostic.helpers.ort_session.InferenceSessionForTorch. It relies on onnxruntime and selects CPU or CUDA depending on the device where the inputs are located.

The torch.export.ExportedProgram can be saved on disk with ep.save("<filename>.pt") and restored with torch.export.load("<filename>.pt"). That leeds the input to save. We can decouple the export and the alignment.

<<<

import onnx
import torch
from onnx_diagnostic.torch_export_patches.patch_inputs import use_dyn_not_str


class Model(torch.nn.Module):
    def forward(self, x):
        ry = x.abs()
        rz = ry.exp()
        rw = rz + 1
        ru = rw.log() + rw
        return ru


x = torch.randn((5, 4))
dynamic_shapes = ({0: "batch"},)
Model()(x)  # to make sure the model is running
ep = torch.export.export(Model(), (x,), dynamic_shapes=use_dyn_not_str(dynamic_shapes))
onx = torch.onnx.export(Model(), (x,), dynamic_shapes=dynamic_shapes).model_proto

torch.export.save(ep, "test_doc_sbs_example.pt2")
onnx.save(onx, "test_doc_sbs_example.onnx")
torch.save((x,), "test_doc_sbs_example.pt")

>>>

    [torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`...
    [torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`... ✅
    [torch.onnx] Run decomposition...
    /usr/lib/python3.12/copyreg.py:99: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
      return cls.__new__(cls, *args)
    [torch.onnx] Run decomposition... ✅
    [torch.onnx] Translate the graph into ONNX...
    [torch.onnx] Translate the graph into ONNX... ✅

Then we can restore all of them and run it.

<<<

import pandas
import onnx
import torch
from onnx_diagnostic.torch_onnx.sbs import run_aligned
from onnx_diagnostic.reference import OnnxruntimeEvaluator


ep = torch.export.load("test_doc_sbs_example.pt2")
onx = onnx.load("test_doc_sbs_example.onnx")
inputs = torch.load("test_doc_sbs_example.pt")


results = list(
    run_aligned(
        ep,
        onx,
        OnnxruntimeEvaluator,
        inputs,
        atol=1e-5,
        rtol=1e-5,
        verbose=1,
        use_tensor=True,
    )
)
print("------------")
print("final results")
df = pandas.DataFrame(results)
df = df.apply(lambda col: col.fillna("") if col.dtype == "object" else col)
print(df)

>>>

    [run_aligned] run_cls=<class 'onnx_diagnostic.reference.ort_evaluator.OnnxruntimeEvaluator'>
    [run_aligned] run_cls_kwargs={'ir_version': 10, 'opsets': {'': 20}, 'verbose': 0, 'providers': ['CPUExecutionProvider']}
    [run_aligned] ep: model has 0 torch constants or weights.
    [run_aligned] ep: walks through 7 nodes from torch
    [run_aligned] ep: found 0 torch constants or weights.
    [run_aligned] ep: found inputs  ['x']
    [run_aligned] ep: found outputs ['add_1']
    [run_aligned] nx: walks through 5 nodes from onnx
    [run_aligned]   args: (CT1s5x4,)
    [run_aligned] kwargs: None
    [run_aligned]   onnx: #1[CT1s5x4]
    [run_aligned] nx: walks through 1 onnx inputs
    [run_aligned-nx] +inp: x: CT1s5x4
    [run_aligned] nx: handles 1 initializers from onnx
    [run_aligned] nx: handled 2 initializers from onnx
    [run_aligned] nx: memory cpu 0.000 Mb
    [run_aligned] nx: memory cuda 0.000 Mb
    [run_aligned] nx: 2 constants
    [run_aligned] nx: 1 inputs
    [run_aligned] nx: 1 outputs
    [run_aligned] bo: 1 outputs
    [run_aligned] run_cls_kwargs={'ir_version': 10, 'opsets': {'': 20}, 'verbose': 0, 'providers': ['CPUExecutionProvider']}
    [run_aligned] ep: starts side-by-side with 7 fx nodes and 5 onnx nodes
    
  0%|          | 0/12 [00:00<?, ?it/s]
ep 0/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0:   0%|          | 0/12 [00:00<?, ?it/s]
ep 1/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0:   8%|8         | 1/12 [00:00<00:00, 8128.50it/s]
ep 1/7 nx 0/5 yielded=0 maxabs=0.000 #inf=0 #nan=0:  17%|#6        | 2/12 [00:00<00:00, 6065.52it/s]
ep 2/7 nx 1/5 yielded=1 maxabs=0.000 #inf=0 #nan=0:  25%|##5       | 3/12 [00:00<00:00, 559.69it/s] 
ep 2/7 nx 1/5 yielded=1 maxabs=0.000 #inf=0 #nan=0:  33%|###3      | 4/12 [00:00<00:00, 721.85it/s]
ep 3/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0:  42%|####1     | 5/12 [00:00<00:00, 545.51it/s]
ep 4/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0:  50%|#####     | 6/12 [00:00<00:00, 640.97it/s]
ep 4/7 nx 2/5 yielded=2 maxabs=0.000 #inf=0 #nan=0:  58%|#####8    | 7/12 [00:00<00:00, 738.66it/s]
ep 4/7 nx 3/5 yielded=2 maxabs=0.000 #inf=0 #nan=0:  67%|######6   | 8/12 [00:00<00:00, 658.64it/s]
ep 5/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0:  75%|#######5  | 9/12 [00:00<00:00, 525.56it/s]
ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0:  83%|########3 | 10/12 [00:00<00:00, 570.84it/s]
ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0:  92%|#########1| 11/12 [00:00<00:00, 621.54it/s]
ep 6/7 nx 4/5 yielded=3 maxabs=0.000 #inf=0 #nan=0: 100%|##########| 12/12 [00:00<00:00, 539.46it/s]
    [run_aligned] done with status=yielded=4 maxabs=0.000 #inf=0 #nan=0
    ------------
    final results
       ep_id_node  onnx_id_node ep_name  ... err_h012 err_h0012 comment
    0         NaN            -1          ...                           
    1         0.0            -1       x  ...                           
    2         1.0             0   abs_1  ...                           
    3         2.0             1     exp  ...                           
    4         4.0             3     log  ...                           
    5         5.0             4   add_1  ...                           
    
    [6 rows x 24 columns]

A command line can also be run:

python -m onnx_diagnostic sbs -i <tensors>.input.pt \
                              --ep <exported_program>.pt2 \
                              -m <model>.onnx  \
                              -o results.xlsx \
                              -v 1 --atol=0.1 --rtol=1