onnx_diagnostic.torch_onnx.sbs_dataclasses

class onnx_diagnostic.torch_onnx.sbs_dataclasses.ReplayConfiguration(dump_folder: str, selected_names: Set[str] | None = None, selected_op_types: Set[str] | None = None, threshold: float = 0.1)[source][source]

Configuration specifying how to replay or dump pieces of onnx graph in order to replay them later and investigate later possible sources of discrepancies.

Parameters:
  • dump_folder – where to dump the onnx model corresponding to the pieces to investigate

  • selected_names – list of results names to dump

  • selected_op_types – list of onnx operators to dump

  • threshold – only keep those whose discrepancies is greater than that threshold

dump(name: str, onnx_id_node: int, model: ModelProto, onnx_results: Dict[str, Any], torch_results: Dict[str, Tensor], onnx_name_to_ep_name: Dict[str, str], verbose: int = 0) str | None[source][source]

Dumps the minimal graph which can be replayed outside the model.

Parameters:
  • name – name of the result to look into

  • onnx_id_node – index of the node which produces it model model

  • model – onnx model

  • onnx_results – all known onnx results

  • torch_results – all known torch results

  • onnx_name_to_ep_name – correspondence between onnx_node name and exported program name

  • verbose – verbosity level

Returns:

the folder created to dump everything

get_replay_code() str[source][source]

Returns a code letting the user replay the onnx model. It looks like the following. It may have to be adapted.

<<<

from onnx_diagnostic.torch_onnx.sbs_dataclasses import ReplayConfiguration

rc = ReplayConfiguration(dump_folder="unused")
print(rc.get_replay_code())

>>>

    
    import onnx
    import torch
    from onnx_diagnostic.helpers import max_diff, string_diff, string_type
    from onnx_diagnostic.helpers.torch_helper import study_discrepancies
    from onnx_diagnostic.helpers.onnx_helper import pretty_onnx
    from onnx_diagnostic.reference import OnnxruntimeEvaluator
    
    skws = dict(with_shape=True, with_device=True)
    
    torch_inputs = torch.load("torch_inputs.pt")
    onnx_inputs = torch.load("onnx_inputs.pt")
    expected_outputs_and_mapping = torch.load("torch_outputs_and_mapping.pt")
    expected = expected_outputs_and_mapping["expected"]
    mapping = expected_outputs_and_mapping["mapping"]
    
    print(f"-- torch_inputs={string_type(torch_inputs, **skws)}")
    print(f"-- onnx_inputs={string_type(onnx_inputs, **skws)}")
    print(f"-- expected={string_type(expected, **skws)}")
    print(f"-- mapping={mapping}")
    
    print()
    print("-- model.onnx")
    print()
    
    model = onnx.load("model.onnx")
    print(pretty_onnx(model))
    
    print()
    print("-- range of inputs --")
    print()
    
    for k, v in onnx_inputs.items():
        print(f"--   {k}: {string_type(v, **skws, with_min_max=True)}")
    
    print()
    print("-- discrepancies of inputs --")
    print()
    
    ep_feeds = {}
    for k, v in onnx_inputs.items():
        tk = mapping.get(k, k)
        tkv = torch_inputs[k] if k in torch_inputs else torch_inputs[tk]
        ep_feeds[k] = tkv
        diff = max_diff(v, tkv)
        print(
            f"--   {k} -> {tk} ep:{string_type(tkv, **skws)} "
            f"nx:{string_type(v, **skws)} / diff {string_diff(diff)}"
        )
    
    print()
    print("-- SVD --")
    print()
    
    for k, v in onnx_inputs.items():
        if len(v.shape) == 2:
            U, S, Vt = torch.linalg.svd(v.to(torch.float32))
            print(f" -- {k}: {S[:5]}")
    
    print()
    print("-- run with onnx_inputs --")
    print()
    
    sess = OnnxruntimeEvaluator(model, whole=True)
    feeds = onnx_inputs
    obtained = sess.run(None, feeds)
    print(f"-- obtained={string_type(obtained, **skws)}")
    diff = max_diff(expected, tuple(obtained), hist=[0.1, 0.01])
    print(f"-- diff: {string_diff(diff)}")
    print()
    print("-- plots --")
    
    for i in range(len(expected)):
        study_discrepancies(
            expected[i],
            obtained[i],
            title=f"study output {i}",
            name=f"disc{i}.png",
            bins=50,
        )
    
    print()
    print("-- run with torch_inputs --")
    print()
    
    obtained = sess.run(None, ep_feeds)
    print(f"-- obtained={string_type(obtained, **skws)}")
    diff = max_diff(expected, tuple(obtained), hist=[0.1, 0.01])
    print(f"-- diff: {string_diff(diff)}")
    
    print()
    print("-- end --")
    print()
    
    if False:
        # CUDA profiling
        with torch.profiler.profile(
            activities=[torch.profiler.ProfilerActivity.CUDA],
            record_shapes=True,
            with_stack=True,
        ) as prof:
            sess.run(None, ep_feeds)
        obj = prof.key_averages()
        print(obj.table())
select(name: str | None = None, op_type: str | None = None, err_abs: float | None = None) bool[source][source]

Returns true or false whether or not a piece of the onnx model should be dumped, around a particular node. The results is True if one of the condition is true:

  • name in self.selected_names

  • op_type in self.selected_op_types

  • err_abs >= self.threshold

Parameters:
  • name – result name

  • op_type – operator type

  • err_abs – measured discrepancy

Returns:

True if this should be dumped

class onnx_diagnostic.torch_onnx.sbs_dataclasses.RunAlignedRecord(ep_id_node: int | None = None, onnx_id_node: int | None = None, ep_name: str | None = None, onnx_name: str | None = None, ep_target: str | None = None, onnx_op_type: str | None = None, onnx_id_output: int | None = None, ep_shape_type: str | None = None, onnx_shape_type: str | None = None, err_abs: float | None = None, err_rel: float | None = None, err_dev: float | None = None, err_nan: float | None = None, err_h01: float | None = None, err_h001: float | None = None, ep_time_run: float | None = None, onnx_time_run: float | None = None, err_abs2: float | None = None, err_rel2: float | None = None, err_dev2: float | None = None, err_nan2: float | None = None, err_h012: float | None = None, err_h0012: float | None = None, comment: str | None = None)[source][source]

The side-by-side ran by function run_aligned yields instances of this type. If both ep_name and onnx_name are specified, then both results appear in the exported program (torch) and the onnx model.

Parameters:
  • ep_id_node – node index in the exported program

  • onnx_id_node – node index in the onnx model, -1 for an initializer

  • ep_name – result name in the exported program

  • onnx_name – result name in the onnx model, usually same as ep_name except for initializer

  • ep_target – target name in the exported program producing the result

  • onnx_op_type – operator type in the onnx model producing the result

  • onnx_id_output – usually 0 unless this node has multiple output, in that case, it is the output index

  • ep_shape_type – shape and type of the results in the exported program

  • onnx_shape_type – shape and type of the results in the onnx mode, it should be the same as ep_shape_type, anything different probably means a bug

  • err_abs – maximum absolute error for the considered result between the exported program and the onnx model

  • err_rel – maximum relative error

  • err_dev – 0 if the device is the same, 1 if not

  • err_nan – number of nan values disagreeing

  • err_h01 – number of values for which the discrepancy is above 0.1

  • err_h001 – number of values for which the discrepancy is above 0.01

  • ep_time_run – execution time for the exported program

  • onnx_time_run – execution time for the onnx model, that includes the creation of the onnx model so that’s probably not very usable

  • err_abs2 – same as err_abs if onnx kernel is run with torch results

  • err_rel2 – same as err_rel if onnx kernel is run with torch results

  • err_dev2 – same as err_dev if onnx kernel is run with torch results

  • err_nan2 – same as err_nan if onnx kernel is run with torch results

  • err_h012 – same as err_h01 if onnx kernel is run with torch results

  • err_h0012 – same as err_h001 if onnx kernel is run with torch results

  • comment – any additional information

check(already_yielded: Dict[Tuple[int | None, int | None, int | None, str | None, str | None], int]) Self[source][source]

Checks a record was not already yielded.

property key: Tuple[int | None, int | None, int | None, str | None, str | None]

Creates a unique identifier.

set_diff(diff: Dict[str, Any]) Self[source][source]

Sets error.

set_diff2(diff: Dict[str, Any]) Self[source][source]

Sets error.

class onnx_diagnostic.torch_onnx.sbs_dataclasses.StatusRunAligned(max_abs: float = 0.0, n_inf: int = 0, n_nan: int = 0, yielded_nodes: int = 0, last_replay: str = '')[source][source]

Information to display while running the side-by-side

Parameters:
  • max_abs – maximum absolute seen so far

  • n_inf – number of infinite values seen so far

  • n_nan – number of nan values seen so for

  • yielded_nodes – number of yielded pair of nodes seen so far

  • last_replay – last result dumped on disk for later replay

to_str() str[source][source]

Nice display.

update(err_abs: float)[source][source]

Updates all attributes with the latest measure.

onnx_diagnostic.torch_onnx.sbs_dataclasses.make_torch_inputs(input_names: List[str], onnx_name_to_ep_name: Dict[str, str], onnx_results: Dict[str, Tensor], torch_results: Dict[str, Tensor], submodel: ModelProto | None) Tuple[Dict[str, Tensor], Set[str]][source][source]

Gathers torch tensors instead of onnx tensors (tensors produced by the onnx model)

Parameters:
  • input_names – tensors to gather

  • onnx_name_to_ep_name – mapping between onnx name to names in the exported program

  • onnx_results – all onnx results (produced by the onnx model)

  • torch_results – all tensors produced by the exported program

  • submodel – onnx model, any tensor missing in torch_results is add as an initializer to this model

Returns:

the list of tensors, the set of inputs for which there was no tensor coming from the exported program