onnx_diagnostic.torch_onnx.sbs_dataclasses¶

class onnx_diagnostic.torch_onnx.sbs_dataclasses.ReplayConfiguration(dump_folder: str, selected_names: Set[str] | None = None, selected_op_types: Set[str] | None = None, threshold: float = 0.1, dump_prefix_model: bool = False)[source][source]¶

Configuration specifying how to replay or dump pieces of onnx graph in order to replay them later and investigate later possible sources of discrepancies.

Parameters:

dump_folder – where to dump the onnx model corresponding to the pieces to investigate
selected_names – list of results names to dump
selected_op_types – list of onnx operators to dump
threshold – only keep those whose discrepancies is greater than that threshold
dump_prefix_model – after dumping the smallest model able to replicate one given output, if also dumps the models producing the inputs and the outputs truncated from the big one

dump(name: str, onnx_id_node: int, model: ModelProto, onnx_results: Dict[str, Any], torch_results: Dict[str, Tensor], onnx_name_to_ep_name: Dict[str, str], verbose: int = 0) → str | None[source][source]¶

Dumps the minimal graph which can be replayed outside the model.

Parameters:

name – name of the result to look into
onnx_id_node – index of the node which produces it model model
model – onnx model
onnx_results – all known onnx results
torch_results – all known torch results
onnx_name_to_ep_name – correspondence between onnx_node name and exported program name
verbose – verbosity level

Returns:

the folder created to dump everything

get_replay_code() → str[source][source]¶

Returns a code letting the user replay the onnx model. It looks like the following. It may have to be adapted.

<<<

from onnx_diagnostic.torch_onnx.sbs_dataclasses import ReplayConfiguration

rc = ReplayConfiguration(dump_folder="unused")
print(rc.get_replay_code())

>>>

    
    import onnx
    import torch
    from onnx_diagnostic.helpers import max_diff, string_diff, string_type
    from onnx_diagnostic.helpers.torch_helper import study_discrepancies
    from onnx_diagnostic.helpers.onnx_helper import pretty_onnx
    from onnx_diagnostic.reference import OnnxruntimeEvaluator
    
    skws = dict(with_shape=True, with_device=True)
    
    torch_inputs = torch.load("torch_inputs.pt")
    onnx_inputs = torch.load("onnx_inputs.pt")
    expected_outputs_and_mapping = torch.load("torch_outputs_and_mapping.pt")
    expected = expected_outputs_and_mapping["expected"]
    mapping = expected_outputs_and_mapping["mapping"]
    
    print(f"-- torch_inputs={string_type(torch_inputs, **skws)}")
    print(f"-- onnx_inputs={string_type(onnx_inputs, **skws)}")
    print(f"-- expected={string_type(expected, **skws)}")
    print(f"-- mapping={mapping}")
    
    print()
    print("-- model.onnx")
    print()
    
    model = onnx.load("model.onnx")
    print(pretty_onnx(model))
    
    print()
    print("-- range of inputs --")
    print()
    
    for k, v in onnx_inputs.items():
        print(f"--   {k}: {string_type(v, **skws, with_min_max=True)}")
    
    print()
    print("-- discrepancies of inputs --")
    print()
    
    ep_feeds = {}
    for k, v in onnx_inputs.items():
        tk = mapping.get(k, k)
        tkv = torch_inputs[k] if k in torch_inputs else torch_inputs[tk]
        ep_feeds[k] = tkv
        diff = max_diff(v, tkv)
        print(
            f"--   {k} -> {tk} ep:{string_type(tkv, **skws)} "
            f"nx:{string_type(v, **skws)} / diff {string_diff(diff)}"
        )
    
    print()
    print("-- SVD --")
    print()
    
    for k, v in onnx_inputs.items():
        if len(v.shape) == 2:
            U, S, Vt = torch.linalg.svd(v.to(torch.float32))
            print(f" -- {k}: {S[:5]}")
    
    print()
    print("-- run with onnx_inputs --")
    print()
    
    sess = OnnxruntimeEvaluator(model, whole=True)
    feeds = onnx_inputs
    obtained = sess.run(None, feeds)
    print(f"-- obtained={string_type(obtained, **skws)}")
    diff = max_diff(expected, tuple(obtained), hist=[0.1, 0.01])
    print(f"-- diff: {string_diff(diff)}")
    print()
    print("-- plots --")
    
    for i in range(len(expected)):
        study_discrepancies(
            expected[i],
            obtained[i],
            title=f"study output {i}",
            name=f"disc{i}.png",
            bins=50,
        )
    
    print()
    print("-- run with torch_inputs --")
    print()
    
    obtained = sess.run(None, ep_feeds)
    print(f"-- obtained={string_type(obtained, **skws)}")
    diff = max_diff(expected, tuple(obtained), hist=[0.1, 0.01])
    print(f"-- diff: {string_diff(diff)}")
    
    print()
    print("-- end --")
    print()
    
    if False:
        # CUDA profiling
        with torch.profiler.profile(
            activities=[torch.profiler.ProfilerActivity.CUDA],
            record_shapes=True,
            with_stack=True,
        ) as prof:
            sess.run(None, ep_feeds)
        obj = prof.key_averages()
        print(obj.table())

select(name: str | None = None, op_type: str | None = None, err_abs: float | None = None) → bool[source][source]¶

Returns true or false whether or not a piece of the onnx model should be dumped, around a particular node. The results is True if one of the condition is true:

name in self.selected_names
op_type in self.selected_op_types
err_abs >= self.threshold

Parameters:

name – result name
op_type – operator type
err_abs – measured discrepancy

Returns:

True if this should be dumped

The side-by-side ran by function run_aligned yields instances of this type. If both ep_name and onnx_name are specified, then both results appear in the exported program (torch) and the onnx model.

Parameters:

ep_id_node – node index in the exported program
onnx_id_node – node index in the onnx model, -1 for an initializer
ep_name – result name in the exported program
onnx_name – result name in the onnx model, usually same as ep_name except for initializer
ep_target – target name in the exported program producing the result
onnx_op_type – operator type in the onnx model producing the result
onnx_id_output – usually 0 unless this node has multiple output, in that case, it is the output index
ep_shape_type – shape and type of the results in the exported program
onnx_shape_type – shape and type of the results in the onnx mode, it should be the same as ep_shape_type, anything different probably means a bug
err_abs – maximum absolute error for the considered result between the exported program and the onnx model
err_rel – maximum relative error
err_dev – 0 if the device is the same, 1 if not
err_nan – number of nan values disagreeing
err_h01 – number of values for which the discrepancy is above 0.1
err_h001 – number of values for which the discrepancy is above 0.01
ep_time_run – execution time for the exported program
onnx_time_run – execution time for the onnx model, that includes the creation of the onnx model so that’s probably not very usable
err_abs2 – same as err_abs if onnx kernel is run with torch results
err_rel2 – same as err_rel if onnx kernel is run with torch results
err_dev2 – same as err_dev if onnx kernel is run with torch results
err_nan2 – same as err_nan if onnx kernel is run with torch results
err_h012 – same as err_h01 if onnx kernel is run with torch results
err_h0012 – same as err_h001 if onnx kernel is run with torch results
comment – any additional information

check(already_yielded: Dict[Tuple[int | None, int | None, int | None, str | None, str | None], int]) → Self[source][source]¶: Checks a record was not already yielded.

property key: Tuple[int | None, int | None, int | None, str | None, str | None]¶: Creates a unique identifier.

set_diff(diff: Dict[str, Any]) → Self[source][source]¶: Sets error.

set_diff2(diff: Dict[str, Any]) → Self[source][source]¶: Sets error.

class onnx_diagnostic.torch_onnx.sbs_dataclasses.StatusRunAligned(max_abs: float = 0.0, n_inf: int = 0, n_nan: int = 0, yielded_nodes: int = 0, last_replay: str = '')[source][source]¶

Information to display while running the side-by-side

Parameters:

max_abs – maximum absolute seen so far
n_inf – number of infinite values seen so far
n_nan – number of nan values seen so for
yielded_nodes – number of yielded pair of nodes seen so far
last_replay – last result dumped on disk for later replay

to_str() → str[source][source]¶: Nice display.

update(err_abs: float)[source][source]¶: Updates all attributes with the latest measure.

onnx_diagnostic.torch_onnx.sbs_dataclasses.make_torch_inputs(input_names: List[str], onnx_name_to_ep_name: Dict[str, str], onnx_results: Dict[str, Tensor], torch_results: Dict[str, Tensor], submodel: ModelProto | None) → Tuple[Dict[str, Tensor], Set[str]][source][source]¶

Gathers torch tensors instead of onnx tensors (tensors produced by the onnx model)

Parameters:

input_names – tensors to gather
onnx_name_to_ep_name – mapping between onnx name to names in the exported program
onnx_results – all onnx results (produced by the onnx model)
torch_results – all tensors produced by the exported program
submodel – onnx model, any tensor missing in torch_results is add as an initializer to this model

Returns:

the list of tensors, the set of inputs for which there was no tensor coming from the exported program