onnx_diagnostic.torch_onnx.sbs_dataclasses¶
- class onnx_diagnostic.torch_onnx.sbs_dataclasses.ReplayConfiguration(dump_folder: str, selected_names: Set[str] | None = None, selected_op_types: Set[str] | None = None, threshold: float = 0.1)[source][source]¶
Configuration specifying how to replay or dump pieces of onnx graph in order to replay them later and investigate later possible sources of discrepancies.
- Parameters:
dump_folder – where to dump the onnx model corresponding to the pieces to investigate
selected_names – list of results names to dump
selected_op_types – list of onnx operators to dump
threshold – only keep those whose discrepancies is greater than that threshold
- dump(name: str, onnx_id_node: int, model: ModelProto, onnx_results: Dict[str, Any], torch_results: Dict[str, Tensor], onnx_name_to_ep_name: Dict[str, str], verbose: int = 0) str | None[source][source]¶
Dumps the minimal graph which can be replayed outside the model.
- Parameters:
name – name of the result to look into
onnx_id_node – index of the node which produces it model model
model – onnx model
onnx_results – all known onnx results
torch_results – all known torch results
onnx_name_to_ep_name – correspondence between onnx_node name and exported program name
verbose – verbosity level
- Returns:
the folder created to dump everything
- get_replay_code() str[source][source]¶
Returns a code letting the user replay the onnx model. It looks like the following. It may have to be adapted.
<<<
from onnx_diagnostic.torch_onnx.sbs_dataclasses import ReplayConfiguration rc = ReplayConfiguration(dump_folder="unused") print(rc.get_replay_code())
>>>
import onnx import torch from onnx_diagnostic.helpers import max_diff, string_diff, string_type from onnx_diagnostic.helpers.torch_helper import study_discrepancies from onnx_diagnostic.helpers.onnx_helper import pretty_onnx from onnx_diagnostic.reference import OnnxruntimeEvaluator skws = dict(with_shape=True, with_device=True) torch_inputs = torch.load("torch_inputs.pt") onnx_inputs = torch.load("onnx_inputs.pt") expected_outputs_and_mapping = torch.load("torch_outputs_and_mapping.pt") expected = expected_outputs_and_mapping["expected"] mapping = expected_outputs_and_mapping["mapping"] print(f"-- torch_inputs={string_type(torch_inputs, **skws)}") print(f"-- onnx_inputs={string_type(onnx_inputs, **skws)}") print(f"-- expected={string_type(expected, **skws)}") print(f"-- mapping={mapping}") print() print("-- model.onnx") print() model = onnx.load("model.onnx") print(pretty_onnx(model)) print() print("-- range of inputs --") print() for k, v in onnx_inputs.items(): print(f"-- {k}: {string_type(v, **skws, with_min_max=True)}") print() print("-- discrepancies of inputs --") print() ep_feeds = {} for k, v in onnx_inputs.items(): tk = mapping.get(k, k) tkv = torch_inputs[k] if k in torch_inputs else torch_inputs[tk] ep_feeds[k] = tkv diff = max_diff(v, tkv) print( f"-- {k} -> {tk} ep:{string_type(tkv, **skws)} " f"nx:{string_type(v, **skws)} / diff {string_diff(diff)}" ) print() print("-- SVD --") print() for k, v in onnx_inputs.items(): if len(v.shape) == 2: U, S, Vt = torch.linalg.svd(v.to(torch.float32)) print(f" -- {k}: {S[:5]}") print() print("-- run with onnx_inputs --") print() sess = OnnxruntimeEvaluator(model, whole=True) feeds = onnx_inputs obtained = sess.run(None, feeds) print(f"-- obtained={string_type(obtained, **skws)}") diff = max_diff(expected, tuple(obtained), hist=[0.1, 0.01]) print(f"-- diff: {string_diff(diff)}") print() print("-- plots --") for i in range(len(expected)): study_discrepancies( expected[i], obtained[i], title=f"study output {i}", name=f"disc{i}.png", bins=50, ) print() print("-- run with torch_inputs --") print() obtained = sess.run(None, ep_feeds) print(f"-- obtained={string_type(obtained, **skws)}") diff = max_diff(expected, tuple(obtained), hist=[0.1, 0.01]) print(f"-- diff: {string_diff(diff)}") print() print("-- end --") print() if False: # CUDA profiling with torch.profiler.profile( activities=[torch.profiler.ProfilerActivity.CUDA], record_shapes=True, with_stack=True, ) as prof: sess.run(None, ep_feeds) obj = prof.key_averages() print(obj.table())
- select(name: str | None = None, op_type: str | None = None, err_abs: float | None = None) bool[source][source]¶
Returns true or false whether or not a piece of the onnx model should be dumped, around a particular node. The results is True if one of the condition is true:
name in self.selected_namesop_type in self.selected_op_typeserr_abs >= self.threshold
- Parameters:
name – result name
op_type – operator type
err_abs – measured discrepancy
- Returns:
True if this should be dumped
- class onnx_diagnostic.torch_onnx.sbs_dataclasses.RunAlignedRecord(ep_id_node: int | None = None, onnx_id_node: int | None = None, ep_name: str | None = None, onnx_name: str | None = None, ep_target: str | None = None, onnx_op_type: str | None = None, onnx_id_output: int | None = None, ep_shape_type: str | None = None, onnx_shape_type: str | None = None, err_abs: float | None = None, err_rel: float | None = None, err_dev: float | None = None, err_nan: float | None = None, err_h01: float | None = None, err_h001: float | None = None, ep_time_run: float | None = None, onnx_time_run: float | None = None, err_abs2: float | None = None, err_rel2: float | None = None, err_dev2: float | None = None, err_nan2: float | None = None, err_h012: float | None = None, err_h0012: float | None = None, comment: str | None = None)[source][source]¶
The side-by-side ran by function
run_alignedyields instances of this type. If both ep_name and onnx_name are specified, then both results appear in the exported program (torch) and the onnx model.- Parameters:
ep_id_node – node index in the exported program
onnx_id_node – node index in the onnx model, -1 for an initializer
ep_name – result name in the exported program
onnx_name – result name in the onnx model, usually same as ep_name except for initializer
ep_target – target name in the exported program producing the result
onnx_op_type – operator type in the onnx model producing the result
onnx_id_output – usually 0 unless this node has multiple output, in that case, it is the output index
ep_shape_type – shape and type of the results in the exported program
onnx_shape_type – shape and type of the results in the onnx mode, it should be the same as ep_shape_type, anything different probably means a bug
err_abs – maximum absolute error for the considered result between the exported program and the onnx model
err_rel – maximum relative error
err_dev – 0 if the device is the same, 1 if not
err_nan – number of nan values disagreeing
err_h01 – number of values for which the discrepancy is above 0.1
err_h001 – number of values for which the discrepancy is above 0.01
ep_time_run – execution time for the exported program
onnx_time_run – execution time for the onnx model, that includes the creation of the onnx model so that’s probably not very usable
err_abs2 – same as err_abs if onnx kernel is run with torch results
err_rel2 – same as err_rel if onnx kernel is run with torch results
err_dev2 – same as err_dev if onnx kernel is run with torch results
err_nan2 – same as err_nan if onnx kernel is run with torch results
err_h012 – same as err_h01 if onnx kernel is run with torch results
err_h0012 – same as err_h001 if onnx kernel is run with torch results
comment – any additional information
- check(already_yielded: Dict[Tuple[int | None, int | None, int | None, str | None, str | None], int]) Self[source][source]¶
Checks a record was not already yielded.
- class onnx_diagnostic.torch_onnx.sbs_dataclasses.StatusRunAligned(max_abs: float = 0.0, n_inf: int = 0, n_nan: int = 0, yielded_nodes: int = 0, last_replay: str = '')[source][source]¶
Information to display while running the side-by-side
- Parameters:
max_abs – maximum absolute seen so far
n_inf – number of infinite values seen so far
n_nan – number of nan values seen so for
yielded_nodes – number of yielded pair of nodes seen so far
last_replay – last result dumped on disk for later replay
- onnx_diagnostic.torch_onnx.sbs_dataclasses.make_torch_inputs(input_names: List[str], onnx_name_to_ep_name: Dict[str, str], onnx_results: Dict[str, Tensor], torch_results: Dict[str, Tensor], submodel: ModelProto | None) Tuple[Dict[str, Tensor], Set[str]][source][source]¶
Gathers torch tensors instead of onnx tensors (tensors produced by the onnx model)
- Parameters:
input_names – tensors to gather
onnx_name_to_ep_name – mapping between onnx name to names in the exported program
onnx_results – all onnx results (produced by the onnx model)
torch_results – all tensors produced by the exported program
submodel – onnx model, any tensor missing in torch_results is add as an initializer to this model
- Returns:
the list of tensors, the set of inputs for which there was no tensor coming from the exported program