-m onnx_diagnostic sbs … runs a side-by-side torch/onnx¶
Description¶
It compares the intermediate results between an exported program saved with
torch.export.save() and an exported model on saved inputs
with torch.save(). It assumes intermediate results share the same
names.
usage: side-by-side (sbs) [-h] -i INPUTS -e EP -m ONNX -o OUTPUT [--atol ATOL] [--rtol RTOL] [-v VERBOSE] [-r RATIO] [--first | --no-first] [--sbs | --no-sbs] [-2 | --second-run | --no-second-run]
[--reset RESET] [-s REPLAY_THRESHOLD] [-n REPLAY_NAMES] [-t REPLAY_OP_TYPES] [-f REPLAY_FOLDER] [-p | --replay-prefix-model | --no-replay-prefix-model]
Compares the intermediate outputs between the exported program and the exported onnx model. It assumes some names are common. The execution of the exported program and the onnx model are done in
parallel. The device is the one used to store the model and the inputs. Where do discrepancies start? This function tries to answer that question.
options:
-h, --help show this help message and exit
-i INPUTS, --inputs INPUTS
model inputs saved with torch.save
-e EP, --ep EP exported program saved with torch.export.save, input sets saved with torch.save,
-m ONNX, --onnx ONNX exported model in onnx format
-o OUTPUT, --output OUTPUT
output name to stored what the command line produces, it should be an excel file
--atol ATOL absolute tolerance
--rtol RTOL relative tolerance
-v VERBOSE, --verbose VERBOSE
verbosity
-r RATIO, --ratio RATIO
Saves the result in an excel file every <ratio> nodes, default is 100.
--first, --no-first First runs the whole model (default is False).
--sbs, --no-sbs Runs the side-by-side (default is True).
-2, --second-run, --no-second-run
Tries to run all onnx nodes with torch results produced by the exported program. It then measures the discrepancies again. It can be used to identify kernel introduces
discrepancies from other just propagating them.
--reset RESET List of result names separated by a comma. For those results, the side-by-side will take torch results instead of onnx results to compute the rest of the onnx model.
-s REPLAY_THRESHOLD, --replay-threshold REPLAY_THRESHOLD
Triggers the replay if the discrepancies are higher than this value.
-n REPLAY_NAMES, --replay-names REPLAY_NAMES
Triggers the replay if a result name is in this set of values (comma separated)
-t REPLAY_OP_TYPES, --replay-op-types REPLAY_OP_TYPES
Triggers the replay if an onnx type is in this set of values (comma separated)
-f REPLAY_FOLDER, --replay-folder REPLAY_FOLDER
If the replay is triggered, this defines the folder where everything is dumped.
-p, --replay-prefix-model, --no-replay-prefix-model
There are two ways to recompute an intermediate output, the first one is to " produce the minimal model between torch and onnx. The second one is to dump onnx models from the
inputs to the considered intermediate results. This enables the second one.
The command line expects the following files to be saved with the following function. inputs is a dictionary of the input of the model. - torch.export.save(ep: torch.export.ExportedProgram) -
torch.save(**inputs) - onnx.save(...) The Replay functionality is just a way to investigates a part of a model. It saves torch and onnx inputs, the torch outputs, and the minimal onnx model which shares
its inputs with the exported program. This is used to investigate the discrepancies between the torch model (through the exported program) and its onnx conversion. This functionality dumps everything it
can to disk so that it be replayed in a separate process.
CPU, CUDA¶
Inputs are saved torch.save(). The execution will run on CUDA
if the device of the inputs is CUDA, same goes on CPU.
Example¶
python -m onnx_diagnostic sbs \
-i qwen25_vli_visual.inputs.pt \
--ep test_qwen25_vli_visual.cuda.float16.custom.graph.ep.pt2 \
-m test_qwen25_vli_visual.cuda.float16.custom.onnx \
-o results.dynamo.float16.xlsx \
-v 1 --atol=0.1 --rtol=1 \
--replay-names conv3d,rsqrt,to_4,mul_48,linear,linear_2,linear_84,linear_89,mul_172,linear_156,linear_159 \
-2 --reset conv3d
A snippet of the table it produces:
ep_name onnx_name ep_target onnx_op_type onnx_id_output ep_shape_type onnx_shape_type err_abs
transpose_18 transpose_18 aten.transpose.int Transpose 0 GT10s16x1292x80 GT10s16x1292x80 0.0083
unsqueeze_50 unsqueeze_50 aten.unsqueeze.default Unsqueeze 0 GT10s1x16x1292x80 GT10s1x16x1292x80 0.0083
eq_20 eq_20 aten.eq.Scalar Equal 0 GT9s1292x1292 GT9s1292x1292 0
unsqueeze_56 unsqueeze_56 aten.unsqueeze.default Unsqueeze 0 GT9s1x1x1292x1292 GT9s1x1x1292x1292 0
slice_29 slice_29 aten.slice.Tensor Slice 0 GT9s1x1x1292x1292 GT9s1x1x1292x1292 0
transpose_19 transpose_19 aten.transpose.int Transpose 0 GT10s1x1292x16x80 GT10s1x1292x16x80 0.0071
reshape_20 reshape_20 aten.reshape.default Reshape 0 GT10s1292x1280 GT10s1292x1280 0.0071
linear_21 linear_21 aten.linear.default Gemm 0 GT10s1292x1280 GT10s1292x1280 0.0015
mul_54 mul_54 aten.mul.Tensor SkipSimplifiedLayerNormalization 0 GT10s1292x1280 GT10s1292x1280 0.0098
add_32 add_32 aten.add.Tensor SkipSimplifiedLayerNormalization 3 GT10s1292x1280 GT10s1292x1280 0.0313
linear_22 linear_22 aten.linear.default Gemm 0 GT10s1292x3420 GT10s1292x3420 0.0078
silu_4 silu_4 aten.silu.default QuickGelu 0 GT10s1292x3420 GT10s1292x3420 0.0059
The available column are described by
RunAlignedRecord.
It is possible to dump pieces of the model to study some particular input
with ReplayConfiguration.