-m onnx_diagnostic sbs … runs a side-by-side torch/onnx¶
Description¶
It compares the intermediate results between an exported program saved with
torch.export.save() and an exported model on saved inputs
with torch.save(). It assumes intermediate results share the same
names.
usage: side-by-side (sbs) [-h] -i INPUTS -e EP -m ONNX -o OUTPUT [--atol ATOL]
[--rtol RTOL] [-v VERBOSE] [-r RATIO]
[--first | --no-first]
[-2 | --second-run | --no-second-run]
[--reset RESET] [-s REPLAY_THRESHOLD]
[-n REPLAY_NAMES] [-t REPLAY_OP_TYPES]
[-f REPLAY_FOLDER]
Compares the intermediate outputs between the exported program and the
exported onnx model. It assumes some names are common. The execution of the
exported program and the onnx model are done in parallel. The device is the
one used to store the model and the inputs. Where do discrepancies start? This
function tries to answer that question.
options:
-h, --help show this help message and exit
-i INPUTS, --inputs INPUTS
model inputs saved with torch.save
-e EP, --ep EP exported program saved with torch.export.save, input
sets saved with torch.save,
-m ONNX, --onnx ONNX exported model in onnx format
-o OUTPUT, --output OUTPUT
output name to stored what the command line produces,
it should be an excel file
--atol ATOL absolute tolerance
--rtol RTOL relative tolerance
-v VERBOSE, --verbose VERBOSE
verbosity
-r RATIO, --ratio RATIO
Saves the result in an excel file every <ratio> nodes,
default is 100.
--first, --no-first First runs the whole model.
-2, --second-run, --no-second-run
Tries to run all onnx nodes with torch results
produced by the exported program. It then measures the
discrepancies again. It can be used to identify kernel
introduces discrepancies from other just propagating
them.
--reset RESET List of result names separated by a comma. For those
results, the side-by-side will take torch results
instead of onnx results to compute the rest of the
onnx model.
-s REPLAY_THRESHOLD, --replay-threshold REPLAY_THRESHOLD
Triggers the replay if the discrepancies are higher
than this value.
-n REPLAY_NAMES, --replay-names REPLAY_NAMES
Triggers the replay if a result name is in this set of
values (comma separated)
-t REPLAY_OP_TYPES, --replay-op-types REPLAY_OP_TYPES
Triggers the replay if an onnx type is in this set of
values (comma separated)
-f REPLAY_FOLDER, --replay-folder REPLAY_FOLDER
If the replay is triggered, this defines the folder
where everything is dumped.
The command line expects the following files to be saved with the following
function. inputs is a dictionary of the input of the model. -
torch.export.save(ep: torch.export.ExportedProgram) - torch.save(**inputs) -
onnx.save(...) The Replay functionality is just a way to investigates a part
of a model. It saves torch and onnx inputs, the torch outputs, and the minimal
onnx model which shares its inputs with the exported program. This is used to
investigate the discrepancies between the torch model (through the exported
program) and its onnx conversion. This functionality dumps everything it can
to disk so that it be replayed in a separate process.
CPU, CUDA¶
Inputs are saved torch.save(). The execution will run on CUDA
if the device of the inputs is CUDA, same goes on CPU.
Example¶
python -m onnx_diagnostic sbs \
-i qwen25_vli_visual.inputs.pt \
--ep test_qwen25_vli_visual.cuda.float16.custom.graph.ep.pt2 \
-m test_qwen25_vli_visual.cuda.float16.custom.onnx \
-o results.dynamo.float16.xlsx \
-v 1 --atol=0.1 --rtol=1 \
--replay-names conv3d,rsqrt,to_4,mul_48,linear,linear_2,linear_84,linear_89,mul_172,linear_156,linear_159 \
-2 --reset conv3d
A snippet of the table it produces:
ep_name onnx_name ep_target onnx_op_type onnx_id_output ep_shape_type onnx_shape_type err_abs
transpose_18 transpose_18 aten.transpose.int Transpose 0 GT10s16x1292x80 GT10s16x1292x80 0.0083
unsqueeze_50 unsqueeze_50 aten.unsqueeze.default Unsqueeze 0 GT10s1x16x1292x80 GT10s1x16x1292x80 0.0083
eq_20 eq_20 aten.eq.Scalar Equal 0 GT9s1292x1292 GT9s1292x1292 0
unsqueeze_56 unsqueeze_56 aten.unsqueeze.default Unsqueeze 0 GT9s1x1x1292x1292 GT9s1x1x1292x1292 0
slice_29 slice_29 aten.slice.Tensor Slice 0 GT9s1x1x1292x1292 GT9s1x1x1292x1292 0
transpose_19 transpose_19 aten.transpose.int Transpose 0 GT10s1x1292x16x80 GT10s1x1292x16x80 0.0071
reshape_20 reshape_20 aten.reshape.default Reshape 0 GT10s1292x1280 GT10s1292x1280 0.0071
linear_21 linear_21 aten.linear.default Gemm 0 GT10s1292x1280 GT10s1292x1280 0.0015
mul_54 mul_54 aten.mul.Tensor SkipSimplifiedLayerNormalization 0 GT10s1292x1280 GT10s1292x1280 0.0098
add_32 add_32 aten.add.Tensor SkipSimplifiedLayerNormalization 3 GT10s1292x1280 GT10s1292x1280 0.0313
linear_22 linear_22 aten.linear.default Gemm 0 GT10s1292x3420 GT10s1292x3420 0.0078
silu_4 silu_4 aten.silu.default QuickGelu 0 GT10s1292x3420 GT10s1292x3420 0.0059
The available column are described by
RunAlignedRecord.
It is possible to dump pieces of the model to study some particular input
with ReplayConfiguration.