Measuring the exporters on a short list of sets of models¶
This benchmark aims measures a couple of exporter or ways to run a pytorch model and various sets of models to check which one is running or better in some conditions. It can be triggered on sets or models through a different script for each of them:
explicit:
python -m experimental_experiment.torch_bench.bash_bench_explicit
huggingface:
python -m experimental_experiment.torch_bench.bash_bench_huggingface
huggingface_big:
python -m experimental_experiment.torch_bench.bash_bench_huggingface_big
issues:
python -m experimental_experiment.torch_bench.bash_bench_issues
timm:
python -m experimental_experiment.torch_bench.bash_bench_timm
torchbench:
python -m experimental_experiment.torch_bench.bash_bench_torchbench
torchbench_ado:
python -m experimental_experiment.torch_bench.bash_bench_torchbench_ado
untrained:
python -m experimental_experiment.torch_bench.bash_bench_untrained
huggingface is a set of models coming from transformers, huggingface_big is a another set of models coming from transformers, models are bigger, timm is a set of models coming from timm, torchbench and torchbench_ado models come from torchbench, explicit is a set of custom models, issues is a set of models to track after they failed, untrained is a set similar to huggingface_big but it bypasses the downloading part which can takes several minutes.
These scripts are usually uses in two ways:
a single run: to investigate a failure or a slow model
a batch run: to benchmark many models on many exporters
Examples are using with bash_bench_huggingface
but any of the other can be used.
List of models¶
The list of supported models can be obtained by running:
python -m experimental_experiment.torch_bench.bash_bench_huggingface --model ""
0 - 101Dummy
1 - 101Dummy16
2 - 101DummyTuple
3 - AlbertForMaskedLM
4 - AlbertForQuestionAnswering
5 - AllenaiLongformerBase
6 - BartForCausalLM
7 - BartForConditionalGeneration
8 - BertForMaskedLM
9 - BertForQuestionAnswering
10 - BlenderbotForCausalLM
11 - BlenderbotForConditionalGeneration
12 - BlenderbotSmallForCausalLM
13 - BlenderbotSmallForConditionalGeneration
...
Single Run¶
The script loads a model, ElectraForQuestionAnswering in this case,
warms up 10 times, measure the time to run inference 30 times. Then it converts it
into onnx, and do the same. This script is usually run with --quiet=0
to ensure the script stops as soon as an exception is raised. One example:
python -m experimental_experiment.torch_bench.bash_bench_huggingface --model ElectraForQuestionAnswering --device cpu --exporter script --verbose 3 --quiet 0 -w 1 -r 3
[bash_bench_huggingface] start
device=cpu
dtype=
dump_folder=dump_bash_bench
dynamic=0
exporter=script
model=ElectraForQuestionAnswering
opt_patterns=
output_data=output_data_bash_bench_huggingface.py.csv
process=0
quiet=0
repeat=3
target_opset=18
verbose=3
warmup=1
Running model 'ElectraForQuestionAnswering'
[BenchmarkRunner.benchmark] test model 'ElectraForQuestionAnswering' with exporter='script'
[BenchmarkRunner.benchmark] load model 'ElectraForQuestionAnswering'
[benchmarkrunner.benchmark] model wrapped with class <class 'experimental_experiment.torch_bench._bash_bench_model_runner.WrappedModelToTuple'>
[BenchmarkRunner.benchmark] model size and dtype 13483522, float32
[BenchmarkRunner.benchmark] warmup model 'ElectraForQuestionAnswering' - 1 times
[benchmarkrunner.benchmark] output_size=65537.0
[BenchmarkRunner.benchmark] repeat model 'ElectraForQuestionAnswering' - 3 times
[BenchmarkRunner.benchmark] export model 'ElectraForQuestionAnswering'
[BenchmarkRunner.benchmark] inference model 'ElectraForQuestionAnswering'
[BenchmarkRunner.benchmark] warmup script - 'ElectraForQuestionAnswering'
[benchmarkrunner.benchmark] no_grad=True torch.is_grad_enabled()=False before warmup
[benchmarkrunner.benchmark] torch.is_grad_enabled()=False after warmup
[BenchmarkRunner.benchmark] repeat ort 'ElectraForQuestionAnswering'
[BenchmarkRunner.benchmark] done model with 46 metrics
[BenchmarkRunner.benchmark] done model 'ElectraForQuestionAnswering' with exporter='script' in 116.15856191800003
:_index,ElectraForQuestionAnswering-script;
:capability,6.1;
:cpu,8;
:date_start,2024-07-09;
:device,cpu;
:device_name,NVIDIA GeForce GTX 1060;
:discrepancies_abs,1.3709068298339844e-06;
:discrepancies_rel,0.03255894407629967;
:executable,/usr/bin/python;
:exporter,script;
:filename,dump_test_models/ElectraForQuestionAnswering-script-cpu-/model.onnx;
:flag_fake_tensor,False;
:flag_no_grad,True;
:flag_training,False;
:has_cuda,True;
:input_size,32896;
:machine,x86_64;
:model_name,ElectraForQuestionAnswering;
:onnx_filesize,55613972;
:onnx_input_names,input.1|onnx::Clip_1|onnx::Clip_2;
:onnx_model,1;
:onnx_n_inputs,3;
:onnx_n_outputs,3;
:onnx_optimized,0;
:onnx_output_names,1300|onnx::SoftmaxCrossEntropyLoss_1286|onnx::SoftmaxCrossEntropyLoss_1288;
:opt_patterns,;
:output_size,65537.0;
:params_dtype,float32;
:params_size,13483522;
:processor,x86_64;
:providers,CPUExecutionProvider;
:repeat,3;
:speedup,1.3189447836001065;
:speedup_increase,0.3189447836001065;
:time_export,19.68962045799981;
:time_latency,10.412437652000031;
:time_latency_eager,13.733430325666783;
:time_load,0.3337397940003939;
:time_session,0.22385592099999485;
:time_total,116.15856191800003;
:time_warmup,10.869273103000069;
:time_warmup_eager,12.341592189000039;
:version,3.10.12;
:version_onnxruntime,1.18.0+cu118;
:version_torch,2.5.0.dev20240705+cu118;
:version_transformers,4.42.3;
:warmup,1;
Multiple Runs¶
--model all
runs the same command as above a in new process each time,
--model All
runs the same command as above a in new process each time,
--model Head
runs the same command as above a in new process each time
with the ten first model of the benchark,
--model Tail
runs the same command as above a in new process each time
with the ten first model of the benchark.
Any value with ,
means the command line needs to be run multiple times
with multiple values. For example, the following command line:
python -m experimental_experiment.torch_bench.bash_bench_huggingface --model ElectraForQuestionAnswering --device cpu --exporter script,dynamo_export --verbose 3 --quiet 1 -w 1 -r 3
Will run:
python -m experimental_experiment.torch_bench.bash_bench_huggingface --model ElectraForQuestionAnswering --device cpu --exporter script --verbose 3 --quiet 1 -w 1 -r 3
python -m experimental_experiment.torch_bench.bash_bench_huggingface --model ElectraForQuestionAnswering --device cpu --exporter dynamo_export --verbose 3 --quiet 1 -w 1 -r 3
Multiple fields may have multiple values.
Every run outputs some variable following the format
:<name>,<value>;
. All of these expressions are collected
and aggregated in a csv file.
Aggregated Report¶
An aggregated report can be produced by command line:
python -m experimental_experiment.torch_bench.bash_bench_agg summary.xlsx bench1.csv bench2.csv ...
Other options of this command line allow the user to filter in ir out some data
(see --filter_in
, --filter_out
). The aggregator assumes every differences
in the version is a tested difference. If not, different versions can be ignored
by using --skip_keys=version,version_torch
or any other key column not meant
to be used in the report.