-m onnx_diagnostic validate … validate a model id¶
Description¶
The command lines validate a model id available on HuggingFace but not only. It creates dummy inputs, runs the models on them, exports the model, measures the discrepancies…
usage: test [-h] [-m MID] [-t TASK] [-e EXPORT] [--opt OPT]
[-r | --run | --no-run] [-q | --quiet | --no-quiet]
[-p | --patch | --no-patch] [--stop-if-static STOP_IF_STATIC]
[--trained | --no-trained] [-o DUMP_FOLDER] [--drop DROP]
[--ortfusiontype ORTFUSIONTYPE] [-v VERBOSE] [--dtype DTYPE]
[--device DEVICE] [--iop [KEY=VALUE ...]] [--mop [KEY=VALUE ...]]
Prints out dummy inputs for a particular task or a model id. If both mid and
task are empty, the command line displays the list of supported tasks.
options:
-h, --help show this help message and exit
-m MID, --mid MID model id, usually <author>/<name>
-t TASK, --task TASK force the task to use
-e EXPORT, --export EXPORT
export the model with this exporter
--opt OPT optimization to apply after the export
-r, --run, --no-run runs the model to check it runs
-q, --quiet, --no-quiet
catches exception, report them in the summary
-p, --patch, --no-patch
applies patches before exporting
--stop-if-static STOP_IF_STATIC
raises an exception if a dynamic dimension becomes
static
--trained, --no-trained
validate the trained model (requires downloading)
-o DUMP_FOLDER, --dump-folder DUMP_FOLDER
if not empty, a folder is created to dumps statistics,
exported program, onnx...
--drop DROP drops the following inputs names, it should be a list
with comma separated values
--ortfusiontype ORTFUSIONTYPE
applies onnxruntime fusion, this parameter should
contain the model type or multiple values separated by
`|`. `ALL` can be used to run them all
-v VERBOSE, --verbose VERBOSE
verbosity
--dtype DTYPE changes dtype if necessary
--device DEVICE changes the device if necessary
--iop [KEY=VALUE ...]
Additional input options, use to change the default
inputs use to export, example: --iop
cls_cache=SlidingWindowCache
--mop [KEY=VALUE ...]
Additional model options, use to change some
parameters of the model, example: --mop
attn_implementation=eager
If the model id is specified, one untrained version of it is instantiated.
Get the list of supported tasks¶
The task are the same defined by HuggingFace. The tool only supports a subset of them.
python -m onnx_diagnostic validate
-- list of supported tasks:
MoE
automatic-speech-recognition
feature-extraction
fill-mask
image-classification
image-text-to-text
sentence-similarity
text-classification
text-generation
text2text-generation
zero-shot-image-classification
Get the default inputs for a specific task¶
This returns the dummy inputs for a specific task. There may be too many inputs. Only those the forward method defines are kept.
python -m onnx_diagnostic validate -t text-generation
-- inputs
+ input_ids : T7s2x3
+ attention_mask : T7s2x33
+ position_ids : T7s2x3
+ past_key_values : DynamicCache(key_cache=#4[T1s2x24x30x16,T1s2x24x30x16,T1s2x24x30x16,T1s2x24x30x16], value_cache=#4[T1s2x24x30x16,T1s2x24x30x16,T1s2x24x30x16,T1s2x24x30x16])
-- dynamic_shapes
+ input_ids : {0:Dim(batch),1:DYN(seq_length)}
+ attention_mask : {0:Dim(batch),1:DYN(cache+seq)}
+ position_ids : {0:Dim(batch),1:DYN(cache+seq)}
+ past_key_values : #2[#4[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}],#4[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}]]
Validate dummy inputs for a model¶
The dummy inputs may not work for this model and this task. The following command line checks that. It is no use to export if this fails.
python -m onnx_diagnostic validate -m arnir0/Tiny-LLM --run -v 1
[validate_model] validate model id 'arnir0/Tiny-LLM'
[validate_model] get dummy inputs with input_options=None...
[get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
[get_untrained_model_with_inputs] use preinstalled 'arnir0/Tiny-LLM'
[get_untrained_model_with_inputs] architecture='LlamaForCausalLM'
[get_untrained_model_with_inputs] cls='LlamaConfig'
[get_untrained_model_with_inputs] task='text-generation'
[get_untrained_model_with_inputs] use fct=<function get_inputs at 0x7fcff7892160>
[validate_model] --
[validate_model] task=text-generation
[validate_model] size=49.549072265625 Mb
[validate_model] n_weights=12.988992 millions parameters
[validate_model] +INPUT input_ids=T7s2x3
[validate_model] +INPUT attention_mask=T7s2x33
[validate_model] +INPUT position_ids=T7s2x3
[validate_model] +INPUT past_key_values=DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])
[validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
[validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE past_key_values=#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]
[validate_model] --
[validate_model] -- run the model...
[validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_model] done (run)
[validate_model] -- done (final)
-- summary --
:model_class,LlamaForCausalLM;
:model_config,{'vocab_size':32000,'max_position_embeddings':1024,'hidden_size':192,'intermediate_size':1024,'num_hidden_layers':1,'num_attention_heads':2,'num_key_value_heads':1,'hidden_act':'silu','initializer_range':0.02,'rms_norm_eps':1e-05,'pretraining_tp':1,'use_cache':True,'rope_theta':10000.0,'rope_scaling':None,'attention_bias':False,'attention_dropout':0.0,'mlp_bias':False,'head_dim':96,'return_dict':True,'output_hidden_states':False,'output_attentions':False,'torchscript':False,'torch_dtype':'float32','use_bfloat16':False,'tf_legacy_loss':False,'pruned_heads':{},'tie_word_embeddings':False,'chunk_size_feed_forward':0,'is_encoder_decoder':False,'is_decoder':False,'cross_attention_hidden_size':None,'add_cross_attention':False,'tie_encoder_decoder':False,'max_length':20,'min_length':0,'do_sample':False,'early_stopping':False,'num_beams':1,'num_beam_groups':1,'diversity_penalty':0.0,'temperature':1.0,'top_k':50,'top_p':1.0,'typical_p':1.0,'repetition_penalty':1.0,'length_penalty':1.0,'no_repeat_ngram_size':0,'encoder_no_repeat_ngram_size':0,'bad_words_ids':None,'num_return_sequences':1,'output_scores':False,'return_dict_in_generate':False,'forced_bos_token_id':None,'forced_eos_token_id':None,'remove_invalid_values':False,'exponential_decay_length_penalty':None,'suppress_tokens':None,'begin_suppress_tokens':None,'architectures':['LlamaForCausalLM'],'finetuning_task':None,'id2label':{0:'LABEL_0',1:'LABEL_1'},'label2id':{'LABEL_0':0,'LABEL_1':1},'tokenizer_class':None,'prefix':None,'bos_token_id':1,'pad_token_id':None,'eos_token_id':2,'sep_token_id':None,'decoder_start_token_id':None,'task_specific_params':None,'problem_type':None,'_name_or_path':'','_attn_implementation_autoset':True,'transformers_version':'4.52.0.dev0','model_type':'llama'};
:model_config_class,LlamaConfig;
:model_expected,CausalLMOutputWithPast(logits:T1s2x3x32000,past_key_values:DynamicCache(key_cache=#1[T1s2x1x33x96], value_cache=#1[T1s2x1x33x96]));
:model_id,arnir0/Tiny-LLM;
:model_inputs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
:model_inputs_opionts,;
:model_nweights,12988992;
:model_shapes,str;
:model_size,51955968;
:model_task,text-generation;
:time_create,0.15915065000081086;
:time_run,0.014513059002638329;
:version_date,2025-04-25T15:02:23;
:version_device,;
:version_do_run,True;
:version_drop_inputs,[];
:version_dtype,;
:version_dump_folder,;
:version_exporter,;
:version_model_id,arnir0/Tiny-LLM;
:version_numpy,2.2.5;
:version_onnx,1.19.0;
:version_onnx_diagnostic,0.4.2;
:version_onnxruntime,1.22.0+cu126;
:version_onnxscript,0.3.0.dev20250301;
:version_optimization,;
:version_ortfusiontype,;
:version_patch,True;
:version_quiet,False;
:version_stop_if_static,0;
:version_torch,2.8.0.dev20250423+cu126;
:version_trained,False;
:version_transformers,4.52.0.dev0;
Validate and export a model¶
Exports a model given the task. Checks for discrepancies as well. The latency given are just for one run. It tells how long the benchmark runs but it is far from the latency measure we can get by running multiple times the same model.
python -m onnx_diagnostic validate -m arnir0/Tiny-LLM --run -v 1 --export export-nostrict -o dump_models --patch
[validate_model] dump into 'arnir0_Tiny-LLM-export-nostrict'
[validate_model] validate model id 'arnir0/Tiny-LLM'
[validate_model] get dummy inputs with input_options=None...
[get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
[get_untrained_model_with_inputs] use preinstalled 'arnir0/Tiny-LLM'
[get_untrained_model_with_inputs] architecture='LlamaForCausalLM'
[get_untrained_model_with_inputs] cls='LlamaConfig'
[get_untrained_model_with_inputs] task='text-generation'
[get_untrained_model_with_inputs] use fct=<function get_inputs at 0x7fcff7892160>
[validate_model] --
[validate_model] task=text-generation
[validate_model] size=49.549072265625 Mb
[validate_model] n_weights=12.988992 millions parameters
[validate_model] +INPUT input_ids=T7s2x3
[validate_model] +INPUT attention_mask=T7s2x33
[validate_model] +INPUT position_ids=T7s2x3
[validate_model] +INPUT past_key_values=DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])
[validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
[validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE past_key_values=#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]
[validate_model] --
[validate_model] -- run the model...
[validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_model] done (run)
[validate_model] -- export the model with 'export-nostrict', optimization=None
[validate_model] applies patches before exporting stop_if_static=0
[validate_model] run patched model...
[validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
/home/xadupre/github/onnx-diagnostic/onnx_diagnostic/helpers/helper.py:1295: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
Consider using tensor.detach() first. (Triggered internally at /pytorch/aten/src/ATen/native/Scalar.cpp:22.)
float(diff.max()),
[validate_model] done (patched run)
[validate_model] patched discrepancies=abs=0, rel=0
[call_torch_export_export] exporter='export-nostrict', strict=False, optimization=None
[call_torch_export_export] args=()
[call_torch_export_export] kwargs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[call_torch_export_export] dynamic_shapes=dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]])
[call_torch_export_export] dynamic_shapes_export_export=dict(input_ids:{0:Dim(batch),1:DYNAMIC},attention_mask:{0:Dim(batch),1:DYNAMIC},position_ids:{0:Dim(batch),1:DYNAMIC},past_key_values:#2[#1[{0:Dim(batch),2:DYNAMIC}],#1[{0:Dim(batch),2:DYNAMIC}]])
[call_torch_export_export] export...
[call_torch_export_export] done (export) with 158 nodes
[validate_model] run exported model...
[validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_model] done (exported run)
[validate_model] exported discrepancies=abs=0, rel=0
[validate_model] -- dumps exported program in 'dump_models/arnir0_Tiny-LLM-export-nostrict'...
[validate_model] done (dump ep)
[validate_model] dumps statistics in 'dump_models/arnir0_Tiny-LLM-export-nostrict'...
[validate_model] done (dump)
[validate_model] -- done (final)
-- summary --
:disc_exported_abs,0;
:disc_exported_dnan,0;
:disc_exported_n,204672.0;
:disc_exported_rel,0;
:disc_exported_sum,0.0;
:disc_patched_abs,0;
:disc_patched_dnan,0;
:disc_patched_n,204672.0;
:disc_patched_rel,0;
:disc_patched_sum,0.0;
:dump_folder,dump_models/arnir0_Tiny-LLM-export-nostrict;
:dump_folder_name,arnir0_Tiny-LLM-export-nostrict;
:export_args,();
:export_dynamic_shapes,dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]);
:export_dynamic_shapes_export_export,dict(input_ids:{0:Dim(batch),1:DYNAMIC},attention_mask:{0:Dim(batch),1:DYNAMIC},position_ids:{0:Dim(batch),1:DYNAMIC},past_key_values:#2[#1[{0:Dim(batch),2:DYNAMIC}],#1[{0:Dim(batch),2:DYNAMIC}]]);
:export_exporter,export-nostrict;
:export_graph_nodes,158;
:export_kwargs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
:export_optimization,;
:export_strict,False;
:model_class,LlamaForCausalLM;
:model_config,{'vocab_size':32000,'max_position_embeddings':1024,'hidden_size':192,'intermediate_size':1024,'num_hidden_layers':1,'num_attention_heads':2,'num_key_value_heads':1,'hidden_act':'silu','initializer_range':0.02,'rms_norm_eps':1e-05,'pretraining_tp':1,'use_cache':True,'rope_theta':10000.0,'rope_scaling':None,'attention_bias':False,'attention_dropout':0.0,'mlp_bias':False,'head_dim':96,'return_dict':True,'output_hidden_states':False,'output_attentions':False,'torchscript':False,'torch_dtype':'float32','use_bfloat16':False,'tf_legacy_loss':False,'pruned_heads':{},'tie_word_embeddings':False,'chunk_size_feed_forward':0,'is_encoder_decoder':False,'is_decoder':False,'cross_attention_hidden_size':None,'add_cross_attention':False,'tie_encoder_decoder':False,'max_length':20,'min_length':0,'do_sample':False,'early_stopping':False,'num_beams':1,'num_beam_groups':1,'diversity_penalty':0.0,'temperature':1.0,'top_k':50,'top_p':1.0,'typical_p':1.0,'repetition_penalty':1.0,'length_penalty':1.0,'no_repeat_ngram_size':0,'encoder_no_repeat_ngram_size':0,'bad_words_ids':None,'num_return_sequences':1,'output_scores':False,'return_dict_in_generate':False,'forced_bos_token_id':None,'forced_eos_token_id':None,'remove_invalid_values':False,'exponential_decay_length_penalty':None,'suppress_tokens':None,'begin_suppress_tokens':None,'architectures':['LlamaForCausalLM'],'finetuning_task':None,'id2label':{0:'LABEL_0',1:'LABEL_1'},'label2id':{'LABEL_0':0,'LABEL_1':1},'tokenizer_class':None,'prefix':None,'bos_token_id':1,'pad_token_id':None,'eos_token_id':2,'sep_token_id':None,'decoder_start_token_id':None,'task_specific_params':None,'problem_type':None,'_name_or_path':'','_attn_implementation_autoset':True,'transformers_version':'4.52.0.dev0','model_type':'llama'};
:model_config_class,LlamaConfig;
:model_expected,CausalLMOutputWithPast(logits:T1s2x3x32000,past_key_values:DynamicCache(key_cache=#1[T1s2x1x33x96], value_cache=#1[T1s2x1x33x96]));
:model_id,arnir0/Tiny-LLM;
:model_inputs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
:model_inputs_opionts,;
:model_nweights,12988992;
:model_shapes,str;
:model_size,51955968;
:model_task,text-generation;
:time_create,0.1318674469985126;
:time_export_export,0.8615079670016712;
:time_run,0.006955013999686344;
:time_run_exported,0.02028118300222559;
:time_run_patched,0.0042310229982831515;
:version_date,2025-04-25T15:02:23;
:version_device,;
:version_do_run,True;
:version_drop_inputs,[];
:version_dtype,;
:version_dump_folder,dump_models;
:version_exporter,export-nostrict;
:version_model_id,arnir0/Tiny-LLM;
:version_numpy,2.2.5;
:version_onnx,1.19.0;
:version_onnx_diagnostic,0.4.2;
:version_onnxruntime,1.22.0+cu126;
:version_onnxscript,0.3.0.dev20250301;
:version_optimization,;
:version_ortfusiontype,;
:version_patch,True;
:version_quiet,False;
:version_stop_if_static,0;
:version_torch,2.8.0.dev20250423+cu126;
:version_trained,False;
:version_transformers,4.52.0.dev0;
Validate ONNX discrepancies¶
Let’s export with ONNX this time and checks for discrepancies.
python -m onnx_diagnostic validate -m arnir0/Tiny-LLM --run -v 1 --export onnx-dynamo -o dump_models --patch --opt ir
[validate_model] dump into 'arnir0_Tiny-LLM-onnx-dynamo-ir'
[validate_model] validate model id 'arnir0/Tiny-LLM'
[validate_model] get dummy inputs with input_options=None...
[get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
[get_untrained_model_with_inputs] use preinstalled 'arnir0/Tiny-LLM'
[get_untrained_model_with_inputs] architecture='LlamaForCausalLM'
[get_untrained_model_with_inputs] cls='LlamaConfig'
[get_untrained_model_with_inputs] task='text-generation'
[get_untrained_model_with_inputs] use fct=<function get_inputs at 0x7fcff7892160>
[validate_model] --
[validate_model] task=text-generation
[validate_model] size=49.549072265625 Mb
[validate_model] n_weights=12.988992 millions parameters
[validate_model] +INPUT input_ids=T7s2x3
[validate_model] +INPUT attention_mask=T7s2x33
[validate_model] +INPUT position_ids=T7s2x3
[validate_model] +INPUT past_key_values=DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])
[validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
[validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE past_key_values=#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]
[validate_model] --
[validate_model] -- run the model...
[validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_model] done (run)
[validate_model] -- export the model with 'onnx-dynamo', optimization='ir'
[validate_model] applies patches before exporting stop_if_static=0
[validate_model] run patched model...
[validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_model] done (patched run)
[validate_model] patched discrepancies=abs=0, rel=0
[call_torch_export_onnx] exporter='onnx-dynamo', optimization='ir'
[call_torch_export_onnx] args=()
[call_torch_export_onnx] kwargs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[call_torch_export_onnx] dynamic_shapes=dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]])
[call_torch_export_onnx] export...
[call_torch_export_onnx] export_export_kwargs=dict(dynamo:bool,dynamic_shapes:dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]))
[torch.onnx] Obtain model graph for `LlamaForCausalLM([...]` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `LlamaForCausalLM([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: cache+seq will not be used, since it shares the same shape constraints with another axis: seq_length.
warnings.warn(
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
Applied 54 of general pattern rewrite rules.
[call_torch_export_onnx] done (export)
[call_torch_export_onnx] starts optimization='ir'...
[call_torch_export_onnx] done (optimization)
[validate_model] dumps onnx program in 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'...
[validate_model] done (dump onnx) in 0.07412190300237853
[validate_model] dumps statistics in 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'...
[validate_model] done (dump)
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour=None
[validate_onnx_model] done (ort_session) flavour=None
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.152557373046875e-07, rel=0.00025080741642596183, n=204672.0
[validate_model] -- done (final)
-- summary --
:disc_onnx_ort_run_abs,7.152557373046875e-07;
:disc_onnx_ort_run_dnan,0;
:disc_onnx_ort_run_n,204672.0;
:disc_onnx_ort_run_rel,0.00025080741642596183;
:disc_onnx_ort_run_sum,0.01808798987008231;
:disc_patched_abs,0;
:disc_patched_dnan,0;
:disc_patched_n,204672.0;
:disc_patched_rel,0;
:disc_patched_sum,0.0;
:dump_folder,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir;
:dump_folder_name,arnir0_Tiny-LLM-onnx-dynamo-ir;
:export_args,();
:export_dynamo,True;
:export_exporter,onnx-dynamo;
:export_kwargs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
:export_optimization,ir;
:model_class,LlamaForCausalLM;
:model_config,{'vocab_size':32000,'max_position_embeddings':1024,'hidden_size':192,'intermediate_size':1024,'num_hidden_layers':1,'num_attention_heads':2,'num_key_value_heads':1,'hidden_act':'silu','initializer_range':0.02,'rms_norm_eps':1e-05,'pretraining_tp':1,'use_cache':True,'rope_theta':10000.0,'rope_scaling':None,'attention_bias':False,'attention_dropout':0.0,'mlp_bias':False,'head_dim':96,'return_dict':True,'output_hidden_states':False,'output_attentions':False,'torchscript':False,'torch_dtype':'float32','use_bfloat16':False,'tf_legacy_loss':False,'pruned_heads':{},'tie_word_embeddings':False,'chunk_size_feed_forward':0,'is_encoder_decoder':False,'is_decoder':False,'cross_attention_hidden_size':None,'add_cross_attention':False,'tie_encoder_decoder':False,'max_length':20,'min_length':0,'do_sample':False,'early_stopping':False,'num_beams':1,'num_beam_groups':1,'diversity_penalty':0.0,'temperature':1.0,'top_k':50,'top_p':1.0,'typical_p':1.0,'repetition_penalty':1.0,'length_penalty':1.0,'no_repeat_ngram_size':0,'encoder_no_repeat_ngram_size':0,'bad_words_ids':None,'num_return_sequences':1,'output_scores':False,'return_dict_in_generate':False,'forced_bos_token_id':None,'forced_eos_token_id':None,'remove_invalid_values':False,'exponential_decay_length_penalty':None,'suppress_tokens':None,'begin_suppress_tokens':None,'architectures':['LlamaForCausalLM'],'finetuning_task':None,'id2label':{0:'LABEL_0',1:'LABEL_1'},'label2id':{'LABEL_0':0,'LABEL_1':1},'tokenizer_class':None,'prefix':None,'bos_token_id':1,'pad_token_id':None,'eos_token_id':2,'sep_token_id':None,'decoder_start_token_id':None,'task_specific_params':None,'problem_type':None,'_name_or_path':'','_attn_implementation_autoset':True,'transformers_version':'4.52.0.dev0','model_type':'llama'};
:model_config_class,LlamaConfig;
:model_expected,CausalLMOutputWithPast(logits:T1s2x3x32000,past_key_values:DynamicCache(key_cache=#1[T1s2x1x33x96], value_cache=#1[T1s2x1x33x96]));
:model_id,arnir0/Tiny-LLM;
:model_inputs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
:model_inputs_opionts,;
:model_nweights,12988992;
:model_shapes,str;
:model_size,51955968;
:model_task,text-generation;
:onnx_filename,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.onnx;
:onnx_ort_inputs,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_size,294999;
:time_create,0.135851458999241;
:time_export_onnx,3.120271321000473;
:time_export_onnx_opt_ir,0.04257779399995343;
:time_onnx_save,0.07412190300237853;
:time_run,0.015620837999449577;
:time_run_patched,0.01565436199962278;
:time_time_onnx_ort_create,0.16686298700005864;
:time_time_onnx_ort_run,0.013971129999845289;
:version_date,2025-04-25T15:02:24;
:version_device,;
:version_do_run,True;
:version_drop_inputs,[];
:version_dtype,;
:version_dump_folder,dump_models;
:version_exporter,onnx-dynamo;
:version_model_id,arnir0/Tiny-LLM;
:version_numpy,2.2.5;
:version_onnx,1.19.0;
:version_onnx_diagnostic,0.4.2;
:version_onnxruntime,1.22.0+cu126;
:version_onnxscript,0.3.0.dev20250301;
:version_optimization,ir;
:version_ortfusiontype,;
:version_patch,True;
:version_quiet,False;
:version_stop_if_static,0;
:version_torch,2.8.0.dev20250423+cu126;
:version_trained,False;
:version_transformers,4.52.0.dev0;
Run onnxruntime fusions¶
This option runs transformers optimizations
implemented in onnxruntime. The list of supported model_type
can be found in the documentation
of function onnx_diagnostic.torch_models.test_helper.run_ort_fusion()
.
python -m onnx_diagnostic validate -m arnir0/Tiny-LLM --run -v 1 --export onnx-dynamo -o dump_models --patch --opt ir --ortfusiontype ALL
[validate_model] dump into 'arnir0_Tiny-LLM-onnx-dynamo-ir'
[validate_model] validate model id 'arnir0/Tiny-LLM'
[validate_model] get dummy inputs with input_options=None...
[get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
[get_untrained_model_with_inputs] use preinstalled 'arnir0/Tiny-LLM'
[get_untrained_model_with_inputs] architecture='LlamaForCausalLM'
[get_untrained_model_with_inputs] cls='LlamaConfig'
[get_untrained_model_with_inputs] task='text-generation'
[get_untrained_model_with_inputs] use fct=<function get_inputs at 0x7fcff7892160>
[validate_model] --
[validate_model] task=text-generation
[validate_model] size=49.549072265625 Mb
[validate_model] n_weights=12.988992 millions parameters
[validate_model] +INPUT input_ids=T7s2x3
[validate_model] +INPUT attention_mask=T7s2x33
[validate_model] +INPUT position_ids=T7s2x3
[validate_model] +INPUT past_key_values=DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])
[validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
[validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE past_key_values=#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]
[validate_model] --
[validate_model] -- run the model...
[validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_model] done (run)
[validate_model] -- export the model with 'onnx-dynamo', optimization='ir'
[validate_model] applies patches before exporting stop_if_static=0
[validate_model] run patched model...
[validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_model] done (patched run)
[validate_model] patched discrepancies=abs=0, rel=0
[call_torch_export_onnx] exporter='onnx-dynamo', optimization='ir'
[call_torch_export_onnx] args=()
[call_torch_export_onnx] kwargs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[call_torch_export_onnx] dynamic_shapes=dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]])
[call_torch_export_onnx] export...
[call_torch_export_onnx] export_export_kwargs=dict(dynamo:bool,dynamic_shapes:dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]))
[torch.onnx] Obtain model graph for `LlamaForCausalLM([...]` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `LlamaForCausalLM([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: cache+seq will not be used, since it shares the same shape constraints with another axis: seq_length.
warnings.warn(
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
/home/xadupre/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
Applied 54 of general pattern rewrite rules.
[call_torch_export_onnx] done (export)
[call_torch_export_onnx] starts optimization='ir'...
[call_torch_export_onnx] done (optimization)
[validate_model] dumps onnx program in 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'...
[validate_model] done (dump onnx) in 0.10028175900151837
[validate_model] dumps statistics in 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'...
[validate_model] done (dump)
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour=None
[validate_onnx_model] done (ort_session) flavour=None
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002678153179787634, n=204672.0
[validate_model] run onnxruntime fusion for 'bart'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'bart' in 0.40075301400065655, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bart.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortbart'
[validate_onnx_model] done (ort_session) flavour='ortbart'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'bert'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'bert' in 0.28629150800043135, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortbert'
[validate_onnx_model] done (ort_session) flavour='ortbert'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'bert_keras'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'bert_keras' in 0.352392017000966, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert_keras.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortbert_keras'
[validate_onnx_model] done (ort_session) flavour='ortbert_keras'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'bert_tf'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'bert_tf' in 0.30486665300122695, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert_tf.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortbert_tf'
[validate_onnx_model] done (ort_session) flavour='ortbert_tf'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'clip'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'clip' in 0.34893476299839676, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.clip.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortclip'
[validate_onnx_model] done (ort_session) flavour='ortclip'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'conformer'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'conformer' in 0.3070983799989335, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.conformer.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortconformer'
[validate_onnx_model] done (ort_session) flavour='ortconformer'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'gpt2'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'gpt2' in 0.3660418559993559, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt2.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortgpt2'
[validate_onnx_model] done (ort_session) flavour='ortgpt2'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'gpt2_tf'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'gpt2_tf' in 0.26004131299850997, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt2_tf.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortgpt2_tf'
[validate_onnx_model] done (ort_session) flavour='ortgpt2_tf'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'gpt_neox'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'gpt_neox' in 0.1955189580003207, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt_neox.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortgpt_neox'
[validate_onnx_model] done (ort_session) flavour='ortgpt_neox'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'mmdit'
fusion: 0%| | 0/5 [00:00<?, ?it/s]failed in shape inference <class 'AssertionError'>
The optimized model requires LayerNormalization with broadcast support. Please use onnxruntime-gpu>=1.21 for inference.
fusion: 20%|## | 1/5 [00:00<00:00, 80.27it/s]
fusion: 100%|##########| 5/5 [00:00<00:00, 183.64it/s]
[validate_model] done 'mmdit' in 0.1283201480000571, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.mmdit.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortmmdit'
[validate_onnx_model] done (ort_session) flavour='ortmmdit'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'phi'
[validate_model] done 'phi' in 0.018583577999379486, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.phi.onnx'
[validate_model] run onnxruntime fusion for 'sam2'
sam2 fusion: 0%| | 0/12 [00:00<?, ?it/s]failed in shape inference <class 'AssertionError'>
symbolic shape inference disabled or failed.
sam2 fusion: 50%|##### | 6/12 [00:00<00:00, 296.41it/s]
sam2 fusion: 100%|##########| 12/12 [00:00<00:00, 470.97it/s]
[validate_model] done 'sam2' in 0.10133608999967691, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.sam2.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortsam2'
[validate_onnx_model] done (ort_session) flavour='ortsam2'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002678153179787634, n=204672.0
[validate_model] run onnxruntime fusion for 'swin'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'swin' in 0.2537386920012068, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.swin.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortswin'
[validate_onnx_model] done (ort_session) flavour='ortswin'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 't5'
failed in shape inference <class 'AssertionError'>
[validate_model] done 't5' in 0.47997708399998373, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.t5.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortt5'
[validate_onnx_model] done (ort_session) flavour='ortt5'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'tnlr'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'tnlr' in 0.18051545100024668, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.tnlr.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='orttnlr'
[validate_onnx_model] done (ort_session) flavour='orttnlr'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] run onnxruntime fusion for 'unet'
fusion: 0%| | 0/18 [00:00<?, ?it/s]failed in shape inference <class 'AssertionError'>
symbolic shape inference disabled or failed.
fusion: 50%|##### | 9/18 [00:00<00:00, 338.49it/s]
SkipGroupNorm fusion will be skipped since symbolic shape inference disabled or failed.
fusion: 67%|######6 | 12/18 [00:00<00:00, 424.07it/s]
fusion: 100%|##########| 18/18 [00:00<00:00, 515.90it/s]
[validate_model] done 'unet' in 0.14335201600260916, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.unet.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortunet'
[validate_onnx_model] done (ort_session) flavour='ortunet'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002678153179787634, n=204672.0
[validate_model] run onnxruntime fusion for 'vae'
fusion: 0%| | 0/18 [00:00<?, ?it/s]failed in shape inference <class 'AssertionError'>
symbolic shape inference disabled or failed.
fusion: 50%|##### | 9/18 [00:00<00:00, 386.49it/s]
SkipGroupNorm fusion will be skipped since symbolic shape inference disabled or failed.
fusion: 67%|######6 | 12/18 [00:00<00:00, 480.75it/s]
fusion: 100%|##########| 18/18 [00:00<00:00, 568.84it/s]
[validate_model] done 'vae' in 0.10861333000138984, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.vae.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortvae'
[validate_onnx_model] done (ort_session) flavour='ortvae'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002678153179787634, n=204672.0
[validate_model] run onnxruntime fusion for 'vit'
failed in shape inference <class 'AssertionError'>
[validate_model] done 'vit' in 0.4301966810016893, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.vit.onnx'
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortvit'
[validate_onnx_model] done (ort_session) flavour='ortvit'
[validate_onnx_model] -- make_feeds...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
[validate_onnx_model] discrepancies=abs=7.748603820800781e-07, rel=0.0002945900909882278, n=204672.0
[validate_model] -- done (final)
-- summary --
:ERR_onnx_missing_ortphi,FileNotFoundError('dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.phi.onnx');
:ERR_opt_ort_phi,'method' object is not iterable;
:disc_onnx_ort_run_abs,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortbart,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortbert,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortbert_keras,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortbert_tf,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortclip,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortconformer,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortgpt2,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortgpt2_tf,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortgpt_neox,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortmmdit,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortsam2,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortswin,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortt5,7.748603820800781e-07;
:disc_onnx_ort_run_abs_orttnlr,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortunet,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortvae,7.748603820800781e-07;
:disc_onnx_ort_run_abs_ortvit,7.748603820800781e-07;
:disc_onnx_ort_run_dnan,0;
:disc_onnx_ort_run_dnan_ortbart,0;
:disc_onnx_ort_run_dnan_ortbert,0;
:disc_onnx_ort_run_dnan_ortbert_keras,0;
:disc_onnx_ort_run_dnan_ortbert_tf,0;
:disc_onnx_ort_run_dnan_ortclip,0;
:disc_onnx_ort_run_dnan_ortconformer,0;
:disc_onnx_ort_run_dnan_ortgpt2,0;
:disc_onnx_ort_run_dnan_ortgpt2_tf,0;
:disc_onnx_ort_run_dnan_ortgpt_neox,0;
:disc_onnx_ort_run_dnan_ortmmdit,0;
:disc_onnx_ort_run_dnan_ortsam2,0;
:disc_onnx_ort_run_dnan_ortswin,0;
:disc_onnx_ort_run_dnan_ortt5,0;
:disc_onnx_ort_run_dnan_orttnlr,0;
:disc_onnx_ort_run_dnan_ortunet,0;
:disc_onnx_ort_run_dnan_ortvae,0;
:disc_onnx_ort_run_dnan_ortvit,0;
:disc_onnx_ort_run_n,204672.0;
:disc_onnx_ort_run_n_ortbart,204672.0;
:disc_onnx_ort_run_n_ortbert,204672.0;
:disc_onnx_ort_run_n_ortbert_keras,204672.0;
:disc_onnx_ort_run_n_ortbert_tf,204672.0;
:disc_onnx_ort_run_n_ortclip,204672.0;
:disc_onnx_ort_run_n_ortconformer,204672.0;
:disc_onnx_ort_run_n_ortgpt2,204672.0;
:disc_onnx_ort_run_n_ortgpt2_tf,204672.0;
:disc_onnx_ort_run_n_ortgpt_neox,204672.0;
:disc_onnx_ort_run_n_ortmmdit,204672.0;
:disc_onnx_ort_run_n_ortsam2,204672.0;
:disc_onnx_ort_run_n_ortswin,204672.0;
:disc_onnx_ort_run_n_ortt5,204672.0;
:disc_onnx_ort_run_n_orttnlr,204672.0;
:disc_onnx_ort_run_n_ortunet,204672.0;
:disc_onnx_ort_run_n_ortvae,204672.0;
:disc_onnx_ort_run_n_ortvit,204672.0;
:disc_onnx_ort_run_rel,0.0002678153179787634;
:disc_onnx_ort_run_rel_ortbart,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortbert,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortbert_keras,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortbert_tf,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortclip,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortconformer,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortgpt2,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortgpt2_tf,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortgpt_neox,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortmmdit,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortsam2,0.0002678153179787634;
:disc_onnx_ort_run_rel_ortswin,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortt5,0.0002945900909882278;
:disc_onnx_ort_run_rel_orttnlr,0.0002945900909882278;
:disc_onnx_ort_run_rel_ortunet,0.0002678153179787634;
:disc_onnx_ort_run_rel_ortvae,0.0002678153179787634;
:disc_onnx_ort_run_rel_ortvit,0.0002945900909882278;
:disc_onnx_ort_run_sum,0.01905394762297874;
:disc_onnx_ort_run_sum_ortbart,0.019863042089980354;
:disc_onnx_ort_run_sum_ortbert,0.019863042089980354;
:disc_onnx_ort_run_sum_ortbert_keras,0.019863042089980354;
:disc_onnx_ort_run_sum_ortbert_tf,0.019863042089980354;
:disc_onnx_ort_run_sum_ortclip,0.019863042089980354;
:disc_onnx_ort_run_sum_ortconformer,0.019863042089980354;
:disc_onnx_ort_run_sum_ortgpt2,0.019863042089980354;
:disc_onnx_ort_run_sum_ortgpt2_tf,0.019863042089980354;
:disc_onnx_ort_run_sum_ortgpt_neox,0.019863042089980354;
:disc_onnx_ort_run_sum_ortmmdit,0.019863042089980354;
:disc_onnx_ort_run_sum_ortsam2,0.01905394762297874;
:disc_onnx_ort_run_sum_ortswin,0.019863042089980354;
:disc_onnx_ort_run_sum_ortt5,0.019863042089980354;
:disc_onnx_ort_run_sum_orttnlr,0.019863042089980354;
:disc_onnx_ort_run_sum_ortunet,0.01905394762297874;
:disc_onnx_ort_run_sum_ortvae,0.01905394762297874;
:disc_onnx_ort_run_sum_ortvit,0.019863042089980354;
:disc_patched_abs,0;
:disc_patched_dnan,0;
:disc_patched_n,204672.0;
:disc_patched_rel,0;
:disc_patched_sum,0.0;
:dump_folder,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir;
:dump_folder_name,arnir0_Tiny-LLM-onnx-dynamo-ir;
:export_args,();
:export_dynamo,True;
:export_exporter,onnx-dynamo;
:export_kwargs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
:export_optimization,ir;
:model_class,LlamaForCausalLM;
:model_config,{'vocab_size':32000,'max_position_embeddings':1024,'hidden_size':192,'intermediate_size':1024,'num_hidden_layers':1,'num_attention_heads':2,'num_key_value_heads':1,'hidden_act':'silu','initializer_range':0.02,'rms_norm_eps':1e-05,'pretraining_tp':1,'use_cache':True,'rope_theta':10000.0,'rope_scaling':None,'attention_bias':False,'attention_dropout':0.0,'mlp_bias':False,'head_dim':96,'return_dict':True,'output_hidden_states':False,'output_attentions':False,'torchscript':False,'torch_dtype':'float32','use_bfloat16':False,'tf_legacy_loss':False,'pruned_heads':{},'tie_word_embeddings':False,'chunk_size_feed_forward':0,'is_encoder_decoder':False,'is_decoder':False,'cross_attention_hidden_size':None,'add_cross_attention':False,'tie_encoder_decoder':False,'max_length':20,'min_length':0,'do_sample':False,'early_stopping':False,'num_beams':1,'num_beam_groups':1,'diversity_penalty':0.0,'temperature':1.0,'top_k':50,'top_p':1.0,'typical_p':1.0,'repetition_penalty':1.0,'length_penalty':1.0,'no_repeat_ngram_size':0,'encoder_no_repeat_ngram_size':0,'bad_words_ids':None,'num_return_sequences':1,'output_scores':False,'return_dict_in_generate':False,'forced_bos_token_id':None,'forced_eos_token_id':None,'remove_invalid_values':False,'exponential_decay_length_penalty':None,'suppress_tokens':None,'begin_suppress_tokens':None,'architectures':['LlamaForCausalLM'],'finetuning_task':None,'id2label':{0:'LABEL_0',1:'LABEL_1'},'label2id':{'LABEL_0':0,'LABEL_1':1},'tokenizer_class':None,'prefix':None,'bos_token_id':1,'pad_token_id':None,'eos_token_id':2,'sep_token_id':None,'decoder_start_token_id':None,'task_specific_params':None,'problem_type':None,'_name_or_path':'','_attn_implementation_autoset':True,'transformers_version':'4.52.0.dev0','model_type':'llama'};
:model_config_class,LlamaConfig;
:model_expected,CausalLMOutputWithPast(logits:T1s2x3x32000,past_key_values:DynamicCache(key_cache=#1[T1s2x1x33x96], value_cache=#1[T1s2x1x33x96]));
:model_id,arnir0/Tiny-LLM;
:model_inputs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
:model_inputs_opionts,;
:model_nweights,12988992;
:model_shapes,str;
:model_size,51955968;
:model_task,text-generation;
:onnx_filename,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.onnx;
:onnx_filename_ortbart,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bart.onnx;
:onnx_filename_ortbert,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert.onnx;
:onnx_filename_ortbert_keras,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert_keras.onnx;
:onnx_filename_ortbert_tf,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert_tf.onnx;
:onnx_filename_ortclip,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.clip.onnx;
:onnx_filename_ortconformer,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.conformer.onnx;
:onnx_filename_ortgpt2,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt2.onnx;
:onnx_filename_ortgpt2_tf,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt2_tf.onnx;
:onnx_filename_ortgpt_neox,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt_neox.onnx;
:onnx_filename_ortmmdit,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.mmdit.onnx;
:onnx_filename_ortsam2,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.sam2.onnx;
:onnx_filename_ortswin,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.swin.onnx;
:onnx_filename_ortt5,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.t5.onnx;
:onnx_filename_orttnlr,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.tnlr.onnx;
:onnx_filename_ortunet,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.unet.onnx;
:onnx_filename_ortvae,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.vae.onnx;
:onnx_filename_ortvit,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.vit.onnx;
:onnx_ort_inputs,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortbart,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortbert,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortbert_keras,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortbert_tf,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortclip,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortconformer,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortgpt2,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortgpt2_tf,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortgpt_neox,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortmmdit,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortsam2,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortswin,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortt5,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_orttnlr,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortunet,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortvae,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_ort_inputs_ortvit,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
:onnx_size,295014;
:onnx_size_ortbart,249371;
:onnx_size_ortbert,249371;
:onnx_size_ortbert_keras,249427;
:onnx_size_ortbert_tf,249400;
:onnx_size_ortclip,249371;
:onnx_size_ortconformer,249416;
:onnx_size_ortgpt2,249371;
:onnx_size_ortgpt2_tf,249398;
:onnx_size_ortgpt_neox,249407;
:onnx_size_ortmmdit,263981;
:onnx_size_ortsam2,297174;
:onnx_size_ortswin,249371;
:onnx_size_ortt5,249353;
:onnx_size_orttnlr,249371;
:onnx_size_ortunet,297174;
:onnx_size_ortvae,297165;
:onnx_size_ortvit,249362;
:opt_ort_bart_delta_node,-30;
:opt_ort_bart_duration,0.16702407599950675;
:opt_ort_bart_duration_save,0.19048016699889558;
:opt_ort_bart_n_nodes1,228;
:opt_ort_bart_n_nodes2,198;
:opt_ort_bert_delta_node,-30;
:opt_ort_bert_duration,0.20450454000092577;
:opt_ort_bert_duration_save,0.051647533000505064;
:opt_ort_bert_keras_delta_node,-30;
:opt_ort_bert_keras_duration,0.23843554799896083;
:opt_ort_bert_keras_duration_save,0.06444993600234739;
:opt_ort_bert_keras_n_nodes1,228;
:opt_ort_bert_keras_n_nodes2,198;
:opt_ort_bert_n_nodes1,228;
:opt_ort_bert_n_nodes2,198;
:opt_ort_bert_tf_delta_node,-30;
:opt_ort_bert_tf_duration,0.19772093600113294;
:opt_ort_bert_tf_duration_save,0.05476687400005176;
:opt_ort_bert_tf_n_nodes1,228;
:opt_ort_bert_tf_n_nodes2,198;
:opt_ort_clip_delta_node,-30;
:opt_ort_clip_duration,0.23550658599924645;
:opt_ort_clip_duration_save,0.057842898000671994;
:opt_ort_clip_n_nodes1,228;
:opt_ort_clip_n_nodes2,198;
:opt_ort_conformer_delta_node,-30;
:opt_ort_conformer_duration,0.2312637140021252;
:opt_ort_conformer_duration_save,0.050621952999790665;
:opt_ort_conformer_n_nodes1,228;
:opt_ort_conformer_n_nodes2,198;
:opt_ort_gpt2_delta_node,-30;
:opt_ort_gpt2_duration,0.2571823569996923;
:opt_ort_gpt2_duration_save,0.0552203629995347;
:opt_ort_gpt2_n_nodes1,228;
:opt_ort_gpt2_n_nodes2,198;
:opt_ort_gpt2_tf_delta_node,-30;
:opt_ort_gpt2_tf_duration,0.15394551400095224;
:opt_ort_gpt2_tf_duration_save,0.053941004000080284;
:opt_ort_gpt2_tf_n_nodes1,228;
:opt_ort_gpt2_tf_n_nodes2,198;
:opt_ort_gpt_neox_delta_node,-30;
:opt_ort_gpt_neox_duration,0.10887350600023638;
:opt_ort_gpt_neox_duration_save,0.05389379099869984;
:opt_ort_gpt_neox_n_nodes1,228;
:opt_ort_gpt_neox_n_nodes2,198;
:opt_ort_mmdit_delta_node,-22;
:opt_ort_mmdit_duration,0.03512645000228076;
:opt_ort_mmdit_duration_save,0.054776773002231494;
:opt_ort_mmdit_n_nodes1,228;
:opt_ort_mmdit_n_nodes2,206;
:opt_ort_phi_duration,0.00011379900024621747;
:opt_ort_sam2_delta_node,0;
:opt_ort_sam2_duration,0.031599996000295505;
:opt_ort_sam2_duration_save,0.0550883280011476;
:opt_ort_sam2_n_nodes1,228;
:opt_ort_sam2_n_nodes2,228;
:opt_ort_swin_delta_node,-30;
:opt_ort_swin_duration,0.16732090000004973;
:opt_ort_swin_duration_save,0.05281629400269594;
:opt_ort_swin_n_nodes1,228;
:opt_ort_swin_n_nodes2,198;
:opt_ort_t5_delta_node,-30;
:opt_ort_t5_duration,0.2676118029994541;
:opt_ort_t5_duration_save,0.1826407940025092;
:opt_ort_t5_n_nodes1,228;
:opt_ort_t5_n_nodes2,198;
:opt_ort_tnlr_delta_node,-30;
:opt_ort_tnlr_duration,0.10796501000004355;
:opt_ort_tnlr_duration_save,0.05322882600012235;
:opt_ort_tnlr_n_nodes1,228;
:opt_ort_tnlr_n_nodes2,198;
:opt_ort_unet_delta_node,0;
:opt_ort_unet_duration,0.04400993300077971;
:opt_ort_unet_duration_save,0.0556437109989929;
:opt_ort_unet_n_nodes1,228;
:opt_ort_unet_n_nodes2,228;
:opt_ort_vae_delta_node,0;
:opt_ort_vae_duration,0.038967807002336485;
:opt_ort_vae_duration_save,0.053308243997889804;
:opt_ort_vae_n_nodes1,228;
:opt_ort_vae_n_nodes2,228;
:opt_ort_vit_delta_node,-30;
:opt_ort_vit_duration,0.2683015680013341;
:opt_ort_vit_duration_save,0.08969260700177983;
:opt_ort_vit_n_nodes1,228;
:opt_ort_vit_n_nodes2,198;
:time_create,0.11956799000108731;
:time_export_onnx,3.5259304530009103;
:time_export_onnx_opt_ir,0.04180493899912108;
:time_onnx_save,0.10028175900151837;
:time_ortfusion_ortbart,0.40075301400065655;
:time_ortfusion_ortbert,0.28629150800043135;
:time_ortfusion_ortbert_keras,0.352392017000966;
:time_ortfusion_ortbert_tf,0.30486665300122695;
:time_ortfusion_ortclip,0.34893476299839676;
:time_ortfusion_ortconformer,0.3070983799989335;
:time_ortfusion_ortgpt2,0.3660418559993559;
:time_ortfusion_ortgpt2_tf,0.26004131299850997;
:time_ortfusion_ortgpt_neox,0.1955189580003207;
:time_ortfusion_ortmmdit,0.1283201480000571;
:time_ortfusion_ortphi,0.018583577999379486;
:time_ortfusion_ortsam2,0.10133608999967691;
:time_ortfusion_ortswin,0.2537386920012068;
:time_ortfusion_ortt5,0.47997708399998373;
:time_ortfusion_orttnlr,0.18051545100024668;
:time_ortfusion_ortunet,0.14335201600260916;
:time_ortfusion_ortvae,0.10861333000138984;
:time_ortfusion_ortvit,0.4301966810016893;
:time_run,0.019737878999876557;
:time_run_patched,0.005853747999935877;
:time_time_onnx_ort_create,0.18777374400087865;
:time_time_onnx_ort_create_ortbart,0.07551231200341135;
:time_time_onnx_ort_create_ortbert,0.12260419499943964;
:time_time_onnx_ort_create_ortbert_keras,0.09592613199856714;
:time_time_onnx_ort_create_ortbert_tf,0.07475993999833008;
:time_time_onnx_ort_create_ortclip,0.04419616099767154;
:time_time_onnx_ort_create_ortconformer,0.061266297998372465;
:time_time_onnx_ort_create_ortgpt2,0.08176588200149126;
:time_time_onnx_ort_create_ortgpt2_tf,0.06141125900103361;
:time_time_onnx_ort_create_ortgpt_neox,0.09537127799922018;
:time_time_onnx_ort_create_ortmmdit,0.05915626299974974;
:time_time_onnx_ort_create_ortsam2,0.14806341499934206;
:time_time_onnx_ort_create_ortswin,0.040857841002434725;
:time_time_onnx_ort_create_ortt5,0.08266376599931391;
:time_time_onnx_ort_create_orttnlr,0.09833833800075809;
:time_time_onnx_ort_create_ortunet,0.06496018699908745;
:time_time_onnx_ort_create_ortvae,0.11905257900070865;
:time_time_onnx_ort_create_ortvit,0.09996090200002072;
:time_time_onnx_ort_run,0.002378155000769766;
:time_time_onnx_ort_run_ortbart,0.004704349001258379;
:time_time_onnx_ort_run_ortbert,0.005000416000257246;
:time_time_onnx_ort_run_ortbert_keras,0.005339641000318807;
:time_time_onnx_ort_run_ortbert_tf,0.004371618997538462;
:time_time_onnx_ort_run_ortclip,0.0016990139993140474;
:time_time_onnx_ort_run_ortconformer,0.005347147998691071;
:time_time_onnx_ort_run_ortgpt2,0.005582612000580411;
:time_time_onnx_ort_run_ortgpt2_tf,0.0032799399996292777;
:time_time_onnx_ort_run_ortgpt_neox,0.005223287000262644;
:time_time_onnx_ort_run_ortmmdit,0.0027631069970084354;
:time_time_onnx_ort_run_ortsam2,0.0020690250021289103;
:time_time_onnx_ort_run_ortswin,0.0027659929983201437;
:time_time_onnx_ort_run_ortt5,0.004460136999114184;
:time_time_onnx_ort_run_orttnlr,0.006693765997624723;
:time_time_onnx_ort_run_ortunet,0.001990652999666054;
:time_time_onnx_ort_run_ortvae,0.0025483979989076033;
:time_time_onnx_ort_run_ortvit,0.005597908002528129;
:version_date,2025-04-25T15:02:25;
:version_device,;
:version_do_run,True;
:version_drop_inputs,[];
:version_dtype,;
:version_dump_folder,dump_models;
:version_exporter,onnx-dynamo;
:version_model_id,arnir0/Tiny-LLM;
:version_numpy,2.2.5;
:version_onnx,1.19.0;
:version_onnx_diagnostic,0.4.2;
:version_onnxruntime,1.22.0+cu126;
:version_onnxscript,0.3.0.dev20250301;
:version_optimization,ir;
:version_ortbart_hidden_size,192;
:version_ortbart_num_attention_heads,2;
:version_ortbert_hidden_size,192;
:version_ortbert_keras_hidden_size,192;
:version_ortbert_keras_num_attention_heads,2;
:version_ortbert_num_attention_heads,2;
:version_ortbert_tf_hidden_size,192;
:version_ortbert_tf_num_attention_heads,2;
:version_ortclip_hidden_size,192;
:version_ortclip_num_attention_heads,2;
:version_ortconformer_hidden_size,192;
:version_ortconformer_num_attention_heads,2;
:version_ortfusiontype,ALL;
:version_ortgpt2_hidden_size,192;
:version_ortgpt2_num_attention_heads,2;
:version_ortgpt2_tf_hidden_size,192;
:version_ortgpt2_tf_num_attention_heads,2;
:version_ortgpt_neox_hidden_size,192;
:version_ortgpt_neox_num_attention_heads,2;
:version_ortmmdit_hidden_size,192;
:version_ortmmdit_num_attention_heads,2;
:version_ortphi_hidden_size,192;
:version_ortphi_num_attention_heads,2;
:version_ortsam2_hidden_size,192;
:version_ortsam2_num_attention_heads,2;
:version_ortswin_hidden_size,192;
:version_ortswin_num_attention_heads,2;
:version_ortt5_hidden_size,192;
:version_ortt5_num_attention_heads,2;
:version_orttnlr_hidden_size,192;
:version_orttnlr_num_attention_heads,2;
:version_ortunet_hidden_size,192;
:version_ortunet_num_attention_heads,2;
:version_ortvae_hidden_size,192;
:version_ortvae_num_attention_heads,2;
:version_ortvit_hidden_size,192;
:version_ortvit_num_attention_heads,2;
:version_patch,True;
:version_quiet,False;
:version_stop_if_static,0;
:version_torch,2.8.0.dev20250423+cu126;
:version_trained,False;
:version_transformers,4.52.0.dev0;