-m onnx_diagnostic validate … validate a model id

The command line is a wrapper around function onnx_diagnostic.torch_models.validate.validate_model().

Description

The command lines validate a model id available on HuggingFace but not only. It creates dummy inputs, runs the models on them, exports the model, measures the discrepancies…

    usage: validate [-h] [-m MID] [-t TASK] [-e EXPORT] [--opt OPT] [-r | --run | --no-run] [-q | --quiet | --no-quiet] [--patch [PATCH ...]] [--rewrite | --no-rewrite]
                    [--stop-if-static STOP_IF_STATIC] [--same-as-trained | --no-same-as-trained] [--trained | --no-trained] [--inputs2 INPUTS2] [--runtime {onnxruntime,torch,ref}]
                    [-o DUMP_FOLDER] [--drop DROP] [--opset OPSET] [--subfolder SUBFOLDER] [--ortfusiontype ORTFUSIONTYPE] [-v VERBOSE] [--dtype DTYPE] [--device DEVICE] [--iop [KEY=VALUE ...]]
                    [--mop [KEY=VALUE ...]] [--repeat REPEAT] [--warmup WARMUP] [--outnames OUTNAMES]
    
    Prints out dummy inputs for a particular task or a model id.
    If both mid and task are empty, the command line displays the list
    of supported tasks.
    
    options:
      -h, --help            show this help message and exit
      -m MID, --mid MID     model id, usually <author>/<name>
      -t TASK, --task TASK  force the task to use
      -e EXPORT, --export EXPORT
                            export the model with this exporter
      --opt OPT             optimization to apply after the export
      -r, --run, --no-run   Runs the model to check it runs.
      -q, --quiet, --no-quiet
                            Catches exception, reports them in the summary.
      --patch [PATCH ...]   Applies patches before exporting, it can be a boolean to enable to disable the patches or be more finetuned. It is possible to disable patch for torch by adding --patch "patch_sympy=False" --patch "patch_torch=False", default is True.
      --rewrite, --no-rewrite
                            Applies rewrite before exporting.
      --stop-if-static STOP_IF_STATIC
                            Raises an exception if a dynamic dimension becomes static.
      --same-as-trained, --no-same-as-trained
                            Validates a model identical to the trained model but not trained.
      --trained, --no-trained
                            Validates the trained model (requires downloading).
      --inputs2 INPUTS2     Validates the model on a second set of inputs
                            to check the exported model supports dynamism. The values is used as an increment to the first set of inputs. A high value may trick a different behavior in the model and missed by the exporter.
      --runtime {onnxruntime,torch,ref}
                            onnx runtime to use, `onnxruntime` by default
      -o DUMP_FOLDER, --dump-folder DUMP_FOLDER
                            A folder is created to dumps statistics,
                            exported program, onnx...
      --drop DROP           Drops the following inputs names, it should be a list
                            with comma separated values, example:
                            --drop position_ids
      --opset OPSET         onnx opset to use, 18 by default
      --subfolder SUBFOLDER
                            Subfolder where to find the model and the configuration.
      --ortfusiontype ORTFUSIONTYPE
                            Applies onnxruntime fusion, this parameter should contain the
                            model type or multiple values separated by `|`. `ALL` can be used
                            to run them all.
      -v VERBOSE, --verbose VERBOSE
                            verbosity
      --dtype DTYPE         Changes dtype if necessary.
      --device DEVICE       Changes the device if necessary.
      --iop [KEY=VALUE ...]
                            Additional input options, use to change the defaultinputs use to export, example:
                              --iop cls_cache=SlidingWindowCache
                              --iop cls_cache=StaticCache
      --mop [KEY=VALUE ...]
                            Additional model options, use to change some parameters of the model, example:
                              --mop attn_implementation=sdpa --mop attn_implementation=eager
                              --mop "rope_scaling={'rope_type': 'dynamic', 'factor': 10.0}"
      --repeat REPEAT       number of times to run the model to measures inference time
      --warmup WARMUP       number of times to run the model to do warmup
      --outnames OUTNAMES   This comma separated list defines the output names the onnx exporter should use.
    
    If the model id is specified, one untrained version of it is instantiated.
    Examples:
    
    python -m onnx_diagnostic validate -m microsoft/Phi-4-mini-reasoning \
        --run -v 1 -o dump_test --no-quiet --repeat 2 --warmup 2 \
        --dtype float16 --device cuda --patch --export onnx-dynamo --opt ir
    
    python -m onnx_diagnostic validate -m microsoft/Phi-4-mini-reasoning \
        --run -v 1 -o dump_test --no-quiet --repeat 2 --warmup 2 \
        --dtype float16 --device cuda --patch --export custom --opt default
    
    python -m onnx_diagnostic validate -m microsoft/Phi-4-mini-reasoning \
        --run -v 1 -o dump_test --no-quiet --repeat 2 --warmup 2 \
        --dtype float16 --device cuda --export modelbuilder
    
    position_ids is usually not needed, they can be removed by adding:
    
    --drop position_ids
    
    The behaviour may be modified compare the original configuration,
    the following argument can be rope_scaling to dynamic:
    
    --mop "rope_scaling={'rope_type': 'dynamic', 'factor': 10.0}""

Get the list of supported tasks

The task are the same defined by HuggingFace. The tool only supports a subset of them.

python -m onnx_diagnostic validate
    -- list of supported tasks:
    MoE
    automatic-speech-recognition
    feature-extraction
    fill-mask
    image-classification
    image-text-to-text
    mask-generation
    object-detection
    sentence-similarity
    summarization
    text-classification
    text-generation
    text-to-image
    text2text-generation
    zero-shot-image-classification

Get the default inputs for a specific task

This returns the dummy inputs for a specific task. There may be too many inputs. Only those the forward method defines are kept.

python -m onnx_diagnostic validate -t text-generation
    -- inputs
      + input_ids       : T7s2x3
      + attention_mask  : T7s2x33
      + position_ids    : T7s2x3
      + past_key_values : DynamicCache(key_cache=#4[T1s2x24x30x16,T1s2x24x30x16,T1s2x24x30x16,T1s2x24x30x16], value_cache=#4[T1s2x24x30x16,T1s2x24x30x16,T1s2x24x30x16,T1s2x24x30x16])
    -- dynamic_shapes
      + input_ids       : {0:Dim(batch),1:DYN(seq_length)}
      + attention_mask  : {0:Dim(batch),1:DYN(cache+seq)}
      + position_ids    : {0:Dim(batch),1:DYN(cache+seq)}
      + past_key_values : #2[#4[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}],#4[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}]]

Validate dummy inputs for a model

The dummy inputs may not work for this model and this task. The following command line checks that. It is no use to export if this fails.

python -m onnx_diagnostic validate -m arnir0/Tiny-LLM --run -v 1
    [validate_model] validate model id 'arnir0/Tiny-LLM'
    [validate_model] patch=True
    [validate_model] get dummy inputs with input_options=None...
    [validate_model] rewrite=True, patch_kwargs={'patch_transformers': True, 'patch_diffusers': True, 'patch': True}, stop_if_static=0
    [validate_model] exporter=None, optimization=None
    [validate_model] dump_folder=None
    [validate_model] output_names=None
    [get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] use preinstalled 'arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] architectures=['LlamaForCausalLM']
    [get_untrained_model_with_inputs] cls='LlamaConfig'
    [get_untrained_model_with_inputs] task='text-generation'
    [get_untrained_model_with_inputs] default config._attn_implementation=None
    [get_untrained_model_with_inputs] use fct=<function get_inputs at 0x726e5d185b20>
    [validate_model] --
    [validate_model] task=text-generation
    [validate_model] size=49.549072265625 Mb
    [validate_model] n_weights=12.988992 millions parameters
    [validate_model] +INPUT input_ids=T7s2x3
    [validate_model] +INPUT attention_mask=T7s2x33
    [validate_model] +INPUT position_ids=T7s2x3
    [validate_model] +INPUT past_key_values=DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])
    [validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
    [validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
    [validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
    [validate_model] +SHAPE past_key_values=#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]
    [validate_model] --
    [validate_model] -- run the model inputs='inputs'...
    [validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_model] done ([run])
    [validate_model] -- run the model inputs='inputs2'...
    [validate_model] inputs2=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_model] done ([run2])
    [validate_model] -- done (final)
    
    -- summary --
    :model_class,LlamaForCausalLM;
    :model_config,{'vocab_size':32000,'max_position_embeddings':1024,'hidden_size':192,'intermediate_size':1024,'num_hidden_layers':1,'num_attention_heads':2,'num_key_value_heads':1,'hidden_act':'silu','initializer_range':0.02,'rms_norm_eps':1e-05,'pretraining_tp':1,'use_cache':True,'rope_theta':10000.0,'rope_scaling':None,'attention_bias':False,'attention_dropout':0.0,'mlp_bias':False,'head_dim':96,'return_dict':True,'output_hidden_states':False,'torchscript':False,'dtype':'float32','pruned_heads':{},'tie_word_embeddings':False,'chunk_size_feed_forward':0,'is_encoder_decoder':False,'is_decoder':False,'cross_attention_hidden_size':None,'add_cross_attention':False,'tie_encoder_decoder':False,'architectures':['LlamaForCausalLM'],'finetuning_task':None,'id2label':{0:'LABEL_0',1:'LABEL_1'},'label2id':{'LABEL_0':0,'LABEL_1':1},'task_specific_params':None,'problem_type':None,'tokenizer_class':None,'prefix':None,'bos_token_id':1,'pad_token_id':None,'eos_token_id':2,'sep_token_id':None,'decoder_start_token_id':None,'max_length':20,'min_length':0,'do_sample':False,'early_stopping':False,'num_beams':1,'num_beam_groups':1,'diversity_penalty':0.0,'temperature':1.0,'top_k':50,'top_p':1.0,'typical_p':1.0,'repetition_penalty':1.0,'length_penalty':1.0,'no_repeat_ngram_size':0,'encoder_no_repeat_ngram_size':0,'bad_words_ids':None,'num_return_sequences':1,'output_scores':False,'return_dict_in_generate':False,'forced_bos_token_id':None,'forced_eos_token_id':None,'remove_invalid_values':False,'exponential_decay_length_penalty':None,'suppress_tokens':None,'begin_suppress_tokens':None,'_name_or_path':'','transformers_version':'4.56.0.dev0','model_type':'llama','tf_legacy_loss':False,'use_bfloat16':False,'subfolder':None,'output_attentions':False};
    :model_config_class,LlamaConfig;
    :model_file,/home/xadupre/github/transformers/src/transformers/models/llama/modeling_llama.py;
    :model_id,arnir0/Tiny-LLM;
    :model_inputs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
    :model_inputs_options,;
    :model_module,transformers.models.llama.modeling_llama;
    :model_nweights,12988992;
    :model_shapes,dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]);
    :model_size,51955968;
    :model_subfolder,;
    :model_task,text-generation;
    :run_expected,CausalLMOutputWithPast(logits:T1s2x3x32000,past_key_values:DynamicCache(key_cache=#1[T1s2x1x33x96], value_cache=#1[T1s2x1x33x96]));
    :run_expected2,CausalLMOutputWithPast(logits:T1s3x4x32000,past_key_values:DynamicCache(key_cache=#1[T1s3x1x35x96], value_cache=#1[T1s3x1x35x96]));
    :time_create,0.16615501700107416;
    :time_run,0.04875091000030807;
    :time_run2,0.026418173998536076;
    :version_date,2025-08-27T17:08:23;
    :version_device,;
    :version_do_run,True;
    :version_drop_inputs,[];
    :version_dtype,;
    :version_dump_folder,;
    :version_exporter,;
    :version_inputs2,1;
    :version_model_id,arnir0/Tiny-LLM;
    :version_numpy,2.3.2;
    :version_onnx,1.20.0;
    :version_onnx_diagnostic,0.7.7;
    :version_onnx_ir,0.1.8;
    :version_onnxruntime,1.23.0;
    :version_onnxscript,0.3.0.dev20250301;
    :version_opset,18;
    :version_optimization,;
    :version_ortfusiontype,;
    :version_patch,True;
    :version_patch_kwargs,{'patch_transformers':True,'patch_diffusers':True,'patch':True};
    :version_quiet,False;
    :version_rewrite,True;
    :version_runtime,onnxruntime;
    :version_same_as_pretrained,False;
    :version_scipy,1.16.1;
    :version_stop_if_static,0;
    :version_torch,2.9.0.dev20250820+cu126;
    :version_transformers,4.56.0.dev0;
    :version_use_pretrained,False;

Validate and export a model

Exports a model given the task. Checks for discrepancies as well. The latency given are just for one run. It tells how long the benchmark runs but it is far from the latency measure we can get by running multiple times the same model.

python -m onnx_diagnostic validate -m arnir0/Tiny-LLM --run -v 1 --export export-nostrict -o dump_models --patch
    [validate_model] dump into 'arnir0_Tiny-LLM-export-nostrict'
    [validate_model] validate model id 'arnir0/Tiny-LLM'
    [validate_model] patch=True
    [validate_model] get dummy inputs with input_options=None...
    [validate_model] rewrite=True, patch_kwargs={'patch_transformers': True, 'patch_diffusers': True, 'patch': True}, stop_if_static=0
    [validate_model] exporter='export-nostrict', optimization=None
    [validate_model] dump_folder='dump_models/arnir0_Tiny-LLM-export-nostrict'
    [validate_model] output_names=None
    [get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] use preinstalled 'arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] architectures=['LlamaForCausalLM']
    [get_untrained_model_with_inputs] cls='LlamaConfig'
    [get_untrained_model_with_inputs] task='text-generation'
    [get_untrained_model_with_inputs] default config._attn_implementation=None
    [get_untrained_model_with_inputs] use fct=<function get_inputs at 0x726e5d185b20>
    [validate_model] --
    [validate_model] task=text-generation
    [validate_model] size=49.549072265625 Mb
    [validate_model] n_weights=12.988992 millions parameters
    [validate_model] +INPUT input_ids=T7s2x3
    [validate_model] +INPUT attention_mask=T7s2x33
    [validate_model] +INPUT position_ids=T7s2x3
    [validate_model] +INPUT past_key_values=DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])
    [validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
    [validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
    [validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
    [validate_model] +SHAPE past_key_values=#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]
    [validate_model] --
    [validate_model] -- run the model inputs='inputs'...
    [validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_model] done ([run])
    [validate_model] -- run the model inputs='inputs2'...
    [validate_model] inputs2=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_model] done ([run2])
    [validate_model] -- export the model with 'export-nostrict', optimization=None
    [validate_model] applies patches before exporting stop_if_static=0
    [validate_model] run patched model...
    [validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_model] done (patched run)
    [validate_model] patched discrepancies=abs=0, rel=0
    [call_torch_export_export] exporter='export-nostrict', strict=False, optimization=None
    [call_torch_export_export] args=()
    [call_torch_export_export] kwargs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [call_torch_export_export] dynamic_shapes=dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]])
    [call_torch_export_export] dynamic_shapes_export_export=dict(input_ids:{0:Dim(batch),1:DYNAMIC},attention_mask:{0:Dim(batch),1:DYNAMIC},position_ids:{0:Dim(batch),1:DYNAMIC},past_key_values:#2[#1[{0:Dim(batch),2:DYNAMIC}],#1[{0:Dim(batch),2:DYNAMIC}]])
    [call_torch_export_export] export...
    [call_torch_export_export] done (export) with 140 nodes
    [validate_model] run exported model...
    [validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_model] done (exported run)
    [validate_model] exported discrepancies=abs=0, rel=0
    [validate_model] -- dumps exported program in 'dump_models/arnir0_Tiny-LLM-export-nostrict'...
    [validate_model] done (dump ep)
    [validate_model] dumps statistics in 'dump_models/arnir0_Tiny-LLM-export-nostrict'...
    [validate_model] done (dump)
    [validate_model] -- done (final)
    
    -- summary --
    :disc_exported_abs,0;
    :disc_exported_dnan,0;
    :disc_exported_n,204672.0;
    :disc_exported_rel,0;
    :disc_exported_sum,0.0;
    :disc_patched_abs,0;
    :disc_patched_dnan,0;
    :disc_patched_n,204672.0;
    :disc_patched_rel,0;
    :disc_patched_sum,0.0;
    :dump_folder,dump_models/arnir0_Tiny-LLM-export-nostrict;
    :dump_folder_name,arnir0_Tiny-LLM-export-nostrict;
    :export_args,();
    :export_dynamic_shapes,dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]);
    :export_dynamic_shapes_export_export,dict(input_ids:{0:Dim(batch),1:DYNAMIC},attention_mask:{0:Dim(batch),1:DYNAMIC},position_ids:{0:Dim(batch),1:DYNAMIC},past_key_values:#2[#1[{0:Dim(batch),2:DYNAMIC}],#1[{0:Dim(batch),2:DYNAMIC}]]);
    :export_exporter,export-nostrict;
    :export_graph_nodes,140;
    :export_kwargs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
    :export_optimization,;
    :export_strict,False;
    :model_class,LlamaForCausalLM;
    :model_config,{'vocab_size':32000,'max_position_embeddings':1024,'hidden_size':192,'intermediate_size':1024,'num_hidden_layers':1,'num_attention_heads':2,'num_key_value_heads':1,'hidden_act':'silu','initializer_range':0.02,'rms_norm_eps':1e-05,'pretraining_tp':1,'use_cache':True,'rope_theta':10000.0,'rope_scaling':None,'attention_bias':False,'attention_dropout':0.0,'mlp_bias':False,'head_dim':96,'return_dict':True,'output_hidden_states':False,'torchscript':False,'dtype':'float32','pruned_heads':{},'tie_word_embeddings':False,'chunk_size_feed_forward':0,'is_encoder_decoder':False,'is_decoder':False,'cross_attention_hidden_size':None,'add_cross_attention':False,'tie_encoder_decoder':False,'architectures':['LlamaForCausalLM'],'finetuning_task':None,'id2label':{0:'LABEL_0',1:'LABEL_1'},'label2id':{'LABEL_0':0,'LABEL_1':1},'task_specific_params':None,'problem_type':None,'tokenizer_class':None,'prefix':None,'bos_token_id':1,'pad_token_id':None,'eos_token_id':2,'sep_token_id':None,'decoder_start_token_id':None,'max_length':20,'min_length':0,'do_sample':False,'early_stopping':False,'num_beams':1,'num_beam_groups':1,'diversity_penalty':0.0,'temperature':1.0,'top_k':50,'top_p':1.0,'typical_p':1.0,'repetition_penalty':1.0,'length_penalty':1.0,'no_repeat_ngram_size':0,'encoder_no_repeat_ngram_size':0,'bad_words_ids':None,'num_return_sequences':1,'output_scores':False,'return_dict_in_generate':False,'forced_bos_token_id':None,'forced_eos_token_id':None,'remove_invalid_values':False,'exponential_decay_length_penalty':None,'suppress_tokens':None,'begin_suppress_tokens':None,'_name_or_path':'','transformers_version':'4.56.0.dev0','model_type':'llama','tf_legacy_loss':False,'use_bfloat16':False,'subfolder':None,'output_attentions':False};
    :model_config_class,LlamaConfig;
    :model_file,/home/xadupre/github/transformers/src/transformers/models/llama/modeling_llama.py;
    :model_id,arnir0/Tiny-LLM;
    :model_inputs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
    :model_inputs_options,;
    :model_module,transformers.models.llama.modeling_llama;
    :model_nweights,12988992;
    :model_shapes,dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]);
    :model_size,51955968;
    :model_subfolder,;
    :model_task,text-generation;
    :run_expected,CausalLMOutputWithPast(logits:T1s2x3x32000,past_key_values:DynamicCache(key_cache=#1[T1s2x1x33x96], value_cache=#1[T1s2x1x33x96]));
    :run_expected2,CausalLMOutputWithPast(logits:T1s3x4x32000,past_key_values:DynamicCache(key_cache=#1[T1s3x1x35x96], value_cache=#1[T1s3x1x35x96]));
    :time_create,0.20015871599935053;
    :time_export_export,1.790458003999447;
    :time_run,0.009992467999836663;
    :time_run2,0.009186837000015657;
    :time_run_exported,0.014633524000601028;
    :time_run_patched,0.003768320000745007;
    :version_date,2025-08-27T17:08:23;
    :version_device,;
    :version_do_run,True;
    :version_drop_inputs,[];
    :version_dtype,;
    :version_dump_folder,dump_models;
    :version_exporter,export-nostrict;
    :version_inputs2,1;
    :version_model_id,arnir0/Tiny-LLM;
    :version_numpy,2.3.2;
    :version_onnx,1.20.0;
    :version_onnx_diagnostic,0.7.7;
    :version_onnx_ir,0.1.8;
    :version_onnxruntime,1.23.0;
    :version_onnxscript,0.3.0.dev20250301;
    :version_opset,18;
    :version_optimization,;
    :version_ortfusiontype,;
    :version_patch,True;
    :version_patch_kwargs,{'patch_transformers':True,'patch_diffusers':True,'patch':True};
    :version_quiet,False;
    :version_rewrite,True;
    :version_runtime,onnxruntime;
    :version_same_as_pretrained,False;
    :version_scipy,1.16.1;
    :version_stop_if_static,0;
    :version_torch,2.9.0.dev20250820+cu126;
    :version_transformers,4.56.0.dev0;
    :version_use_pretrained,False;

Validate ONNX discrepancies

Let’s export with ONNX this time and checks for discrepancies.

python -m onnx_diagnostic validate -m arnir0/Tiny-LLM --run -v 1 --export onnx-dynamo -o dump_models --patch --opt ir
    [validate_model] dump into 'arnir0_Tiny-LLM-onnx-dynamo-ir'
    [validate_model] validate model id 'arnir0/Tiny-LLM'
    [validate_model] patch=True
    [validate_model] get dummy inputs with input_options=None...
    [validate_model] rewrite=True, patch_kwargs={'patch_transformers': True, 'patch_diffusers': True, 'patch': True}, stop_if_static=0
    [validate_model] exporter='onnx-dynamo', optimization='ir'
    [validate_model] dump_folder='dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'
    [validate_model] output_names=None
    [get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] use preinstalled 'arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] architectures=['LlamaForCausalLM']
    [get_untrained_model_with_inputs] cls='LlamaConfig'
    [get_untrained_model_with_inputs] task='text-generation'
    [get_untrained_model_with_inputs] default config._attn_implementation=None
    [get_untrained_model_with_inputs] use fct=<function get_inputs at 0x726e5d185b20>
    [validate_model] --
    [validate_model] task=text-generation
    [validate_model] size=49.549072265625 Mb
    [validate_model] n_weights=12.988992 millions parameters
    [validate_model] +INPUT input_ids=T7s2x3
    [validate_model] +INPUT attention_mask=T7s2x33
    [validate_model] +INPUT position_ids=T7s2x3
    [validate_model] +INPUT past_key_values=DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])
    [validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
    [validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
    [validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
    [validate_model] +SHAPE past_key_values=#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]
    [validate_model] --
    [validate_model] -- run the model inputs='inputs'...
    [validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_model] done ([run])
    [validate_model] -- run the model inputs='inputs2'...
    [validate_model] inputs2=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_model] done ([run2])
    [validate_model] -- export the model with 'onnx-dynamo', optimization='ir'
    [validate_model] applies patches before exporting stop_if_static=0
    [validate_model] run patched model...
    [validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_model] done (patched run)
    [validate_model] patched discrepancies=abs=0, rel=0
    [call_torch_export_onnx] exporter='onnx-dynamo', optimization='ir'
    [call_torch_export_onnx] args=()
    [call_torch_export_onnx] kwargs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [call_torch_export_onnx] dynamic_shapes=dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]])
    [call_torch_export_onnx] export...
    [call_torch_export_onnx] export_export_kwargs=dict(dynamo:bool,dynamic_shapes:dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]),opset_version:int)
    [torch.onnx] Obtain model graph for `LlamaForCausalLM([...]` with `torch.export.export(..., strict=False)`...
    [_catch_produce_guards_and_solve_constraints] ERROR: produce_guards_and_solve_constraints failed, use SKIP_SOLVE_CONSTRAINTS=0 to avoid skipping
    fake_mode=<torch._subclasses.fake_tensor.FakeTensorMode object at 0x726e2f288dd0>
    dynamic_shapes={'input_ids': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'attention_mask': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'position_ids': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'past_key_values': [[{0: Dim('batch', min=1, max=1024), 2: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}], [{0: Dim('batch', min=1, max=1024), 2: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}]]}
    equalities_inputs=EqualityConstraint(warn_only=False, source_pairs=[(TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0))], derived_equalities=[], phantom_symbols=[], relaxed_sources={TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=2), TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=2)}, _parents={TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)}, _defs={})
    original_signature=(input_ids: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_values: Optional[transformers.cache_utils.Cache] = None, inputs_embeds: Optional[torch.FloatTensor] = None, labels: Optional[torch.LongTensor] = None, use_cache: Optional[bool] = None, cache_position: Optional[torch.LongTensor] = None, logits_to_keep: Union[int, torch.Tensor] = 0, **kwargs: Unpack[transformers.utils.generic.TransformersKwargs]) -> transformers.modeling_outputs.CausalLMOutputWithPast
    kwargs={}
    exc=Constraints violated (batch)! For more information, run with TORCH_LOGS="+dynamic".
      - Not all values of batch = L['input_ids'].size()[0] in the specified range batch <= 1024 satisfy the generated guard L['input_ids'].size()[0] != 1.
    gm=<lambda>()
    
    
    
    def forward(self, arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1):
        sym_size_int = torch.ops.aten.sym_size.int(arg16_1, 0)
        empty = torch.ops.aten.empty.memory_format([sym_size_int, 1, 0, 96], dtype = torch.float32, device = device(type='cpu'), pin_memory = False)
        empty_1 = torch.ops.aten.empty.memory_format([sym_size_int, 1, 0, 96], dtype = torch.float32, device = device(type='cpu'), pin_memory = False)
        cat = torch.ops.aten.cat.default([empty, arg16_1], -2);  empty = arg16_1 = None
        cat_1 = torch.ops.aten.cat.default([empty_1, arg17_1], -2);  empty_1 = arg17_1 = None
        embedding = torch.ops.aten.embedding.default(arg0_1, arg13_1);  arg0_1 = None
        sym_size_int_1 = torch.ops.aten.sym_size.int(cat, 2)
        sym_size_int_2 = torch.ops.aten.sym_size.int(arg13_1, 1)
        add = sym_size_int_1 + sym_size_int_2
        arange = torch.ops.aten.arange.start(sym_size_int_1, add, device = device(type='cpu'), pin_memory = False);  add = None
        to = torch.ops.aten.to.device(arg14_1, device(type='cpu'), torch.bool);  arg14_1 = None
        sym_size_int_3 = torch.ops.aten.sym_size.int(arange, 0)
        add_1 = sym_size_int_3 + sym_size_int_1;  sym_size_int_1 = None
        arange_1 = torch.ops.aten.arange.default(add_1, device = device(type='cpu'), pin_memory = False);  add_1 = None
        add_ = torch.ops.aten.add_.Tensor(arange_1, 0)
        sym_size_int_5 = torch.ops.aten.sym_size.int(arg13_1, 0);  arg13_1 = None
        arange_2 = torch.ops.aten.arange.default(sym_size_int_5, device = device(type='cpu'), pin_memory = False)
        arange_3 = torch.ops.aten.arange.default(1, device = device(type='cpu'), pin_memory = False)
        sym_size_int_6 = torch.ops.aten.sym_size.int(arange_2, 0)
        sym_size_int_7 = torch.ops.aten.sym_size.int(arange_1, 0);  arange_1 = None
        reshape = torch.ops.aten.reshape.default(arange_2, [-1, 1, 1, 1]);  arange_2 = None
        reshape_1 = torch.ops.aten.reshape.default(arange_3, [1, -1, 1, 1]);  arange_3 = None
        reshape_2 = torch.ops.aten.reshape.default(arange, [1, 1, -1, 1]);  arange = None
        reshape_3 = torch.ops.aten.reshape.default(add_, [1, 1, 1, -1]);  add_ = None
        expand = torch.ops.aten.expand.default(reshape, [sym_size_int_6, 1, sym_size_int_3, sym_size_int_7]);  reshape = None
        expand_1 = torch.ops.aten.expand.default(reshape_1, [sym_size_int_6, 1, sym_size_int_3, sym_size_int_7]);  reshape_1 = expand_1 = None
        expand_2 = torch.ops.aten.expand.default(reshape_2, [sym_size_int_6, 1, sym_size_int_3, sym_size_int_7]);  reshape_2 = None
        expand_3 = torch.ops.aten.expand.default(reshape_3, [sym_size_int_6, 1, sym_size_int_3, sym_size_int_7]);  reshape_3 = sym_size_int_6 = sym_size_int_3 = sym_size_int_7 = None
        new_ones = torch.ops.aten.new_ones.default(expand_2, [], dtype = torch.bool, pin_memory = False)
        le = torch.ops.aten.le.Tensor(expand_3, expand_2);  expand_2 = None
        to_1 = torch.ops.aten.to.dtype_layout(le, dtype = torch.bool, layout = torch.strided, device = device(type='cpu'));  le = None
        and_1 = torch.ops.aten.__and__.Tensor(new_ones, to_1);  new_ones = to_1 = None
        index = torch.ops.aten.index.Tensor(to, [expand, expand_3]);  to = expand = expand_3 = None
        to_2 = torch.ops.aten.to.dtype_layout(index, dtype = torch.bool, layout = torch.strided, device = device(type='cpu'));  index = None
        and_2 = torch.ops.aten.__and__.Tensor(and_1, to_2);  and_1 = to_2 = None
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None
        unsqueeze = torch.ops.aten.unsqueeze.default(arg12_1, 0);  arg12_1 = None
        unsqueeze_1 = torch.ops.aten.unsqueeze.default(unsqueeze, 2);  unsqueeze = None
        to_3 = torch.ops.aten.to.dtype(unsqueeze_1, torch.float32);  unsqueeze_1 = None
        sym_size_int_8 = torch.ops.aten.sym_size.int(arg15_1, 0)
        expand_4 = torch.ops.aten.expand.default(to_3, [sym_size_int_8, -1, 1]);  to_3 = sym_size_int_8 = None
        to_4 = torch.ops.aten.to.dtype_layout(expand_4, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'));  expand_4 = None
        unsqueeze_2 = torch.ops.aten.unsqueeze.default(arg15_1, 1);  arg15_1 = None
        slice_1 = torch.ops.aten.slice.Tensor(unsqueeze_2, 2, 0, 9223372036854775807);  unsqueeze_2 = None
        to_5 = torch.ops.aten.to.dtype(slice_1, torch.float32);  slice_1 = None
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cpu', torch.bfloat16, False, False)
        to_6 = torch.ops.aten.to.dtype(to_4, torch.float32);  to_4 = None
        to_7 = torch.ops.aten.to.dtype(to_5, torch.float32);  to_5 = None
        matmul = torch.ops.aten.matmul.default(to_6, to_7);  to_6 = to_7 = None
        transpose = torch.ops.aten.transpose.int(matmul, 1, 2);  matmul = None
        cat_2 = torch.ops.aten.cat.default([transpose, transpose], -1);  transpose = None
        cos = torch.ops.aten.cos.default(cat_2)
        mul = torch.ops.aten.mul.Tensor(cos, 1.0);  cos = None
        sin = torch.ops.aten.sin.default(cat_2);  cat_2 = None
        mul_1 = torch.ops.aten.mul.Tensor(sin, 1.0);  sin = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = _exit_autocast = None
        to_8 = torch.ops.aten.to.dtype(mul, torch.float32);  mul = None
        to_9 = torch.ops.aten.to.dtype(mul_1, torch.float32);  mul_1 = None
        _set_grad_enabled_1 = torch._C._set_grad_enabled(True);  _set_grad_enabled_1 = None
        to_10 = torch.ops.aten.to.dtype(embedding, torch.float32);  embedding = None
        pow_1 = torch.ops.aten.pow.Tensor_Scalar(to_10, 2)
        mean = torch.ops.aten.mean.dim(pow_1, [-1], True);  pow_1 = None
        add_3 = torch.ops.aten.add.Tensor(mean, 1e-05);  mean = None
        rsqrt = torch.ops.aten.rsqrt.default(add_3);  add_3 = None
        mul_2 = torch.ops.aten.mul.Tensor(to_10, rsqrt);  rsqrt = None
        to_11 = torch.ops.aten.to.dtype(mul_2, torch.float32);  mul_2 = None
        mul_3 = torch.ops.aten.mul.Tensor(arg8_1, to_11);  arg8_1 = to_11 = None
        linear = torch.ops.aten.linear.default(mul_3, arg1_1);  arg1_1 = None
        view = torch.ops.aten.view.default(linear, [sym_size_int_5, sym_size_int_2, -1, 96]);  linear = None
        transpose_1 = torch.ops.aten.transpose.int(view, 1, 2);  view = None
        linear_1 = torch.ops.aten.linear.default(mul_3, arg2_1);  arg2_1 = None
        view_1 = torch.ops.aten.view.default(linear_1, [sym_size_int_5, sym_size_int_2, -1, 96]);  linear_1 = None
        transpose_2 = torch.ops.aten.transpose.int(view_1, 1, 2);  view_1 = None
        linear_2 = torch.ops.aten.linear.default(mul_3, arg3_1);  mul_3 = arg3_1 = None
        view_2 = torch.ops.aten.view.default(linear_2, [sym_size_int_5, sym_size_int_2, -1, 96]);  linear_2 = None
        transpose_3 = torch.ops.aten.transpose.int(view_2, 1, 2);  view_2 = None
        unsqueeze_3 = torch.ops.aten.unsqueeze.default(to_8, 1);  to_8 = None
        unsqueeze_4 = torch.ops.aten.unsqueeze.default(to_9, 1);  to_9 = None
        mul_4 = torch.ops.aten.mul.Tensor(transpose_1, unsqueeze_3)
        slice_2 = torch.ops.aten.slice.Tensor(transpose_1, 3, 0, 48)
        slice_3 = torch.ops.aten.slice.Tensor(transpose_1, 3, 48, 9223372036854775807);  transpose_1 = None
        neg = torch.ops.aten.neg.default(slice_3);  slice_3 = None
        cat_3 = torch.ops.aten.cat.default([neg, slice_2], -1);  neg = slice_2 = None
        mul_5 = torch.ops.aten.mul.Tensor(cat_3, unsqueeze_4);  cat_3 = None
        add_4 = torch.ops.aten.add.Tensor(mul_4, mul_5);  mul_4 = mul_5 = None
        mul_6 = torch.ops.aten.mul.Tensor(transpose_2, unsqueeze_3);  unsqueeze_3 = None
        slice_4 = torch.ops.aten.slice.Tensor(transpose_2, 3, 0, 48)
        slice_5 = torch.ops.aten.slice.Tensor(transpose_2, 3, 48, 9223372036854775807);  transpose_2 = None
        neg_1 = torch.ops.aten.neg.default(slice_5);  slice_5 = None
        cat_4 = torch.ops.aten.cat.default([neg_1, slice_4], -1);  neg_1 = slice_4 = None
        mul_7 = torch.ops.aten.mul.Tensor(cat_4, unsqueeze_4);  cat_4 = unsqueeze_4 = None
        add_5 = torch.ops.aten.add.Tensor(mul_6, mul_7);  mul_6 = mul_7 = None
        cat_5 = torch.ops.aten.cat.default([cat, add_5], -2);  cat = add_5 = None
        cat_6 = torch.ops.aten.cat.default([cat_1, transpose_3], -2);  cat_1 = transpose_3 = None
        unsqueeze_5 = torch.ops.aten.unsqueeze.default(cat_5, 2)
        sym_size_int_10 = torch.ops.aten.sym_size.int(cat_5, 2)
        slice_6 = torch.ops.aten.slice.Tensor(unsqueeze_5, 3, 0, 9223372036854775807);  unsqueeze_5 = None
        expand_5 = torch.ops.aten.expand.default(slice_6, [sym_size_int, 1, 2, sym_size_int_10, 96]);  slice_6 = None
        reshape_4 = torch.ops.aten.reshape.default(expand_5, [sym_size_int, 2, sym_size_int_10, 96]);  expand_5 = None
        unsqueeze_6 = torch.ops.aten.unsqueeze.default(cat_6, 2)
        sym_size_int_11 = torch.ops.aten.sym_size.int(cat_6, 2)
        slice_7 = torch.ops.aten.slice.Tensor(unsqueeze_6, 3, 0, 9223372036854775807);  unsqueeze_6 = None
        expand_6 = torch.ops.aten.expand.default(slice_7, [sym_size_int, 1, 2, sym_size_int_11, 96]);  slice_7 = None
        reshape_5 = torch.ops.aten.reshape.default(expand_6, [sym_size_int, 2, sym_size_int_11, 96]);  expand_6 = sym_size_int = sym_size_int_11 = None
        slice_8 = torch.ops.aten.slice.Tensor(and_2, 3, None, sym_size_int_10);  and_2 = sym_size_int_10 = None
        scaled_dot_product_attention = torch.ops.aten.scaled_dot_product_attention.default(add_4, reshape_4, reshape_5, slice_8, scale = 0.10206207261596575);  add_4 = reshape_4 = reshape_5 = slice_8 = None
        transpose_4 = torch.ops.aten.transpose.int(scaled_dot_product_attention, 1, 2);  scaled_dot_product_attention = None
        reshape_6 = torch.ops.aten.reshape.default(transpose_4, [sym_size_int_5, sym_size_int_2, -1]);  transpose_4 = sym_size_int_5 = sym_size_int_2 = None
        linear_3 = torch.ops.aten.linear.default(reshape_6, arg4_1);  reshape_6 = arg4_1 = None
        add_6 = torch.ops.aten.add.Tensor(to_10, linear_3);  to_10 = linear_3 = None
        to_12 = torch.ops.aten.to.dtype(add_6, torch.float32);  add_6 = None
        pow_2 = torch.ops.aten.pow.Tensor_Scalar(to_12, 2)
        mean_1 = torch.ops.aten.mean.dim(pow_2, [-1], True);  pow_2 = None
        add_7 = torch.ops.aten.add.Tensor(mean_1, 1e-05);  mean_1 = None
        rsqrt_1 = torch.ops.aten.rsqrt.default(add_7);  add_7 = None
        mul_16 = torch.ops.aten.mul.Tensor(to_12, rsqrt_1);  rsqrt_1 = None
        to_13 = torch.ops.aten.to.dtype(mul_16, torch.float32);  mul_16 = None
        mul_17 = torch.ops.aten.mul.Tensor(arg9_1, to_13);  arg9_1 = to_13 = None
        linear_4 = torch.ops.aten.linear.default(mul_17, arg5_1);  arg5_1 = None
        silu = torch.ops.aten.silu.default(linear_4);  linear_4 = None
        linear_5 = torch.ops.aten.linear.default(mul_17, arg6_1);  mul_17 = arg6_1 = None
        mul_18 = torch.ops.aten.mul.Tensor(silu, linear_5);  silu = linear_5 = None
        linear_6 = torch.ops.aten.linear.default(mul_18, arg7_1);  mul_18 = arg7_1 = None
        add_8 = torch.ops.aten.add.Tensor(to_12, linear_6);  to_12 = linear_6 = None
        to_14 = torch.ops.aten.to.dtype(add_8, torch.float32);  add_8 = None
        pow_3 = torch.ops.aten.pow.Tensor_Scalar(to_14, 2)
        mean_2 = torch.ops.aten.mean.dim(pow_3, [-1], True);  pow_3 = None
        add_9 = torch.ops.aten.add.Tensor(mean_2, 1e-05);  mean_2 = None
        rsqrt_2 = torch.ops.aten.rsqrt.default(add_9);  add_9 = None
        mul_19 = torch.ops.aten.mul.Tensor(to_14, rsqrt_2);  to_14 = rsqrt_2 = None
        to_15 = torch.ops.aten.to.dtype(mul_19, torch.float32);  mul_19 = None
        mul_20 = torch.ops.aten.mul.Tensor(arg10_1, to_15);  arg10_1 = to_15 = None
        slice_9 = torch.ops.aten.slice.Tensor(mul_20, 1, 0, 9223372036854775807);  mul_20 = None
        linear_7 = torch.ops.aten.linear.default(slice_9, arg11_1);  slice_9 = arg11_1 = None
        return (linear_7, cat_5, cat_6)
        
    # To see more debug info, please use `graph_module.print_readable()`
    [torch.onnx] Obtain model graph for `LlamaForCausalLM([...]` with `torch.export.export(..., strict=False)`... ✅
    [torch.onnx] Run decomposition...
    [torch.onnx] Run decomposition... ✅
    [torch.onnx] Translate the graph into ONNX...
    [torch.onnx] Translate the graph into ONNX... ✅
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
      warnings.warn(
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
      warnings.warn(
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: cache+seq will not be used, since it shares the same shape constraints with another axis: seq_length.
      warnings.warn(
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
      warnings.warn(
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
      warnings.warn(
    Applied 35 of general pattern rewrite rules.
    [call_torch_export_onnx] done (export)
    [call_torch_export_onnx] starts optimization='ir'...
    [call_torch_export_onnx] done (optimization)
    [validate_model] dumps onnx program in 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'...
    [validate_model] done (dump onnx) in 0.15125025599991204
    [validate_model] dumps statistics in 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'...
    [validate_model] done (dump)
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour=None
    [validate_onnx_model] done (ort_session) flavour=None
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.046627044677734e-07, rel=0.00029397114747102115, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0003824267790890117, n=404160.0
    [validate_model] -- done (final)
    
    -- summary --
    :disc_onnx_ort_run2_abs,8.344650268554688e-07;
    :disc_onnx_ort_run2_dnan,0;
    :disc_onnx_ort_run2_n,404160.0;
    :disc_onnx_ort_run2_rel,0.0003824267790890117;
    :disc_onnx_ort_run2_sum,0.03882712602421634;
    :disc_onnx_ort_run_abs,8.046627044677734e-07;
    :disc_onnx_ort_run_dnan,0;
    :disc_onnx_ort_run_n,204672.0;
    :disc_onnx_ort_run_rel,0.00029397114747102115;
    :disc_onnx_ort_run_sum,0.019302217995800675;
    :disc_patched_abs,0;
    :disc_patched_dnan,0;
    :disc_patched_n,204672.0;
    :disc_patched_rel,0;
    :disc_patched_sum,0.0;
    :dump_folder,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir;
    :dump_folder_name,arnir0_Tiny-LLM-onnx-dynamo-ir;
    :export_args,();
    :export_dynamo,True;
    :export_exporter,onnx-dynamo;
    :export_kwargs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
    :export_opset,18;
    :export_optimization,ir;
    :model_class,LlamaForCausalLM;
    :model_config,{'vocab_size':32000,'max_position_embeddings':1024,'hidden_size':192,'intermediate_size':1024,'num_hidden_layers':1,'num_attention_heads':2,'num_key_value_heads':1,'hidden_act':'silu','initializer_range':0.02,'rms_norm_eps':1e-05,'pretraining_tp':1,'use_cache':True,'rope_theta':10000.0,'rope_scaling':None,'attention_bias':False,'attention_dropout':0.0,'mlp_bias':False,'head_dim':96,'return_dict':True,'output_hidden_states':False,'torchscript':False,'dtype':'float32','pruned_heads':{},'tie_word_embeddings':False,'chunk_size_feed_forward':0,'is_encoder_decoder':False,'is_decoder':False,'cross_attention_hidden_size':None,'add_cross_attention':False,'tie_encoder_decoder':False,'architectures':['LlamaForCausalLM'],'finetuning_task':None,'id2label':{0:'LABEL_0',1:'LABEL_1'},'label2id':{'LABEL_0':0,'LABEL_1':1},'task_specific_params':None,'problem_type':None,'tokenizer_class':None,'prefix':None,'bos_token_id':1,'pad_token_id':None,'eos_token_id':2,'sep_token_id':None,'decoder_start_token_id':None,'max_length':20,'min_length':0,'do_sample':False,'early_stopping':False,'num_beams':1,'num_beam_groups':1,'diversity_penalty':0.0,'temperature':1.0,'top_k':50,'top_p':1.0,'typical_p':1.0,'repetition_penalty':1.0,'length_penalty':1.0,'no_repeat_ngram_size':0,'encoder_no_repeat_ngram_size':0,'bad_words_ids':None,'num_return_sequences':1,'output_scores':False,'return_dict_in_generate':False,'forced_bos_token_id':None,'forced_eos_token_id':None,'remove_invalid_values':False,'exponential_decay_length_penalty':None,'suppress_tokens':None,'begin_suppress_tokens':None,'_name_or_path':'','transformers_version':'4.56.0.dev0','model_type':'llama','tf_legacy_loss':False,'use_bfloat16':False,'subfolder':None,'output_attentions':False};
    :model_config_class,LlamaConfig;
    :model_file,~/github/transformers/src/transformers/models/llama/modeling_llama.py;
    :model_id,arnir0/Tiny-LLM;
    :model_inputs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
    :model_inputs_options,;
    :model_module,transformers.models.llama.modeling_llama;
    :model_nweights,12988992;
    :model_shapes,dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]);
    :model_size,51955968;
    :model_subfolder,;
    :model_task,text-generation;
    :onnx_filename,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.onnx;
    :onnx_ort_inputs,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs2,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_size,204345;
    :run_expected,CausalLMOutputWithPast(logits:T1s2x3x32000,past_key_values:DynamicCache(key_cache=#1[T1s2x1x33x96], value_cache=#1[T1s2x1x33x96]));
    :run_expected2,CausalLMOutputWithPast(logits:T1s3x4x32000,past_key_values:DynamicCache(key_cache=#1[T1s3x1x35x96], value_cache=#1[T1s3x1x35x96]));
    :run_feeds_inputs,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :run_feeds_inputs2,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :run_output_inputs,#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96];
    :run_output_inputs2,#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96];
    :time_create,0.15212773500024923;
    :time_create_onnx_ort,0.050103256000511465;
    :time_export_onnx,7.234299549001662;
    :time_export_onnx_opt_ir,0.04470971100090537;
    :time_onnx_save,0.15125025599991204;
    :time_run,0.012062527999660233;
    :time_run2,0.009683688998848083;
    :time_run_onnx_ort,0.008159883000189438;
    :time_run_onnx_ort2,0.002132023000740446;
    :time_run_patched,0.008518315999026527;
    :version_date,2025-08-27T17:08:26;
    :version_device,;
    :version_do_run,True;
    :version_drop_inputs,[];
    :version_dtype,;
    :version_dump_folder,dump_models;
    :version_exporter,onnx-dynamo;
    :version_inputs2,1;
    :version_model_id,arnir0/Tiny-LLM;
    :version_numpy,2.3.2;
    :version_onnx,1.20.0;
    :version_onnx_diagnostic,0.7.7;
    :version_onnx_ir,0.1.8;
    :version_onnxruntime,1.23.0;
    :version_onnxscript,0.3.0.dev20250301;
    :version_opset,18;
    :version_optimization,ir;
    :version_ortfusiontype,;
    :version_patch,True;
    :version_patch_kwargs,{'patch_transformers':True,'patch_diffusers':True,'patch':True};
    :version_quiet,False;
    :version_rewrite,True;
    :version_runtime,onnxruntime;
    :version_same_as_pretrained,False;
    :version_scipy,1.16.1;
    :version_stop_if_static,0;
    :version_torch,2.9.0.dev20250820+cu126;
    :version_transformers,4.56.0.dev0;
    :version_use_pretrained,False;

Run onnxruntime fusions

This option runs transformers optimizations implemented in onnxruntime. The list of supported model_type can be found in the documentation of function onnx_diagnostic.torch_models.validate.run_ort_fusion().

python -m onnx_diagnostic validate -m arnir0/Tiny-LLM --run -v 1 --export onnx-dynamo -o dump_models --patch --opt ir --ortfusiontype ALL
    [validate_model] dump into 'arnir0_Tiny-LLM-onnx-dynamo-ir'
    [validate_model] validate model id 'arnir0/Tiny-LLM'
    [validate_model] patch=True
    [validate_model] get dummy inputs with input_options=None...
    [validate_model] rewrite=True, patch_kwargs={'patch_transformers': True, 'patch_diffusers': True, 'patch': True}, stop_if_static=0
    [validate_model] exporter='onnx-dynamo', optimization='ir'
    [validate_model] dump_folder='dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'
    [validate_model] output_names=None
    [get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] use preinstalled 'arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] architectures=['LlamaForCausalLM']
    [get_untrained_model_with_inputs] cls='LlamaConfig'
    [get_untrained_model_with_inputs] task='text-generation'
    [get_untrained_model_with_inputs] default config._attn_implementation=None
    [get_untrained_model_with_inputs] use fct=<function get_inputs at 0x726e5d185b20>
    [validate_model] --
    [validate_model] task=text-generation
    [validate_model] size=49.549072265625 Mb
    [validate_model] n_weights=12.988992 millions parameters
    [validate_model] +INPUT input_ids=T7s2x3
    [validate_model] +INPUT attention_mask=T7s2x33
    [validate_model] +INPUT position_ids=T7s2x3
    [validate_model] +INPUT past_key_values=DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])
    [validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
    [validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
    [validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
    [validate_model] +SHAPE past_key_values=#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]
    [validate_model] --
    [validate_model] -- run the model inputs='inputs'...
    [validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_model] done ([run])
    [validate_model] -- run the model inputs='inputs2'...
    [validate_model] inputs2=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_model] done ([run2])
    [validate_model] -- export the model with 'onnx-dynamo', optimization='ir'
    [validate_model] applies patches before exporting stop_if_static=0
    [validate_model] run patched model...
    [validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_model] done (patched run)
    [validate_model] patched discrepancies=abs=0, rel=0
    [call_torch_export_onnx] exporter='onnx-dynamo', optimization='ir'
    [call_torch_export_onnx] args=()
    [call_torch_export_onnx] kwargs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [call_torch_export_onnx] dynamic_shapes=dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]])
    [call_torch_export_onnx] export...
    [call_torch_export_onnx] export_export_kwargs=dict(dynamo:bool,dynamic_shapes:dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]),opset_version:int)
    [torch.onnx] Obtain model graph for `LlamaForCausalLM([...]` with `torch.export.export(..., strict=False)`...
    [_catch_produce_guards_and_solve_constraints] ERROR: produce_guards_and_solve_constraints failed, use SKIP_SOLVE_CONSTRAINTS=0 to avoid skipping
    fake_mode=<torch._subclasses.fake_tensor.FakeTensorMode object at 0x726e2d7765d0>
    dynamic_shapes={'input_ids': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'attention_mask': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'position_ids': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'past_key_values': [[{0: Dim('batch', min=1, max=1024), 2: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}], [{0: Dim('batch', min=1, max=1024), 2: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}]]}
    equalities_inputs=EqualityConstraint(warn_only=False, source_pairs=[(TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0))], derived_equalities=[], phantom_symbols=[], relaxed_sources={TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=2), TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=2)}, _parents={TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)}, _defs={})
    original_signature=(input_ids: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_values: Optional[transformers.cache_utils.Cache] = None, inputs_embeds: Optional[torch.FloatTensor] = None, labels: Optional[torch.LongTensor] = None, use_cache: Optional[bool] = None, cache_position: Optional[torch.LongTensor] = None, logits_to_keep: Union[int, torch.Tensor] = 0, **kwargs: Unpack[transformers.utils.generic.TransformersKwargs]) -> transformers.modeling_outputs.CausalLMOutputWithPast
    kwargs={}
    exc=Constraints violated (batch)! For more information, run with TORCH_LOGS="+dynamic".
      - Not all values of batch = L['input_ids'].size()[0] in the specified range batch <= 1024 satisfy the generated guard L['input_ids'].size()[0] != 1.
    gm=<lambda>()
    
    
    
    def forward(self, arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1):
        sym_size_int = torch.ops.aten.sym_size.int(arg16_1, 0)
        empty = torch.ops.aten.empty.memory_format([sym_size_int, 1, 0, 96], dtype = torch.float32, device = device(type='cpu'), pin_memory = False)
        empty_1 = torch.ops.aten.empty.memory_format([sym_size_int, 1, 0, 96], dtype = torch.float32, device = device(type='cpu'), pin_memory = False)
        cat = torch.ops.aten.cat.default([empty, arg16_1], -2);  empty = arg16_1 = None
        cat_1 = torch.ops.aten.cat.default([empty_1, arg17_1], -2);  empty_1 = arg17_1 = None
        embedding = torch.ops.aten.embedding.default(arg0_1, arg13_1);  arg0_1 = None
        sym_size_int_1 = torch.ops.aten.sym_size.int(cat, 2)
        sym_size_int_2 = torch.ops.aten.sym_size.int(arg13_1, 1)
        add = sym_size_int_1 + sym_size_int_2
        arange = torch.ops.aten.arange.start(sym_size_int_1, add, device = device(type='cpu'), pin_memory = False);  add = None
        to = torch.ops.aten.to.device(arg14_1, device(type='cpu'), torch.bool);  arg14_1 = None
        sym_size_int_3 = torch.ops.aten.sym_size.int(arange, 0)
        add_1 = sym_size_int_3 + sym_size_int_1;  sym_size_int_1 = None
        arange_1 = torch.ops.aten.arange.default(add_1, device = device(type='cpu'), pin_memory = False);  add_1 = None
        add_ = torch.ops.aten.add_.Tensor(arange_1, 0)
        sym_size_int_5 = torch.ops.aten.sym_size.int(arg13_1, 0);  arg13_1 = None
        arange_2 = torch.ops.aten.arange.default(sym_size_int_5, device = device(type='cpu'), pin_memory = False)
        arange_3 = torch.ops.aten.arange.default(1, device = device(type='cpu'), pin_memory = False)
        sym_size_int_6 = torch.ops.aten.sym_size.int(arange_2, 0)
        sym_size_int_7 = torch.ops.aten.sym_size.int(arange_1, 0);  arange_1 = None
        reshape = torch.ops.aten.reshape.default(arange_2, [-1, 1, 1, 1]);  arange_2 = None
        reshape_1 = torch.ops.aten.reshape.default(arange_3, [1, -1, 1, 1]);  arange_3 = None
        reshape_2 = torch.ops.aten.reshape.default(arange, [1, 1, -1, 1]);  arange = None
        reshape_3 = torch.ops.aten.reshape.default(add_, [1, 1, 1, -1]);  add_ = None
        expand = torch.ops.aten.expand.default(reshape, [sym_size_int_6, 1, sym_size_int_3, sym_size_int_7]);  reshape = None
        expand_1 = torch.ops.aten.expand.default(reshape_1, [sym_size_int_6, 1, sym_size_int_3, sym_size_int_7]);  reshape_1 = expand_1 = None
        expand_2 = torch.ops.aten.expand.default(reshape_2, [sym_size_int_6, 1, sym_size_int_3, sym_size_int_7]);  reshape_2 = None
        expand_3 = torch.ops.aten.expand.default(reshape_3, [sym_size_int_6, 1, sym_size_int_3, sym_size_int_7]);  reshape_3 = sym_size_int_6 = sym_size_int_3 = sym_size_int_7 = None
        new_ones = torch.ops.aten.new_ones.default(expand_2, [], dtype = torch.bool, pin_memory = False)
        le = torch.ops.aten.le.Tensor(expand_3, expand_2);  expand_2 = None
        to_1 = torch.ops.aten.to.dtype_layout(le, dtype = torch.bool, layout = torch.strided, device = device(type='cpu'));  le = None
        and_1 = torch.ops.aten.__and__.Tensor(new_ones, to_1);  new_ones = to_1 = None
        index = torch.ops.aten.index.Tensor(to, [expand, expand_3]);  to = expand = expand_3 = None
        to_2 = torch.ops.aten.to.dtype_layout(index, dtype = torch.bool, layout = torch.strided, device = device(type='cpu'));  index = None
        and_2 = torch.ops.aten.__and__.Tensor(and_1, to_2);  and_1 = to_2 = None
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None
        unsqueeze = torch.ops.aten.unsqueeze.default(arg12_1, 0);  arg12_1 = None
        unsqueeze_1 = torch.ops.aten.unsqueeze.default(unsqueeze, 2);  unsqueeze = None
        to_3 = torch.ops.aten.to.dtype(unsqueeze_1, torch.float32);  unsqueeze_1 = None
        sym_size_int_8 = torch.ops.aten.sym_size.int(arg15_1, 0)
        expand_4 = torch.ops.aten.expand.default(to_3, [sym_size_int_8, -1, 1]);  to_3 = sym_size_int_8 = None
        to_4 = torch.ops.aten.to.dtype_layout(expand_4, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'));  expand_4 = None
        unsqueeze_2 = torch.ops.aten.unsqueeze.default(arg15_1, 1);  arg15_1 = None
        slice_1 = torch.ops.aten.slice.Tensor(unsqueeze_2, 2, 0, 9223372036854775807);  unsqueeze_2 = None
        to_5 = torch.ops.aten.to.dtype(slice_1, torch.float32);  slice_1 = None
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cpu', torch.bfloat16, False, False)
        to_6 = torch.ops.aten.to.dtype(to_4, torch.float32);  to_4 = None
        to_7 = torch.ops.aten.to.dtype(to_5, torch.float32);  to_5 = None
        matmul = torch.ops.aten.matmul.default(to_6, to_7);  to_6 = to_7 = None
        transpose = torch.ops.aten.transpose.int(matmul, 1, 2);  matmul = None
        cat_2 = torch.ops.aten.cat.default([transpose, transpose], -1);  transpose = None
        cos = torch.ops.aten.cos.default(cat_2)
        mul = torch.ops.aten.mul.Tensor(cos, 1.0);  cos = None
        sin = torch.ops.aten.sin.default(cat_2);  cat_2 = None
        mul_1 = torch.ops.aten.mul.Tensor(sin, 1.0);  sin = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = _exit_autocast = None
        to_8 = torch.ops.aten.to.dtype(mul, torch.float32);  mul = None
        to_9 = torch.ops.aten.to.dtype(mul_1, torch.float32);  mul_1 = None
        _set_grad_enabled_1 = torch._C._set_grad_enabled(True);  _set_grad_enabled_1 = None
        to_10 = torch.ops.aten.to.dtype(embedding, torch.float32);  embedding = None
        pow_1 = torch.ops.aten.pow.Tensor_Scalar(to_10, 2)
        mean = torch.ops.aten.mean.dim(pow_1, [-1], True);  pow_1 = None
        add_3 = torch.ops.aten.add.Tensor(mean, 1e-05);  mean = None
        rsqrt = torch.ops.aten.rsqrt.default(add_3);  add_3 = None
        mul_2 = torch.ops.aten.mul.Tensor(to_10, rsqrt);  rsqrt = None
        to_11 = torch.ops.aten.to.dtype(mul_2, torch.float32);  mul_2 = None
        mul_3 = torch.ops.aten.mul.Tensor(arg8_1, to_11);  arg8_1 = to_11 = None
        linear = torch.ops.aten.linear.default(mul_3, arg1_1);  arg1_1 = None
        view = torch.ops.aten.view.default(linear, [sym_size_int_5, sym_size_int_2, -1, 96]);  linear = None
        transpose_1 = torch.ops.aten.transpose.int(view, 1, 2);  view = None
        linear_1 = torch.ops.aten.linear.default(mul_3, arg2_1);  arg2_1 = None
        view_1 = torch.ops.aten.view.default(linear_1, [sym_size_int_5, sym_size_int_2, -1, 96]);  linear_1 = None
        transpose_2 = torch.ops.aten.transpose.int(view_1, 1, 2);  view_1 = None
        linear_2 = torch.ops.aten.linear.default(mul_3, arg3_1);  mul_3 = arg3_1 = None
        view_2 = torch.ops.aten.view.default(linear_2, [sym_size_int_5, sym_size_int_2, -1, 96]);  linear_2 = None
        transpose_3 = torch.ops.aten.transpose.int(view_2, 1, 2);  view_2 = None
        unsqueeze_3 = torch.ops.aten.unsqueeze.default(to_8, 1);  to_8 = None
        unsqueeze_4 = torch.ops.aten.unsqueeze.default(to_9, 1);  to_9 = None
        mul_4 = torch.ops.aten.mul.Tensor(transpose_1, unsqueeze_3)
        slice_2 = torch.ops.aten.slice.Tensor(transpose_1, 3, 0, 48)
        slice_3 = torch.ops.aten.slice.Tensor(transpose_1, 3, 48, 9223372036854775807);  transpose_1 = None
        neg = torch.ops.aten.neg.default(slice_3);  slice_3 = None
        cat_3 = torch.ops.aten.cat.default([neg, slice_2], -1);  neg = slice_2 = None
        mul_5 = torch.ops.aten.mul.Tensor(cat_3, unsqueeze_4);  cat_3 = None
        add_4 = torch.ops.aten.add.Tensor(mul_4, mul_5);  mul_4 = mul_5 = None
        mul_6 = torch.ops.aten.mul.Tensor(transpose_2, unsqueeze_3);  unsqueeze_3 = None
        slice_4 = torch.ops.aten.slice.Tensor(transpose_2, 3, 0, 48)
        slice_5 = torch.ops.aten.slice.Tensor(transpose_2, 3, 48, 9223372036854775807);  transpose_2 = None
        neg_1 = torch.ops.aten.neg.default(slice_5);  slice_5 = None
        cat_4 = torch.ops.aten.cat.default([neg_1, slice_4], -1);  neg_1 = slice_4 = None
        mul_7 = torch.ops.aten.mul.Tensor(cat_4, unsqueeze_4);  cat_4 = unsqueeze_4 = None
        add_5 = torch.ops.aten.add.Tensor(mul_6, mul_7);  mul_6 = mul_7 = None
        cat_5 = torch.ops.aten.cat.default([cat, add_5], -2);  cat = add_5 = None
        cat_6 = torch.ops.aten.cat.default([cat_1, transpose_3], -2);  cat_1 = transpose_3 = None
        unsqueeze_5 = torch.ops.aten.unsqueeze.default(cat_5, 2)
        sym_size_int_10 = torch.ops.aten.sym_size.int(cat_5, 2)
        slice_6 = torch.ops.aten.slice.Tensor(unsqueeze_5, 3, 0, 9223372036854775807);  unsqueeze_5 = None
        expand_5 = torch.ops.aten.expand.default(slice_6, [sym_size_int, 1, 2, sym_size_int_10, 96]);  slice_6 = None
        reshape_4 = torch.ops.aten.reshape.default(expand_5, [sym_size_int, 2, sym_size_int_10, 96]);  expand_5 = None
        unsqueeze_6 = torch.ops.aten.unsqueeze.default(cat_6, 2)
        sym_size_int_11 = torch.ops.aten.sym_size.int(cat_6, 2)
        slice_7 = torch.ops.aten.slice.Tensor(unsqueeze_6, 3, 0, 9223372036854775807);  unsqueeze_6 = None
        expand_6 = torch.ops.aten.expand.default(slice_7, [sym_size_int, 1, 2, sym_size_int_11, 96]);  slice_7 = None
        reshape_5 = torch.ops.aten.reshape.default(expand_6, [sym_size_int, 2, sym_size_int_11, 96]);  expand_6 = sym_size_int = sym_size_int_11 = None
        slice_8 = torch.ops.aten.slice.Tensor(and_2, 3, None, sym_size_int_10);  and_2 = sym_size_int_10 = None
        scaled_dot_product_attention = torch.ops.aten.scaled_dot_product_attention.default(add_4, reshape_4, reshape_5, slice_8, scale = 0.10206207261596575);  add_4 = reshape_4 = reshape_5 = slice_8 = None
        transpose_4 = torch.ops.aten.transpose.int(scaled_dot_product_attention, 1, 2);  scaled_dot_product_attention = None
        reshape_6 = torch.ops.aten.reshape.default(transpose_4, [sym_size_int_5, sym_size_int_2, -1]);  transpose_4 = sym_size_int_5 = sym_size_int_2 = None
        linear_3 = torch.ops.aten.linear.default(reshape_6, arg4_1);  reshape_6 = arg4_1 = None
        add_6 = torch.ops.aten.add.Tensor(to_10, linear_3);  to_10 = linear_3 = None
        to_12 = torch.ops.aten.to.dtype(add_6, torch.float32);  add_6 = None
        pow_2 = torch.ops.aten.pow.Tensor_Scalar(to_12, 2)
        mean_1 = torch.ops.aten.mean.dim(pow_2, [-1], True);  pow_2 = None
        add_7 = torch.ops.aten.add.Tensor(mean_1, 1e-05);  mean_1 = None
        rsqrt_1 = torch.ops.aten.rsqrt.default(add_7);  add_7 = None
        mul_16 = torch.ops.aten.mul.Tensor(to_12, rsqrt_1);  rsqrt_1 = None
        to_13 = torch.ops.aten.to.dtype(mul_16, torch.float32);  mul_16 = None
        mul_17 = torch.ops.aten.mul.Tensor(arg9_1, to_13);  arg9_1 = to_13 = None
        linear_4 = torch.ops.aten.linear.default(mul_17, arg5_1);  arg5_1 = None
        silu = torch.ops.aten.silu.default(linear_4);  linear_4 = None
        linear_5 = torch.ops.aten.linear.default(mul_17, arg6_1);  mul_17 = arg6_1 = None
        mul_18 = torch.ops.aten.mul.Tensor(silu, linear_5);  silu = linear_5 = None
        linear_6 = torch.ops.aten.linear.default(mul_18, arg7_1);  mul_18 = arg7_1 = None
        add_8 = torch.ops.aten.add.Tensor(to_12, linear_6);  to_12 = linear_6 = None
        to_14 = torch.ops.aten.to.dtype(add_8, torch.float32);  add_8 = None
        pow_3 = torch.ops.aten.pow.Tensor_Scalar(to_14, 2)
        mean_2 = torch.ops.aten.mean.dim(pow_3, [-1], True);  pow_3 = None
        add_9 = torch.ops.aten.add.Tensor(mean_2, 1e-05);  mean_2 = None
        rsqrt_2 = torch.ops.aten.rsqrt.default(add_9);  add_9 = None
        mul_19 = torch.ops.aten.mul.Tensor(to_14, rsqrt_2);  to_14 = rsqrt_2 = None
        to_15 = torch.ops.aten.to.dtype(mul_19, torch.float32);  mul_19 = None
        mul_20 = torch.ops.aten.mul.Tensor(arg10_1, to_15);  arg10_1 = to_15 = None
        slice_9 = torch.ops.aten.slice.Tensor(mul_20, 1, 0, 9223372036854775807);  mul_20 = None
        linear_7 = torch.ops.aten.linear.default(slice_9, arg11_1);  slice_9 = arg11_1 = None
        return (linear_7, cat_5, cat_6)
        
    # To see more debug info, please use `graph_module.print_readable()`
    [torch.onnx] Obtain model graph for `LlamaForCausalLM([...]` with `torch.export.export(..., strict=False)`... ✅
    [torch.onnx] Run decomposition...
    [torch.onnx] Run decomposition... ✅
    [torch.onnx] Translate the graph into ONNX...
    [torch.onnx] Translate the graph into ONNX... ✅
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
      warnings.warn(
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
      warnings.warn(
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: cache+seq will not be used, since it shares the same shape constraints with another axis: seq_length.
      warnings.warn(
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
      warnings.warn(
    ~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
      warnings.warn(
    Applied 35 of general pattern rewrite rules.
    [call_torch_export_onnx] done (export)
    [call_torch_export_onnx] starts optimization='ir'...
    [call_torch_export_onnx] done (optimization)
    [validate_model] dumps onnx program in 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'...
    [validate_model] done (dump onnx) in 0.26968359500096994
    [validate_model] dumps statistics in 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir'...
    [validate_model] done (dump)
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour=None
    [validate_onnx_model] done (ort_session) flavour=None
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=7.450580596923828e-07, rel=0.00027354017262003796, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0003063333051545333, n=404160.0
    [validate_model] run onnxruntime fusion for 'bart'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'bart' in 0.20830882399968687, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bart.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortbart'
    [validate_onnx_model] done (ort_session) flavour='ortbart'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'bert'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'bert' in 0.26940306600045005, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortbert'
    [validate_onnx_model] done (ort_session) flavour='ortbert'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'bert_keras'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'bert_keras' in 0.27563871500024106, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert_keras.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortbert_keras'
    [validate_onnx_model] done (ort_session) flavour='ortbert_keras'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'bert_tf'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'bert_tf' in 0.2212624780004262, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert_tf.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortbert_tf'
    [validate_onnx_model] done (ort_session) flavour='ortbert_tf'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'clip'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'clip' in 0.3054245700004685, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.clip.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortclip'
    [validate_onnx_model] done (ort_session) flavour='ortclip'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'conformer'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'conformer' in 0.2323375810010475, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.conformer.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortconformer'
    [validate_onnx_model] done (ort_session) flavour='ortconformer'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'gpt2'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'gpt2' in 0.24990340099975583, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt2.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortgpt2'
    [validate_onnx_model] done (ort_session) flavour='ortgpt2'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'gpt2_tf'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'gpt2_tf' in 0.30343909399925906, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt2_tf.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortgpt2_tf'
    [validate_onnx_model] done (ort_session) flavour='ortgpt2_tf'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'gpt_neox'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'gpt_neox' in 0.2219563679991552, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt_neox.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortgpt_neox'
    [validate_onnx_model] done (ort_session) flavour='ortgpt_neox'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'mmdit'
    
fusion:   0%|          | 0/5 [00:00<?, ?it/s]failed in shape inference <class 'AssertionError'>
    
                                             
The optimized model requires LayerNormalization with broadcast support. Please use onnxruntime-gpu>=1.21 for inference.
    
fusion:  20%|##        | 1/5 [00:00<00:00, 11.22it/s]
                                                     
Fused SimplifiedLayerNormalization: 3
    
fusion:  20%|##        | 1/5 [00:00<00:00, 10.93it/s]
                                                     
opset version: 18
    
fusion: 100%|##########| 5/5 [00:00<00:00, 49.89it/s]
fusion: 100%|##########| 5/5 [00:00<00:00, 49.80it/s]
    [validate_model] done 'mmdit' in 0.2987801479994232, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.mmdit.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortmmdit'
    [validate_onnx_model] done (ort_session) flavour='ortmmdit'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'phi'
    [validate_model] done 'phi' in 0.022397999000531854, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.phi.onnx'
    [validate_onnx_model] missing 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.phi.onnx'
    [validate_model] run onnxruntime fusion for 'sam2'
    
sam2 fusion:   0%|          | 0/12 [00:00<?, ?it/s]failed in shape inference <class 'AssertionError'>
    
                                                   
symbolic shape inference disabled or failed.
    
sam2 fusion:  50%|#####     | 6/12 [00:00<00:00, 211.89it/s]
                                                            
opset version: 18
    
sam2 fusion: 100%|##########| 12/12 [00:00<00:00, 354.49it/s]
sam2 fusion: 100%|##########| 12/12 [00:00<00:00, 352.58it/s]
    [validate_model] done 'sam2' in 0.14555706299870508, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.sam2.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortsam2'
    [validate_onnx_model] done (ort_session) flavour='ortsam2'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=7.450580596923828e-07, rel=0.00027354017262003796, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0003063333051545333, n=404160.0
    [validate_model] run onnxruntime fusion for 'swin'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'swin' in 0.15037079700050526, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.swin.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortswin'
    [validate_onnx_model] done (ort_session) flavour='ortswin'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 't5'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 't5' in 0.1526649110001017, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.t5.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortt5'
    [validate_onnx_model] done (ort_session) flavour='ortt5'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'tnlr'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'tnlr' in 0.13871947199913848, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.tnlr.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='orttnlr'
    [validate_onnx_model] done (ort_session) flavour='orttnlr'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] run onnxruntime fusion for 'unet'
    
fusion:   0%|          | 0/18 [00:00<?, ?it/s]failed in shape inference <class 'AssertionError'>
    
                                              
symbolic shape inference disabled or failed.
    
fusion:  50%|#####     | 9/18 [00:00<00:00, 278.80it/s]
                                                       
SkipGroupNorm fusion will be skipped since symbolic shape inference disabled or failed.
    
fusion:  67%|######6   | 12/18 [00:00<00:00, 349.53it/s]
                                                        
opset version: 18
    
fusion: 100%|##########| 18/18 [00:00<00:00, 429.29it/s]
fusion: 100%|##########| 18/18 [00:00<00:00, 427.13it/s]
    [validate_model] done 'unet' in 0.13775665299908724, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.unet.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortunet'
    [validate_onnx_model] done (ort_session) flavour='ortunet'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=7.450580596923828e-07, rel=0.00027354017262003796, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0003063333051545333, n=404160.0
    [validate_model] run onnxruntime fusion for 'vae'
    
fusion:   0%|          | 0/18 [00:00<?, ?it/s]failed in shape inference <class 'AssertionError'>
    
                                              
symbolic shape inference disabled or failed.
    
fusion:  50%|#####     | 9/18 [00:00<00:00, 228.14it/s]
                                                       
SkipGroupNorm fusion will be skipped since symbolic shape inference disabled or failed.
    
fusion:  67%|######6   | 12/18 [00:00<00:00, 295.22it/s]
                                                        
opset version: 18
    
fusion: 100%|##########| 18/18 [00:00<00:00, 398.81it/s]
fusion: 100%|##########| 18/18 [00:00<00:00, 397.81it/s]
    [validate_model] done 'vae' in 0.15772798799844168, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.vae.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortvae'
    [validate_onnx_model] done (ort_session) flavour='ortvae'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=7.450580596923828e-07, rel=0.00027354017262003796, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0003063333051545333, n=404160.0
    [validate_model] run onnxruntime fusion for 'vit'
    failed in shape inference <class 'AssertionError'>
    [validate_model] done 'vit' in 0.2868474709994189, saved into 'dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.vit.onnx'
    [validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour='ortvit'
    [validate_onnx_model] done (ort_session) flavour='ortvit'
    [validate_onnx_model] -- make_feeds for 'inputs'...
    [validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.0004247389736828931, n=204672.0
    [validate_onnx_model] -- make_feeds for 'inputs2'...
    [validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#1[T1s3x1x31x96], value_cache=#1[T1s3x1x31x96]))
    [validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96)
    [validate_onnx_model] done (make_feeds)
    [validate_onnx_model] run session...
    [validate_onnx_model] done (run)
    [validate_onnx_model] got=#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96]
    [validate_onnx_model] discrepancies=abs=8.344650268554688e-07, rel=0.00036102609604184355, n=404160.0
    [validate_model] -- done (final)
    
    -- summary --
    :ERR_onnx_missing_ortphi,FileNotFoundError('dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.phi.onnx');
    :ERR_opt_ort_phi,'method' object is not iterable;
    :disc_onnx_ort_run2_abs,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortbart,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortbert,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortbert_keras,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortbert_tf,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortclip,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortconformer,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortgpt2,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortgpt2_tf,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortgpt_neox,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortmmdit,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortsam2,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortswin,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortt5,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_orttnlr,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortunet,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortvae,8.344650268554688e-07;
    :disc_onnx_ort_run2_abs_ortvit,8.344650268554688e-07;
    :disc_onnx_ort_run2_dnan,0;
    :disc_onnx_ort_run2_dnan_ortbart,0;
    :disc_onnx_ort_run2_dnan_ortbert,0;
    :disc_onnx_ort_run2_dnan_ortbert_keras,0;
    :disc_onnx_ort_run2_dnan_ortbert_tf,0;
    :disc_onnx_ort_run2_dnan_ortclip,0;
    :disc_onnx_ort_run2_dnan_ortconformer,0;
    :disc_onnx_ort_run2_dnan_ortgpt2,0;
    :disc_onnx_ort_run2_dnan_ortgpt2_tf,0;
    :disc_onnx_ort_run2_dnan_ortgpt_neox,0;
    :disc_onnx_ort_run2_dnan_ortmmdit,0;
    :disc_onnx_ort_run2_dnan_ortsam2,0;
    :disc_onnx_ort_run2_dnan_ortswin,0;
    :disc_onnx_ort_run2_dnan_ortt5,0;
    :disc_onnx_ort_run2_dnan_orttnlr,0;
    :disc_onnx_ort_run2_dnan_ortunet,0;
    :disc_onnx_ort_run2_dnan_ortvae,0;
    :disc_onnx_ort_run2_dnan_ortvit,0;
    :disc_onnx_ort_run2_n,404160.0;
    :disc_onnx_ort_run2_n_ortbart,404160.0;
    :disc_onnx_ort_run2_n_ortbert,404160.0;
    :disc_onnx_ort_run2_n_ortbert_keras,404160.0;
    :disc_onnx_ort_run2_n_ortbert_tf,404160.0;
    :disc_onnx_ort_run2_n_ortclip,404160.0;
    :disc_onnx_ort_run2_n_ortconformer,404160.0;
    :disc_onnx_ort_run2_n_ortgpt2,404160.0;
    :disc_onnx_ort_run2_n_ortgpt2_tf,404160.0;
    :disc_onnx_ort_run2_n_ortgpt_neox,404160.0;
    :disc_onnx_ort_run2_n_ortmmdit,404160.0;
    :disc_onnx_ort_run2_n_ortsam2,404160.0;
    :disc_onnx_ort_run2_n_ortswin,404160.0;
    :disc_onnx_ort_run2_n_ortt5,404160.0;
    :disc_onnx_ort_run2_n_orttnlr,404160.0;
    :disc_onnx_ort_run2_n_ortunet,404160.0;
    :disc_onnx_ort_run2_n_ortvae,404160.0;
    :disc_onnx_ort_run2_n_ortvit,404160.0;
    :disc_onnx_ort_run2_rel,0.0003063333051545333;
    :disc_onnx_ort_run2_rel_ortbart,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortbert,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortbert_keras,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortbert_tf,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortclip,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortconformer,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortgpt2,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortgpt2_tf,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortgpt_neox,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortmmdit,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortsam2,0.0003063333051545333;
    :disc_onnx_ort_run2_rel_ortswin,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortt5,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_orttnlr,0.00036102609604184355;
    :disc_onnx_ort_run2_rel_ortunet,0.0003063333051545333;
    :disc_onnx_ort_run2_rel_ortvae,0.0003063333051545333;
    :disc_onnx_ort_run2_rel_ortvit,0.00036102609604184355;
    :disc_onnx_ort_run2_sum,0.037360749839990604;
    :disc_onnx_ort_run2_sum_ortbart,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortbert,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortbert_keras,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortbert_tf,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortclip,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortconformer,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortgpt2,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortgpt2_tf,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortgpt_neox,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortmmdit,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortsam2,0.037360749839990604;
    :disc_onnx_ort_run2_sum_ortswin,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortt5,0.040791127634633995;
    :disc_onnx_ort_run2_sum_orttnlr,0.040791127634633995;
    :disc_onnx_ort_run2_sum_ortunet,0.037360749839990604;
    :disc_onnx_ort_run2_sum_ortvae,0.037360749839990604;
    :disc_onnx_ort_run2_sum_ortvit,0.040791127634633995;
    :disc_onnx_ort_run_abs,7.450580596923828e-07;
    :disc_onnx_ort_run_abs_ortbart,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortbert,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortbert_keras,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortbert_tf,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortclip,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortconformer,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortgpt2,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortgpt2_tf,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortgpt_neox,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortmmdit,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortsam2,7.450580596923828e-07;
    :disc_onnx_ort_run_abs_ortswin,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortt5,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_orttnlr,8.344650268554688e-07;
    :disc_onnx_ort_run_abs_ortunet,7.450580596923828e-07;
    :disc_onnx_ort_run_abs_ortvae,7.450580596923828e-07;
    :disc_onnx_ort_run_abs_ortvit,8.344650268554688e-07;
    :disc_onnx_ort_run_dnan,0;
    :disc_onnx_ort_run_dnan_ortbart,0;
    :disc_onnx_ort_run_dnan_ortbert,0;
    :disc_onnx_ort_run_dnan_ortbert_keras,0;
    :disc_onnx_ort_run_dnan_ortbert_tf,0;
    :disc_onnx_ort_run_dnan_ortclip,0;
    :disc_onnx_ort_run_dnan_ortconformer,0;
    :disc_onnx_ort_run_dnan_ortgpt2,0;
    :disc_onnx_ort_run_dnan_ortgpt2_tf,0;
    :disc_onnx_ort_run_dnan_ortgpt_neox,0;
    :disc_onnx_ort_run_dnan_ortmmdit,0;
    :disc_onnx_ort_run_dnan_ortsam2,0;
    :disc_onnx_ort_run_dnan_ortswin,0;
    :disc_onnx_ort_run_dnan_ortt5,0;
    :disc_onnx_ort_run_dnan_orttnlr,0;
    :disc_onnx_ort_run_dnan_ortunet,0;
    :disc_onnx_ort_run_dnan_ortvae,0;
    :disc_onnx_ort_run_dnan_ortvit,0;
    :disc_onnx_ort_run_n,204672.0;
    :disc_onnx_ort_run_n_ortbart,204672.0;
    :disc_onnx_ort_run_n_ortbert,204672.0;
    :disc_onnx_ort_run_n_ortbert_keras,204672.0;
    :disc_onnx_ort_run_n_ortbert_tf,204672.0;
    :disc_onnx_ort_run_n_ortclip,204672.0;
    :disc_onnx_ort_run_n_ortconformer,204672.0;
    :disc_onnx_ort_run_n_ortgpt2,204672.0;
    :disc_onnx_ort_run_n_ortgpt2_tf,204672.0;
    :disc_onnx_ort_run_n_ortgpt_neox,204672.0;
    :disc_onnx_ort_run_n_ortmmdit,204672.0;
    :disc_onnx_ort_run_n_ortsam2,204672.0;
    :disc_onnx_ort_run_n_ortswin,204672.0;
    :disc_onnx_ort_run_n_ortt5,204672.0;
    :disc_onnx_ort_run_n_orttnlr,204672.0;
    :disc_onnx_ort_run_n_ortunet,204672.0;
    :disc_onnx_ort_run_n_ortvae,204672.0;
    :disc_onnx_ort_run_n_ortvit,204672.0;
    :disc_onnx_ort_run_rel,0.00027354017262003796;
    :disc_onnx_ort_run_rel_ortbart,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortbert,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortbert_keras,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortbert_tf,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortclip,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortconformer,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortgpt2,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortgpt2_tf,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortgpt_neox,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortmmdit,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortsam2,0.00027354017262003796;
    :disc_onnx_ort_run_rel_ortswin,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortt5,0.0004247389736828931;
    :disc_onnx_ort_run_rel_orttnlr,0.0004247389736828931;
    :disc_onnx_ort_run_rel_ortunet,0.00027354017262003796;
    :disc_onnx_ort_run_rel_ortvae,0.00027354017262003796;
    :disc_onnx_ort_run_rel_ortvit,0.0004247389736828931;
    :disc_onnx_ort_run_sum,0.018449280611321228;
    :disc_onnx_ort_run_sum_ortbart,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortbert,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortbert_keras,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortbert_tf,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortclip,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortconformer,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortgpt2,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortgpt2_tf,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortgpt_neox,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortmmdit,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortsam2,0.018449280611321228;
    :disc_onnx_ort_run_sum_ortswin,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortt5,0.01927947461263102;
    :disc_onnx_ort_run_sum_orttnlr,0.01927947461263102;
    :disc_onnx_ort_run_sum_ortunet,0.018449280611321228;
    :disc_onnx_ort_run_sum_ortvae,0.018449280611321228;
    :disc_onnx_ort_run_sum_ortvit,0.01927947461263102;
    :disc_patched_abs,0;
    :disc_patched_dnan,0;
    :disc_patched_n,204672.0;
    :disc_patched_rel,0;
    :disc_patched_sum,0.0;
    :dump_folder,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir;
    :dump_folder_name,arnir0_Tiny-LLM-onnx-dynamo-ir;
    :export_args,();
    :export_dynamo,True;
    :export_exporter,onnx-dynamo;
    :export_kwargs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
    :export_opset,18;
    :export_optimization,ir;
    :model_class,LlamaForCausalLM;
    :model_config,{'vocab_size':32000,'max_position_embeddings':1024,'hidden_size':192,'intermediate_size':1024,'num_hidden_layers':1,'num_attention_heads':2,'num_key_value_heads':1,'hidden_act':'silu','initializer_range':0.02,'rms_norm_eps':1e-05,'pretraining_tp':1,'use_cache':True,'rope_theta':10000.0,'rope_scaling':None,'attention_bias':False,'attention_dropout':0.0,'mlp_bias':False,'head_dim':96,'return_dict':True,'output_hidden_states':False,'torchscript':False,'dtype':'float32','pruned_heads':{},'tie_word_embeddings':False,'chunk_size_feed_forward':0,'is_encoder_decoder':False,'is_decoder':False,'cross_attention_hidden_size':None,'add_cross_attention':False,'tie_encoder_decoder':False,'architectures':['LlamaForCausalLM'],'finetuning_task':None,'id2label':{0:'LABEL_0',1:'LABEL_1'},'label2id':{'LABEL_0':0,'LABEL_1':1},'task_specific_params':None,'problem_type':None,'tokenizer_class':None,'prefix':None,'bos_token_id':1,'pad_token_id':None,'eos_token_id':2,'sep_token_id':None,'decoder_start_token_id':None,'max_length':20,'min_length':0,'do_sample':False,'early_stopping':False,'num_beams':1,'num_beam_groups':1,'diversity_penalty':0.0,'temperature':1.0,'top_k':50,'top_p':1.0,'typical_p':1.0,'repetition_penalty':1.0,'length_penalty':1.0,'no_repeat_ngram_size':0,'encoder_no_repeat_ngram_size':0,'bad_words_ids':None,'num_return_sequences':1,'output_scores':False,'return_dict_in_generate':False,'forced_bos_token_id':None,'forced_eos_token_id':None,'remove_invalid_values':False,'exponential_decay_length_penalty':None,'suppress_tokens':None,'begin_suppress_tokens':None,'_name_or_path':'','transformers_version':'4.56.0.dev0','model_type':'llama','tf_legacy_loss':False,'use_bfloat16':False,'subfolder':None,'output_attentions':False};
    :model_config_class,LlamaConfig;
    :model_file,~/github/transformers/src/transformers/models/llama/modeling_llama.py;
    :model_id,arnir0/Tiny-LLM;
    :model_inputs,dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]));
    :model_inputs_options,;
    :model_module,transformers.models.llama.modeling_llama;
    :model_nweights,12988992;
    :model_shapes,dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#1[{0:Dim(batch),2:DYN(cache_length)}],#1[{0:Dim(batch),2:DYN(cache_length)}]]);
    :model_size,51955968;
    :model_subfolder,;
    :model_task,text-generation;
    :onnx_filename,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.onnx;
    :onnx_filename_ortbart,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bart.onnx;
    :onnx_filename_ortbert,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert.onnx;
    :onnx_filename_ortbert_keras,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert_keras.onnx;
    :onnx_filename_ortbert_tf,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.bert_tf.onnx;
    :onnx_filename_ortclip,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.clip.onnx;
    :onnx_filename_ortconformer,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.conformer.onnx;
    :onnx_filename_ortgpt2,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt2.onnx;
    :onnx_filename_ortgpt2_tf,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt2_tf.onnx;
    :onnx_filename_ortgpt_neox,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.gpt_neox.onnx;
    :onnx_filename_ortmmdit,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.mmdit.onnx;
    :onnx_filename_ortsam2,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.sam2.onnx;
    :onnx_filename_ortswin,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.swin.onnx;
    :onnx_filename_ortt5,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.t5.onnx;
    :onnx_filename_orttnlr,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.tnlr.onnx;
    :onnx_filename_ortunet,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.unet.onnx;
    :onnx_filename_ortvae,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.vae.onnx;
    :onnx_filename_ortvit,dump_models/arnir0_Tiny-LLM-onnx-dynamo-ir/arnir0_Tiny-LLM-onnx-dynamo-ir.ort.vit.onnx;
    :onnx_ort_inputs,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs2,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortbart,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortbert,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortbert_keras,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortbert_tf,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortclip,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortconformer,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortgpt2,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortgpt2_tf,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortgpt_neox,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortmmdit,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortsam2,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortswin,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortt5,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_orttnlr,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortunet,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortvae,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs2_ortvit,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :onnx_ort_inputs_ortbart,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortbert,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortbert_keras,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortbert_tf,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortclip,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortconformer,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortgpt2,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortgpt2_tf,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortgpt_neox,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortmmdit,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortsam2,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortswin,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortt5,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_orttnlr,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortunet,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortvae,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_ort_inputs_ortvit,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :onnx_size,204345;
    :onnx_size_ortbart,173811;
    :onnx_size_ortbert,173811;
    :onnx_size_ortbert_keras,173868;
    :onnx_size_ortbert_tf,173840;
    :onnx_size_ortclip,173811;
    :onnx_size_ortconformer,173857;
    :onnx_size_ortgpt2,173811;
    :onnx_size_ortgpt2_tf,173838;
    :onnx_size_ortgpt_neox,173847;
    :onnx_size_ortmmdit,173820;
    :onnx_size_ortsam2,205125;
    :onnx_size_ortswin,173811;
    :onnx_size_ortt5,173793;
    :onnx_size_orttnlr,173811;
    :onnx_size_ortunet,205125;
    :onnx_size_ortvae,205116;
    :onnx_size_ortvit,173802;
    :opt_ort_bart_delta_node,-18;
    :opt_ort_bart_duration,0.09899726600087888;
    :opt_ort_bart_duration_save,0.06224587200085807;
    :opt_ort_bart_n_nodes1,134;
    :opt_ort_bart_n_nodes2,116;
    :opt_ort_bert_delta_node,-18;
    :opt_ort_bert_duration,0.11427381799876457;
    :opt_ort_bert_duration_save,0.06705120300102863;
    :opt_ort_bert_keras_delta_node,-18;
    :opt_ort_bert_keras_duration,0.07921109400012938;
    :opt_ort_bert_keras_duration_save,0.08345457399991574;
    :opt_ort_bert_keras_n_nodes1,134;
    :opt_ort_bert_keras_n_nodes2,116;
    :opt_ort_bert_n_nodes1,134;
    :opt_ort_bert_n_nodes2,116;
    :opt_ort_bert_tf_delta_node,-18;
    :opt_ort_bert_tf_duration,0.08564825300163648;
    :opt_ort_bert_tf_duration_save,0.05464550899887399;
    :opt_ort_bert_tf_n_nodes1,134;
    :opt_ort_bert_tf_n_nodes2,116;
    :opt_ort_clip_delta_node,-18;
    :opt_ort_clip_duration,0.11919729299916071;
    :opt_ort_clip_duration_save,0.07214596300036646;
    :opt_ort_clip_n_nodes1,134;
    :opt_ort_clip_n_nodes2,116;
    :opt_ort_conformer_delta_node,-18;
    :opt_ort_conformer_duration,0.08376180199957162;
    :opt_ort_conformer_duration_save,0.06496977399910975;
    :opt_ort_conformer_n_nodes1,134;
    :opt_ort_conformer_n_nodes2,116;
    :opt_ort_gpt2_delta_node,-18;
    :opt_ort_gpt2_duration,0.11758374000055483;
    :opt_ort_gpt2_duration_save,0.06024150299890607;
    :opt_ort_gpt2_n_nodes1,134;
    :opt_ort_gpt2_n_nodes2,116;
    :opt_ort_gpt2_tf_delta_node,-18;
    :opt_ort_gpt2_tf_duration,0.1268735680005193;
    :opt_ort_gpt2_tf_duration_save,0.07472950100054732;
    :opt_ort_gpt2_tf_n_nodes1,134;
    :opt_ort_gpt2_tf_n_nodes2,116;
    :opt_ort_gpt_neox_delta_node,-18;
    :opt_ort_gpt_neox_duration,0.09460733400010213;
    :opt_ort_gpt_neox_duration_save,0.060930384999664966;
    :opt_ort_gpt_neox_n_nodes1,134;
    :opt_ort_gpt_neox_n_nodes2,116;
    :opt_ort_mmdit_delta_node,-18;
    :opt_ort_mmdit_duration,0.12561429100060195;
    :opt_ort_mmdit_duration_save,0.057056035000641714;
    :opt_ort_mmdit_n_nodes1,134;
    :opt_ort_mmdit_n_nodes2,116;
    :opt_ort_phi_duration,0.000299528999676113;
    :opt_ort_sam2_delta_node,0;
    :opt_ort_sam2_duration,0.04566408000027877;
    :opt_ort_sam2_duration_save,0.07459294200089062;
    :opt_ort_sam2_n_nodes1,134;
    :opt_ort_sam2_n_nodes2,134;
    :opt_ort_swin_delta_node,-18;
    :opt_ort_swin_duration,0.049752971999623696;
    :opt_ort_swin_duration_save,0.08567862199925003;
    :opt_ort_swin_n_nodes1,134;
    :opt_ort_swin_n_nodes2,116;
    :opt_ort_t5_delta_node,-18;
    :opt_ort_t5_duration,0.04692302599869436;
    :opt_ort_t5_duration_save,0.07905687899983604;
    :opt_ort_t5_n_nodes1,134;
    :opt_ort_t5_n_nodes2,116;
    :opt_ort_tnlr_delta_node,-18;
    :opt_ort_tnlr_duration,0.03601181400154019;
    :opt_ort_tnlr_duration_save,0.08779363000030571;
    :opt_ort_tnlr_n_nodes1,134;
    :opt_ort_tnlr_n_nodes2,116;
    :opt_ort_unet_delta_node,0;
    :opt_ort_unet_duration,0.051656841998919845;
    :opt_ort_unet_duration_save,0.07134037400101079;
    :opt_ort_unet_n_nodes1,134;
    :opt_ort_unet_n_nodes2,134;
    :opt_ort_vae_delta_node,0;
    :opt_ort_vae_duration,0.0537087539996719;
    :opt_ort_vae_duration_save,0.08020333400054369;
    :opt_ort_vae_n_nodes1,134;
    :opt_ort_vae_n_nodes2,134;
    :opt_ort_vit_delta_node,-18;
    :opt_ort_vit_duration,0.1162545749994024;
    :opt_ort_vit_duration_save,0.0759659430004831;
    :opt_ort_vit_n_nodes1,134;
    :opt_ort_vit_n_nodes2,116;
    :run_expected,CausalLMOutputWithPast(logits:T1s2x3x32000,past_key_values:DynamicCache(key_cache=#1[T1s2x1x33x96], value_cache=#1[T1s2x1x33x96]));
    :run_expected2,CausalLMOutputWithPast(logits:T1s3x4x32000,past_key_values:DynamicCache(key_cache=#1[T1s3x1x35x96], value_cache=#1[T1s3x1x35x96]));
    :run_feeds_inputs,dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x1x30x96,past_key_values_value_cache_0:A1s2x1x30x96);
    :run_feeds_inputs2,dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x1x31x96,past_key_values_value_cache_0:A1s3x1x31x96);
    :run_output_inputs,#3[A1s2x3x32000,A1s2x1x33x96,A1s2x1x33x96];
    :run_output_inputs2,#3[A1s3x4x32000,A1s3x1x35x96,A1s3x1x35x96];
    :time_create,0.13043155599916645;
    :time_create_onnx_ort,0.03687587800050096;
    :time_create_onnx_ort_ortbart,0.029774127000564476;
    :time_create_onnx_ort_ortbert,0.03293975900123769;
    :time_create_onnx_ort_ortbert_keras,0.030709955000929767;
    :time_create_onnx_ort_ortbert_tf,0.05855054600033327;
    :time_create_onnx_ort_ortclip,0.039513326999440324;
    :time_create_onnx_ort_ortconformer,0.04696819599848823;
    :time_create_onnx_ort_ortgpt2,0.03718896899954416;
    :time_create_onnx_ort_ortgpt2_tf,0.0383205829984945;
    :time_create_onnx_ort_ortgpt_neox,0.06802177899953676;
    :time_create_onnx_ort_ortmmdit,0.050380042999677244;
    :time_create_onnx_ort_ortsam2,0.054810667999845464;
    :time_create_onnx_ort_ortswin,0.06564367600003607;
    :time_create_onnx_ort_ortt5,0.03528055199967639;
    :time_create_onnx_ort_orttnlr,0.02894865099915478;
    :time_create_onnx_ort_ortunet,0.030569902999559417;
    :time_create_onnx_ort_ortvae,0.06063975399956689;
    :time_create_onnx_ort_ortvit,0.033138623999548145;
    :time_export_onnx,6.9328553219984315;
    :time_export_onnx_opt_ir,0.053772010000102455;
    :time_onnx_save,0.26968359500096994;
    :time_ortfusion_ortbart,0.20830882399968687;
    :time_ortfusion_ortbert,0.26940306600045005;
    :time_ortfusion_ortbert_keras,0.27563871500024106;
    :time_ortfusion_ortbert_tf,0.2212624780004262;
    :time_ortfusion_ortclip,0.3054245700004685;
    :time_ortfusion_ortconformer,0.2323375810010475;
    :time_ortfusion_ortgpt2,0.24990340099975583;
    :time_ortfusion_ortgpt2_tf,0.30343909399925906;
    :time_ortfusion_ortgpt_neox,0.2219563679991552;
    :time_ortfusion_ortmmdit,0.2987801479994232;
    :time_ortfusion_ortphi,0.022397999000531854;
    :time_ortfusion_ortsam2,0.14555706299870508;
    :time_ortfusion_ortswin,0.15037079700050526;
    :time_ortfusion_ortt5,0.1526649110001017;
    :time_ortfusion_orttnlr,0.13871947199913848;
    :time_ortfusion_ortunet,0.13775665299908724;
    :time_ortfusion_ortvae,0.15772798799844168;
    :time_ortfusion_ortvit,0.2868474709994189;
    :time_run,0.015697179000198958;
    :time_run2,0.025493414999800734;
    :time_run_onnx_ort,0.0019326829988131067;
    :time_run_onnx_ort2,0.0021378870005719364;
    :time_run_onnx_ort2_ortbart,0.009951333000572049;
    :time_run_onnx_ort2_ortbert,0.002560373999585863;
    :time_run_onnx_ort2_ortbert_keras,0.0027044340004067635;
    :time_run_onnx_ort2_ortbert_tf,0.0034478590005164733;
    :time_run_onnx_ort2_ortclip,0.002298619001521729;
    :time_run_onnx_ort2_ortconformer,0.00813833200118097;
    :time_run_onnx_ort2_ortgpt2,0.002527812001062557;
    :time_run_onnx_ort2_ortgpt2_tf,0.0021786209999845596;
    :time_run_onnx_ort2_ortgpt_neox,0.002349907001189422;
    :time_run_onnx_ort2_ortmmdit,0.005631908999930602;
    :time_run_onnx_ort2_ortsam2,0.005076429000837379;
    :time_run_onnx_ort2_ortswin,0.015455577000466292;
    :time_run_onnx_ort2_ortt5,0.002844454000296537;
    :time_run_onnx_ort2_orttnlr,0.003999888000180363;
    :time_run_onnx_ort2_ortunet,0.004999580998628517;
    :time_run_onnx_ort2_ortvae,0.004442308001671336;
    :time_run_onnx_ort2_ortvit,0.0030644479993497953;
    :time_run_onnx_ort_ortbart,0.003036991000044509;
    :time_run_onnx_ort_ortbert,0.0024984219999169;
    :time_run_onnx_ort_ortbert_keras,0.001992705998418387;
    :time_run_onnx_ort_ortbert_tf,0.0028049930006091017;
    :time_run_onnx_ort_ortclip,0.0025185779995808844;
    :time_run_onnx_ort_ortconformer,0.0034222040012537036;
    :time_run_onnx_ort_ortgpt2,0.002344845999687095;
    :time_run_onnx_ort_ortgpt2_tf,0.0018748250004136935;
    :time_run_onnx_ort_ortgpt_neox,0.003187795000485494;
    :time_run_onnx_ort_ortmmdit,0.011193055999683565;
    :time_run_onnx_ort_ortsam2,0.005388905001382227;
    :time_run_onnx_ort_ortswin,0.010517578000872163;
    :time_run_onnx_ort_ortt5,0.003151558999888948;
    :time_run_onnx_ort_orttnlr,0.0029032330003246898;
    :time_run_onnx_ort_ortunet,0.005167968000023393;
    :time_run_onnx_ort_ortvae,0.002564708998761489;
    :time_run_onnx_ort_ortvit,0.00192625800082169;
    :time_run_patched,0.01793804299995827;
    :version_date,2025-08-27T17:08:34;
    :version_device,;
    :version_do_run,True;
    :version_drop_inputs,[];
    :version_dtype,;
    :version_dump_folder,dump_models;
    :version_exporter,onnx-dynamo;
    :version_inputs2,1;
    :version_model_id,arnir0/Tiny-LLM;
    :version_numpy,2.3.2;
    :version_onnx,1.20.0;
    :version_onnx_diagnostic,0.7.7;
    :version_onnx_ir,0.1.8;
    :version_onnxruntime,1.23.0;
    :version_onnxscript,0.3.0.dev20250301;
    :version_opset,18;
    :version_optimization,ir;
    :version_ortbart_hidden_size,192;
    :version_ortbart_num_attention_heads,2;
    :version_ortbert_hidden_size,192;
    :version_ortbert_keras_hidden_size,192;
    :version_ortbert_keras_num_attention_heads,2;
    :version_ortbert_num_attention_heads,2;
    :version_ortbert_tf_hidden_size,192;
    :version_ortbert_tf_num_attention_heads,2;
    :version_ortclip_hidden_size,192;
    :version_ortclip_num_attention_heads,2;
    :version_ortconformer_hidden_size,192;
    :version_ortconformer_num_attention_heads,2;
    :version_ortfusiontype,ALL;
    :version_ortgpt2_hidden_size,192;
    :version_ortgpt2_num_attention_heads,2;
    :version_ortgpt2_tf_hidden_size,192;
    :version_ortgpt2_tf_num_attention_heads,2;
    :version_ortgpt_neox_hidden_size,192;
    :version_ortgpt_neox_num_attention_heads,2;
    :version_ortmmdit_hidden_size,192;
    :version_ortmmdit_num_attention_heads,2;
    :version_ortphi_hidden_size,192;
    :version_ortphi_num_attention_heads,2;
    :version_ortsam2_hidden_size,192;
    :version_ortsam2_num_attention_heads,2;
    :version_ortswin_hidden_size,192;
    :version_ortswin_num_attention_heads,2;
    :version_ortt5_hidden_size,192;
    :version_ortt5_num_attention_heads,2;
    :version_orttnlr_hidden_size,192;
    :version_orttnlr_num_attention_heads,2;
    :version_ortunet_hidden_size,192;
    :version_ortunet_num_attention_heads,2;
    :version_ortvae_hidden_size,192;
    :version_ortvae_num_attention_heads,2;
    :version_ortvit_hidden_size,192;
    :version_ortvit_num_attention_heads,2;
    :version_patch,True;
    :version_patch_kwargs,{'patch_transformers':True,'patch_diffusers':True,'patch':True};
    :version_quiet,False;
    :version_rewrite,True;
    :version_runtime,onnxruntime;
    :version_same_as_pretrained,False;
    :version_scipy,1.16.1;
    :version_stop_if_static,0;
    :version_torch,2.9.0.dev20250820+cu126;
    :version_transformers,4.56.0.dev0;
    :version_use_pretrained,False;

Sdpa or Eager implementation or Use a StaticCache

Add --mop cache_implementation=static --iop cls_cache=StaticCache to use a StaticCache instead of a DynamicCache (default). Add --mop attn_implementation=eager to explicitly select eager implementation for attention.

python -m onnx_diagnostic validate \
            -m google/gemma-2b \
            --run \
            -v 1 \
            --export custom \
            -o dump_test \
            --dtype float16 \
            --device cpu \
            --patch \
            --no-quiet \
            --opt default \
            --rewrite \
            --mop attn_implementation=eager \
            --mop cache_implementation=static \
            --iop cls_cache=StaticCache