Note
Go to the end to download the full example code.
Measures loading, saving time for an onnx model in python¶
import os
import time
import numpy as np
import onnx
import onnx_extended.onnx2 as onnx2
onnx_file = (
"dump_test/microsoft_Phi-4-mini-reasoning-onnx-dynamo-ir/"
"microsoft_Phi-4-mini-reasoning-onnx-dynamo-ir.onnx"
)
if not os.path.exists(onnx_file):
from onnx_diagnostic.torch_models.validate import validate_model
print("Creates the model...")
validate_model(
"microsoft/Phi-4-mini-reasoning",
do_run=True,
verbose=2,
exporter="onnx-dynamo",
do_same=True,
patch=True,
rewrite=True,
optimization="ir",
dump_folder="dump_test",
)
print("done.")
Creates the model...
[validate_model] dump into 'microsoft_Phi-4-mini-reasoning-onnx-dynamo-ir'
[validate_model] validate model id 'microsoft/Phi-4-mini-reasoning'
[validate_model] get dummy inputs with input_options=None...
[validate_model] rewrite=True, patch_kwargs={'patch_transformers': True, 'patch_diffusers': True, 'patch': True}, stop_if_static=1
[validate_model] exporter='onnx-dynamo', optimization='ir'
[validate_model] dump_folder='dump_test/microsoft_Phi-4-mini-reasoning-onnx-dynamo-ir'
[validate_model] output_names=None
[get_untrained_model_with_inputs] model_id='microsoft/Phi-4-mini-reasoning'
[get_untrained_model_with_inputs] use preinstalled 'microsoft/Phi-4-mini-reasoning'
[get_untrained_model_with_inputs] architectures=['Phi3ForCausalLM']
[get_untrained_model_with_inputs] cls='Phi3Config'
[get_untrained_model_with_inputs] task='text-generation'
[get_untrained_model_with_inputs] default config._attn_implementation=None
[get_untrained_model_with_inputs] use fct=<function get_inputs at 0x737cbd9eb880>
[validate_model] --
[validate_model] task=text-generation
[validate_model] size=989.51953125 Mb
[validate_model] n_weights=259.396608 millions parameters
[validate_model] +INPUT input_ids=T7s2x3
[validate_model] +INPUT attention_mask=T7s2x33
[validate_model] +INPUT position_ids=T7s2x3
[validate_model] +INPUT past_key_values=DynamicCache(key_cache=#2[T1s2x8x30x128,T1s2x8x30x128], value_cache=#2[T1s2x8x30x128,T1s2x8x30x128])
[validate_model] +SHAPE input_ids={0:Dim(batch),1:DYN(seq_length)}
[validate_model] +SHAPE attention_mask={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE position_ids={0:Dim(batch),1:DYN(cache+seq)}
[validate_model] +SHAPE past_key_values=#2[#2[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}],#2[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}]]
[validate_model] --
[validate_model] -- run the model inputs='inputs'...
[validate_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#2[T1s2x8x30x128,T1s2x8x30x128], value_cache=#2[T1s2x8x30x128,T1s2x8x30x128]))
[validate_model] done ([run])
[validate_model] -- run the model inputs='inputs2'...
[validate_model] inputs2=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#2[T1s3x8x31x128,T1s3x8x31x128], value_cache=#2[T1s3x8x31x128,T1s3x8x31x128]))
[validate_model] done ([run2])
[validate_model] -- export the model with 'onnx-dynamo', optimization='ir'
[validate_model] applies patches before exporting stop_if_static=1
[torch_export_patches] replace torch.jit.isinstance, torch._dynamo.mark_static_address
[_fix_registration] DynamicCache is unregistered and registered first
[unregister_cache_serialization] unregistered DynamicCache
[register_class_serialization] ---------- register DynamicCache
[_fix_registration] DynamicCache done.
[_fix_registration] BaseModelOutput is unregistered and registered first
[unregister_cache_serialization] unregistered BaseModelOutput
[register_class_serialization] ---------- register BaseModelOutput
[_fix_registration] BaseModelOutput done.
[_fix_registration] UNet2DConditionOutput is unregistered and registered first
[unregister_cache_serialization] unregistered UNet2DConditionOutput
[register_class_serialization] ---------- register UNet2DConditionOutput
[_fix_registration] UNet2DConditionOutput done.
[register_class_serialization] already registered DynamicCache
[register_class_serialization] ---------- register HybridCache
[register_class_serialization] ---------- register MambaCache
[register_class_serialization] ---------- register EncoderDecoderCache
[register_class_serialization] ---------- register SlidingWindowCache
[register_class_serialization] ---------- register StaticCache
[register_class_serialization] already registered UNet2DConditionOutput
[register_class_serialization] already registered BaseModelOutput
[torch_export_patches] sympy.__version__='1.13.3'
[torch_export_patches] patch sympy
[torch_export_patches] torch.__version__='2.9.0.dev20250727+cu126'
[torch_export_patches] stop_if_static=1
[torch_export_patches] patch pytorch
[torch_export_patches] modifies shape constraints
[torch_export_patches] assert when a dynamic dimension turns static
[torch_export_patches] replaces ShapeEnv._set_replacement
[torch_export_patches] replaces ShapeEnv._log_guard
[torch_export_patches] transformers.__version__='4.55.0.dev0'
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_AttentionMaskConverter:
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Gemma2RotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Gemma3RotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_GemmaRotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_GenerationMixin: _cache_dependant_input_preparation, _cache_dependant_input_preparation_exporting, prepare_inputs_for_generation
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_IdeficsAttention: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_IdeficsEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_LlamaRotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_MistralRotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_MixtralRotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Phi3RotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Phi4MultimodalRotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_PhiRotaryEmbedding: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_SamMaskDecoder: forward
[patch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_SmolLM3RotaryEmbedding: forward
[patch_module_or_classes] function: transformers.models.bart.modeling_bart.eager_attention_forward
[patch_module_or_classes] function: transformers.models.marian.modeling_marian.eager_attention_forward
[patch_module_or_classes] function: transformers.cache_utils.parse_processor_args
[torch_export_patches] patches transformers.masking_utils._vmap_for_bhqkv
[torch_export_patches] patches transformers.masking_utils.eager_mask
[torch_export_patches] done patching
[validate_model] run patched model...
[validate_model] patched inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#2[T1s2x8x30x128,T1s2x8x30x128], value_cache=#2[T1s2x8x30x128,T1s2x8x30x128]))
[validate_model] done (patched run)
[validate_model] patched discrepancies=abs=0, rel=0
[call_torch_export_onnx] exporter='onnx-dynamo', optimization='ir'
[call_torch_export_onnx] args=()
[call_torch_export_onnx] kwargs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#2[T1s2x8x30x128,T1s2x8x30x128], value_cache=#2[T1s2x8x30x128,T1s2x8x30x128]))
[call_torch_export_onnx] dynamic_shapes=dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#2[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}],#2[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}]])
[call_torch_export_onnx] export...
[call_torch_export_onnx] export_export_kwargs=dict(dynamo:bool,dynamic_shapes:dict(input_ids:{0:Dim(batch),1:DYN(seq_length)},attention_mask:{0:Dim(batch),1:DYN(cache+seq)},position_ids:{0:Dim(batch),1:DYN(cache+seq)},past_key_values:#2[#2[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}],#2[{0:Dim(batch),2:DYN(cache_length)},{0:Dim(batch),2:DYN(cache_length)}]]))
[torch.onnx] Obtain model graph for `Phi3ForCausalLM([...]` with `torch.export.export(..., strict=False)`...
[_catch_produce_guards_and_solve_constraints] ERROR: produce_guards_and_solve_constraints failed, use SKIP_SOLVE_CONSTRAINTS=0 to avoid skipping
fake_mode=<torch._subclasses.fake_tensor.FakeTensorMode object at 0x737c6caaed20>
dynamic_shapes={'input_ids': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'attention_mask': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'position_ids': {0: Dim('batch', min=1, max=1024), 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, 'past_key_values': [[{0: Dim('batch', min=1, max=1024), 2: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, {0: Dim('batch', min=1, max=1024), 2: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}], [{0: Dim('batch', min=1, max=1024), 2: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}, {0: Dim('batch', min=1, max=1024), 2: _DimHint(type=<_DimHintType.DYNAMIC: 3>, min=None, max=None, _factory=True)}]]}
equalities_inputs=EqualityConstraint(warn_only=False, source_pairs=[(TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=1, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)), (TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=1, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0))], derived_equalities=[], phantom_symbols=[], relaxed_sources={TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=1, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=2), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=1, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=2), TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=1), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=2), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=2)}, _parents={TensorPropertySource(base=LocalSource(local_name='attention_mask', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=LocalSource(local_name='position_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='key_cache', index_is_slice=False), index=1, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=0, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0), TensorPropertySource(base=GetItemSource(base=GetItemSource(base=LocalSource(local_name='past_key_values', is_input=False, dynamism=None, is_derefed_cell_contents=False), index='value_cache', index_is_slice=False), index=1, index_is_slice=False), prop=<TensorProperty.SIZE: 0>, idx=0): TensorPropertySource(base=LocalSource(local_name='input_ids', is_input=False, dynamism=None, is_derefed_cell_contents=False), prop=<TensorProperty.SIZE: 0>, idx=0)}, _defs={})
original_signature=(input_ids: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_values: Optional[transformers.cache_utils.Cache] = None, inputs_embeds: Optional[torch.FloatTensor] = None, labels: Optional[torch.LongTensor] = None, use_cache: Optional[bool] = None, cache_position: Optional[torch.LongTensor] = None, logits_to_keep: Union[int, torch.Tensor] = 0, **kwargs: Unpack[transformers.utils.generic.TransformersKwargs]) -> transformers.modeling_outputs.CausalLMOutputWithPast
_is_torch_jit_trace=False
exc=produce_guards_and_solve_constraints() got an unexpected keyword argument '_is_torch_jit_trace'
gm=<lambda>(
(true_graph_0): <lambda>()
(false_graph_0): <lambda>()
)
def forward(self, arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1):
embedding = torch.ops.aten.embedding.default(arg14_1, arg16_1, 199999)
sym_size_int = torch.ops.aten.sym_size.int(arg19_1, 2)
sym_size_int_1 = torch.ops.aten.sym_size.int(arg16_1, 1)
add = sym_size_int + sym_size_int_1
arange = torch.ops.aten.arange.start(sym_size_int, add, device = device(type='cpu'), pin_memory = False); add = None
to = torch.ops.aten.to.device(arg17_1, device(type='cpu'), torch.bool); arg17_1 = None
sym_size_int_2 = torch.ops.aten.sym_size.int(arange, 0)
add_1 = sym_size_int_2 + sym_size_int; sym_size_int = None
arange_1 = torch.ops.aten.arange.default(add_1, device = device(type='cpu'), pin_memory = False); add_1 = None
add_ = torch.ops.aten.add_.Tensor(arange_1, 0)
sym_size_int_4 = torch.ops.aten.sym_size.int(arg16_1, 0); arg16_1 = None
arange_2 = torch.ops.aten.arange.default(sym_size_int_4, device = device(type='cpu'), pin_memory = False)
arange_3 = torch.ops.aten.arange.default(1, device = device(type='cpu'), pin_memory = False)
sym_size_int_5 = torch.ops.aten.sym_size.int(arange_2, 0)
sym_size_int_6 = torch.ops.aten.sym_size.int(arange_1, 0); arange_1 = None
reshape = torch.ops.aten.reshape.default(arange_2, [-1, 1, 1, 1]); arange_2 = None
reshape_1 = torch.ops.aten.reshape.default(arange_3, [1, -1, 1, 1]); arange_3 = None
reshape_2 = torch.ops.aten.reshape.default(arange, [1, 1, -1, 1]); arange = None
reshape_3 = torch.ops.aten.reshape.default(add_, [1, 1, 1, -1]); add_ = None
expand = torch.ops.aten.expand.default(reshape, [sym_size_int_5, 1, sym_size_int_2, sym_size_int_6]); reshape = None
expand_1 = torch.ops.aten.expand.default(reshape_1, [sym_size_int_5, 1, sym_size_int_2, sym_size_int_6]); reshape_1 = expand_1 = None
expand_2 = torch.ops.aten.expand.default(reshape_2, [sym_size_int_5, 1, sym_size_int_2, sym_size_int_6]); reshape_2 = None
expand_3 = torch.ops.aten.expand.default(reshape_3, [sym_size_int_5, 1, sym_size_int_2, sym_size_int_6]); reshape_3 = sym_size_int_5 = sym_size_int_2 = sym_size_int_6 = None
new_ones = torch.ops.aten.new_ones.default(expand_2, [], dtype = torch.bool, pin_memory = False)
new_ones_1 = torch.ops.aten.new_ones.default(expand_2, [], dtype = torch.bool, pin_memory = False)
sub_1 = torch.ops.aten.sub.Tensor(expand_2, 262144)
gt_5 = torch.ops.aten.gt.Tensor(expand_3, sub_1); sub_1 = None
and_1 = torch.ops.aten.__and__.Tensor(new_ones_1, gt_5); new_ones_1 = gt_5 = None
le = torch.ops.aten.le.Tensor(expand_3, expand_2); expand_2 = None
and_2 = torch.ops.aten.__and__.Tensor(and_1, le); and_1 = le = None
and_3 = torch.ops.aten.__and__.Tensor(new_ones, and_2); new_ones = and_2 = None
index = torch.ops.aten.index.Tensor(to, [expand, expand_3]); to = expand = expand_3 = None
and_4 = torch.ops.aten.__and__.Tensor(and_3, index); and_3 = index = None
_set_grad_enabled = torch._C._set_grad_enabled(False); _set_grad_enabled = None
max_1 = torch.ops.aten.max.default(arg18_1)
add_3 = torch.ops.aten.add.Tensor(max_1, 1); max_1 = None
_tensor_constant0 = self._tensor_constant0
lift_fresh_copy = torch.ops.aten.lift_fresh_copy.default(_tensor_constant0); _tensor_constant0 = None
detach_ = torch.ops.aten.detach_.default(lift_fresh_copy); lift_fresh_copy = None
arange_4 = torch.ops.aten.arange.start_step(0, 96, 2, dtype = torch.int64, device = device(type='cpu'), pin_memory = False)
to_1 = torch.ops.aten.to.dtype(arange_4, torch.float32); arange_4 = None
div = torch.ops.aten.div.Tensor(to_1, 96); to_1 = None
pow_1 = torch.ops.aten.pow.Scalar(10000.0, div); div = None
mul = torch.ops.aten.mul.Tensor(detach_, pow_1); detach_ = pow_1 = None
reciprocal = torch.ops.aten.reciprocal.default(mul); mul = None
mul_1 = torch.ops.aten.mul.Tensor(reciprocal, 1.0); reciprocal = None
_tensor_constant1 = self._tensor_constant1
to_2 = torch.ops.aten.to.dtype_layout(_tensor_constant1, dtype = torch.float32, layout = torch.strided, device = device(type='cpu')); _tensor_constant1 = None
gt_6 = torch.ops.aten.gt.Scalar(add_3, 4096); add_3 = None
item = torch.ops.aten.item.default(gt_6); gt_6 = None
true_graph_0 = self.true_graph_0
false_graph_0 = self.false_graph_0
cond = torch.ops.higher_order.cond(item, true_graph_0, false_graph_0, (mul_1, to_2)); item = true_graph_0 = false_graph_0 = mul_1 = to_2 = None
getitem = cond[0]; cond = None
unsqueeze = torch.ops.aten.unsqueeze.default(getitem, 0); getitem = None
unsqueeze_1 = torch.ops.aten.unsqueeze.default(unsqueeze, 2); unsqueeze = None
to_3 = torch.ops.aten.to.dtype(unsqueeze_1, torch.float32); unsqueeze_1 = None
sym_size_int_7 = torch.ops.aten.sym_size.int(arg18_1, 0)
expand_4 = torch.ops.aten.expand.default(to_3, [sym_size_int_7, -1, 1]); to_3 = sym_size_int_7 = None
to_4 = torch.ops.aten.to.dtype_layout(expand_4, dtype = torch.float32, layout = torch.strided, device = device(type='cpu')); expand_4 = None
unsqueeze_2 = torch.ops.aten.unsqueeze.default(arg18_1, 1); arg18_1 = None
slice_1 = torch.ops.aten.slice.Tensor(unsqueeze_2, 2, 0, 9223372036854775807); unsqueeze_2 = None
to_5 = torch.ops.aten.to.dtype(slice_1, torch.float32); slice_1 = None
_enter_autocast = torch.amp.autocast_mode._enter_autocast('cpu', torch.bfloat16, False, False)
to_6 = torch.ops.aten.to.dtype(to_4, torch.float32); to_4 = None
to_7 = torch.ops.aten.to.dtype(to_5, torch.float32); to_5 = None
matmul = torch.ops.aten.matmul.default(to_6, to_7); to_6 = to_7 = None
transpose = torch.ops.aten.transpose.int(matmul, 1, 2); matmul = None
cat = torch.ops.aten.cat.default([transpose, transpose], -1); transpose = None
cos = torch.ops.aten.cos.default(cat)
mul_2 = torch.ops.aten.mul.Tensor(cos, 1.1902380714238083); cos = None
sin = torch.ops.aten.sin.default(cat); cat = None
mul_3 = torch.ops.aten.mul.Tensor(sin, 1.1902380714238083); sin = None
_exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = _exit_autocast = None
to_8 = torch.ops.aten.to.dtype(mul_2, torch.float32); mul_2 = None
to_9 = torch.ops.aten.to.dtype(mul_3, torch.float32); mul_3 = None
_set_grad_enabled_1 = torch._C._set_grad_enabled(True); _set_grad_enabled_1 = None
to_10 = torch.ops.aten.to.dtype(embedding, torch.float32); embedding = None
pow_2 = torch.ops.aten.pow.Tensor_Scalar(to_10, 2)
mean = torch.ops.aten.mean.dim(pow_2, [-1], True); pow_2 = None
add_4 = torch.ops.aten.add.Tensor(mean, 1e-05); mean = None
rsqrt = torch.ops.aten.rsqrt.default(add_4); add_4 = None
mul_4 = torch.ops.aten.mul.Tensor(to_10, rsqrt); rsqrt = None
to_11 = torch.ops.aten.to.dtype(mul_4, torch.float32); mul_4 = None
mul_5 = torch.ops.aten.mul.Tensor(arg5_1, to_11); arg5_1 = to_11 = None
linear = torch.ops.aten.linear.default(mul_5, arg2_1); mul_5 = arg2_1 = None
slice_2 = torch.ops.aten.slice.Tensor(linear, 2, 0, 3072)
slice_3 = torch.ops.aten.slice.Tensor(linear, 2, 3072, 4096)
slice_4 = torch.ops.aten.slice.Tensor(linear, 2, 4096, 9223372036854775807); linear = None
view = torch.ops.aten.view.default(slice_2, [sym_size_int_4, sym_size_int_1, -1, 128]); slice_2 = None
transpose_1 = torch.ops.aten.transpose.int(view, 1, 2); view = None
view_1 = torch.ops.aten.view.default(slice_3, [sym_size_int_4, sym_size_int_1, -1, 128]); slice_3 = None
transpose_2 = torch.ops.aten.transpose.int(view_1, 1, 2); view_1 = None
view_2 = torch.ops.aten.view.default(slice_4, [sym_size_int_4, sym_size_int_1, -1, 128]); slice_4 = None
transpose_3 = torch.ops.aten.transpose.int(view_2, 1, 2); view_2 = None
unsqueeze_3 = torch.ops.aten.unsqueeze.default(to_8, 1)
unsqueeze_4 = torch.ops.aten.unsqueeze.default(to_9, 1)
slice_5 = torch.ops.aten.slice.Tensor(transpose_1, 3, 0, 96)
slice_6 = torch.ops.aten.slice.Tensor(transpose_1, 3, 96, 9223372036854775807); transpose_1 = None
slice_7 = torch.ops.aten.slice.Tensor(transpose_2, 3, 0, 96)
slice_8 = torch.ops.aten.slice.Tensor(transpose_2, 3, 96, 9223372036854775807); transpose_2 = None
mul_6 = torch.ops.aten.mul.Tensor(slice_5, unsqueeze_3)
slice_9 = torch.ops.aten.slice.Tensor(slice_5, 3, 0, 48)
slice_10 = torch.ops.aten.slice.Tensor(slice_5, 3, 48, 9223372036854775807); slice_5 = None
neg = torch.ops.aten.neg.default(slice_10); slice_10 = None
cat_1 = torch.ops.aten.cat.default([neg, slice_9], -1); neg = slice_9 = None
mul_7 = torch.ops.aten.mul.Tensor(cat_1, unsqueeze_4); cat_1 = None
add_5 = torch.ops.aten.add.Tensor(mul_6, mul_7); mul_6 = mul_7 = None
cat_2 = torch.ops.aten.cat.default([add_5, slice_6], -1); add_5 = slice_6 = None
mul_8 = torch.ops.aten.mul.Tensor(slice_7, unsqueeze_3); unsqueeze_3 = None
slice_11 = torch.ops.aten.slice.Tensor(slice_7, 3, 0, 48)
slice_12 = torch.ops.aten.slice.Tensor(slice_7, 3, 48, 9223372036854775807); slice_7 = None
neg_1 = torch.ops.aten.neg.default(slice_12); slice_12 = None
cat_3 = torch.ops.aten.cat.default([neg_1, slice_11], -1); neg_1 = slice_11 = None
mul_9 = torch.ops.aten.mul.Tensor(cat_3, unsqueeze_4); cat_3 = unsqueeze_4 = None
add_6 = torch.ops.aten.add.Tensor(mul_8, mul_9); mul_8 = mul_9 = None
cat_4 = torch.ops.aten.cat.default([add_6, slice_8], -1); add_6 = slice_8 = None
cat_5 = torch.ops.aten.cat.default([arg19_1, cat_4], -2); cat_4 = None
cat_6 = torch.ops.aten.cat.default([arg21_1, transpose_3], -2); transpose_3 = None
sym_size_int_9 = torch.ops.aten.sym_size.int(arg19_1, 0); arg19_1 = None
unsqueeze_5 = torch.ops.aten.unsqueeze.default(cat_5, 2)
sym_size_int_10 = torch.ops.aten.sym_size.int(cat_5, 2)
slice_13 = torch.ops.aten.slice.Tensor(unsqueeze_5, 3, 0, 9223372036854775807); unsqueeze_5 = None
expand_5 = torch.ops.aten.expand.default(slice_13, [sym_size_int_9, 8, 3, sym_size_int_10, 128]); slice_13 = None
reshape_4 = torch.ops.aten.reshape.default(expand_5, [sym_size_int_9, 24, sym_size_int_10, 128]); expand_5 = sym_size_int_9 = None
sym_size_int_11 = torch.ops.aten.sym_size.int(arg21_1, 0); arg21_1 = None
unsqueeze_6 = torch.ops.aten.unsqueeze.default(cat_6, 2)
sym_size_int_12 = torch.ops.aten.sym_size.int(cat_6, 2)
slice_14 = torch.ops.aten.slice.Tensor(unsqueeze_6, 3, 0, 9223372036854775807); unsqueeze_6 = None
expand_6 = torch.ops.aten.expand.default(slice_14, [sym_size_int_11, 8, 3, sym_size_int_12, 128]); slice_14 = None
reshape_5 = torch.ops.aten.reshape.default(expand_6, [sym_size_int_11, 24, sym_size_int_12, 128]); expand_6 = sym_size_int_11 = sym_size_int_12 = None
slice_15 = torch.ops.aten.slice.Tensor(and_4, 3, None, sym_size_int_10); sym_size_int_10 = None
scaled_dot_product_attention = torch.ops.aten.scaled_dot_product_attention.default(cat_2, reshape_4, reshape_5, slice_15, scale = 0.08838834764831845); cat_2 = reshape_4 = reshape_5 = slice_15 = None
transpose_4 = torch.ops.aten.transpose.int(scaled_dot_product_attention, 1, 2); scaled_dot_product_attention = None
contiguous = torch.ops.aten.contiguous.default(transpose_4); transpose_4 = None
reshape_6 = torch.ops.aten.reshape.default(contiguous, [sym_size_int_4, sym_size_int_1, -1]); contiguous = None
linear_1 = torch.ops.aten.linear.default(reshape_6, arg1_1); reshape_6 = arg1_1 = None
dropout = torch.ops.aten.dropout.default(linear_1, 0.0, False); linear_1 = None
add_7 = torch.ops.aten.add.Tensor(to_10, dropout); to_10 = dropout = None
to_12 = torch.ops.aten.to.dtype(add_7, torch.float32); add_7 = None
pow_3 = torch.ops.aten.pow.Tensor_Scalar(to_12, 2)
mean_1 = torch.ops.aten.mean.dim(pow_3, [-1], True); pow_3 = None
add_8 = torch.ops.aten.add.Tensor(mean_1, 1e-05); mean_1 = None
rsqrt_1 = torch.ops.aten.rsqrt.default(add_8); add_8 = None
mul_28 = torch.ops.aten.mul.Tensor(to_12, rsqrt_1); rsqrt_1 = None
to_13 = torch.ops.aten.to.dtype(mul_28, torch.float32); mul_28 = None
mul_29 = torch.ops.aten.mul.Tensor(arg6_1, to_13); arg6_1 = to_13 = None
linear_2 = torch.ops.aten.linear.default(mul_29, arg3_1); mul_29 = arg3_1 = None
chunk = torch.ops.aten.chunk.default(linear_2, 2, -1); linear_2 = None
getitem_1 = chunk[0]
getitem_2 = chunk[1]; chunk = None
silu = torch.ops.aten.silu.default(getitem_1); getitem_1 = None
mul_30 = torch.ops.aten.mul.Tensor(getitem_2, silu); getitem_2 = silu = None
linear_3 = torch.ops.aten.linear.default(mul_30, arg4_1); mul_30 = arg4_1 = None
dropout_1 = torch.ops.aten.dropout.default(linear_3, 0.0, False); linear_3 = None
add_9 = torch.ops.aten.add.Tensor(to_12, dropout_1); to_12 = dropout_1 = None
to_14 = torch.ops.aten.to.dtype(add_9, torch.float32); add_9 = None
pow_4 = torch.ops.aten.pow.Tensor_Scalar(to_14, 2)
mean_2 = torch.ops.aten.mean.dim(pow_4, [-1], True); pow_4 = None
add_10 = torch.ops.aten.add.Tensor(mean_2, 1e-05); mean_2 = None
rsqrt_2 = torch.ops.aten.rsqrt.default(add_10); add_10 = None
mul_31 = torch.ops.aten.mul.Tensor(to_14, rsqrt_2); rsqrt_2 = None
to_15 = torch.ops.aten.to.dtype(mul_31, torch.float32); mul_31 = None
mul_32 = torch.ops.aten.mul.Tensor(arg11_1, to_15); arg11_1 = to_15 = None
linear_4 = torch.ops.aten.linear.default(mul_32, arg8_1); mul_32 = arg8_1 = None
slice_16 = torch.ops.aten.slice.Tensor(linear_4, 2, 0, 3072)
slice_17 = torch.ops.aten.slice.Tensor(linear_4, 2, 3072, 4096)
slice_18 = torch.ops.aten.slice.Tensor(linear_4, 2, 4096, 9223372036854775807); linear_4 = None
view_3 = torch.ops.aten.view.default(slice_16, [sym_size_int_4, sym_size_int_1, -1, 128]); slice_16 = None
transpose_5 = torch.ops.aten.transpose.int(view_3, 1, 2); view_3 = None
view_4 = torch.ops.aten.view.default(slice_17, [sym_size_int_4, sym_size_int_1, -1, 128]); slice_17 = None
transpose_6 = torch.ops.aten.transpose.int(view_4, 1, 2); view_4 = None
view_5 = torch.ops.aten.view.default(slice_18, [sym_size_int_4, sym_size_int_1, -1, 128]); slice_18 = None
transpose_7 = torch.ops.aten.transpose.int(view_5, 1, 2); view_5 = None
unsqueeze_7 = torch.ops.aten.unsqueeze.default(to_8, 1); to_8 = None
unsqueeze_8 = torch.ops.aten.unsqueeze.default(to_9, 1); to_9 = None
slice_19 = torch.ops.aten.slice.Tensor(transpose_5, 3, 0, 96)
slice_20 = torch.ops.aten.slice.Tensor(transpose_5, 3, 96, 9223372036854775807); transpose_5 = None
slice_21 = torch.ops.aten.slice.Tensor(transpose_6, 3, 0, 96)
slice_22 = torch.ops.aten.slice.Tensor(transpose_6, 3, 96, 9223372036854775807); transpose_6 = None
mul_33 = torch.ops.aten.mul.Tensor(slice_19, unsqueeze_7)
slice_23 = torch.ops.aten.slice.Tensor(slice_19, 3, 0, 48)
slice_24 = torch.ops.aten.slice.Tensor(slice_19, 3, 48, 9223372036854775807); slice_19 = None
neg_2 = torch.ops.aten.neg.default(slice_24); slice_24 = None
cat_7 = torch.ops.aten.cat.default([neg_2, slice_23], -1); neg_2 = slice_23 = None
mul_34 = torch.ops.aten.mul.Tensor(cat_7, unsqueeze_8); cat_7 = None
add_11 = torch.ops.aten.add.Tensor(mul_33, mul_34); mul_33 = mul_34 = None
cat_8 = torch.ops.aten.cat.default([add_11, slice_20], -1); add_11 = slice_20 = None
mul_35 = torch.ops.aten.mul.Tensor(slice_21, unsqueeze_7); unsqueeze_7 = None
slice_25 = torch.ops.aten.slice.Tensor(slice_21, 3, 0, 48)
slice_26 = torch.ops.aten.slice.Tensor(slice_21, 3, 48, 9223372036854775807); slice_21 = None
neg_3 = torch.ops.aten.neg.default(slice_26); slice_26 = None
cat_9 = torch.ops.aten.cat.default([neg_3, slice_25], -1); neg_3 = slice_25 = None
mul_36 = torch.ops.aten.mul.Tensor(cat_9, unsqueeze_8); cat_9 = unsqueeze_8 = None
add_12 = torch.ops.aten.add.Tensor(mul_35, mul_36); mul_35 = mul_36 = None
cat_10 = torch.ops.aten.cat.default([add_12, slice_22], -1); add_12 = slice_22 = None
cat_11 = torch.ops.aten.cat.default([arg20_1, cat_10], -2); cat_10 = None
cat_12 = torch.ops.aten.cat.default([arg22_1, transpose_7], -2); transpose_7 = None
sym_size_int_13 = torch.ops.aten.sym_size.int(arg20_1, 0); arg20_1 = None
unsqueeze_9 = torch.ops.aten.unsqueeze.default(cat_11, 2)
sym_size_int_14 = torch.ops.aten.sym_size.int(cat_11, 2)
slice_27 = torch.ops.aten.slice.Tensor(unsqueeze_9, 3, 0, 9223372036854775807); unsqueeze_9 = None
expand_7 = torch.ops.aten.expand.default(slice_27, [sym_size_int_13, 8, 3, sym_size_int_14, 128]); slice_27 = None
reshape_7 = torch.ops.aten.reshape.default(expand_7, [sym_size_int_13, 24, sym_size_int_14, 128]); expand_7 = sym_size_int_13 = None
sym_size_int_15 = torch.ops.aten.sym_size.int(arg22_1, 0); arg22_1 = None
unsqueeze_10 = torch.ops.aten.unsqueeze.default(cat_12, 2)
sym_size_int_16 = torch.ops.aten.sym_size.int(cat_12, 2)
slice_28 = torch.ops.aten.slice.Tensor(unsqueeze_10, 3, 0, 9223372036854775807); unsqueeze_10 = None
expand_8 = torch.ops.aten.expand.default(slice_28, [sym_size_int_15, 8, 3, sym_size_int_16, 128]); slice_28 = None
reshape_8 = torch.ops.aten.reshape.default(expand_8, [sym_size_int_15, 24, sym_size_int_16, 128]); expand_8 = sym_size_int_15 = sym_size_int_16 = None
slice_29 = torch.ops.aten.slice.Tensor(and_4, 3, None, sym_size_int_14); and_4 = sym_size_int_14 = None
scaled_dot_product_attention_1 = torch.ops.aten.scaled_dot_product_attention.default(cat_8, reshape_7, reshape_8, slice_29, scale = 0.08838834764831845); cat_8 = reshape_7 = reshape_8 = slice_29 = None
transpose_8 = torch.ops.aten.transpose.int(scaled_dot_product_attention_1, 1, 2); scaled_dot_product_attention_1 = None
contiguous_1 = torch.ops.aten.contiguous.default(transpose_8); transpose_8 = None
reshape_9 = torch.ops.aten.reshape.default(contiguous_1, [sym_size_int_4, sym_size_int_1, -1]); contiguous_1 = sym_size_int_4 = sym_size_int_1 = None
linear_5 = torch.ops.aten.linear.default(reshape_9, arg7_1); reshape_9 = arg7_1 = None
dropout_2 = torch.ops.aten.dropout.default(linear_5, 0.0, False); linear_5 = None
add_13 = torch.ops.aten.add.Tensor(to_14, dropout_2); to_14 = dropout_2 = None
to_16 = torch.ops.aten.to.dtype(add_13, torch.float32); add_13 = None
pow_5 = torch.ops.aten.pow.Tensor_Scalar(to_16, 2)
mean_3 = torch.ops.aten.mean.dim(pow_5, [-1], True); pow_5 = None
add_14 = torch.ops.aten.add.Tensor(mean_3, 1e-05); mean_3 = None
rsqrt_3 = torch.ops.aten.rsqrt.default(add_14); add_14 = None
mul_59 = torch.ops.aten.mul.Tensor(to_16, rsqrt_3); rsqrt_3 = None
to_17 = torch.ops.aten.to.dtype(mul_59, torch.float32); mul_59 = None
mul_60 = torch.ops.aten.mul.Tensor(arg12_1, to_17); arg12_1 = to_17 = None
linear_6 = torch.ops.aten.linear.default(mul_60, arg9_1); mul_60 = arg9_1 = None
chunk_1 = torch.ops.aten.chunk.default(linear_6, 2, -1); linear_6 = None
getitem_3 = chunk_1[0]
getitem_4 = chunk_1[1]; chunk_1 = None
silu_1 = torch.ops.aten.silu.default(getitem_3); getitem_3 = None
mul_61 = torch.ops.aten.mul.Tensor(getitem_4, silu_1); getitem_4 = silu_1 = None
linear_7 = torch.ops.aten.linear.default(mul_61, arg10_1); mul_61 = arg10_1 = None
dropout_3 = torch.ops.aten.dropout.default(linear_7, 0.0, False); linear_7 = None
add_15 = torch.ops.aten.add.Tensor(to_16, dropout_3); to_16 = dropout_3 = None
to_18 = torch.ops.aten.to.dtype(add_15, torch.float32); add_15 = None
pow_6 = torch.ops.aten.pow.Tensor_Scalar(to_18, 2)
mean_4 = torch.ops.aten.mean.dim(pow_6, [-1], True); pow_6 = None
add_16 = torch.ops.aten.add.Tensor(mean_4, 1e-05); mean_4 = None
rsqrt_4 = torch.ops.aten.rsqrt.default(add_16); add_16 = None
mul_62 = torch.ops.aten.mul.Tensor(to_18, rsqrt_4); to_18 = rsqrt_4 = None
to_19 = torch.ops.aten.to.dtype(mul_62, torch.float32); mul_62 = None
mul_63 = torch.ops.aten.mul.Tensor(arg13_1, to_19); arg13_1 = to_19 = None
slice_30 = torch.ops.aten.slice.Tensor(mul_63, 1, 0, 9223372036854775807); mul_63 = None
linear_8 = torch.ops.aten.linear.default(slice_30, arg14_1); slice_30 = arg14_1 = None
return (linear_8, cat_5, cat_11, cat_6, cat_12)
# To see more debug info, please use `graph_module.print_readable()`
[torch.onnx] Obtain model graph for `Phi3ForCausalLM([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅
~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: batch will not be used, since it shares the same shape constraints with another axis: batch.
warnings.warn(
~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: cache+seq will not be used, since it shares the same shape constraints with another axis: seq_length.
warnings.warn(
~/vv/this312/lib/python3.12/site-packages/torch/onnx/_internal/exporter/_dynamic_shapes.py:264: UserWarning: # The axis name: cache_length will not be used, since it shares the same shape constraints with another axis: cache_length.
warnings.warn(
Applied 43 of general pattern rewrite rules.
[call_torch_export_onnx] done (export)
[call_torch_export_onnx] starts optimization='ir'...
[call_torch_export_onnx] done (optimization)
[torch_export_patches] remove patches
[torch_export_patches] restored sympy functions
[torch_export_patches] restored pytorch functions
[torch_export_patches] restored ShapeEnv._set_replacement
[torch_export_patches] restored ShapeEnv._log_guard
[torch_export_patches] restored shape constraints
[torch_export_patches] unpatches transformers
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_AttentionMaskConverter:
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Gemma2RotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Gemma3RotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_GemmaRotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_GenerationMixin: _cache_dependant_input_preparation, _cache_dependant_input_preparation_exporting, prepare_inputs_for_generation
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_IdeficsAttention: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_IdeficsEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_LlamaRotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_MistralRotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_MixtralRotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Phi3RotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Phi4MultimodalRotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_PhiRotaryEmbedding: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_SamMaskDecoder: forward
[unpatch_module_or_classes] onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_SmolLM3RotaryEmbedding: forward
[unpatch_module_or_classes] function transformers.models.bart.modeling_bart.eager_attention_forward
[unpatch_module_or_classes] function transformers.models.marian.modeling_marian.eager_attention_forward
[unpatch_module_or_classes] function transformers.cache_utils.parse_processor_args
[torch_export_patches] restored transformers.masking_utils._vmap_for_bhqkv
[torch_export_patches] restored transformers.masking_utils.eager_mask
[validate_model] dumps onnx program in 'dump_test/microsoft_Phi-4-mini-reasoning-onnx-dynamo-ir'...
[validate_model] done (dump onnx) in 2.781305350996263
[validate_model] dumps statistics in 'dump_test/microsoft_Phi-4-mini-reasoning-onnx-dynamo-ir'...
[validate_model] done (dump)
[validate_onnx_model] verify onnx model with providers ['CPUExecutionProvider']..., flavour=None
[validate_onnx_model] done (ort_session) flavour=None
[validate_onnx_model] -- make_feeds for 'inputs'...
[validate_onnx_model] inputs=dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#2[T1s2x8x30x128,T1s2x8x30x128], value_cache=#2[T1s2x8x30x128,T1s2x8x30x128]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s2x3,attention_mask:A7s2x33,position_ids:A7s2x3,past_key_values_key_cache_0:A1s2x8x30x128,past_key_values_key_cache_1:A1s2x8x30x128,past_key_values_value_cache_0:A1s2x8x30x128,past_key_values_value_cache_1:A1s2x8x30x128)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#5[A1s2x3x200064,A1s2x8x33x128,A1s2x8x33x128,A1s2x8x33x128,A1s2x8x33x128]
[validate_onnx_model] discrepancies=abs=3.606081008911133e-06, rel=0.0017793676259306645, n=1470720.0
[validate_onnx_model] -- make_feeds for 'inputs2'...
[validate_onnx_model] inputs=dict(input_ids:T7s3x4,attention_mask:T7s3x35,position_ids:T7s3x4,past_key_values:DynamicCache(key_cache=#2[T1s3x8x31x128,T1s3x8x31x128], value_cache=#2[T1s3x8x31x128,T1s3x8x31x128]))
[validate_onnx_model] ort inputs=dict(input_ids:A7s3x4,attention_mask:A7s3x35,position_ids:A7s3x4,past_key_values_key_cache_0:A1s3x8x31x128,past_key_values_key_cache_1:A1s3x8x31x128,past_key_values_value_cache_0:A1s3x8x31x128,past_key_values_value_cache_1:A1s3x8x31x128)
[validate_onnx_model] done (make_feeds)
[validate_onnx_model] run session...
[validate_onnx_model] done (run)
[validate_onnx_model] got=#5[A1s3x4x200064,A1s3x8x35x128,A1s3x8x35x128,A1s3x8x35x128,A1s3x8x35x128]
[validate_onnx_model] discrepancies=abs=3.2782554626464844e-06, rel=0.00204118731285426, n=2830848.0
[validate_model] -- done (final)
done.
Let’s load and save the model to get one unique file.
Loads the model and saves it as one unique file.
Let’s get the size.
model size 989.852 Mb
Measures the loading time¶
def measure(f, N=3):
times = []
for _ in range(N):
begin = time.perf_counter()
onx = f()
end = time.perf_counter()
times.append(end - begin)
return onx, {"avg": np.mean(times), "times": times}
Let’s do it with onnx2.
Load time with onnx2.
{'avg': np.float64(1.9109536773321452), 'times': [1.866385472996626, 2.2022045769990655, 1.664270982000744]}
Then with onnx.
Load time with onnx.
{'avg': np.float64(1.9776364249992184), 'times': [2.3620774800001527, 2.3361349849947146, 1.234696810002788]}
Measure the saving time¶
Let’s do it with onnx2.
Save time with onnx2.
{'avg': np.float64(4.362325190665918), 'times': [3.1312978290006868, 4.771354374999646, 5.184323367997422]}
Then with onnx.
Save time with onnx.
{'avg': np.float64(3.8619260056633116), 'times': [4.143729067996901, 3.8224597579974215, 3.6195891909956117]}
Total running time of the script: (1 minutes 40.482 seconds)