onnx_diagnostic.torch_export_patches.patches.patch_transformers¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.common_RotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- forward(x, position_ids, layer_type=None)[source][source]¶
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.common_eager_attention_forward(module: Module, query: Tensor, key: Tensor, value: Tensor, attention_mask: Tensor | None, scaling: float | None = None, dropout: float = 0.0, head_mask: Tensor | None = None, **kwargs)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_AttentionMaskConverter[source][source]¶
- Patches - transformers.modeling_attn_mask_utils.AttentionMaskConverter._make_causal_mask.
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_DynamicLayer[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Gemma2RotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Gemma3Model(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Gemma3RotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_GemmaRotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_GenerationMixin[source][source]¶
- Applies modifications implemented in PR transformers/#36652. - prepare_inputs_for_generation(input_ids: LongTensor, past_key_values: Cache | None = None, attention_mask: LongTensor | None = None, inputs_embeds: FloatTensor | None = None, cache_position: LongTensor | None = None, **kwargs)[source][source]¶
- Prepare the model inputs for generation. In includes operations like computing the 4D attention mask or slicing inputs given the existing cache. - See the forward pass in the model documentation for expected arguments (different models might have different requirements for e.g. past_key_values). This function should work as is for most LLMs. 
 
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_IdeficsAttention(*args: Any, **kwargs: Any)[source][source]¶
- forward(hidden_states: Tensor, key_value_states: Tensor | None = None, attention_mask: Tensor | None = None, position_ids: LongTensor | None = None, past_key_value: Tuple[Tensor] | None = None, output_attentions: bool = False, use_cache: bool = False, cache_position: LongTensor | None = None, **kwargs) Tuple[Tensor, Tensor | None, Tuple[Tensor] | None][source][source]¶
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_IdeficsEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- forward(x, seq_len=None)[source][source]¶
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_LlamaRotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_MistralRotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_MixtralRotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Phi3RotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Phi4MultimodalRotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_PhiRotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_Qwen3MoeSparseMoeBlock(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_SamMaskDecoder(*args: Any, **kwargs: Any)[source][source]¶
- forward(image_embeddings: Tensor, image_positional_embeddings: Tensor, sparse_prompt_embeddings: Tensor, dense_prompt_embeddings: Tensor, multimask_output: bool, output_attentions: bool | None = None, attention_similarity: Tensor | None = None, target_embedding: Tensor | None = None) tuple[Tensor, Tensor][source][source]¶
- Predict masks given image and prompt embeddings. - Args:
- image_embeddings (torch.Tensor):
- the embeddings from the image encoder 
- image_positional_embedding (torch.Tensor):
- positional encoding with the shape of image_embeddings 
- sparse_prompt_embeddings (torch.Tensor):
- The embeddings of the points and boxes 
- dense_prompt_embeddings (torch.Tensor):
- the embeddings of the mask inputs 
- multimask_output (bool):
- Whether to return multiple masks or a single mask. 
- output_attentions (bool, optional):
- Whether or not to return the attentions tensors of all attention layers. 
 
 
 
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_SmolLM3RotaryEmbedding(*args: Any, **kwargs: Any)[source][source]¶
- class onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_VisionAttention(*args: Any, **kwargs: Any)[source][source]¶
- forward(hidden_states: Tensor, cu_seqlens: Tensor, rotary_pos_emb: Tensor | None = None, position_embeddings: Tuple[Tensor, Tensor] | None = None) Tensor[source][source]¶
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched__compute_dynamic_ntk_parameters(config: PreTrainedConfig | None = None, device: device | None = None, seq_len: int | None = None, **rope_kwargs) Tuple[Tensor, float][source][source]¶
- manual patch: - [patch:transformers.modeling_rope_utils._compute_dynamic_ntk_parameters]- Computes the inverse frequencies with NTK scaling. Credits to the Reddit users /u/bloc97 and /u/emozilla - Args:
- config ([~transformers.PretrainedConfig]):
- The model configuration. 
- device (torch.device):
- The device to use for initialization of the inverse frequencies. 
- seq_len (int, optional):
- The current sequence length, used to update the dynamic RoPE at inference time. 
- rope_kwargs (Dict, optional):
- BC compatibility with the previous RoPE class instantiation, will be removed in v4.45. 
 
- Returns:
- Tuple of (torch.Tensor, float), containing the inverse frequencies for the RoPE embeddings and the post-processing scaling factor applied to the omputed cos/sin (unused in this type of RoPE). 
 
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched__vmap_for_bhqkv(mask_function: Callable, bh_indices: bool = True) Callable[source][source]¶
- manual patch for function - transformers.masking_utils._vmap_for_bhqkv.
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_dynamic_rope_update(rope_forward)[source][source]¶
- manual patch: - [patch:transformers.modeling_rope_utils.dynamic_rope_update]- rope_typeis determined in the constructor of class- transformers.models.phi3.modeling_phi3.Phi3RotaryEmbedding.- if hasattr(config, "rope_scaling") and config.rope_scaling is not None: self.rope_type = config.rope_scaling.get( "rope_type", config.rope_scaling.get("type")) else: self.rope_type = "default" - The original code of the patched function: - def dynamic_rope_update(rope_forward): def longrope_frequency_update(self, position_ids, device): seq_len = torch.max(position_ids) + 1 if hasattr(self.config, "original_max_position_embeddings"): original_max_position_embeddings = self.config.original_max_position_embeddings else: original_max_position_embeddings = self.config.max_position_embeddings if seq_len > original_max_position_embeddings: if not hasattr(self, "long_inv_freq"): self.long_inv_freq, _ = self.rope_init_fn( self.config, device, seq_len=original_max_position_embeddings + 1 ) self.register_buffer("inv_freq", self.long_inv_freq, persistent=False) else: self.original_inv_freq = self.original_inv_freq.to(device) self.register_buffer("inv_freq", self.original_inv_freq, persistent=False) def dynamic_frequency_update(self, position_ids, device): seq_len = torch.max(position_ids) + 1 if seq_len > self.max_seq_len_cached: # growth inv_freq, self.attention_scaling = self.rope_init_fn( self.config, device, seq_len=seq_len) self.register_buffer("inv_freq", inv_freq, persistent=False) self.max_seq_len_cached = seq_len if seq_len < self.original_max_seq_len and self.max_seq_len_cached > self.original_max_seq_len: self.original_inv_freq = self.original_inv_freq.to(device) self.register_buffer("inv_freq", self.original_inv_freq, persistent=False) self.max_seq_len_cached = self.original_max_seq_len @wraps(rope_forward) def wrapper(self, x, position_ids): if "dynamic" in self.rope_type: dynamic_frequency_update(self, position_ids, device=x.device) elif self.rope_type == "longrope": longrope_frequency_update(self, position_ids, device=x.device) return rope_forward(self, x, position_ids) return wrapper 
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_eager_mask(batch_size: int, cache_position: ~torch.Tensor, kv_length: int, kv_offset: int = 0, mask_function: ~typing.Callable = <function causal_mask_function>, attention_mask: ~torch.Tensor | None = None, dtype: ~torch.dtype = torch.float32, **kwargs) Tensor[source][source]¶
- manual patch for function - transformers.masking_utils.eager_mask.
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_model_bart_eager_attention_forward(module: Module, query: Tensor, key: Tensor, value: Tensor, attention_mask: Tensor | None, scaling: float | None = None, dropout: float = 0.0, head_mask: Tensor | None = None, **kwargs)[source][source]¶
- [patch:transformers.models.bart.modeling_bart.eager_attention_forward] 
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_modeling_marian_eager_attention_forward(module: Module, query: Tensor, key: Tensor, value: Tensor, attention_mask: Tensor | None, scaling: float | None = None, dropout: float = 0.0, head_mask: Tensor | None = None, **kwargs)[source][source]¶
- [patch:transformers.models.marian.modeling_marian.eager_attention_forward] 
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_sdpa_attention_forward(module: Module, query: Tensor, key: Tensor, value: Tensor, attention_mask: Tensor | None, dropout: float = 0.0, scaling: float | None = None, is_causal: bool | None = None, **kwargs) tuple[Tensor, None][source][source]¶
- [patch:transformers.integrations.sdpa_attention.sdpa_attention_forward] 
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.patched_sdpa_mask_recent_torch(batch_size: int, cache_position: ~torch.Tensor, kv_length: int, kv_offset: int = 0, mask_function: ~typing.Callable = <function causal_mask_function>, attention_mask: ~torch.Tensor | None = None, local_size: int | None = None, allow_is_causal_skip: bool = True, **kwargs) Tensor | None[source][source]¶
- manual patch for function - transformers.masking_utils.sdpa_mask_recent_torch.
- onnx_diagnostic.torch_export_patches.patches.patch_transformers.rewrite_loop_for_square_mask(mask: Tensor, seq: Tensor)[source][source]¶
- Rewrites the loop in: - attention_mask = torch.full( [1, seq_length, seq_length], torch.finfo(q.dtype).min, dtype=q.dtype ) for i in range(1, len(seq)): attention_mask[..., seq[i - 1] : seq[i], seq[i - 1] : seq[i]] = 0