.torch_models.llm_model_setup¶

class experimental_experiment.torch_models.llm_model_setup.LLMInputKind(*values)[source]¶

Defines the dummy inputs which can be generated for a LLM vision model.

Example:

K = LLMInputKind.

K.input_ids
K.input_ids | K.position_ids | K.attention_mask
K.input_ids | K.position_ids | K.attention_mask | K.images | K.past_key_values

Remarks, for Phi 3.5:

images means two new inputs pixel_value Ix5x3x336x336 where I is the number of images and image_size Ix2 which contains the image sizes
min(LLMInputKind.input_ids) = -I where I is still the number of images.
the number of caches is equal to the number of hidden kayers

What does batch size means? Multiple prompts? The image embedding does not seem to support that.

experimental_experiment.torch_models.llm_model_setup.finalize_llm_setup(model: Any, batch_size: int, max_token_id: int = 50285, cache_last_dim: int = 80, common_dynamic_shapes: bool = True, inputs_as_tuple: bool = False, num_hidden_layers: int = 2, num_key_value_heads: int = 32, input_cache: bool = True, device: str = 'cpu', seq_length_multiple: int = 1, input_cache_class: type | None = None) → Dict[str, Any][source]¶: Creates dummy inputs for a model ran as if it were the second iteration. Inputs contains cache.

experimental_experiment.torch_models.llm_model_setup.finalize_llm_vision_setup(model: Any, input_kind: LLMInputKind, batch_size: int, max_token_id: int = 50285, cache_last_dim: int = 80, common_dynamic_shapes: bool = True, inputs_as_tuple: bool = False, num_hidden_layers: int = 2, device: str = 'cpu') → Dict[str, Any][source]¶: Creates dummy inputs for a model ran as if it were the second iteration. Inputs contains cache.

experimental_experiment.torch_models.llm_model_setup.get_input_cache(num_hidden_layers: int, batch_size: int, num_key_value_heads: int, sequence_length: int, cache_last_dim: int, device: str, input_cache_class: type | None = None)[source]¶: Creates a random cache.