experimental_experiment.torch_models.llm_model_setup

class experimental_experiment.torch_models.llm_model_setup.LLMInputKind(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Defines the dummy inputs which can be generated for a LLM vision model.

Example:

K = LLMInputKind.

K.input_ids
K.input_ids | K.position_ids | K.attention_mask
K.input_ids | K.position_ids | K.attention_mask | K.images | K.past_key_values

Remarks, for Phi 3.5:

  • images means two new inputs pixel_value Ix5x3x336x336 where I is the number of images and image_size Ix2 which contains the image sizes

  • min(LLMInputKind.input_ids) = -I where I is still the number of images.

  • the number of caches is equal to the number of hidden kayers

What does batch size means? Multiple prompts? The image embedding does not seem to support that.

experimental_experiment.torch_models.llm_model_setup.finalize_llm_setup(model: Any, batch_size: int, max_token_id: int = 50285, cache_last_dim: int = 80, common_dynamic_shapes: bool = True, inputs_as_tuple: bool = False, num_hidden_layers: int = 2, num_attention_heads: int = 32, input_cache: bool = True, device: str = 'cpu', seq_length_multiple: int = 1, input_cache_class: type | None = None) Dict[str, Any][source]

Creates dummy inputs for a model ran as if it were the second iteration. Inputs contains cache.

experimental_experiment.torch_models.llm_model_setup.finalize_llm_vision_setup(model: Any, input_kind: LLMInputKind, batch_size: int, max_token_id: int = 50285, cache_last_dim: int = 80, common_dynamic_shapes: bool = True, inputs_as_tuple: bool = False, num_hidden_layers: int = 2, device: str = 'cpu') Dict[str, Any][source]

Creates dummy inputs for a model ran as if it were the second iteration. Inputs contains cache.

experimental_experiment.torch_models.llm_model_setup.get_input_cache(num_hidden_layers: int, batch_size: int, num_attention_heads: int, sequence_length: int, cache_last_dim: int, device: str, input_cache_class: type | None = None)[source]

Creates a random cache.