onnx_diagnostic.torch_models.hghub¶
submodules
- onnx_diagnostic.torch_models.hghub.get_untrained_model_with_inputs(model_id: str, config: Any | None = None, task: str | None = '', inputs_kwargs: Dict[str, Any] | None = None, model_kwargs: Dict[str, Any] | None = None, verbose: int = 0, dynamic_rope: bool | None = None, same_as_pretrained: bool = False) Dict[str, Any] [source]¶
Gets a non initialized model similar to the original model based on the model id given to the function. The model size is reduced compare to the original model. No weight is downloaded, only the configuration file sometimes.
- Parameters:
model_id – model id, ex: arnir0/Tiny-LLM
config – to overwrite the configuration
task – model task, can be overwritten, otherwise, it is automatically determined
input_kwargs – parameters sent to input generation
model_kwargs – to change the model generation
verbose – display found information
dynamic_rope – use dynamic rope (see
transformers.LlamaConfig
)same_as_pretrained – if True, do not change the default values to get a smaller model
- Returns:
dictionary with a model, inputs, dynamic shapes, and the configuration
Example:
<<<
import pprint from onnx_diagnostic.helpers import string_type from onnx_diagnostic.torch_models.hghub import get_untrained_model_with_inputs data = get_untrained_model_with_inputs("arnir0/Tiny-LLM", verbose=1) print("-- model size:", data["size"]) print("-- number of parameters:", data["n_weights"]) print("-- inputs:", string_type(data["inputs"], with_shape=True)) print("-- dynamic shapes:", pprint.pformat(data["dynamic_shapes"])) print("-- configuration:", pprint.pformat(data["configuration"]))
>>>
[get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM' [get_untrained_model_with_inputs] architecture='LlamaForCausalLM' [get_untrained_model_with_inputs] cls='LlamaConfig' [get_untrained_model_with_inputs] task='text-generation' -- model size: 51955968 -- number of parameters: 12988992 -- inputs: dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96])) -- dynamic shapes: {'attention_mask': {0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>, 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>)}, 'input_ids': {0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>, 1: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.seq_length'>}, 'past_key_values': [[{0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>, 2: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.cache_length'>}], [{0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>, 2: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.cache_length'>}]], 'position_ids': {0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>, 1: _DimHint(type=<_DimHintType.DYNAMIC: 3>)}} -- configuration: LlamaConfig { "_attn_implementation_autoset": true, "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "head_dim": 96, "hidden_act": "silu", "hidden_size": 192, "initializer_range": 0.02, "intermediate_size": 1024, "max_position_embeddings": 1024, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 2, "num_hidden_layers": 1, "num_key_value_heads": 1, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.51.0.dev0", "use_cache": true, "vocab_size": 32000 }