onnx_diagnostic.torch_models.hghub

onnx_diagnostic.torch_models.hghub.get_untrained_model_with_inputs(model_id: str, config: Any | None = None, task: str | None = '', inputs_kwargs: Dict[str, Any] | None = None, model_kwargs: Dict[str, Any] | None = None, verbose: int = 0, dynamic_rope: bool | None = None, same_as_pretrained: bool = False) Dict[str, Any][source]

Gets a non initialized model similar to the original model based on the model id given to the function. The model size is reduced compare to the original model. No weight is downloaded, only the configuration file sometimes.

Parameters:
  • model_id – model id, ex: arnir0/Tiny-LLM

  • config – to overwrite the configuration

  • task – model task, can be overwritten, otherwise, it is automatically determined

  • input_kwargs – parameters sent to input generation

  • model_kwargs – to change the model generation

  • verbose – display found information

  • dynamic_rope – use dynamic rope (see transformers.LlamaConfig)

  • same_as_pretrained – if True, do not change the default values to get a smaller model

Returns:

dictionary with a model, inputs, dynamic shapes, and the configuration

Example:

<<<

import pprint
from onnx_diagnostic.helpers import string_type
from onnx_diagnostic.torch_models.hghub import get_untrained_model_with_inputs

data = get_untrained_model_with_inputs("arnir0/Tiny-LLM", verbose=1)

print("-- model size:", data["size"])
print("-- number of parameters:", data["n_weights"])
print("-- inputs:", string_type(data["inputs"], with_shape=True))
print("-- dynamic shapes:", pprint.pformat(data["dynamic_shapes"]))
print("-- configuration:", pprint.pformat(data["configuration"]))

>>>

    [get_untrained_model_with_inputs] model_id='arnir0/Tiny-LLM'
    [get_untrained_model_with_inputs] architecture='LlamaForCausalLM'
    [get_untrained_model_with_inputs] cls='LlamaConfig'
    [get_untrained_model_with_inputs] task='text-generation'
    -- model size: 51955968
    -- number of parameters: 12988992
    -- inputs: dict(input_ids:T7s2x3,attention_mask:T7s2x33,position_ids:T7s2x3,past_key_values:DynamicCache(key_cache=#1[T1s2x1x30x96], value_cache=#1[T1s2x1x30x96]))
    -- dynamic shapes: {'attention_mask': {0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>,
                        1: _DimHint(type=<_DimHintType.DYNAMIC: 3>)},
     'input_ids': {0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>,
                   1: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.seq_length'>},
     'past_key_values': [[{0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>,
                           2: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.cache_length'>}],
                         [{0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>,
                           2: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.cache_length'>}]],
     'position_ids': {0: <class 'onnx_diagnostic.torch_models.hghub.model_inputs.batch'>,
                      1: _DimHint(type=<_DimHintType.DYNAMIC: 3>)}}
    -- configuration: LlamaConfig {
      "_attn_implementation_autoset": true,
      "architectures": [
        "LlamaForCausalLM"
      ],
      "attention_bias": false,
      "attention_dropout": 0.0,
      "bos_token_id": 1,
      "eos_token_id": 2,
      "head_dim": 96,
      "hidden_act": "silu",
      "hidden_size": 192,
      "initializer_range": 0.02,
      "intermediate_size": 1024,
      "max_position_embeddings": 1024,
      "mlp_bias": false,
      "model_type": "llama",
      "num_attention_heads": 2,
      "num_hidden_layers": 1,
      "num_key_value_heads": 1,
      "pretraining_tp": 1,
      "rms_norm_eps": 1e-05,
      "rope_scaling": null,
      "rope_theta": 10000.0,
      "tie_word_embeddings": false,
      "torch_dtype": "float32",
      "transformers_version": "4.51.0.dev0",
      "use_cache": true,
      "vocab_size": 32000
    }