torch_models

llama_helper

Creates small llama models to check the conversion is working as well as some benchmark.

get_llama_attention

experimental_experiment.torch_models.llama_helper.get_llama_attention(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size=16, num_hidden_layers=1, vocab_size=1024, intermediate_size=16, max_position_embeddings=1024, num_attention_heads=2, _attn_implementation='eager')[source]

Returns the attention part. See experimental_experiment.torch_models.llama_helper.get_llama_model().

get_llama_decoder

experimental_experiment.torch_models.llama_helper.get_llama_decoder(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size=16, num_hidden_layers=1, vocab_size=1024, intermediate_size=16, max_position_embeddings=1024, num_attention_heads=2, _attn_implementation='eager')[source]

Returns the decoder part. See experimental_experiment.torch_models.llama_helper.get_llama_model().

get_llama_model

experimental_experiment.torch_models.llama_helper.get_llama_model(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size: int = 16, num_hidden_layers: int = 1, vocab_size: int = 1024, intermediate_size: int = 16, max_position_embeddings: int = 1024, num_attention_heads: int = 2, _attn_implementation: str = 'eager', with_mask: bool = True)[source]

Returns a model. See LlamaConfig. The parameters are chosen for a unit test configuration.

mistral_helper

get_mistral_model

experimental_experiment.torch_models.mistral_helper.get_mistral_model(input_dims: Sequence[Tuple[int, int]] = ((13, 7), (14, 7), (15, 8)), hidden_size=32, num_hidden_layers=2, vocab_size=99, intermediate_size=16, max_position_embeddings=512, num_attention_heads=2, num_key_value_heads=2, sliding_window=4096, _attn_implementation='eager', with_mask: bool = True)[source]

Returns a model. See MistralConfig. The parameters are chosen for a unit test configuration.

phi_helper

get_phi_model

experimental_experiment.torch_models.phi_helper.get_phi_model(input_dims: Sequence[Tuple[int, int]] = ((13, 7), (14, 7), (15, 8)), hidden_size=32, num_hidden_layers=2, vocab_size=99, intermediate_size=16, max_position_embeddings=512, num_attention_heads=4, num_key_value_heads=2, _attn_implementation='eager', with_mask: bool = True)[source]

Returns a model. See PhiConfig. The parameters are chosen for a unit test configuration from test_modeling_phi.py.

dump_helper

assert_all_close

experimental_experiment.torch_models.dump_helper.assert_all_close(v1: Any, v2: Any, atol: float | Tuple[float, float] = 1e-05, rtol: float = 1e-05, msg: str = '')[source]

Checks that the expected outputs and new outputs are the same.

Parameters:
  • v1 – tensor or tuple of tensors

  • v2 – tensor or tuple of tensors

  • atol – absolute error or (absolute error, quantile), if quantile is specified, the function checks the error is < atol for quantile %

  • rtol – relative error

  • msg – more complex message

See 301: Compares LLAMA exporters for onnxrt backend for an example.

build_matching_inputs

experimental_experiment.torch_models.dump_helper.build_matching_inputs(model1: str | ModelProto, feeds: Dict[str, Any], model2: str | ModelProto) Dict[str, Any][source]

Builds a list of inputs for a model based on the inputs made for another. We assume they both needs the same inputs.

Parameters:
  • model1 – first model

  • feeds – inputs for the first model

  • model2 – second model, the one we need the inputs for

Returns:

new inputs

See 301: Compares LLAMA exporters for onnxrt backend for an example.

dump_onnx

experimental_experiment.torch_models.dump_helper.dump_onnx(prefix: str, folder: str | None = None, clean: bool = False)[source]

context enabling the dump of models generated by onnxrt backend.

Parameters:
  • prefix – prefix for all files

  • folder – sub folder (created if it does not exist)

  • clean – if True, cleans the folder

See 301: Compares LLAMA exporters for onnxrt backend for an example.

inputs_from_onnx_model

experimental_experiment.torch_models.dump_helper.inputs_from_onnx_model(model: str | ModelProto, init: bool = False) List[Tuple[str, int, Tuple[int, ...]]][source]

Returns the inputs for a model.

Parameters:
  • model – model or filename

  • init – include the initializer as well

Returns:

list of inputs and initializers

See 301: Compares LLAMA exporters for onnxrt backend for an example.

reorder_functions_in_proto

experimental_experiment.torch_models.dump_helper.reorder_functions_in_proto(proto: str | ModelProto) str | ModelProto[source]

The reference implementation expects function to be defined. So rank function has to be placed in the first position

Parameters:

proto – a model

Returns:

modified model inplace

See 301: Compares LLAMA exporters for onnxrt backend for an example.