torch_models¶

llama_helper¶

Creates small llama models to check the conversion is working as well as some benchmark.

get_llama_attention¶

experimental_experiment.torch_models.llama_helper.get_llama_attention(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size=16, num_hidden_layers=1, vocab_size=1024, intermediate_size=16, max_position_embeddings=1024, num_attention_heads=2, _attn_implementation='eager')[source]¶: Returns the attention part. See experimental_experiment.torch_models.llama_helper.get_llama_model().

get_llama_decoder¶

experimental_experiment.torch_models.llama_helper.get_llama_decoder(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size=16, num_hidden_layers=1, vocab_size=1024, intermediate_size=16, max_position_embeddings=1024, num_attention_heads=2, _attn_implementation='eager')[source]¶: Returns the decoder part. See experimental_experiment.torch_models.llama_helper.get_llama_model().

get_llama_model¶

experimental_experiment.torch_models.llama_helper.get_llama_model(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size: int = 16, num_hidden_layers: int = 1, vocab_size: int = 1024, intermediate_size: int = 16, max_position_embeddings: int = 1024, num_attention_heads: int = 2, _attn_implementation: str = 'eager', with_mask: bool = True)[source]¶: Returns a model. See LlamaConfig. The parameters are chosen for a unit test configuration.

mistral_helper¶

get_mistral_model¶

experimental_experiment.torch_models.mistral_helper.get_mistral_model(input_dims: Sequence[Tuple[int, int]] = ((13, 7), (14, 7), (15, 8)), hidden_size=32, num_hidden_layers=2, vocab_size=99, intermediate_size=16, max_position_embeddings=512, num_attention_heads=2, num_key_value_heads=2, sliding_window=4096, _attn_implementation='eager', with_mask: bool = True)[source]¶: Returns a model. See MistralConfig. The parameters are chosen for a unit test configuration.

phi_helper¶

get_phi_model¶

experimental_experiment.torch_models.phi_helper.get_phi_model(input_dims: Sequence[Tuple[int, int]] = ((13, 7), (14, 7), (15, 8)), hidden_size=32, num_hidden_layers=2, vocab_size=99, intermediate_size=16, max_position_embeddings=512, num_attention_heads=4, num_key_value_heads=2, _attn_implementation='eager', with_mask: bool = True)[source]¶: Returns a model. See PhiConfig. The parameters are chosen for a unit test configuration from test_modeling_phi.py.

dump_helper¶

assert_all_close¶

experimental_experiment.torch_models.dump_helper.assert_all_close(v1: Any, v2: Any, atol: float | Tuple[float, float] = 1e-05, rtol: float = 1e-05, msg: str = '')[source]¶

Checks that the expected outputs and new outputs are the same.

Parameters:

v1 – tensor or tuple of tensors
v2 – tensor or tuple of tensors
atol – absolute error or (absolute error, quantile), if quantile is specified, the function checks the error is < atol for quantile %
rtol – relative error
msg – more complex message

See 301: Compares LLAMA exporters for onnxrt backend for an example.

build_matching_inputs¶

experimental_experiment.torch_models.dump_helper.build_matching_inputs(model1: str | ModelProto, feeds: Dict[str, Any], model2: str | ModelProto) → Dict[str, Any][source]¶

Builds a list of inputs for a model based on the inputs made for another. We assume they both needs the same inputs.

Parameters:

model1 – first model
feeds – inputs for the first model
model2 – second model, the one we need the inputs for

Returns:

new inputs

See 301: Compares LLAMA exporters for onnxrt backend for an example.

dump_onnx¶

experimental_experiment.torch_models.dump_helper.dump_onnx(prefix: str, folder: str | None = None, clean: bool = False)[source]¶

context enabling the dump of models generated by onnxrt backend.

Parameters:

prefix – prefix for all files
folder – sub folder (created if it does not exist)
clean – if True, cleans the folder

See 301: Compares LLAMA exporters for onnxrt backend for an example.

inputs_from_onnx_model¶

experimental_experiment.torch_models.dump_helper.inputs_from_onnx_model(model: str | ModelProto, init: bool = False) → List[Tuple[str, int, Tuple[int, ...]]][source]¶

Returns the inputs for a model.

Parameters:

model – model or filename
init – include the initializer as well

Returns:

list of inputs and initializers

See 301: Compares LLAMA exporters for onnxrt backend for an example.

reorder_functions_in_proto¶

experimental_experiment.torch_models.dump_helper.reorder_functions_in_proto(proto: str | ModelProto) → str | ModelProto[source]¶

The reference implementation expects function to be defined. So rank function has to be placed in the first position

Parameters:: proto – a model
Returns:: modified model inplace

See 301: Compares LLAMA exporters for onnxrt backend for an example.