torch_models¶
llama_helper¶
Creates small llama models to check the conversion is working as well as some benchmark.
get_llama_attention¶
- experimental_experiment.torch_models.llama_helper.get_llama_attention(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size=16, num_hidden_layers=1, vocab_size=1024, intermediate_size=16, max_position_embeddings=1024, num_attention_heads=2, _attn_implementation='eager')[source]¶
Returns the attention part. See
experimental_experiment.torch_models.llama_helper.get_llama_model()
.
get_llama_decoder¶
- experimental_experiment.torch_models.llama_helper.get_llama_decoder(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size=16, num_hidden_layers=1, vocab_size=1024, intermediate_size=16, max_position_embeddings=1024, num_attention_heads=2, _attn_implementation='eager')[source]¶
Returns the decoder part. See
experimental_experiment.torch_models.llama_helper.get_llama_model()
.
get_llama_model¶
- experimental_experiment.torch_models.llama_helper.get_llama_model(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size: int = 16, num_hidden_layers: int = 1, vocab_size: int = 1024, intermediate_size: int = 16, max_position_embeddings: int = 1024, num_attention_heads: int = 2, _attn_implementation: str = 'eager', with_mask: bool = True)[source]¶
Returns a model. See LlamaConfig. The parameters are chosen for a unit test configuration.
mistral_helper¶
get_mistral_model¶
- experimental_experiment.torch_models.mistral_helper.get_mistral_model(input_dims: Sequence[Tuple[int, int]] = ((13, 7), (14, 7), (15, 8)), hidden_size=32, num_hidden_layers=2, vocab_size=99, intermediate_size=16, max_position_embeddings=512, num_attention_heads=2, num_key_value_heads=2, sliding_window=4096, _attn_implementation='eager', with_mask: bool = True)[source]¶
Returns a model. See MistralConfig. The parameters are chosen for a unit test configuration.
phi_helper¶
get_phi_model¶
- experimental_experiment.torch_models.phi_helper.get_phi_model(input_dims: Sequence[Tuple[int, int]] = ((13, 7), (14, 7), (15, 8)), hidden_size=32, num_hidden_layers=2, vocab_size=99, intermediate_size=16, max_position_embeddings=512, num_attention_heads=4, num_key_value_heads=2, _attn_implementation='eager', with_mask: bool = True)[source]¶
Returns a model. See PhiConfig. The parameters are chosen for a unit test configuration from test_modeling_phi.py.
dump_helper¶
assert_all_close¶
- experimental_experiment.torch_models.dump_helper.assert_all_close(v1: Any, v2: Any, atol: float | Tuple[float, float] = 1e-05, rtol: float = 1e-05, msg: str = '')[source]¶
Checks that the expected outputs and new outputs are the same.
- Parameters:
v1 – tensor or tuple of tensors
v2 – tensor or tuple of tensors
atol – absolute error or (absolute error, quantile), if quantile is specified, the function checks the error is < atol for quantile %
rtol – relative error
msg – more complex message
See 301: Compares LLAMA exporters for onnxrt backend for an example.
build_matching_inputs¶
- experimental_experiment.torch_models.dump_helper.build_matching_inputs(model1: str | ModelProto, feeds: Dict[str, Any], model2: str | ModelProto) Dict[str, Any] [source]¶
Builds a list of inputs for a model based on the inputs made for another. We assume they both needs the same inputs.
- Parameters:
model1 – first model
feeds – inputs for the first model
model2 – second model, the one we need the inputs for
- Returns:
new inputs
See 301: Compares LLAMA exporters for onnxrt backend for an example.
dump_onnx¶
- experimental_experiment.torch_models.dump_helper.dump_onnx(prefix: str, folder: str | None = None, clean: bool = False)[source]¶
context enabling the dump of models generated by onnxrt backend.
- Parameters:
prefix – prefix for all files
folder – sub folder (created if it does not exist)
clean – if True, cleans the folder
See 301: Compares LLAMA exporters for onnxrt backend for an example.
inputs_from_onnx_model¶
- experimental_experiment.torch_models.dump_helper.inputs_from_onnx_model(model: str | ModelProto, init: bool = False) List[Tuple[str, int, Tuple[int, ...]]] [source]¶
Returns the inputs for a model.
- Parameters:
model – model or filename
init – include the initializer as well
- Returns:
list of inputs and initializers
See 301: Compares LLAMA exporters for onnxrt backend for an example.
reorder_functions_in_proto¶
- experimental_experiment.torch_models.dump_helper.reorder_functions_in_proto(proto: str | ModelProto) str | ModelProto [source]¶
The reference implementation expects function to be defined. So rank function has to be placed in the first position
- Parameters:
proto – a model
- Returns:
modified model inplace
See 301: Compares LLAMA exporters for onnxrt backend for an example.