teachcompute.torch_models

It mostly relies on transformers.

llama

teachcompute.torch_models.llama_helper.get_llama_model(input_dims: Sequence[Tuple[int, int]] = ((2, 8), (4, 7), (9, 15)), hidden_size: int = 16, num_hidden_layers: int = 1, vocab_size: int = 1024, intermediate_size: int = 16, max_position_embeddings: int = 1024, num_attention_heads: int = 2, _attn_implementation: str = 'eager', with_mask: bool = True)[source][source]

Returns a model. See LlamaConfig. The parameters are chosen for a unit test configuration.

mistral

teachcompute.torch_models.mistral_helper.get_mistral_model(input_dims: Sequence[Tuple[int, int]] = ((13, 7), (14, 7), (15, 8)), hidden_size=32, num_hidden_layers=2, vocab_size=99, intermediate_size=16, max_position_embeddings=512, num_attention_heads=2, num_key_value_heads=2, sliding_window=4096, _attn_implementation='eager', with_mask: bool = True)[source][source]

Returns a model. See MistralConfig. The parameters are chosen for a unit test configuration.

phi

teachcompute.torch_models.phi_helper.get_phi_model(input_dims: Sequence[Tuple[int, int]] = ((13, 7), (14, 7), (15, 8)), hidden_size=32, num_hidden_layers=2, vocab_size=99, intermediate_size=16, max_position_embeddings=512, num_attention_heads=4, num_key_value_heads=2, _attn_implementation='eager', with_mask: bool = True)[source][source]

Returns a model. See PhiConfig. The parameters are chosen for a unit test configuration from test_modeling_phi.py.