onnx_diagnostic.tasks.summarization¶

onnx_diagnostic.tasks.summarization.get_inputs(model: Module, config: Any | None, dummy_max_token_id: int, num_key_value_heads_encoder: int, num_key_value_heads_decoder: int, num_hidden_layers: int, head_dim_encoder: int, head_dim_decoder: int, batch_size: int = 2, sequence_length: int = 30, sequence_length2: int = 3, add_second_input: int = 1, **kwargs)[source][source]¶

Generates input for task summarization.

Parameters:

model – model to get the missing information
config – configuration used to generate the model
head_dim_encoder – last dimension of the cache for the encoder
head_dim_decoder – last dimension of the cache for the decoder
num_key_value_heads_encoder – number of heads for the encoder
num_key_value_heads_decoder – number of heads for the decoder
dummy_max_token_id – dummy max token id
batch_size – batch size
sequence_length – sequence length
sequence_length2 – new sequence length

Returns:

dictionary

Stolen inputs for one model.

cache_position:T7s1
past_key_values:EncoderDecoderCache(
    self_attention_cache=DynamicCache(
        key_cache=#6[T1s1x8x1x64,...],
        value_cache=#6[T1s1x8x1x64,...]),
    cross_attention_cache=DynamicCache(
        key_cache=#6[T1s1x8x16x64,...],
        value_cache=#6[T1s1x8x16x64,...])),
decoder_input_ids:T7s1x1,
encoder_outputs:dict(last_hidden_state:T1s1x16x512)

onnx_diagnostic.tasks.summarization.random_input_kwargs(config: Any) → Tuple[Dict[str, Any], Callable][source][source]¶

Inputs kwargs.

If the configuration is None, the function selects typical dimensions.

onnx_diagnostic.tasks.summarization.reduce_model_config(config: Any) → Dict[str, Any][source][source]¶: Reduces a model size.