onnx_diagnostic.tasks.automatic_speech_recognition¶
- onnx_diagnostic.tasks.automatic_speech_recognition.get_inputs(model: Module, config: Any | None, dummy_max_token_id: int, max_source_positions: int, d_model: int, num_hidden_layers: int, encoder_attention_heads: int, encoder_layers: int, decoder_layers: int, head_dim: int, batch_size: int = 2, sequence_length: int = 30, **kwargs)[source]¶
Generates inputs for task
text2text-generation
. Example:dict( cache_position:T7s4, past_key_values:EncoderDecoderCache( self_attention_cache=DynamicCache[serialized](#2[#0[],#0[]]), cross_attention_cache=DynamicCache[serialized](#2[#0[],#0[]]) ), decoder_input_ids:T7s1x4, encoder_outputs:BaseModelOutput(last_hidden_state:T1s1x1500x384), use_cache:bool,return_dict:bool ) dict( cache_position:T7s1, past_key_values:EncoderDecoderCache( self_attention_cache=DynamicCache[serialized](#2[ #4[T1s1x6x4x64,T1s1x6x4x64,T1s1x6x4x64,T1s1x6x4x64], #4[T1s1x6x4x64,T1s1x6x4x64,T1s1x6x4x64,T1s1x6x4x64] ]), cross_attention_cache=DynamicCache[serialized](#2[ #4[T1s1x6x1500x64,T1s1x6x1500x64,T1s1x6x1500x64,T1s1x6x1500x64], #4[T1s1x6x1500x64,T1s1x6x1500x64,T1s1x6x1500x64,T1s1x6x1500x64] ]), ), decoder_input_ids:T7s1x1, encoder_outputs:BaseModelOutput(last_hidden_state:T1s1x1500x384), use_cache:bool,return_dict:bool )