experimental_experiment.torch_bench._bash_bench_model_runner¶

class experimental_experiment.torch_bench._bash_bench_model_runner.MakeConfig(**kwargs)[source]¶: Creates a dictionary where keys are attributes.

class experimental_experiment.torch_bench._bash_bench_model_runner.ModelRunner(model: Any, inputs: Any, kw_inputs: Dict[str, Any] | None, device: str, dtype: dtype, warmup: int, repeat: int, suite: str, autocast: bool = False, wrap_kind: None = None, nvtx: bool = False, model_name: str | None = None, export_options: Dict[str, Any] | None = None, patch_options: Dict[str, Any] | None = None, dynamic_shapes: Dict[str, Any] | Tuple[Any] | List[Any] | None = None, inputs2: Any | None = None, kw_inputs2: Dict[str, Any] | None = None, task: str = '', attn_impl: str = 'eager', config: Any | None = None)[source]¶

Wrappers around a model. Makes it easier to load, run inference.

Parameters:

model – torch model
inputs – example of inputs
kw_inputs – example of keyword inputs
device – device
dtype – if the model needs to be converted
warmup – number of iteration to warmup the model
repeat – number of iteration to repeat the model
suite – model suite
wrap_kind – to wrap the model and tuple as much as possible, None is default behavior, ‘nowrap’ to explicit avoid wrapping
nvtx – enable nvtx events
model_name – model name
export_options – additional options when exporting if the default options never work
patch_options – patching options, applied before exporting a model
dynamic_shapes – dynamic shapes to use instead of using automated ones
inputs2 – second set of inputs to check the model handles differents shapes when they are dynamic
task – task associated with the model if known

classmethod allowed_configuration(exporter: str, optimization: str | None = None) → bool[source]¶: Defines the allowed configurations.

compute_weight_size() → int[source]¶: Returns the weight size.

dump_std(filename: str)[source]¶: Dumps some information in the given filename.

export_as(exporter: str, name: str, dynamic: bool, fake_tensor: bool, no_grad: bool, optimization: str, verbose: int, target_opset: int, patch: bool) → Tuple[ModelProto, Dict[str, Any] | None][source]¶

Converts a model into onnx.

Parameters:

exporter – exporter
name – filename
dynamic – use dynamic shape
fake_tensor – use fake_tensor
no_grad – use no_grad
optimization – defines the optimizations
verbose – verbosity
target_opset – target opset
patch – apply patches before exporting, not a valid option for all exporters

Returns:

the model proto with or without weights, statistics

get_devices()[source]¶: Returns the devices.

get_dynamic_shapes(dynamic: bool = False, input_names: List[str] | None = None) → Dict[str, Any] | Tuple[Any] | List[Any] | None[source]¶

Returns dynamic shapes specifying the first dimension as dynamic.

Parameters:

dynamic – make it dynamic or not
input_names – to overwrite the input names, (not used)

get_input_shapes(dynamic: bool = False, export: bool = False, inputs: Any | None = None) → Any[source]¶

Returns the input shapes.

Parameters:

dynamic – dynamic, yes or no?
inputs – existing inputs or None to use self.inputs
export – returns the shapes for the inputs used for export

Returns:

new inputs

get_inputs_with_copied_dynamic_cache(inputs: Any | None = None) → Any[source]¶: LLM modifies the cache. It needs to be copied first.

is_lm() → bool[source]¶: Returns True if the model is a language model. In that case, the dynamic dimensions with the two first ones. This test relies on the model name.

make_dynamic_inputs()[source]¶: Creates dynamic inputs based on the static ones by changing the dynamic according to the definition of the dynamic_shapes.

make_export_inputs(dynamic: bool = False, inputs: Tuple[Any, ...] | None = None, kw_inputs: Dict[str, Any] | None = None, int_to_tensor: bool = False) → Tuple[Tuple[Any, ...], Dict[str, Any]][source]¶

Creates the new inputs for the benchmarks. torch.export.export() fails when a dimension is dynamic and the value for this dimension is 1. This function expands the input on that dimension to make it 2 if it is 1. These inputs should only be used at export time.

Parameters:

dynamic – dynamic, yes or no?
inputs – existing inputs or None to use self.inputs
int_to_tensor – converts integers or float to tensors

Returns:

new inputs

make_feeds(exporter: str, filename: str | None = None, dynamic: bool = False, remove_int: bool = False, remove_position_ids: bool = False, batch_size_one: bool = False)[source]¶: Creates feed inputs.

parameters_dtype() → str[source]¶: Returns the unique dtypes of all parameters.

class experimental_experiment.torch_bench._bash_bench_model_runner.UseDefaultValue(*values)[source]¶

Defines if the exporter may use the default value.

FALSE: no default value
TRUE: there is a default value and the input is not specified
BOTH: there is a default and one input

class experimental_experiment.torch_bench._bash_bench_model_runner.WrappedModelBase(model)[source]¶

Wrapper around a module.

forward(*args, **kwargs)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

parameters()[source]¶

Return an iterator over module parameters.

This is typically passed to an optimizer.

Args:

recurse (bool): if True, then yields parameters of this module: and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields:

Parameter: module parameter

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

class experimental_experiment.torch_bench._bash_bench_model_runner.WrappedModelToTuple(model)[source]¶

Wrapper around a module, flattens inputs and outputs so that every exporter can use it.

forward(*args, **kwargs)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

experimental_experiment.torch_bench._bash_bench_model_runner.download_retry_decorator(retry: int = 5) → Callable[source]¶

Decorator function for applying retry logic to a download function.

The wrapped function will be called up to 5 times and raises an exception if the function fails each time. After each unsuccessful attempt, there is a delay before the next attempt, which is increased linearly with the number of tries.

Parameters:: retry – number of times to retry

Usage:

@download_retry_decorator(retry=5)
def download_function(model_name: str):
    # download logic goes here
    # ...

experimental_experiment.torch_bench._bash_bench_model_runner.get_dynamo_stats() → Dict[str, float][source]¶: Returns statistics on memory as a dictionary.

experimental_experiment.torch_bench._bash_bench_model_runner.get_peak_memory()[source]¶: Retuns the memory peak.