experimental_experiment.torch_dynamo.fast_backend

class experimental_experiment.torch_dynamo.fast_backend.OrtBackend(sess: onnxruntime.InferenceSession, run_options: onnxruntime.RunOptions | None = None, devices: Dict[int, Any] | None = None, input_names: List[str] | None = None, output_names: List[str] | None = None, is_dimension_in: List[Tuple[bool, int, str, int]] | None = None, is_dimension_out: List[Tuple[bool, int, str | None, int]] | None = None, dump_first_inputs: str | None = None, stor: Dict[str, Any] | None = None, onnx_model: ModelProto | None = None)[source]

Wraps method run_with_ortvaluevector from InferenceSession to implement a backend for torch.dynamo.

dump_for_debug(folder: str, *inputs, test_case: int = 0)[source]

Dumps everything in a folder.

classmethod replay_dumped_data(folder: str, test_case: int = 0, providers: List[str] | None = None, impl: str = 'ort', ort_optimization_level: str | None = None) Tuple[OrtBackend, List[Any]][source]

Loads the data save by dump_for_debug().

experimental_experiment.torch_dynamo.fast_backend.onnx_custom_backend(graph_module: torch.fx.GraphModule, args: List[torch.Tensor], target_opset: int | None = None, backend: str = 'ort', verbose: int | Tuple[int, int] = 0, dump_prefix: None = None, dump_patterns: str | None = None, providers: Tuple[str] | None = None, raise_exc: bool = True, storage: Dict[str, Any] | None = None, enable_pattern: str | List[str | type] | None = 'default', disable_pattern: str | List[str | type] | None = None, pre_ort_model_transforms: Callable[[ModelProto], ModelProto] | List[Callable[[ModelProto], ModelProto]] | None = None, ort_optimization_level: str | None = None, dispatcher: Dispatcher | None = None, rename_inputs: bool = True, optimize: bool = True, exporter: str | None = None, processor: str = 'CPU', order_algorithm: str | None = None, options: OptimizationOptions | None = None, export_options: str | ExportOptions | None = None) Callable[source]

Custom backend to export torch models into onnx (see torch.compiler). This backend relies on onnxruntime and tries to be as efficient as possible.

Parameters:
  • graph_module – graph to export

  • args – arguments

  • target_opset – opset to use for the conversion

  • backend – only ‘ort’ is allowed

  • verbose – adjust verbosity, if tuple, if gives different verbosity level to the exporter and the runtime

  • dump_prefix – to dump the models and the inputs

  • dump_patterns – dump the patterns as well

  • providers – where to run the model, by default

  • raise_exc – raise an exception whenever something goes wrong

  • storage – to store any interesting objects during the process

  • enable_pattern – optimization patterns to enable

  • disable_pattern – optimization patterns to disable

  • pre_ort_model_transforms – list of transformations applied on the final ModelProto

  • ort_optimization_level – graph optimization level for onnxruntime, the default value is the same as what onnxruntime defines

  • dispatcher – see experimental_experiment.torch_interpreter.Dispatcher

  • rename_inputs – rename the inputs

  • optimize – enable or disable the optimization

  • exporter – use a different exporter

  • processor – optimization should be made for this processor or this list of processors (comma separated value)

  • order_algorithm – algorithm optimizing the order the onnx node, none by default

  • options – to define custom Optimization options, in that case, any other optimization parameter is ignored

  • export_options – see ExportOptions

Returns:

Callable

See 301: Compares LLAMA exporters for onnxrt backend or 101: A custom backend for torch for examples. If not empty, storage keeps the memory of the data generated, onnx models, graph module as well the inputs and outputs when the model is run.

The following example shows how to use the custom backend (based on onnxruntime).

<<<

import torch
from experimental_experiment.torch_dynamo import onnx_custom_backend


class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(10, 32),
            torch.nn.Sigmoid(),
            torch.nn.Linear(32, 1),
        )

    def forward(self, x):
        return self.layers(x)


x = torch.randn(3, 10, dtype=torch.float32)

mlp = MLP()
expected = mlp(x)

compiled_model = torch.compile(
    mlp,
    backend=lambda *args, **kwargs: onnx_custom_backend(*args, verbose=1, **kwargs),
    dynamic=False,
    fullgraph=True,
)

try:
    got = compiled_model(x)
    diff = (expected - got).max()
    print(f"discrepancies: {diff}")
except (ImportError, AttributeError) as e:
    print("onnxruntime-training is not installed", e)

>>>

    [onnx_custom_backend] starts conversion to onnx.
    [to_onnx] build the graph module from <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>, type(args)=<class 'tuple'>
    [to_onnx] build the graph module with input_names=['input0', 'input1', 'input2', 'input3', 'input4']
    [_make_builder_interpreter] use existing <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>
    [to_onnx] graph module done in 0.008583097995142452 s
    [to_onnx] start creating the onnx nodes
    [to_onnx] interpreter.function_options=FunctionOptions(export_as_function=True, name='*', domain='*', external_threshold=256, move_initializer_to_constant=True, return_initializer=True, merge_allowed=True, rename_allowed=True)
    [to_onnx] 13 onnx nodes done in 0.0016884470023796894 s
    [to_onnx] start conversion to onnx (before optimization) mask_outputs=None
    [GraphBuilder.optimize] start with 13 nodes
    [GraphBuilder.optimize] #patterns=44
    [GraphBuilderPatternOptimization.optimize] start with 7 nodes, 0 initializers, 44 patterns, priorities=[0, 1]
    [GraphBuilderPatternOptimization.optimize] iteration 0: 7 nodes, priority=0
    [GraphBuilderPatternOptimization.optimize] increase priority to 1
    [GraphBuilderPatternOptimization.optimize] iteration 1: 7 nodes, priority=1
    [GraphBuilderPatternOptimization.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.001 | max_time=MatMulAddPattern:0.000
    [GraphBuilderPatternOptimization.optimize] iteration 2: 5 nodes, priority=1
    [GraphBuilderPatternOptimization.optimize] applies 2 matches, 2*TransposeMatMulPattern - time=0.000 | max_time=TransposeMatMulPattern:0.000
    [GraphBuilderPatternOptimization.optimize] iteration 3: 3 nodes, priority=1
    [GraphBuilderPatternOptimization.optimize] done after 4 iterations with 3 nodes in 0.005
    [GraphBuilder.optimize] done with 3 nodes in 0.006
    [GraphBuilder-STK.to_onnx] make_model 0 inits 0 params
    [GraphBuilder-STK.time_evaluation_constants_] 0
    [GraphBuilder-STK._build_initializers] start with 0 initializers, large_model=False, external_threshold=1024
    [GraphBuilder-STK._build_initializers] switch low/high order
    [GraphBuilder-STK._build_initializers] done in 1.6429985407739878e-06s with 0 initializers, 0 large initializers
    [to_onnx] to_onnx done in 0.006493446002423298s and 3 nodes, 0 initializers, 5 inputs, 1 outputs
    [onnx_custom_backend] to_onnx done in 0.01903762499568984 with 3 nodes and 0 local functions.
    [onnx_custom_backend] starts creating InferenceSession
    [onnx_custom_backend] InferenceSession done in 0.1192963459980092
    discrepancies: 5.960464477539063e-08