.torch_dynamo.fast_backend¶

class experimental_experiment.torch_dynamo.fast_backend.OrtBackend(sess: onnxruntime.InferenceSession, run_options: onnxruntime.RunOptions | None = None, devices: Dict[int, Any] | None = None, input_names: List[str] | None = None, output_names: List[str] | None = None, is_dimension_in: List[Tuple[bool, int, str, int]] | None = None, is_dimension_out: List[Tuple[bool, int, str | None, int]] | None = None, dump_first_inputs: str | None = None, stor: Dict[str, Any] | None = None, onnx_model: ModelProto | None = None)[source]¶

Wraps method run_with_ortvaluevector from InferenceSession to implement a backend for torch.dynamo.

dump_for_debug(folder: str, *inputs, test_case: int = 0)[source]¶: Dumps everything in a folder.

classmethod replay_dumped_data(folder: str, test_case: int = 0, providers: List[str] | None = None, impl: str = 'ort', ort_optimization_level: str | None = None) → Tuple[OrtBackend, List[Any]][source]¶: Loads the data save by dump_for_debug().

experimental_experiment.torch_dynamo.fast_backend.onnx_custom_backend(graph_module: torch.fx.GraphModule, args: List[torch.Tensor], target_opset: int | None = None, backend: str = 'ort', verbose: int | Tuple[int, int] = 0, dump_prefix: None = None, dump_patterns: str | None = None, providers: Tuple[str] | None = None, raise_exc: bool = True, storage: Dict[str, Any] | None = None, enable_pattern: str | List[str | type] | None = 'default', disable_pattern: str | List[str | type] | None = None, pre_ort_model_transforms: Callable[[ModelProto], ModelProto] | List[Callable[[ModelProto], ModelProto]] | None = None, ort_optimization_level: str | None = None, dispatcher: Dispatcher | None = None, rename_inputs: bool = True, optimize: bool = True, exporter: str | None = None, processor: str = 'CPU', order_algorithm: str | None = None, options: OptimizationOptions | None = None, export_options: str | ExportOptions | None = None) → Callable[source]¶

Custom backend to export torch models into onnx (see torch.compiler). This backend relies on onnxruntime and tries to be as efficient as possible.

Parameters:

graph_module – graph to export
args – arguments
target_opset – opset to use for the conversion
backend – only ‘ort’ is allowed
verbose – adjust verbosity, if tuple, if gives different verbosity level to the exporter and the runtime
dump_prefix – to dump the models and the inputs
dump_patterns – dump the patterns as well
providers – where to run the model, by default
raise_exc – raise an exception whenever something goes wrong
storage – to store any interesting objects during the process
enable_pattern – optimization patterns to enable
disable_pattern – optimization patterns to disable
pre_ort_model_transforms – list of transformations applied on the final ModelProto
ort_optimization_level – graph optimization level for onnxruntime, the default value is the same as what onnxruntime defines
dispatcher – see experimental_experiment.torch_interpreter.Dispatcher
rename_inputs – rename the inputs
optimize – enable or disable the optimization
exporter – use a different exporter
processor – optimization should be made for this processor or this list of processors (comma separated value)
order_algorithm – algorithm optimizing the order the onnx node, none by default
options – to define custom Optimization options, in that case, any other optimization parameter is ignored
export_options – see ExportOptions

Returns:

Callable

See 101: A custom backend for torch for examples. If not empty, storage keeps the memory of the data generated, onnx models, graph module as well the inputs and outputs when the model is run.

The following example shows how to use the custom backend (based on onnxruntime).

<<<

import torch
from experimental_experiment.torch_dynamo import onnx_custom_backend


class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(10, 32),
            torch.nn.Sigmoid(),
            torch.nn.Linear(32, 1),
        )

    def forward(self, x):
        return self.layers(x)


x = torch.randn(3, 10, dtype=torch.float32)

mlp = MLP()
expected = mlp(x)

compiled_model = torch.compile(
    mlp,
    backend=lambda *args, **kwargs: onnx_custom_backend(*args, verbose=1, **kwargs),
    dynamic=False,
    fullgraph=True,
)

try:
    got = compiled_model(x)
    diff = (expected - got).max()
    print(f"discrepancies: {diff}")
except (ImportError, AttributeError) as e:
    print("onnxruntime-training is not installed", e)

>>>

    [onnx_custom_backend] starts conversion to onnx.
    [to_onnx] build the graph module from <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>, type(args)=<class 'tuple'>
    [to_onnx] build the graph module with input_names=['input0', 'input1', 'input2', 'input3', 'input4']
    [_make_builder_interpreter] use existing <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>
    [to_onnx] graph module done in 0.008472286999676726 s
    [to_onnx] start creating the onnx nodes
    [to_onnx] interpreter.function_options=FunctionOptions(export_as_function=True, name='*', domain='*', external_threshold=256, move_initializer_to_constant=True, return_initializer=True, merge_allowed=True, rename_allowed=True)
    [to_onnx] 13 onnx nodes done in 0.001892174999738927 s
    [to_onnx] start conversion to onnx (before optimization) mask_outputs=None
    [GraphBuilder-PWA._add_shape_information] dynamic shapes replacements={}
    [GraphBuilder-PWA.optimize] start with 13 nodes
    [GraphBuilder-PWA.optimize] #patterns=78
    [GraphBuilder-PWA.optimize] start with subgraphs
    [GraphBuilder-PWA.optimize] done with subgraphs
    [GraphBuilderPatternOptimization-PWA.optimize] start with 7 nodes, 0 initializers, 78 patterns, priorities=[0, 1, 3], max_iter=30
    [GraphBuilderPatternOptimization-PWA.optimize] same children={'SameChildrenFromInputPattern', 'SameChildrenPattern'}
    [GraphBuilderPatternOptimization-PWA.optimize] iteration 0: 7 nodes, priority=0
    [GraphBuilderPatternOptimization-PWA.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-PWA.optimize] iteration 1: 7 nodes, priority=1
    [GraphBuilderPatternOptimization-PWA.optimize] applies 2 matches, 1*TransposeEqualReshapePattern, 1*TransposeMatMulPattern - time=0.001 | max_time=TransposeMatMulPattern:0.000
    [GraphBuilderPatternOptimization-PWA.optimize] iteration 2: 6 nodes, priority=1
    [GraphBuilderPatternOptimization-PWA.optimize] increase priority to 3
    [GraphBuilderPatternOptimization-PWA.optimize] iteration 3: 6 nodes, priority=3
    [GraphBuilderPatternOptimization-PWA.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.001 | max_time=ReshapePattern:0.000
    [GraphBuilderPatternOptimization-PWA.optimize] iteration 4: 4 nodes, priority=3
    [GraphBuilderPatternOptimization-PWA.optimize] stops current_priority_index=3, priorities=[0, 1, 3]
    [GraphBuilderPatternOptimization-PWA.optimize] done after 5 iterations with 4 nodes in 0.007
    [GraphBuilder-PWA.optimize] done with 4 nodes in 0.007
    [GraphBuilder-PWA.to_onnx] make_model 1 inits 0 params
    [GraphBuilder-PWA.time_evaluation_constants_] 0
    [GraphBuilder-PWA._build_initializers] start with 1 initializers, large_model=False, external_threshold=1024
    [GraphBuilder-PWA._build_initializers] switch low/high order
    [GraphBuilder-PWA._build_initializers] done in 9.1600031737471e-07s with 1 initializers, 0 large initializers
    [GraphBuilder-PWA._add_shape_information] dynamic shapes replacements={}
    [to_onnx] to_onnx done in 0.008540412000002107s and 4 nodes, 1 initializers, 5 inputs, 1 outputs
    [onnx_custom_backend] to_onnx done in 0.02021880600022996 with 4 nodes and 0 local functions.
    [onnx_custom_backend] starts creating InferenceSession
    [onnx_custom_backend] InferenceSession done in 0.010499592000087432
    discrepancies: 2.9802322387695312e-08