experimental_experiment.torch_dynamo.fast_backend¶
- class experimental_experiment.torch_dynamo.fast_backend.OrtBackend(sess: onnxruntime.InferenceSession, run_options: onnxruntime.RunOptions | None = None, devices: Dict[int, Any] | None = None, input_names: List[str] | None = None, output_names: List[str] | None = None, is_dimension_in: List[Tuple[bool, int, str, int]] | None = None, is_dimension_out: List[Tuple[bool, int, str | None, int]] | None = None, dump_first_inputs: str | None = None, stor: Dict[str, Any] | None = None, onnx_model: ModelProto | None = None)[source]¶
Wraps method
run_with_ortvaluevector
fromInferenceSession
to implement a backend fortorch.dynamo
.
- experimental_experiment.torch_dynamo.fast_backend.onnx_custom_backend(graph_module: torch.fx.GraphModule, args: List[torch.Tensor], target_opset: int | None = None, backend: str = 'ort', verbose: int | Tuple[int, int] = 0, dump_prefix: None = None, dump_patterns: str | None = None, providers: Tuple[str] | None = None, raise_exc: bool = True, storage: Dict[str, Any] | None = None, enable_pattern: str | List[str | type] | None = 'default', disable_pattern: str | List[str | type] | None = None, pre_ort_model_transforms: Callable[[ModelProto], ModelProto] | List[Callable[[ModelProto], ModelProto]] | None = None, ort_optimization_level: str | None = None, dispatcher: Dispatcher | None = None, rename_inputs: bool = True, optimize: bool = True, exporter: str | None = None, processor: str = 'CPU', order_algorithm: str | None = None, options: OptimizationOptions | None = None, export_options: str | ExportOptions | None = None) Callable [source]¶
Custom backend to export torch models into onnx (see torch.compiler). This backend relies on onnxruntime and tries to be as efficient as possible.
- Parameters:
graph_module – graph to export
args – arguments
target_opset – opset to use for the conversion
backend – only ‘ort’ is allowed
verbose – adjust verbosity, if tuple, if gives different verbosity level to the exporter and the runtime
dump_prefix – to dump the models and the inputs
dump_patterns – dump the patterns as well
providers – where to run the model, by default
raise_exc – raise an exception whenever something goes wrong
storage – to store any interesting objects during the process
enable_pattern – optimization patterns to enable
disable_pattern – optimization patterns to disable
pre_ort_model_transforms – list of transformations applied on the final ModelProto
ort_optimization_level – graph optimization level for onnxruntime, the default value is the same as what onnxruntime defines
dispatcher – see
experimental_experiment.torch_interpreter.Dispatcher
rename_inputs – rename the inputs
optimize – enable or disable the optimization
exporter – use a different exporter
processor – optimization should be made for this processor or this list of processors (comma separated value)
order_algorithm – algorithm optimizing the order the onnx node, none by default
options – to define custom Optimization options, in that case, any other optimization parameter is ignored
export_options – see
ExportOptions
- Returns:
Callable
See 301: Compares LLAMA exporters for onnxrt backend or 101: A custom backend for torch for examples. If not empty, storage keeps the memory of the data generated, onnx models, graph module as well the inputs and outputs when the model is run.
The following example shows how to use the custom backend (based on onnxruntime).
<<<
import torch from experimental_experiment.torch_dynamo import onnx_custom_backend class MLP(torch.nn.Module): def __init__(self): super().__init__() self.layers = torch.nn.Sequential( torch.nn.Linear(10, 32), torch.nn.Sigmoid(), torch.nn.Linear(32, 1), ) def forward(self, x): return self.layers(x) x = torch.randn(3, 10, dtype=torch.float32) mlp = MLP() expected = mlp(x) compiled_model = torch.compile( mlp, backend=lambda *args, **kwargs: onnx_custom_backend(*args, verbose=1, **kwargs), dynamic=False, fullgraph=True, ) try: got = compiled_model(x) diff = (expected - got).max() print(f"discrepancies: {diff}") except (ImportError, AttributeError) as e: print("onnxruntime-training is not installed", e)
>>>
[onnx_custom_backend] starts conversion to onnx. [to_onnx] build the graph module from <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>, type(args)=<class 'tuple'> [to_onnx] build the graph module with input_names=['input0', 'input1', 'input2', 'input3', 'input4'] [_make_builder_interpreter] use existing <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'> [to_onnx] graph module done in 0.0002588479983387515 s [to_onnx] start creating the onnx nodes [to_onnx] interpreter.function_options=FunctionOptions(export_as_function=True, name='*', domain='*', external_threshold=256, move_initializer_to_constant=True, return_initializer=True, merge_allowed=True, rename_allowed=True) [to_onnx] 13 onnx nodes done in 0.002746648002357688 s [to_onnx] start conversion to onnx (before optimization) [GraphBuilder.optimize] start with 13 nodes [GraphBuilder.optimize] #patterns=41 [GraphBuilderPatternOptimization.optimize] start with 7 nodes, 0 initializers, 41 patterns, priorities=[0, 1] [GraphBuilderPatternOptimization.optimize] iteration 0: 7 nodes, priority=0 [GraphBuilderPatternOptimization.optimize] increase priority to 1 [GraphBuilderPatternOptimization.optimize] iteration 1: 7 nodes, priority=1 [GraphBuilderPatternOptimization.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.001 | max_time=Reshape2Of3Pattern:0.000 [GraphBuilderPatternOptimization.optimize] iteration 2: 5 nodes, priority=1 [GraphBuilderPatternOptimization.optimize] applies 2 matches, 2*TransposeMatMulPattern - time=0.000 | max_time=TransposeMatMulPattern:0.000 [GraphBuilderPatternOptimization.optimize] iteration 3: 3 nodes, priority=1 [GraphBuilderPatternOptimization.optimize] done after 4 iterations with 3 nodes in 0.007 [GraphBuilder.optimize] done with 3 nodes in 0.008 [GraphBuilder-VRK.to_onnx] make_model 0 inits 0 params [GraphBuilder-VRK.time_evaluation_constants_] 0 [GraphBuilder-VRK._build_initializers] start with 0 initializers, large_model=False, external_threshold=1024 [GraphBuilder-VRK._build_initializers] switch low/high order [GraphBuilder-VRK._build_initializers] done in 9.00996383279562e-07s with 0 initializers, 0 large initializers [to_onnx] to_onnx done in 0.008545954995497596s and 3 nodes, 0 initializers, 5 inputs, 1 outputs [onnx_custom_backend] to_onnx done in 0.011863695006468333 with 3 nodes and 0 local functions. [onnx_custom_backend] starts creating InferenceSession [onnx_custom_backend] InferenceSession done in 0.14823250700283097 discrepancies: 0.0