onnx_diagnostic.export.api¶
- class onnx_diagnostic.export.api.WrapperToExportMethodToOnnx(mod: Module, method_name: str = 'forward', input_names: Sequence[str] | None = None, target_opset: int | Dict[str, int] | None = None, verbose: int = 0, filename: str | None = None, output_names: List[str] | None = None, output_dynamic_shapes: Dict[str, Any] | Tuple[Any] | None = None, exporter: str = 'onnx-dynamo', exporter_kwargs: Dict[str, Any] | None = None, save_ep: str | None = None, optimize: bool = True, optimizer_for_ort: bool = True, use_control_flow_dispatcher: bool = False, onnx_plugs: List[EagerDirectReplacementWithOnnx] | None = None, inline: bool = True, convert_after_n_calls: int = 2, patch_kwargs: Dict[str, Any] | None = None, skip_kwargs_names: Set[str] | None = None, dynamic_shapes: Dict[str, Any] | Tuple[Any] | None = None, dynamic_batch_for: Sequence[int | str] | None = None, expand_batch_for: Sequence[int | str] | None = None)[source][source]¶
Wraps an existing models in order to spy on inputs. This is used by
onnx_diagnostic.export.api.method_to_onnx()or Export a LLM through method generate (with Tiny-LLM) for an example.- classmethod add_empty_cache_if_needed(inputs: List[Any]) List[Any][source][source]¶
Adds empty cache if needed as onnxruntime needs an empty cache, not a missing cache. It only works if inputs are defined as a dictionary.
- check_discrepancies(atol: float = 0.0001, rtol: float = 0.1, hist=(0.1, 0.01), verbose: int = 0) List[Dict[str, str | int | float]][source][source]¶
Computes the discrepancies between the saved inputs and outputs with the saved onnx model.
- Parameters:
atol – absolute tolerance, recommended values, 1e-4 for float, 1e-2 flot float16
rtol – relative tolerance
hist – thresholds, the function determines the number of discrepancies above that threshold.
verbose – verbosity
- Returns:
results, a list of dictionaries, ready to be consumed by a dataframe
- forward(*args, **kwargs)[source][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- classmethod get_dynamic_shape_patterns() Dict[str, Any][source][source]¶
Returns the known patterns for the dynamic shapes.
<<<
import pprint from onnx_diagnostic.export.api import WrapperToExportMethodToOnnx pprint.pprint(WrapperToExportMethodToOnnx.get_dynamic_shape_patterns())
>>>
{'LLM.text': {'attention_mask': {0: 'batch', 1: 'totallength'}, 'cache_position': {0: 'seqlength'}, 'input_ids': {0: 'batch', 1: 'seqlength'}, 'past_key_values': {0: 'batch', 2: 'pastlength'}}}
- classmethod make_empty_cache_from_others(examples: List[Any]) Any[source][source]¶
Builds an empty cache based on existing one.
- classmethod rename_dynamic_shapes(ds: Dict[str, Any], verbose: int = 0) Dict[str, Any][source][source]¶
Renames the dynamic shapes with names. Tries to rename any dynamic dimnesion dimension before export. It is not very clever, it just tries to recognize a known configuration based on input names. Dimension names in dynamic shapes are renamed if ds has the same number of named arguments as the one of the patterns returned by function
get_dynamic_shape_patterns.
- onnx_diagnostic.export.api.get_main_dispatcher(use_control_flow_dispatcher: bool = False, onnx_plugs: List[EagerDirectReplacementWithOnnx] | None = None) Any[source][source]¶
Creates a custom dispatcher for the custom exporter.
- onnx_diagnostic.export.api.method_to_onnx(mod: Module, method_name: str = 'forward', input_names: Sequence[str] | None = None, target_opset: int | Dict[str, int] | None = None, verbose: int = 0, filename: str | None = None, output_names: List[str] | None = None, output_dynamic_shapes: Dict[str, Any] | Tuple[Any] | None = None, exporter: str = 'onnx-dynamo', exporter_kwargs: Dict[str, Any] | None = None, save_ep: str | None = None, optimize: bool = True, optimizer_for_ort: bool = True, use_control_flow_dispatcher: bool = False, onnx_plugs: List[EagerDirectReplacementWithOnnx] | None = None, inline: bool = True, convert_after_n_calls: int = 2, patch_kwargs: Dict[str, Any] | None = None, skip_kwargs_names: Set[str] | None = None, dynamic_shapes: Dict[str, Any] | Tuple[Any] | None = None, dynamic_batch_for: Sequence[int | str] | None = None, expand_batch_for: Sequence[int | str] | None = None) Callable[source][source]¶
Exports one method into ONNX for a module into ONNX. It returns a new method which must be called by the user at least twice with different values for the dynamic dimension between triggering the conversion into ONNX.
- Parameters:
mod_meth – function to export into ONNX
input_names – input names for the onnx model (optional)
target_opset – opset to target, if not specified, each converter keeps its default value
verbose – verbosity level
filename – output filename, mandatory, the onnx model is saved on disk
output_names – to change the output of the onnx model
output_dynamic_shapes – to overwrite the dynamic shapes names
exporter – exporter to use (
onnx-dynamo,modelbuilder,custom)exporter_kwargs – additional parameters sent to the exporter
save_ep – saves the exported program
optimize – optimizes the model
optimizer_for_ort – optimizes the model for onnxruntime
use_control_flow_dispatcher – use the dispatcher created to supported custom loops (see
onnx_diagnostic.export.control_flow_onnx.loop_for_onnx())onnx_plugs – the code was modified to replace some parts with onnx translation
inline – inline local functions
convert_after_n_calls – converts the model after this number of calls.
patch_kwargs – patch arguments
skip_kwargs_names – use default values for these parameters part of the signature of the method to export
dynamic_shapes – dynamic shapes to use if the guessed ones are not right
dynamic_batch_for – LLM are usually called with a batch size equal to 1, but the export may benefit from having a dynamic batch size, this parameter forces the input specified in this set to have the first dimension be dynamic
expand_batch_for – LLM are usually called with a batch size equal to 1, but the export may benefit from having another value for the batch size, this parameter forces the input specified in this set to be expanded to 2 if the batch size is one
- Returns:
the output of the selected exporter, usually a structure including an onnx model
See Export a LLM through method generate (with Tiny-LLM) for an example.
- onnx_diagnostic.export.api.to_onnx(mod: Module | GraphModule, args: Sequence[Tensor] | None = None, kwargs: Dict[str, Tensor] | None = None, input_names: Sequence[str] | None = None, target_opset: int | Dict[str, int] | None = None, verbose: int = 0, dynamic_shapes: Dict[str, Any] | Tuple[Any] | None = None, filename: str | None = None, output_names: List[str] | None = None, output_dynamic_shapes: Dict[str, Any] | Tuple[Any] | None = None, exporter: str = 'onnx-dynamo', exporter_kwargs: Dict[str, Any] | None = None, save_ep: str | None = None, optimize: bool = True, optimizer_for_ort: bool = True, use_control_flow_dispatcher: bool = False, onnx_plugs: List[EagerDirectReplacementWithOnnx] | None = None, inline: bool = True) Any[source][source]¶
Exports one model into ONNX. Common API for exporters. By default, the models are optimized to use the most efficient kernels implemented in onnxruntime.
- Parameters:
mod – torch model
args – unnamed arguments
kwargs – named arguments
input_names – input names for the onnx model (optional)
target_opset – opset to target, if not specified, each converter keeps its default value
verbose – verbosity level
dynamic_shapes – dynamic shapes, usually a nested structure included a dictionary for each tensor
filename – output filename
output_names – to change the output of the onnx model
output_dynamic_shapes – to overwrite the dynamic shapes names
exporter – exporter to use (
onnx-dynamo,modelbuilder,custom)exporter_kwargs – additional parameters sent to the exporter
save_ep – saves the exported program
optimize – optimizes the model
optimizer_for_ort – optimizes the model for onnxruntime
use_control_flow_dispatcher – use the dispatcher created to supported custom loops (see
onnx_diagnostic.export.control_flow_onnx.loop_for_onnx())onnx_plugs – the code was modified to replace some parts with onnx translation
inline – inline local functions
- Returns:
the output of the selected exporter, usually a structure including an onnx model
A simple example:
to_onnx( model, kwargs=inputs, dynamic_shapes=ds, exporter=exporter, filename=filename, )
Some examples using control flows are available in
onnx_diagnostic.export.control_flow_onnx.loop_for_onnx()oronnx_diagnostic.export.onnx_plug.EagerDirectReplacementWithOnnx.