yobx.sklearn.to_onnx#

yobx.sklearn.to_onnx(estimator: ~sklearn.base.BaseEstimator, args: ~typing.Tuple[~typing.Any], input_names: ~typing.Sequence[str] | None = None, dynamic_shapes: ~typing.Tuple[~typing.Dict[int, str]] | None = None, target_opset: int | ~typing.Dict[str, int] = 21, verbose: int = 0, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, extra_converters: ~typing.Dict[type, ~typing.Callable] | None = None, large_model: bool = False, external_threshold: int = 1024, function_options: ~yobx.xbuilder.function_options.FunctionOptions | None = None, convert_options: ~yobx.typing.ConvertOptionsProtocol | None = None) ExportArtifact[source]#

Converts a scikit-learn estimator into ONNX. By default, the first dimension is considered as dynamic, the others are static.

Parameters:
  • estimator – estimator

  • args – dummy inputs; each element may be a numpy array or an onnx.ValueInfoProto that explicitly describes the input tensor’s name, element type and shape. When a ValueInfoProto is provided no actual data is required, and the dynamic_shapes parameter is ignored for that input (the shape is taken directly from the proto).

  • dynamic_shapes – dynamic shapes, if not specified, the first dimension is dynamic, the others are static

  • target_opset – opset to use; either an integer for the default domain (""), or a dictionary mapping domain names to opset versions, e.g. {"": 20, "ai.onnx.ml": 5}. When "ai.onnx.ml" is set to 5 the converter emits the unified TreeEnsemble operator introduced in that opset instead of the older per-task operators. If it includes {'com.microsoft': 1}, the converted model may include optimized kernels specific to onnxruntime.

  • verbose – verbosity

  • builder_cls – by default the graph builder is a yobx.xbuilder.GraphBuilder but any builder can be used as long it implements the apis Shape and type tracking and Building a graph from scratch

  • extra_converters – optional mapping from estimator type to converter function; entries here take priority over the built-in converters and allow converting custom estimators that are not natively supported

  • large_model – if True the returned ExportArtifact has its container attribute set to an ExtendedModelContainer, which lets the user decide later whether weights should be embedded in the model or saved as external data

  • external_threshold – if large_model is True, every tensor whose element count exceeds this threshold is stored as external data

  • function_options – when a FunctionOptions is provided every non-container estimator is exported as a separate ONNX local function. Pipeline and ColumnTransformer are treated as orchestrators — their individual steps/sub-transformers are each wrapped as a function instead of the container itself. Function names for each step are always derived from the estimator’s class name; the name field of the provided FunctionOptions is not used by this helper to customize function naming. Pass None (the default) to disable function wrapping and produce a flat graph. when large_model is True

  • convert_options – see yobx.sklearn.ConvertOptions

Returns:

ExportArtifact wrapping the exported ONNX proto together with an ExportReport.

Note

scikit-learn==1.8 is more strict with computation types and the number of discrepancies is reduced. Switch to float32 in a matrix multiplication when the order of magnitude of the coefficient is quite large usually introduces discrepancies. That is often the case when a matrix is the inverse of another one. See Float32 vs Float64: precision loss with PLSRegression.

Example:

import numpy as np
from sklearn.linear_model import LinearRegression
from yobx.sklearn import to_onnx

X = np.random.randn(10, 3).astype(np.float32)
y = X @ np.array([1.0, 2.0, 3.0], dtype=np.float32)
reg = LinearRegression().fit(X, y)

artifact = to_onnx(reg, (X,))
# Access the raw proto:
proto = artifact.proto
# Save to disk:
artifact.save("model.onnx")