yobx.sklearn.to_onnx#

yobx.sklearn.to_onnx(estimator: ~sklearn.base.BaseEstimator, args: ~typing.Tuple[~typing.Any], input_names: ~typing.Sequence[str] | None = None, dynamic_shapes: ~typing.Tuple[~typing.Dict[int, str]] | None = None, target_opset: int | ~typing.Dict[str, int] = 21, verbose: int = 0, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, extra_converters: ~typing.Dict[type, ~typing.Callable] | None = None, large_model: bool = False, external_threshold: int = 1024, function_options: ~yobx.xbuilder.function_options.FunctionOptions | None = None, convert_options: ~yobx.typing.ConvertOptionsProtocol | None = None, filename: str | None = None, return_optimize_report: bool = False) → ExportArtifact[source]#

Converts a scikit-learn estimator into ONNX. By default, the first dimension is considered as dynamic, the others are static.

Parameters:

estimator – estimator
args –
dummy inputs; each element may be a numpy array, a pandas.DataFrame, an onnx.ValueInfoProto that explicitly describes the input tensor’s name, element type and shape, or a (name, dtype, shape) tuple. A DataFrame is expanded column-by-column: each column is registered as a separate 1-D ONNX graph input named after the column, and an Unsqueeze + Concat node sequence assembles them back into a 2-D matrix (batch, n_cols) that is passed to the converter. When a ValueInfoProto or a (name, dtype, shape) tuple is provided no actual data is required, and the dynamic_shapes parameter is ignored for that input (the shape is taken directly from the descriptor). The (name, dtype, shape) tuple format uses a plain string for the name, a numpy dtype (or scalar-type class such as np.float32) for the element type, and a sequence of ints and/or strings for the shape (strings denote symbolic / dynamic dimensions). Example:
```
to_onnx(estimator, (('x', np.float32, ('N', 4)),))
```
dynamic_shapes – dynamic shapes, if not specified, the first dimension is dynamic, the others are static
target_opset – opset to use; either an integer for the default domain (""), or a dictionary mapping domain names to opset versions, e.g. {"": 20, "ai.onnx.ml": 5}. When "ai.onnx.ml" is set to 5 the converter emits the unified TreeEnsemble operator introduced in that opset instead of the older per-task operators. If it includes {'com.microsoft': 1}, the converted model may include optimized kernels specific to onnxruntime.
verbose – verbosity
builder_cls – by default the graph builder is a yobx.xbuilder.GraphBuilder but any builder can be used as long it implements the apis Shape and type tracking and Building a graph from scratch
extra_converters – optional mapping from estimator type to converter function; entries here take priority over the built-in converters and allow converting custom estimators that are not natively supported
large_model – if True the returned ExportArtifact has its container attribute set to an ExtendedModelContainer, which lets the user decide later whether weights should be embedded in the model or saved as external data
external_threshold – if large_model is True, every tensor whose element count exceeds this threshold is stored as external data
function_options – when a FunctionOptions is provided every non-container estimator is exported as a separate ONNX local function. Pipeline and ColumnTransformer are treated as orchestrators — their individual steps/sub-transformers are each wrapped as a function instead of the container itself. Function names for each step are always derived from the estimator’s class name; the name field of the provided FunctionOptions is not used by this helper to customize function naming. Pass None (the default) to disable function wrapping and produce a flat graph. when large_model is True
convert_options – see yobx.sklearn.ConvertOptions
filename – if set, the exported ONNX model is saved to this path and the ExportReport is written as a companion Excel file (same base name with .xlsx extension).
return_optimize_report – if True, the returned ExportArtifact has its report attribute populated with per-pattern optimization statistics

Returns:

ExportArtifact wrapping the exported ONNX proto together with an ExportReport.

Note

scikit-learn==1.8 is more strict with computation types and the number of discrepancies is reduced. Switch to float32 in a matrix multiplication when the order of magnitude of the coefficient is quite large usually introduces discrepancies. That is often the case when a matrix is the inverse of another one. See Float32 vs Float64: precision loss with PLSRegression.

Example:

import numpy as np
from sklearn.linear_model import LinearRegression
from yobx.sklearn import to_onnx

X = np.random.randn(10, 3).astype(np.float32)
y = X @ np.array([1.0, 2.0, 3.0], dtype=np.float32)
reg = LinearRegression().fit(X, y)

artifact = to_onnx(reg, (X,))
# Access the raw proto:
proto = artifact.proto
# Save to disk:
artifact.save("model.onnx")

yobx.sklearn.wrap_skl2onnx_converter(skl2onnx_op_converter: Callable) → Callable[source]#: Wrap a skl2onnx-style converter function so it can be used with yobx.sklearn.to_onnx() via the extra_converters parameter.

Note

This module contains no skl2onnx imports. Only onnx and numpy (both core yobx dependencies) are used inside the mock helper classes.

Tunes the way every piece of a model is exported.

Pass an instance of this class to yobx.sklearn.to_onnx() via the convert_options keyword argument to request extra outputs from tree and ensemble estimators.

Parameters:

decision_leaf – when True, an extra int64 output tensor is appended containing the zero-based leaf node index reached by each input sample. The shape is (N, 1) for single trees and (N, n_estimators) for ensembles. The option triggers for every estimator which implements decision_path method.
decision_path – when True, an extra object (string) output tensor is appended containing the binary root-to-leaf path for each input sample. Each value is a byte-string whose i-th character is '1' if node i was visited and '0' otherwise. The shape is (N, 1) for single trees and (N, n_estimators) for ensembles. The option triggers for every estimator which implements decision_path method.

Class attributes

OPTIONS

Type:: list[str]

Sorted list of all recognised option names. Currently ["decision_leaf", "decision_path"].

Example:

import numpy as np
from sklearn.tree import DecisionTreeClassifier
from yobx.sklearn import ConvertOptions, to_onnx

X = np.random.randn(20, 4).astype(np.float32)
y = (X[:, 0] > 0).astype(int)
clf = DecisionTreeClassifier(max_depth=3).fit(X, y)

# Export with both extra outputs enabled
opts = ConvertOptions(decision_leaf=True, decision_path=True)
artifact = to_onnx(clf, (X,), convert_options=opts)
# The model now produces four outputs:
#   label (int64), probabilities (float32),
#   decision_path (object/string), decision_leaf (int64)

Usage#

Inherit from both BaseEstimator (or any sklearn base class) and NoKnownOutputMixin when writing a custom converter that needs to emit an arbitrary set of ONNX outputs:

from sklearn.base import BaseEstimator, TransformerMixin
from yobx.sklearn import NoKnownOutputMixin

class MyMultiOutputTransformer(BaseEstimator, TransformerMixin, NoKnownOutputMixin):
    def fit(self, X=None, y=None):
        self.input_dtypes_ = {"a": np.dtype("float32"), "b": np.dtype("float32")}
        return self

    def transform(self, df):
        return df[["a", "b"]].assign(total=df["a"] + df["b"])

    def get_feature_names_out(self, input_features=None):
        return ["a", "b", "total"]

The paired extra_converters entry is then free to call g.make_output(...) for each output without the framework complaining about a mismatched output count.

class yobx.sklearn.TraceableMixin[source]#: Marks an estimator as traceable. Then its method transform will be traced to export it into ONNX.