yobx.sklearn.to_onnx#
- yobx.sklearn.to_onnx(estimator: ~sklearn.base.BaseEstimator, args: ~typing.Tuple[~typing.Any], input_names: ~typing.Sequence[str] | None = None, dynamic_shapes: ~typing.Tuple[~typing.Dict[int, str]] | None = None, target_opset: int | ~typing.Dict[str, int] = 21, verbose: int = 0, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, extra_converters: ~typing.Dict[type, ~typing.Callable] | None = None, large_model: bool = False, external_threshold: int = 1024, function_options: ~yobx.xbuilder.function_options.FunctionOptions | None = None, convert_options: ~yobx.typing.ConvertOptionsProtocol | None = None, filename: str | None = None, return_optimize_report: bool = False) ExportArtifact[source]#
Converts a scikit-learn estimator into ONNX. By default, the first dimension is considered as dynamic, the others are static.
- Parameters:
estimator – estimator
args –
dummy inputs; each element may be a numpy array, a
pandas.DataFrame, anonnx.ValueInfoProtothat explicitly describes the input tensor’s name, element type and shape, or a(name, dtype, shape)tuple. ADataFrameis expanded column-by-column: each column is registered as a separate 1-D ONNX graph input named after the column, and anUnsqueeze+Concatnode sequence assembles them back into a 2-D matrix(batch, n_cols)that is passed to the converter. When aValueInfoProtoor a(name, dtype, shape)tuple is provided no actual data is required, and thedynamic_shapesparameter is ignored for that input (the shape is taken directly from the descriptor). The(name, dtype, shape)tuple format uses a plain string for the name, a numpy dtype (or scalar-type class such asnp.float32) for the element type, and a sequence of ints and/or strings for the shape (strings denote symbolic / dynamic dimensions). Example:to_onnx(estimator, (('x', np.float32, ('N', 4)),))
dynamic_shapes – dynamic shapes, if not specified, the first dimension is dynamic, the others are static
target_opset – opset to use; either an integer for the default domain (
""), or a dictionary mapping domain names to opset versions, e.g.{"": 20, "ai.onnx.ml": 5}. When"ai.onnx.ml"is set to5the converter emits the unifiedTreeEnsembleoperator introduced in that opset instead of the older per-task operators. If it includes{'com.microsoft': 1}, the converted model may include optimized kernels specific to onnxruntime.verbose – verbosity
builder_cls – by default the graph builder is a
yobx.xbuilder.GraphBuilderbut any builder can be used as long it implements the apis Shape and type tracking and Building a graph from scratchextra_converters – optional mapping from estimator type to converter function; entries here take priority over the built-in converters and allow converting custom estimators that are not natively supported
large_model – if True the returned
ExportArtifacthas itscontainerattribute set to anExtendedModelContainer, which lets the user decide later whether weights should be embedded in the model or saved as external dataexternal_threshold – if
large_modelis True, every tensor whose element count exceeds this threshold is stored as external datafunction_options – when a
FunctionOptionsis provided every non-container estimator is exported as a separate ONNX local function.PipelineandColumnTransformerare treated as orchestrators — their individual steps/sub-transformers are each wrapped as a function instead of the container itself. Function names for each step are always derived from the estimator’s class name; thenamefield of the providedFunctionOptionsis not used by this helper to customize function naming. PassNone(the default) to disable function wrapping and produce a flat graph. when large_model is Trueconvert_options – see
yobx.sklearn.ConvertOptionsfilename – if set, the exported ONNX model is saved to this path and the
ExportReportis written as a companion Excel file (same base name with.xlsxextension).return_optimize_report – if True, the returned
ExportArtifacthas itsreportattribute populated with per-pattern optimization statistics
- Returns:
ExportArtifactwrapping the exported ONNX proto together with anExportReport.
Note
scikit-learn==1.8 is more strict with computation types and the number of discrepancies is reduced. Switch to float32 in a matrix multiplication when the order of magnitude of the coefficient is quite large usually introduces discrepancies. That is often the case when a matrix is the inverse of another one. See Float32 vs Float64: precision loss with PLSRegression.
Example:
import numpy as np from sklearn.linear_model import LinearRegression from yobx.sklearn import to_onnx X = np.random.randn(10, 3).astype(np.float32) y = X @ np.array([1.0, 2.0, 3.0], dtype=np.float32) reg = LinearRegression().fit(X, y) artifact = to_onnx(reg, (X,)) # Access the raw proto: proto = artifact.proto # Save to disk: artifact.save("model.onnx")
- yobx.sklearn.wrap_skl2onnx_converter(skl2onnx_op_converter: Callable) Callable[source]#
Wrap a skl2onnx-style converter function so it can be used with
yobx.sklearn.to_onnx()via theextra_convertersparameter.Note
This module contains no skl2onnx imports. Only
onnxandnumpy(both coreyobxdependencies) are used inside the mock helper classes.
- class yobx.sklearn.ConvertOptions(decision_leaf: bool | Set[str | type | int | Callable] = False, decision_path: bool | Set[str | type | int | Callable] = False)[source]#
Tunes the way every piece of a model is exported.
Pass an instance of this class to
yobx.sklearn.to_onnx()via theconvert_optionskeyword argument to request extra outputs from tree and ensemble estimators.- Parameters:
decision_leaf – when
True, an extraint64output tensor is appended containing the zero-based leaf node index reached by each input sample. The shape is(N, 1)for single trees and(N, n_estimators)for ensembles. The option triggers for every estimator which implements decision_path method.decision_path – when
True, an extraobject(string) output tensor is appended containing the binary root-to-leaf path for each input sample. Each value is a byte-string whose i-th character is'1'if node i was visited and'0'otherwise. The shape is(N, 1)for single trees and(N, n_estimators)for ensembles. The option triggers for every estimator which implements decision_path method.
Class attributes
- OPTIONS
-
Sorted list of all recognised option names. Currently
["decision_leaf", "decision_path"].
Example:
import numpy as np from sklearn.tree import DecisionTreeClassifier from yobx.sklearn import ConvertOptions, to_onnx X = np.random.randn(20, 4).astype(np.float32) y = (X[:, 0] > 0).astype(int) clf = DecisionTreeClassifier(max_depth=3).fit(X, y) # Export with both extra outputs enabled opts = ConvertOptions(decision_leaf=True, decision_path=True) artifact = to_onnx(clf, (X,), convert_options=opts) # The model now produces four outputs: # label (int64), probabilities (float32), # decision_path (object/string), decision_leaf (int64)
See also
Exporting sklearn tree models with convert options — worked examples for single trees and ensemble models.
- OPTIONS = {'decision_leaf': <function ConvertOptions.<lambda>>, 'decision_path': <function ConvertOptions.<lambda>>}#
- has(option_name: str, piece: BaseEstimator, name: str | None = None) bool[source]#
Return
Trueif option option_name is active for estimator piece.- Parameters:
option_name – name of the option to query. Must be one of the strings listed in
OPTIONS. AnAssertionErroris raised when an unknown name is passed.piece – the fitted
BaseEstimatorfor which the option is being queried.name – optional pipeline step name for the estimator. When the option attribute is a
set, string elements in the set are compared against this name to enable an option for a specific named step inside aPipeline. If name isNonethe string elements are ignored. Non-string elements (types, integer object ids, callable predicates) are always checked regardless of name.
- Returns:
Truewhen the option is enabled globally (the attribute value isTrue),Falsewhen it is disabled (Falseor any falsy value).- Raises:
AssertionError – if option_name is not a member of
OPTIONS.
- class yobx.sklearn.NoKnownOutputMixin[source]#
Mixin for custom sklearn estimators that produce a variable or non-standard number of ONNX outputs.
By default the ONNX converter infrastructure infers the expected output names from the estimator type (classifier, regressor, transformer, …) and from
get_feature_names_out(). For estimators whose outputs cannot be determined by those heuristics — for example a transformer that returns multiple named columns — this mixin instructsget_output_names()to returnNoneso that the converter is given full control over how many outputs it registers.Usage#
Inherit from both
BaseEstimator(or any sklearn base class) andNoKnownOutputMixinwhen writing a custom converter that needs to emit an arbitrary set of ONNX outputs:from sklearn.base import BaseEstimator, TransformerMixin from yobx.sklearn import NoKnownOutputMixin class MyMultiOutputTransformer(BaseEstimator, TransformerMixin, NoKnownOutputMixin): def fit(self, X=None, y=None): self.input_dtypes_ = {"a": np.dtype("float32"), "b": np.dtype("float32")} return self def transform(self, df): return df[["a", "b"]].assign(total=df["a"] + df["b"]) def get_feature_names_out(self, input_features=None): return ["a", "b", "total"]
The paired
extra_convertersentry is then free to callg.make_output(...)for each output without the framework complaining about a mismatched output count.