yobx.sklearn.to_onnx#
- yobx.sklearn.to_onnx(estimator: ~sklearn.base.BaseEstimator, args: ~typing.Tuple[~typing.Any], input_names: ~typing.Sequence[str] | None = None, dynamic_shapes: ~typing.Tuple[~typing.Dict[int, str]] | None = None, target_opset: int | ~typing.Dict[str, int] = 21, verbose: int = 0, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, extra_converters: ~typing.Dict[type, ~typing.Callable] | None = None, large_model: bool = False, external_threshold: int = 1024, function_options: ~yobx.xbuilder.function_options.FunctionOptions | None = None, convert_options: ~yobx.typing.ConvertOptionsProtocol | None = None) ExportArtifact[source]#
Converts a scikit-learn estimator into ONNX. By default, the first dimension is considered as dynamic, the others are static.
- Parameters:
estimator – estimator
args – dummy inputs; each element may be a numpy array or an
onnx.ValueInfoProtothat explicitly describes the input tensor’s name, element type and shape. When aValueInfoProtois provided no actual data is required, and thedynamic_shapesparameter is ignored for that input (the shape is taken directly from the proto).dynamic_shapes – dynamic shapes, if not specified, the first dimension is dynamic, the others are static
target_opset – opset to use; either an integer for the default domain (
""), or a dictionary mapping domain names to opset versions, e.g.{"": 20, "ai.onnx.ml": 5}. When"ai.onnx.ml"is set to5the converter emits the unifiedTreeEnsembleoperator introduced in that opset instead of the older per-task operators. If it includes{'com.microsoft': 1}, the converted model may include optimized kernels specific to onnxruntime.verbose – verbosity
builder_cls – by default the graph builder is a
yobx.xbuilder.GraphBuilderbut any builder can be used as long it implements the apis Shape and type tracking and Building a graph from scratchextra_converters – optional mapping from estimator type to converter function; entries here take priority over the built-in converters and allow converting custom estimators that are not natively supported
large_model – if True the returned
ExportArtifacthas itscontainerattribute set to anExtendedModelContainer, which lets the user decide later whether weights should be embedded in the model or saved as external dataexternal_threshold – if
large_modelis True, every tensor whose element count exceeds this threshold is stored as external datafunction_options – when a
FunctionOptionsis provided every non-container estimator is exported as a separate ONNX local function.PipelineandColumnTransformerare treated as orchestrators — their individual steps/sub-transformers are each wrapped as a function instead of the container itself. Function names for each step are always derived from the estimator’s class name; thenamefield of the providedFunctionOptionsis not used by this helper to customize function naming. PassNone(the default) to disable function wrapping and produce a flat graph. when large_model is Trueconvert_options – see
yobx.sklearn.ConvertOptions
- Returns:
ExportArtifactwrapping the exported ONNX proto together with anExportReport.
Note
scikit-learn==1.8 is more strict with computation types and the number of discrepancies is reduced. Switch to float32 in a matrix multiplication when the order of magnitude of the coefficient is quite large usually introduces discrepancies. That is often the case when a matrix is the inverse of another one. See Float32 vs Float64: precision loss with PLSRegression.
Example:
import numpy as np from sklearn.linear_model import LinearRegression from yobx.sklearn import to_onnx X = np.random.randn(10, 3).astype(np.float32) y = X @ np.array([1.0, 2.0, 3.0], dtype=np.float32) reg = LinearRegression().fit(X, y) artifact = to_onnx(reg, (X,)) # Access the raw proto: proto = artifact.proto # Save to disk: artifact.save("model.onnx")