Sklearn Converter#
yobx.sklearn.to_onnx() converts a fitted scikit-learn
estimator into an onnx.ModelProto. The conversion is
powered by yobx.xbuilder.GraphBuilder and follows a
registry-based design: each estimator class maps to a dedicated
converter function that emits the required ONNX nodes.
High-level workflow#
fitted estimator
│
▼
to_onnx() ← builds GraphBuilder, looks up converter
│
▼
converter function ← adds ONNX nodes via GraphBuilder.op.*
│
▼
GraphBuilder.to_onnx() ← validates and returns ModelProto
to_onnxaccepts the fitted estimator, representative dummy inputs (used to infer dtype and shape), and optionalinput_names/dynamic_shapes.It calls
register_sklearn_converters(idempotent) to populate the global registry on first use.It constructs a
GraphBuilderand declares one graph input per dummy array viamake_tensor_input.It looks up the converter for
type(estimator)and calls it.Each graph output is declared with
make_tensor_output.GraphBuilder.to_onnxfinalises and returns the model.
<<<
import numpy as np
from sklearn.preprocessing import StandardScaler
from yobx.sklearn import to_onnx
from yobx.helpers.onnx_helper import pretty_onnx
rng = np.random.default_rng(0)
X = rng.standard_normal((10, 4)).astype(np.float32)
scaler = StandardScaler().fit(X)
model = to_onnx(scaler, (X,))
print(pretty_onnx(model))
>>>
opset: domain='' version=21
opset: domain='ai.onnx.ml' version=5
input: name='X' type=dtype('float32') shape=['batch', 4]
init: name='init1_s4_' type=float32 shape=(4,) -- array([-0.448, 0.052, -0.093, 0.247], dtype=float32)-- Opset.make_node.0
init: name='init1_s4_2' type=float32 shape=(4,) -- array([0.774, 0.641, 0.825, 0.728], dtype=float32)-- Opset.make_node.0
Sub(X, init1_s4_) -> _onx_sub_X
Div(_onx_sub_X, init1_s4_2) -> x
output: name='x' type='NOTENSOR' shape=None
Converter registry#
The registry is a plain module-level dictionary
SKLEARN_CONVERTERS: Dict[type, Callable] defined in
yobx.sklearn.register.
Registering a converter#
Use the register_sklearn_converter decorator.
Pass a single class or a tuple of classes as the first argument:
from yobx.sklearn.register import register_sklearn_converter
from yobx.typing import GraphBuilderExtendedProtocol
from yobx.xbuilder import GraphBuilder
@register_sklearn_converter(MyEstimator)
def convert_my_estimator(
g: GraphBuilderExtendedProtocol,
sts: dict,
outputs: list[str],
estimator: MyEstimator,
X: str,
name: str = "my_estimator",
) -> str:
...
The decorator raises TypeError if a converter is already
registered for the same class, preventing accidental double-registration.
Looking up a converter#
get_sklearn_converter
takes a class and returns the registered callable, raising
ValueError if none is found.
Converter function signature#
Every converter follows the same contract:
(g, sts, outputs, estimator, *input_names, name) → output_name(s)
Parameter |
Description |
|---|---|
|
|
|
unused |
|
|
|
The fitted scikit-learn object. |
|
One positional |
|
String prefix used when generating unique node names
via |
The function must return the output tensor name (str) for
single-output estimators, or a tuple of names for multi-output ones
(e.g. classifiers that produce both a label and probabilities).
Output naming#
get_output_names
determines the list of output tensor names for an estimator:
Transformers that expose
get_feature_names_out()use those names (collapsed to a common prefix vialongest_prefixwhen more than one output is expected).Classifiers default to
["label", "probabilities"].Regressors default to
["predictions"].Everything else defaults to
["Y"].
Adding a new converter#
To support a new scikit-learn estimator:
Create a new file (e.g.
yobx/sklearn/ensemble/random_forest.py).Implement a converter function following the signature described above.
Decorate it with
@register_sklearn_converter(MyEstimator).Add an import in the matching
register()function so the converter is loaded whenregister_sklearn_convertersis called.
# yobx/sklearn/ensemble/random_forest.py
from sklearn.ensemble import RandomForestClassifier
from ...typing import GraphBuilderExtendedProtocol
from ..register import register_sklearn_converter
@register_sklearn_converter(RandomForestClassifier)
def convert_random_forest_classifier(
g: GraphBuilderExtendedProtocol,
sts: dict,
outputs: list[str],
estimator: RandomForestClassifier,
X: str,
name: str = "random_forest",
):
# ... emit ONNX nodes via g.op.*
...
Converting Options#
The user may need extra outputs coming from the model.
This is driven by class yobx.sklearn.ConvertOptions
which exposes all the possible ways to change the default behaviour
if the converters.
See Exporting sklearn tree models with convert options for a worked example.
yobx VS skl2onnx#
Both libraries convert scikit-learn models to ONNX in a similar way. But yobx implements a unified way to convert models across different packages and is more scalable than sklearn-onnx.
yobx can then easily be extended to libraries such as sktorch since both exporters can use the same GraphBuilder.
It enables onnxruntime optimizations whenever possible.
It allows the user to export estimator in a pipeline as local functions (see Exporting sklearn estimators as ONNX local functions).
By using protocol, the default GraphBuilder can be replaced by other GraphBuilder based on other packages such as onnxscript or spox.