Sklearn Converter#

yobx.sklearn.to_onnx() converts a fitted scikit-learn estimator into an onnx.ModelProto. The conversion is powered by yobx.xbuilder.GraphBuilder and follows a registry-based design: each estimator class maps to a dedicated converter function that emits the required ONNX nodes.

High-level workflow#

fitted estimator
      │
      ▼
  to_onnx()          ← builds GraphBuilder, looks up converter
      │
      ▼
converter function   ← adds ONNX nodes via GraphBuilder.op.*
      │
      ▼
  GraphBuilder.to_onnx()   ← validates and returns ModelProto
  1. to_onnx accepts the fitted estimator, representative dummy inputs (used to infer dtype and shape), and optional input_names / dynamic_shapes.

  2. It calls register_sklearn_converters (idempotent) to populate the global registry on first use.

  3. It constructs a GraphBuilder and declares one graph input per dummy array via make_tensor_input.

  4. It looks up the converter for type(estimator) and calls it.

  5. Each graph output is declared with make_tensor_output.

  6. GraphBuilder.to_onnx finalises and returns the model.

<<<

import numpy as np
from sklearn.preprocessing import StandardScaler
from yobx.sklearn import to_onnx
from yobx.helpers.onnx_helper import pretty_onnx

rng = np.random.default_rng(0)
X = rng.standard_normal((10, 4)).astype(np.float32)

scaler = StandardScaler().fit(X)
model = to_onnx(scaler, (X,))
print(pretty_onnx(model))

>>>

    opset: domain='' version=21
    opset: domain='ai.onnx.ml' version=5
    input: name='X' type=dtype('float32') shape=['batch', 4]
    init: name='init1_s4_' type=float32 shape=(4,) -- array([-0.448,  0.052, -0.093,  0.247], dtype=float32)-- Opset.make_node.0
    init: name='init1_s4_2' type=float32 shape=(4,) -- array([0.774, 0.641, 0.825, 0.728], dtype=float32)-- Opset.make_node.0
    Sub(X, init1_s4_) -> _onx_sub_X
      Div(_onx_sub_X, init1_s4_2) -> x
    output: name='x' type='NOTENSOR' shape=None

Converter registry#

The registry is a plain module-level dictionary SKLEARN_CONVERTERS: Dict[type, Callable] defined in yobx.sklearn.register.

Registering a converter#

Use the register_sklearn_converter decorator. Pass a single class or a tuple of classes as the first argument:

from yobx.sklearn.register import register_sklearn_converter
from yobx.typing import GraphBuilderExtendedProtocol
from yobx.xbuilder import GraphBuilder

@register_sklearn_converter(MyEstimator)
def convert_my_estimator(
    g: GraphBuilderExtendedProtocol,
    sts: dict,
    outputs: list[str],
    estimator: MyEstimator,
    X: str,
    name: str = "my_estimator",
) -> str:
    ...

The decorator raises TypeError if a converter is already registered for the same class, preventing accidental double-registration.

Looking up a converter#

get_sklearn_converter takes a class and returns the registered callable, raising ValueError if none is found.

Converter function signature#

Every converter follows the same contract:

(g, sts, outputs, estimator, *input_names, name) output_name(s)

Parameter

Description

g

GraphBuilder — call g.op.<OpType>(…) to emit ONNX nodes.

sts

unused

outputs

List[str] of pre-allocated output tensor names that the converter must write to.

estimator

The fitted scikit-learn object.

*inputs

One positional str argument per graph input (the tensor name in the graph).

name

String prefix used when generating unique node names via g.op.

The function must return the output tensor name (str) for single-output estimators, or a tuple of names for multi-output ones (e.g. classifiers that produce both a label and probabilities).

Output naming#

get_output_names determines the list of output tensor names for an estimator:

  • Transformers that expose get_feature_names_out() use those names (collapsed to a common prefix via longest_prefix when more than one output is expected).

  • Classifiers default to ["label", "probabilities"].

  • Regressors default to ["predictions"].

  • Everything else defaults to ["Y"].

Adding a new converter#

To support a new scikit-learn estimator:

  1. Create a new file (e.g. yobx/sklearn/ensemble/random_forest.py).

  2. Implement a converter function following the signature described above.

  3. Decorate it with @register_sklearn_converter(MyEstimator).

  4. Add an import in the matching register() function so the converter is loaded when register_sklearn_converters is called.

# yobx/sklearn/ensemble/random_forest.py
from sklearn.ensemble import RandomForestClassifier
from ...typing import GraphBuilderExtendedProtocol
from ..register import register_sklearn_converter


@register_sklearn_converter(RandomForestClassifier)
def convert_random_forest_classifier(
    g: GraphBuilderExtendedProtocol,
    sts: dict,
    outputs: list[str],
    estimator: RandomForestClassifier,
    X: str,
    name: str = "random_forest",
):
    # ... emit ONNX nodes via g.op.*
    ...

Converting Options#

The user may need extra outputs coming from the model. This is driven by class yobx.sklearn.ConvertOptions which exposes all the possible ways to change the default behaviour if the converters. See Exporting sklearn tree models with convert options for a worked example.

yobx VS skl2onnx#

Both libraries convert scikit-learn models to ONNX in a similar way. But yobx implements a unified way to convert models across different packages and is more scalable than sklearn-onnx.

  • yobx can then easily be extended to libraries such as sktorch since both exporters can use the same GraphBuilder.

  • It enables onnxruntime optimizations whenever possible.

  • It allows the user to export estimator in a pipeline as local functions (see Exporting sklearn estimators as ONNX local functions).

  • By using protocol, the default GraphBuilder can be replaced by other GraphBuilder based on other packages such as onnxscript or spox.