.. _l-design-sklearn-converter: ================= Sklearn Converter ================= :func:`yobx.sklearn.to_onnx` converts a fitted :epkg:`scikit-learn` estimator into an :class:`onnx.ModelProto`. The conversion is powered by :class:`yobx.xbuilder.GraphBuilder` and follows a **registry-based** design: each estimator class maps to a dedicated converter function that emits the required ONNX nodes. High-level workflow =================== .. code-block:: text fitted estimator │ ▼ to_onnx() ← builds GraphBuilder, looks up converter │ ▼ converter function ← adds ONNX nodes via GraphBuilder.op.* │ ▼ GraphBuilder.to_onnx() ← validates and returns ModelProto 1. :func:`to_onnx ` accepts the fitted estimator, representative dummy inputs (used to infer dtype and shape), and optional ``input_names`` / ``dynamic_shapes``. 2. It calls :func:`register_sklearn_converters ` (idempotent) to populate the global registry on first use. 3. It constructs a :class:`GraphBuilder ` and declares one graph input per dummy array via :meth:`make_tensor_input `. 4. It looks up the converter for ``type(estimator)`` and calls it. 5. Each graph output is declared with :meth:`make_tensor_output `. 6. :meth:`GraphBuilder.to_onnx ` finalises and returns the model. .. runpython:: :showcode: import numpy as np from sklearn.preprocessing import StandardScaler from yobx.sklearn import to_onnx from yobx.helpers.onnx_helper import pretty_onnx rng = np.random.default_rng(0) X = rng.standard_normal((10, 4)).astype(np.float32) scaler = StandardScaler().fit(X) model = to_onnx(scaler, (X,)) print(pretty_onnx(model)) Converter registry ================== The registry is a plain module-level dictionary ``SKLEARN_CONVERTERS: Dict[type, Callable]`` defined in :mod:`yobx.sklearn.register`. Registering a converter ----------------------- Use the :func:`register_sklearn_converter ` decorator. Pass a single class or a tuple of classes as the first argument: .. code-block:: python from yobx.sklearn.register import register_sklearn_converter from yobx.typing import GraphBuilderExtendedProtocol from yobx.xbuilder import GraphBuilder @register_sklearn_converter(MyEstimator) def convert_my_estimator( g: GraphBuilderExtendedProtocol, sts: dict, outputs: list[str], estimator: MyEstimator, X: str, name: str = "my_estimator", ) -> str: ... The decorator raises :class:`TypeError` if a converter is already registered for the same class, preventing accidental double-registration. Looking up a converter ---------------------- :func:`get_sklearn_converter ` takes a class and returns the registered callable, raising :class:`ValueError` if none is found. Converter function signature ============================ Every converter follows the same contract: ``(g, sts, outputs, estimator, *input_names, name) → output_name(s)`` ============= ===================================================== Parameter Description ============= ===================================================== ``g`` :class:`GraphBuilder ` — call ``g.op.(…)`` to emit ONNX nodes. ``sts`` unused ``outputs`` ``List[str]`` of pre-allocated output tensor names that the converter **must** write to. ``estimator`` The fitted :epkg:`scikit-learn` object. ``*inputs`` One positional ``str`` argument per graph input (the tensor name in the graph). ``name`` String prefix used when generating unique node names via ``g.op``. ============= ===================================================== The function must return the output tensor name (``str``) for single-output estimators, or a tuple of names for multi-output ones (e.g. classifiers that produce both a label and probabilities). Output naming ============= :func:`get_output_names ` determines the list of output tensor names for an estimator: * **Transformers** that expose ``get_feature_names_out()`` use those names (collapsed to a common prefix via :func:`longest_prefix ` when more than one output is expected). * **Classifiers** default to ``["label", "probabilities"]``. * **Regressors** default to ``["predictions"]``. * Everything else defaults to ``["Y"]``. Adding a new converter ====================== To support a new :epkg:`scikit-learn` estimator: 1. Create a new file (e.g. ``yobx/sklearn/ensemble/random_forest.py``). 2. Implement a converter function following the signature described above. 3. Decorate it with ``@register_sklearn_converter(MyEstimator)``. 4. Add an import in the matching ``register()`` function so the converter is loaded when :func:`register_sklearn_converters ` is called. .. code-block:: python # yobx/sklearn/ensemble/random_forest.py from sklearn.ensemble import RandomForestClassifier from ...typing import GraphBuilderExtendedProtocol from ..register import register_sklearn_converter @register_sklearn_converter(RandomForestClassifier) def convert_random_forest_classifier( g: GraphBuilderExtendedProtocol, sts: dict, outputs: list[str], estimator: RandomForestClassifier, X: str, name: str = "random_forest", ): # ... emit ONNX nodes via g.op.* ... Converting Options ================== The user may need extra outputs coming from the model. This is driven by class :class:`yobx.sklearn.ConvertOptions` which exposes all the possible ways to change the default behaviour if the converters. See :ref:`l-plot-sklearn-convert-options` for a worked example. yobx VS skl2onnx ================ Both libraries convert :epkg:`scikit-learn` models to ONNX in a similar way. But `yobx` implements a unified way to convert models across different packages and is more scalable than `sklearn-onnx`. * `yobx` can then easily be extended to libraries such as :epkg:`sktorch` since both exporters can use the same `GraphBuilder`. * It enables :epkg:`onnxruntime` optimizations whenever possible. * It allows the user to export estimator in a pipeline as local functions (see :ref:`l-plot-sklearn-function-options`). * By using protocol, the default GraphBuilder can be replaced by other GraphBuilder based on other packages such as :epkg:`onnxscript` or `spox`.