.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples_sklearn/plot_sklearn_function_options.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_sklearn_plot_sklearn_function_options.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_sklearn_plot_sklearn_function_options.py:


.. _l-plot-sklearn-function-options:

Exporting sklearn estimators as ONNX local functions
=====================================================

By default :func:`yobx.sklearn.to_onnx` produces a **flat** ONNX graph
where every operator from every estimator is inlined directly in the
``graph`` proto.  This is fine for most use cases, but sometimes you want
to keep the high-level structure visible in the model — for example to make
the graph easier to inspect, to share weights between identical sub-models,
or to target a runtime that supports ONNX local functions natively.

The ``function_options`` argument of :func:`yobx.sklearn.to_onnx` lets you
wrap each estimator's conversion as a separate **ONNX local function** inside
the model proto.  Pass a :class:`~yobx.xbuilder.FunctionOptions` instance to
enable the feature:

* Every *leaf* estimator becomes an ONNX ``FunctionProto`` whose name is the
  estimator's Python class name and whose domain is the one you specify.
* :class:`~sklearn.pipeline.Pipeline` and
  :class:`~sklearn.compose.ColumnTransformer` are treated as **orchestrators**:
  the container itself is *not* turned into a function; instead each of its
  steps / sub-transformers is wrapped individually.
* The main graph only contains function-call nodes and the orchestration logic
  (e.g. ``Concat`` for ``ColumnTransformer``).

Passing ``function_options=False`` (the default) reverts to the flat graph.

.. GENERATED FROM PYTHON SOURCE LINES 30-42

.. code-block:: Python


    import numpy as np
    import onnxruntime
    from sklearn.compose import ColumnTransformer
    from sklearn.linear_model import LogisticRegression
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import MinMaxScaler, StandardScaler

    from yobx.doc import plot_dot
    from yobx.sklearn import to_onnx
    from yobx.xbuilder import FunctionOptions


.. GENERATED FROM PYTHON SOURCE LINES 43-51

1. Build and fit the models
----------------------------

We will demonstrate three scenarios:

* a **standalone** estimator (``StandardScaler``),
* a **Pipeline** with two steps,
* a **ColumnTransformer** with two sub-transformers.

.. GENERATED FROM PYTHON SOURCE LINES 51-79

.. code-block:: Python


    rng = np.random.default_rng(0)
    X = rng.standard_normal((100, 4)).astype(np.float32)
    y = (X[:, 0] + X[:, 1] > 0).astype(int)

    scaler = StandardScaler().fit(X)

    pipe = Pipeline([("scaler", StandardScaler()), ("clf", LogisticRegression(max_iter=200))]).fit(
        X, y
    )

    ct = ColumnTransformer([("std", StandardScaler(), [0, 1]), ("mms", MinMaxScaler(), [2, 3])]).fit(
        X
    )

    pipe_ct = Pipeline(
        [
            (
                "ct",
                ColumnTransformer(
                    [("std", StandardScaler(), [0, 1]), ("mms", MinMaxScaler(), [2, 3])]
                ),
            ),
            ("clf", LogisticRegression(max_iter=200)),
        ]
    ).fit(X, y)


.. GENERATED FROM PYTHON SOURCE LINES 80-91

2. Create FunctionOptions
--------------------------

:class:`~yobx.xbuilder.FunctionOptions` controls how functions are created:

* ``name`` — a placeholder name that is required by the class but overridden
  per estimator (each function gets the estimator's class name).
* ``domain`` — the ONNX domain under which all local functions are registered.
* ``move_initializer_to_constant`` — when ``True`` every weight tensor is
  embedded inside the function body as a ``Constant`` node instead of being
  threaded through as an extra input (recommended for portability).

.. GENERATED FROM PYTHON SOURCE LINES 91-96

.. code-block:: Python


    fopts = FunctionOptions(
        name="sklearn_op", domain="myapp", move_initializer_to_constant=True, export_as_function=True
    )


.. GENERATED FROM PYTHON SOURCE LINES 97-103

3. Standalone estimator as a local function
--------------------------------------------

The converted model contains a single ``FunctionProto`` called
``StandardScaler`` in domain ``myapp``.  The main graph has only one node —
a call to that function — instead of the usual ``Sub``/``Div`` operators.

.. GENERATED FROM PYTHON SOURCE LINES 103-119

.. code-block:: Python


    onx_scaler = to_onnx(scaler, (X[:1],), function_options=fopts)

    print("=== Standalone StandardScaler ===")
    print(f"Local functions : {[(f.name, f.domain) for f in onx_scaler.functions]}")
    print(f"Main graph nodes: {[(n.op_type, n.domain) for n in onx_scaler.graph.node]}")

    # Verify numerical correctness
    sess = onnxruntime.InferenceSession(
        onx_scaler.SerializeToString(), providers=["CPUExecutionProvider"]
    )
    result = sess.run(None, {"X": X})[0]
    expected = scaler.transform(X).astype(np.float32)
    assert np.allclose(expected, result, atol=1e-5), "Standalone scaler mismatch!"
    print("Numerical output matches sklearn ✓")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    === Standalone StandardScaler ===
    Local functions : [('StandardScaler', 'myapp')]
    Main graph nodes: [('StandardScaler', 'myapp')]
    Numerical output matches sklearn ✓


.. GENERATED FROM PYTHON SOURCE LINES 120-125

4. Pipeline: each step becomes a separate function
---------------------------------------------------

The ``Pipeline`` container itself is **not** wrapped; each step gets its own
``FunctionProto``.  The main graph chains two function-call nodes.

.. GENERATED FROM PYTHON SOURCE LINES 125-146

.. code-block:: Python


    onx_pipe = to_onnx(pipe, (X[:1],), function_options=fopts)

    print("\n=== Pipeline ===")
    print(f"Local functions : {[f.name for f in onx_pipe.functions]}")
    main_ops = [n.op_type for n in onx_pipe.graph.node]
    print(f"Main graph nodes: {main_ops}")
    assert "Sub" not in main_ops, "Raw scaler ops should not be in the main graph"
    assert "Gemm" not in main_ops, "Raw LR ops should not be in the main graph"

    sess_pipe = onnxruntime.InferenceSession(
        onx_pipe.SerializeToString(), providers=["CPUExecutionProvider"]
    )
    X_test = rng.standard_normal((20, 4)).astype(np.float32)
    label_onnx, proba_onnx = sess_pipe.run(None, {"X": X_test})
    assert np.array_equal(pipe.predict(X_test), label_onnx), "Label mismatch!"
    assert np.allclose(
        pipe.predict_proba(X_test).astype(np.float32), proba_onnx, atol=1e-5
    ), "Proba mismatch!"
    print("Pipeline labels and probabilities match sklearn ✓")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    === Pipeline ===
    Local functions : ['StandardScaler', 'LogisticRegression']
    Main graph nodes: ['StandardScaler', 'LogisticRegression']
    Pipeline labels and probabilities match sklearn ✓


.. GENERATED FROM PYTHON SOURCE LINES 147-152

5. ColumnTransformer: each sub-transformer becomes a function
-------------------------------------------------------------

The orchestration logic (``Gather`` + ``Concat``) stays in the main graph;
only the two leaf transformers become functions.

.. GENERATED FROM PYTHON SOURCE LINES 152-171

.. code-block:: Python


    onx_ct = to_onnx(ct, (X[:1],), function_options=fopts)

    print("\n=== ColumnTransformer ===")
    print(f"Local functions : {[f.name for f in onx_ct.functions]}")
    ct_ops = [n.op_type for n in onx_ct.graph.node]
    print(f"Main graph nodes: {ct_ops}")
    assert "Concat" in ct_ops, "Concat must remain in main graph for CT orchestration"
    assert "Sub" not in ct_ops, "Raw scaler ops should not be in the main graph"

    X_ct_test = rng.standard_normal((15, 4)).astype(np.float32)
    sess_ct = onnxruntime.InferenceSession(
        onx_ct.SerializeToString(), providers=["CPUExecutionProvider"]
    )
    result_ct = sess_ct.run(None, {"X": X_ct_test})[0]
    expected_ct = ct.transform(X_ct_test).astype(np.float32)
    assert np.allclose(expected_ct, result_ct, atol=1e-5), "CT output mismatch!"
    print("ColumnTransformer output matches sklearn ✓")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    === ColumnTransformer ===
    Local functions : ['StandardScaler', 'MinMaxScaler']
    Main graph nodes: ['Gather', 'StandardScaler', 'Gather', 'MinMaxScaler', 'Concat']
    ColumnTransformer output matches sklearn ✓


.. GENERATED FROM PYTHON SOURCE LINES 172-177

6. Pipeline and ColumnTransformer
---------------------------------

The flat graph (default) inlines all operators.  The function graph keeps
the structure clean in the main graph proto.

.. GENERATED FROM PYTHON SOURCE LINES 177-193

.. code-block:: Python


    onx_pipe_ct = to_onnx(pipe_ct, (X[:1],), function_options=fopts)
    print("\n=== Pipeline and ColumnTransformer ===")
    print(f"Local functions : {[f.name for f in onx_pipe_ct.functions]}")
    ct_ops = [n.op_type for n in onx_pipe_ct.graph.node]
    print(f"Main graph nodes: {ct_ops}")

    X_ct_test = rng.standard_normal((15, 4)).astype(np.float32)
    sess_ct = onnxruntime.InferenceSession(
        onx_pipe_ct.SerializeToString(), providers=["CPUExecutionProvider"]
    )
    result_ct = sess_ct.run(None, {"X": X_ct_test})[1]
    expected_ct = pipe_ct.predict_proba(X_ct_test).astype(np.float32)
    assert np.allclose(expected_ct, result_ct, atol=1e-5), "Pipeline+CT output mismatch!"
    print("Pipeline+ColumnTransformer output matches sklearn ✓")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    === Pipeline and ColumnTransformer ===
    Local functions : ['StandardScaler', 'MinMaxScaler', 'LogisticRegression']
    Main graph nodes: ['Gather', 'StandardScaler', 'Gather', 'MinMaxScaler', 'Concat', 'LogisticRegression']
    Pipeline+ColumnTransformer output matches sklearn ✓


.. GENERATED FROM PYTHON SOURCE LINES 194-198

7. Visualize the function graph
--------------------------------

The main graph of the pipeline model shows two function-call nodes.

.. GENERATED FROM PYTHON SOURCE LINES 198-200

.. code-block:: Python


    plot_dot(onx_pipe_ct)


.. image-sg:: /auto_examples_sklearn/images/sphx_glr_plot_sklearn_function_options_001.png
   :alt: plot sklearn function options
   :srcset: /auto_examples_sklearn/images/sphx_glr_plot_sklearn_function_options_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.395 seconds)


.. _sphx_glr_download_auto_examples_sklearn_plot_sklearn_function_options.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_sklearn_function_options.ipynb <plot_sklearn_function_options.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_sklearn_function_options.py <plot_sklearn_function_options.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_sklearn_function_options.zip <plot_sklearn_function_options.zip>`


.. include:: plot_sklearn_function_options.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_