.. _l-design-expected-api:

============
Expected API
============

:func:`yobx.sklearn.to_onnx` accepts a ``builder_cls`` parameter that
defaults to :class:`yobx.xbuilder.GraphBuilder`.  Any object can be
substituted as long as it exposes the two-part API described on this page.

The API is split into two groups that mirror the cross-references used in
the source code:

* **Construction API** (:ref:`builder-api-make`) — methods to declare inputs,
  outputs, initializers, and nodes, and to export the finished graph.
* **Shape / type API** (:ref:`builder-api`) — methods to attach and query
  shape and type metadata on intermediate tensors.

An alternative bridge implementation,
:class:`OnnxScriptGraphBuilder <yobx.builder.onnxscript.OnnxScriptGraphBuilder>`,
shows how the same API can be satisfied on top of ``onnxscript``'s IR.

When any ``ONNXSTOP*`` variable triggers an exception, the resulting
**stack trace points to the exact line of converter code** that first
assigned a type or shape to that result.

**Why using strings to refer to intermediate results?**

A user usually only sees the final model and can only investigate an
issue based on the names he reads.
Keeping **explicit, stable names** for intermediate results in converters
code helps to track the code where this name appears.
Keeping that in mind, a protocol for a value seems unnecessary.
The creation of the final name should not be delayed.
That makes it easier to investigate issues such as exposes in
:ref:`l-design-sklearn-debug-env-vars`.

Construction API
================

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Method / attribute
     - Description
   * - ``__init__(target_opset or existing ModelProto or FunctionProto)``
     - Constructor.  *target_opset* is either an ``int`` (main domain) or
       a ``Dict[str, int]`` mapping domain names to versions.
   * - ``make_tensor_input(name, elem_type, shape, device=-1)``
     - Declare a graph input tensor.  *elem_type* is an
       ``onnx.TensorProto.*`` integer constant.  *shape* is a tuple whose
       elements are integers (static) or strings (symbolic/dynamic).
       *device* is ``-1`` for CPU.
   * - ``make_tensor_output(name, indexed=False, allow_untyped_output=True)``
     - Declare a graph output.  When ``indexed=False`` the name is used
       verbatim; set ``True`` when the output name is generated by the
       builder (e.g. ``"output_0"``, ``"output_1"`` …).
   * - ``make_initializer(name, value, source="")``
     - Add a constant tensor to the graph.  *value* can be a
       :class:`numpy.ndarray`, a scalar, or an :class:`onnx.TensorProto`.
       Returns the name that was assigned (may differ from *name* if the
       builder deduplicates identical constants).
   * - ``make_node(op_type, inputs, num_outputs, *, domain="", name="", **attrs)``
     - Low-level node creation.  Returns a sequence of output tensor name(s).
   * - ``op.<OpType>(*inputs, **attrs)``
     - Convenience short-hand: ``g.op.Relu("X")`` is equivalent to
       ``g.make_node("Relu", ["X"], 1)``.  Inline numpy arrays are
       automatically promoted to initializers.
   * - ``to_onnx(...)``
     - Finalise and return an :class:`onnx.ModelProto`.

Minimal example
---------------

The snippet below builds the same ``Sub`` / ``Div`` graph emitted by the
``StandardScaler`` converter, using the default
:class:`GraphBuilder <yobx.xbuilder.GraphBuilder>`:

.. runpython::
    :showcode:

    import numpy as np
    import onnx
    from yobx.xbuilder import GraphBuilder, OptimizationOptions
    from yobx.helpers.onnx_helper import pretty_onnx

    TFLOAT = onnx.TensorProto.FLOAT

    opts = OptimizationOptions(constant_folding=False)
    g = GraphBuilder(20, ir_version=10, optimization_options=opts)
    g.make_tensor_input("X", TFLOAT, ("batch", 4))

    mean = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32)
    scale = np.array([0.5, 1.0, 2.0, 4.0], dtype=np.float32)
    mean_name = g.make_initializer("mean", mean)
    scale_name = g.make_initializer("scale", scale)

    centered = g.op.Sub("X", mean_name)
    g.set_type(centered, TFLOAT)
    g.set_shape(centered, ("batch", 4))

    result = g.op.Div(centered, scale_name)
    g.set_type(result, TFLOAT)
    g.set_shape(result, ("batch", 4))

    g.make_tensor_output(result, indexed=False, allow_untyped_output=True)
    model = g.to_onnx()
    print(pretty_onnx(model))

Opset API
=========

Converters frequently need to know which opset versions are active so they
can choose the right operator variant or register an additional domain
(e.g. ``"ai.onnx.ml"`` for scikit-learn models).

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Method / attribute
     - Description
   * - ``main_opset``
     - Read-only property.  Returns the opset version for the main ONNX
       domain (``""``).  Equivalent to ``g.opsets[""]``.
   * - ``has_opset(domain)``
     - Returns the opset version (an ``int``) for *domain*, or ``0`` if the
       domain is not registered.  Because ``0`` is falsy and any valid version
       is truthy, the return value can be used directly in a boolean context:
       ``if g.has_opset("ai.onnx.ml"): ...``.
   * - ``get_opset(domain, exc=True)``
     - Returns the opset version for *domain*.  When ``exc=True`` (default)
       an ``AssertionError`` is raised if the domain is not registered;
       set ``exc=False`` to get ``0`` instead.
   * - ``set_opset(domain, version=1)``
     - Registers *domain* with the given *version*.  If the domain is already
       registered with the same version the call is a no-op; a version
       mismatch raises an ``AssertionError``.
   * - ``add_domain(domain, version=1)``
     - Deprecated alias for ``set_opset``.

A converter that targets the main ONNX domain only needs to read
``g.main_opset``.  A converter that also emits nodes from a secondary domain
(e.g. ``"ai.onnx.ml"``) should first call ``g.set_opset(domain, version)``
to ensure the domain is recorded in the exported model, then query its
version with ``g.get_opset(domain)``.

.. code-block:: python

    from yobx.typing import GraphBuilderExtendedProtocol
    from yobx.xbuilder import GraphBuilder

    def convert_my_estimator(g: GraphBuilderExtendedProtocol, sts, outputs, estimator, X):
        # Read the main opset to pick the right operator variant.
        opset = g.main_opset

        # Register and query the ai.onnx.ml domain when needed.
        g.set_opset("ai.onnx.ml", 3)
        ml_opset = g.get_opset("ai.onnx.ml")

        # Check whether an optional domain is already registered.
        if g.has_opset("com.microsoft"):
            result = g.op.MicrosoftOp(X)
        elif opset >= 20:
            result = g.op.SomeNewOp(X)
        else:
            result = g.op.SomeLegacyOp(X)
        ...
        return result

Shape and type API
==================

Converters are expected to propagate shape and type information after each
node so that downstream converters (e.g. pipeline steps) can query them
without re-running inference. The model may be different given that information.
Below, the required methods are where ``g`` defines the ``GraphBuilder``
implemented the expected API.

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Method
     - Description
   * - ``g.set_type(name, itype)``
     - Register the element type (an ``onnx.TensorProto.*`` integer) for
       tensor ``name``.
   * - ``g.get_type(name)``
     - Return the previously registered element type.
   * - ``g.has_type(name)``
     - Return ``True`` if the element type is known.
   * - ``g.set_shape(name, shape)``
     - Register the shape for tensor ``name``.  Dimensions may be integers
       (static) or strings (symbolic).
   * - ``g.get_shape(name)``
     - Return the shape as a tuple of integers or strings.
   * - ``g.has_shape(name)``
     - Return ``True`` if the shape is known.
   * - ``g.set_device(name, device)``
     - Register the device for tensor ``name`` (``-1`` = CPU).
   * - ``g.get_device(name)``
     - Return the device.
   * - ``g.has_device(name)``
     - Return ``True`` if a device is registered for the tensor.

In addition, it is usually useful to implement the following methods.

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - ``g.unique_name(prefix)``
     - Return a name that starts with *prefix* and is not yet in use
       anywhere in the graph.
   * - ``g.set_type_shape_unary_op(name, input_name, itype: int = None)``
     - Defines shape, type, device for `name` equal the one defined for `input_name`, `itype` can be used the change the type

The current does not include common operations on shapes, ``+, -, //, *, %, min, max``
or even their simplification. This is usually needed to optimize models
but not mandatory to write the model itself. This is left to the builder.
Every converter must usually known the type, the device, the rank
and sometimes if a dimension is static or dynamic.

Shape and Type representation
-----------------------------

This API follows ONNX standard.

* A name is a string: it is a unique identifier.
* A type is an integer: see `supported types <https://onnx.ai/onnx/intro/concepts.html#supported-types>`_.
* A shape is a tuple, empty or filled with integers (static dimension) or strings (dynamic dimension).

Additionally:

* A device is an integer, -1 for CPU, a value >= 0 for a CUDA device.
* A rank is an integer and equal to ``len(shape)``.

Propagating shape and type in a converter
-----------------------------------------

The canonical pattern at the end of every converter is:

.. code-block:: python

    result = g.op.Relu(X, name=name)
    g.set_type_shape_unary_op(result, X)
    return result

The helper :meth:`set_type_shape_unary_op
<yobx.xbuilder.GraphBuilder.set_type_shape_unary_op>` combines the three
``set_*`` calls into one call.

Convert Options
===============

:class:`~yobx.typing.ConvertOptionsProtocol` is a lightweight protocol that
lets callers **opt-in to extra outputs** on a per-estimator basis without
changing the core converter signatures.

Protocol contract
-----------------

Any object that implements the single method below satisfies the protocol and
can be passed to :func:`~yobx.sklearn.to_onnx` as the ``convert_options``
argument:

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Method
     - Description
   * - ``has(option_name: str, piece: object) -> bool``
     - Return ``True`` when the option identified by *option_name* should be
       activated for the estimator *piece*.  The second argument is the fitted
       scikit-learn estimator currently being converted, which lets callers
       enable an option only for a specific step inside a
       :class:`~sklearn.pipeline.Pipeline`.

Inside a converter, the options object is accessible via the graph builder's
``convert_options`` property (``g.convert_options``):

.. code-block:: python

    # Inside a converter function:
    if g.convert_options.has("decision_path", estimator):
        # emit the extra decision-path output
        ...

Built-in options: ``ConvertOptions``
-------------------------------------

:class:`~yobx.sklearn.ConvertOptions` is the default implementation shipped
with the package.  It currently exposes two boolean flags:

.. list-table::
   :header-rows: 1
   :widths: 25 20 55

   * - Option name
     - Type
     - Description
   * - ``decision_path``
     - ``bool``
     - When ``True``, an extra output tensor is appended for each tree/ensemble
       estimator.  For a single :class:`~sklearn.tree.DecisionTreeClassifier`
       or :class:`~sklearn.tree.DecisionTreeRegressor` the shape is
       ``(N, 1)``; for ensemble models
       (:class:`~sklearn.ensemble.RandomForestClassifier`,
       :class:`~sklearn.ensemble.RandomForestRegressor`,
       :class:`~sklearn.ensemble.ExtraTreesClassifier`,
       :class:`~sklearn.ensemble.ExtraTreesRegressor`) the shape is
       ``(N, n_estimators)``.  Each value is a binary string encoding the
       root-to-leaf path through the tree.
   * - ``decision_leaf``
     - ``bool``
     - When ``True``, an extra output tensor (``int64``) is appended containing
       the zero-based leaf node index reached by each sample.  Shapes follow
       the same convention as ``decision_path``.

Passing options to ``to_onnx``
--------------------------------

.. code-block:: python

    import numpy as np
    from sklearn.tree import DecisionTreeClassifier
    from yobx.sklearn import to_onnx, ConvertOptions

    X = np.random.default_rng(0).standard_normal((20, 4)).astype(np.float32)
    y = (X[:, 0] > 0).astype(int)
    clf = DecisionTreeClassifier(max_depth=3).fit(X, y)

    opts = ConvertOptions(decision_path=True)
    model_onnx = to_onnx(clf, (X,), convert_options=opts)
    # The exported model now has three outputs:
    #   output_0 – label (int64, shape [N])
    #   output_1 – probabilities (float32, shape [N, 2])
    #   output_2 – decision path (object/string, shape [N, 1])

Implementing a custom protocol
--------------------------------

You can supply any object with a ``has`` method.  The simplest way is to
subclass :class:`~yobx.typing.DefaultConvertOptions` and override ``has``:

.. code-block:: python

    from yobx.typing import DefaultConvertOptions

    class MyOptions(DefaultConvertOptions):
        def has(self, option_name: str, piece: object) -> bool:
            # Only enable decision_leaf for RandomForestClassifier:
            from sklearn.ensemble import RandomForestClassifier
            if option_name == "decision_leaf":
                return isinstance(piece, RandomForestClassifier)
            return False

Alternatively, any object whose class implements the single-method
:class:`~yobx.typing.ConvertOptionsProtocol` is accepted directly.

See :ref:`l-plot-sklearn-convert-options` for a full runnable example
of ``decision_path`` and ``decision_leaf`` on single trees and ensembles.

Alternative implementations
===========================

Any class that satisfies the two-part API above can be passed as
``builder_cls``.  The package ships with:

* :class:`GraphBuilder <yobx.xbuilder.GraphBuilder>` — the default; builds
  graphs using onnx protobuf objects with built-in optimization passes.
* :class:`OnnxScriptGraphBuilder
  <yobx.builder.onnxscript.OnnxScriptGraphBuilder>` — a bridge that
  satisfies the same API while using the ``onnxscript`` IR internally.
  Useful when the rest of the pipeline already works with onnxscript.

.. runpython::
    :showcode:

    import numpy as np
    import onnx
    from sklearn.preprocessing import StandardScaler
    from yobx.sklearn import to_onnx
    from yobx.builder.onnxscript import OnnxScriptGraphBuilder
    from yobx.helpers.onnx_helper import pretty_onnx

    rng = np.random.default_rng(0)
    X = rng.standard_normal((10, 4)).astype(np.float32)

    scaler = StandardScaler().fit(X)
    model = to_onnx(scaler, (X,), builder_cls=OnnxScriptGraphBuilder)
    print(pretty_onnx(model))

.. seealso::

    :ref:`l-design-sklearn-converter` — overview of the built-in converters.

    :ref:`l-design-sklearn-custom-converter` — how to write and register
    a custom converter.

    :ref:`l-design-graph-builder` — the full :class:`GraphBuilder
    <yobx.xbuilder.GraphBuilder>` reference, including optimization
    passes and dynamic shapes.