Numpy-Tracing and FunctionTransformer#

yobx can export a FunctionTransformer to ONNX by tracing its func attribute: the function is re-executed with lightweight proxy objects instead of real numpy arrays. Every numpy operation performed on those proxies is recorded as an ONNX node, so the resulting ONNX graph exactly mirrors the Python code — without any manual operator mapping.

Overview#

The mechanism consists of two layers:

NumpyArray — a proxy that wraps an ONNX tensor name and an object following the GraphBuilderExtendedProtocol. It overloads all Python arithmetic operators and registers itself as an implementation for both the __array_ufunc__ and __array_function__ numpy protocols. Whenever numpy (or user code) calls an operation on a NumpyArray, the proxy emits the equivalent ONNX node into the graph and returns a new NumpyArray wrapping the result tensor name.
:func:`~yobx.xtracing.trace_numpy_function` — the converter-API function. It receives an object following the GraphBuilderExtendedProtocol, the desired output names, the callable to trace, and the names of the input tensors already registered in that graph. It wraps those tensors as NumpyArray proxies, calls the function, and collects the resulting output tensors.

The high-level helper trace_numpy_to_onnx() creates a standalone onnx.ModelProto by building a fresh graph, registering sample-array-derived inputs, and delegating to trace_numpy_function().

Converter API signature#

trace_numpy_function() follows the same convention as every other converter in this package:

def trace_numpy_function(
    g: GraphBuilderExtendedProtocol,
    sts: Dict,
    outputs: Optional[List[str]],
    func: Callable,
    inputs: List[str],
    name: str = "trace",
    kw_args: Optional[Dict[str, Any]] = None,
) -> str: ...

Parameter	Description
`g`	`GraphBuilderExtendedProtocol` — call `g.op.<Op>(…)` to emit ONNX nodes.
`sts`	`Dict` of metadata (empty `{}` in most call sites).
`outputs`	Pre-allocated output tensor names the tracer must write to.
`func`	Python callable that uses numpy operations.
`inputs`	Names of tensors already registered in g.
`name`	Node-name prefix.
`kw_args`	Optional keyword arguments forwarded to func.

FunctionTransformer converter#

The built-in converter for FunctionTransformer lives in yobx.sklearn.preprocessing.function_transformer. It delegates directly to trace_numpy_function(), so all numpy ops land in the surrounding graph with no sub-model inlining:

<<<

from sklearn.preprocessing import FunctionTransformer
import numpy as np
from yobx.helpers.onnx_helper import pretty_onnx
from yobx.sklearn import to_onnx


def my_func(X):
    return np.log1p(np.abs(X))


rng = np.random.default_rng(0)
X = rng.standard_normal((10, 4)).astype(np.float32)

transformer = FunctionTransformer(func=my_func).fit(X)
onx = to_onnx(transformer, (X,))
print(pretty_onnx(onx))

>>>

    opset: domain='' version=21
    opset: domain='ai.onnx.ml' version=5
    input: name='X' type=dtype('float32') shape=['batch', 4]
    init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
    Abs(X) -> _onx_abs_X
      Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
        Log(_onx_add_abs_X) -> Y
    output: name='Y' type='NOTENSOR' shape=None

When func=None (the identity transformer) a single Identity node is emitted instead of tracing.

Supported numpy operations#

The NumpyArray proxy also supports all Python arithmetic and comparison operators (+, -, *, /, //, **, %, @, ==, !=, <, <=, >, >=, unary -).

The full list of supported ufuncs and array functions is generated below directly from the live dispatch tables in yobx.xtracing.numpy_array. Each numpy name links to the numpy reference documentation, and each ONNX op name links to the ONNX Operators specification.

Ufuncs (via __array_ufunc__)

<<<

import numpy as np
from yobx.xtracing.numpy_array import _UFUNC_TO_ONNX

_NP_BASE = "https://numpy.org/doc/stable/reference/generated/numpy.{}.html"
_ONNX_BASE = "https://onnx.ai/onnx/operators/onnx__{}.html"


# Descriptions for composite (sentinel-string) mappings
def _onnx_link(op):
    return f"`{op} <{_ONNX_BASE.format(op)}>`_"


_composite_desc = {
    "floor_divide": f"{_onnx_link('Floor')} ( {_onnx_link('Div')} (a, b) )",
    "not_equal": f"{_onnx_link('Not')} ( {_onnx_link('Equal')} (a, b) )",
    "maximum": f"`Max <{_ONNX_BASE.format('Max')}>`_ (a, b)",
    "minimum": f"`Min <{_ONNX_BASE.format('Min')}>`_ (a, b)",
    "log1p": f"{_onnx_link('Log')} ( {_onnx_link('Add')} (x, 1) )",
    "expm1": f"{_onnx_link('Sub')} ( {_onnx_link('Exp')} (x), 1 )",
}

print(".. list-table::")
print("   :header-rows: 1")
print("   :widths: 40 60")
print()
print("   * - NumPy ufunc")
print("     - ONNX op")
for k in sorted(_UFUNC_TO_ONNX.keys(), key=lambda x: x.__name__):
    v = _UFUNC_TO_ONNX[k]
    np_url = _NP_BASE.format(k.__name__)
    np_cell = f"`np.{k.__name__} <{np_url}>`_"
    if isinstance(v, tuple):
        onnx_op = v[0]
        onnx_cell = f"`{onnx_op} <{_ONNX_BASE.format(onnx_op)}>`_"
    else:
        onnx_cell = _composite_desc.get(v, f"*(see source: {v})*")
    print(f"   * - {np_cell}")
    print(f"     - {onnx_cell}")
print()

>>>

NumPy ufunc	ONNX op
np.absolute	Abs
np.add	Add
np.arccos	Acos
np.arcsin	Asin
np.arctan	Atan
np.bitwise_and	And
np.bitwise_or	Or
np.bitwise_xor	Xor
np.ceil	Ceil
np.cos	Cos
np.cosh	Cosh
np.divide	Div
np.equal	Equal
np.exp	Exp
np.expm1	Sub ( Exp (x), 1 )
np.floor	Floor
np.floor_divide	Floor ( Div (a, b) )
np.fmod	Mod
np.greater	Greater
np.greater_equal	GreaterOrEqual
np.invert	Not
np.isnan	IsNaN
np.less	Less
np.less_equal	LessOrEqual
np.log	Log
np.log1p	Log ( Add (x, 1) )
np.logical_and	And
np.logical_not	Not
np.logical_or	Or
np.logical_xor	Xor
np.matmul	MatMul
np.maximum	Max (a, b)
np.minimum	Min (a, b)
np.multiply	Mul
np.negative	Neg
np.not_equal	Not ( Equal (a, b) )
np.power	Pow
np.reciprocal	Reciprocal
np.remainder	Mod
np.sign	Sign
np.sin	Sin
np.sinh	Sinh
np.sqrt	Sqrt
np.subtract	Sub
np.tan	Tan
np.tanh	Tanh

Array functions (via __array_function__)

<<<

import re
import numpy as np
from yobx.xtracing.numpy_array import _HANDLED_FUNCTIONS

_NP_BASE = "https://numpy.org/doc/stable/reference/generated/numpy.{}.html"
_ONNX_BASE = "https://onnx.ai/onnx/operators/onnx__{}.html"


def _onnx_cell_from_doc(fn):
    """Extract ONNX ops from the function docstring (``ONNX: Op1, Op2``)."""
    doc = fn.__doc__ or ""
    m = re.search(r"ONNX:\s*([\w,\s]+)", doc)
    if not m:
        return "*(see source)*"
    ops = [op.strip() for op in m.group(1).split(",") if op.strip()]
    links = [f"`{op} <{_ONNX_BASE.format(op)}>`_" for op in ops]
    return ", ".join(links)


print(".. list-table::")
print("   :header-rows: 1")
print("   :widths: 40 60")
print()
print("   * - NumPy function")
print("     - ONNX op")
for k in sorted(_HANDLED_FUNCTIONS.keys(), key=lambda x: x.__name__):
    np_url = _NP_BASE.format(k.__name__)
    np_cell = f"`np.{k.__name__} <{np_url}>`_"
    onnx = _onnx_cell_from_doc(_HANDLED_FUNCTIONS[k])
    print(f"   * - {np_cell}")
    print(f"     - {onnx}")
print()

>>>

NumPy function	ONNX op
np.absolute	Abs
np.amax	ReduceMax
np.amin	ReduceMin
np.clip	Clip
np.concatenate	Concat
np.dot	MatMul
np.exp	Exp
np.expand_dims	Unsqueeze
np.expm1	Exp, Sub
np.log	Log
np.log1p	Add, Log
np.matmul	MatMul
np.max	ReduceMax
np.mean	ReduceMean
np.min	ReduceMin
np.prod	ReduceProd
np.reshape	Reshape
np.sqrt	Sqrt
np.squeeze	Squeeze
np.stack	Unsqueeze, Concat
np.sum	ReduceSum
np.transpose	Transpose
np.where	Where

Standalone usage#

You can also use the tracing machinery outside of scikit-learn pipelines via trace_numpy_to_onnx():

<<<

import numpy as np
from yobx.sql import trace_numpy_to_onnx
from yobx.helpers.onnx_helper import pretty_onnx


def my_func(X):
    return np.sqrt(np.abs(X) + np.float32(1))


X_sample = np.zeros((4, 3), dtype=np.float32)
onx = trace_numpy_to_onnx(my_func, X_sample)
print(pretty_onnx(onx))

>>>

    opset: domain='' version=21
    opset: domain='ai.onnx.ml' version=1
    input: name='X' type=dtype('float32') shape=['batch', 3]
    init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
    Abs(X) -> _onx_abs_X
      Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
        Sqrt(_onx_add_abs_X) -> _onx_sqrt_add_abs_X
    output: name='_onx_sqrt_add_abs_X' type='NOTENSOR' shape=None

Or embedded into a larger graph using trace_numpy_function() directly:

<<<

import numpy as np
from onnx import TensorProto
from yobx.xbuilder import GraphBuilder
from yobx.xtracing import trace_numpy_function
from yobx.helpers.onnx_helper import pretty_onnx

g = GraphBuilder({"": 21, "ai.onnx.ml": 1})
g.make_tensor_input("X", TensorProto.FLOAT, ("batch", 3))


def my_func(X):
    return np.sqrt(np.abs(X) + np.float32(1))


trace_numpy_function(g, {}, ["output_0"], my_func, ["X"])
g.make_tensor_output("output_0", indexed=False, allow_untyped_output=True)
onx = g.to_onnx()
print(pretty_onnx(onx))

>>>

    opset: domain='' version=21
    opset: domain='ai.onnx.ml' version=1
    input: name='X' type=dtype('float32') shape=['batch', 3]
    init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
    Abs(X) -> _onx_abs_X
      Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
        Sqrt(_onx_add_abs_X) -> output_0
    output: name='output_0' type='NOTENSOR' shape=None