Numpy-Tracing and FunctionTransformer#

yobx can export a FunctionTransformer to ONNX by tracing its func attribute: the function is re-executed with lightweight proxy objects instead of real numpy arrays. Every numpy operation performed on those proxies is recorded as an ONNX node, so the resulting ONNX graph exactly mirrors the Python code — without any manual operator mapping.

Overview#

The mechanism consists of two layers:

  1. :class:`~yobx.xtracing.NumpyArray` — a proxy that wraps an ONNX tensor name and an object following the GraphBuilderExtendedProtocol. It overloads all Python arithmetic operators and registers itself as an implementation for both the __array_ufunc__ and __array_function__ numpy protocols. Whenever numpy (or user code) calls an operation on a NumpyArray, the proxy emits the equivalent ONNX node into the graph and returns a new NumpyArray wrapping the result tensor name.

  2. :func:`~yobx.xtracing.trace_numpy_function` — the converter-API function. It receives an object following the GraphBuilderExtendedProtocol, the desired output names, the callable to trace, and the names of the input tensors already registered in that graph. It wraps those tensors as NumpyArray proxies, calls the function, and collects the resulting output tensors.

The high-level helper trace_numpy_to_onnx() creates a standalone onnx.ModelProto by building a fresh graph, registering sample-array-derived inputs, and delegating to trace_numpy_function().

Converter API signature#

trace_numpy_function() follows the same convention as every other converter in this package:

def trace_numpy_function(
    g: GraphBuilderExtendedProtocol,
    sts: Dict,
    outputs: List[str],
    func: Callable,
    inputs: List[str],
    name: str = "trace",
    kw_args: Optional[Dict[str, Any]] = None,
) -> str: ...

Parameter

Description

g

GraphBuilderExtendedProtocol — call g.op.<Op>(…) to emit ONNX nodes.

sts

Dict of metadata (empty {} in most call sites).

outputs

Pre-allocated output tensor names the tracer must write to.

func

Python callable that uses numpy operations.

inputs

Names of tensors already registered in g.

name

Node-name prefix.

kw_args

Optional keyword arguments forwarded to func.

FunctionTransformer converter#

The built-in converter for FunctionTransformer lives in yobx.sklearn.preprocessing.function_transformer. It delegates directly to trace_numpy_function(), so all numpy ops land in the surrounding graph with no sub-model inlining:

<<<

from sklearn.preprocessing import FunctionTransformer
import numpy as np
from yobx.helpers.onnx_helper import pretty_onnx
from yobx.sklearn import to_onnx


def my_func(X):
    return np.log1p(np.abs(X))


rng = np.random.default_rng(0)
X = rng.standard_normal((10, 4)).astype(np.float32)

transformer = FunctionTransformer(func=my_func).fit(X)
onx = to_onnx(transformer, (X,))
print(pretty_onnx(onx))

>>>

    opset: domain='' version=21
    opset: domain='ai.onnx.ml' version=5
    input: name='X' type=dtype('float32') shape=['batch', 4]
    init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
    Abs(X) -> _onx_abs_X
      Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
        Log(_onx_add_abs_X) -> Y
    output: name='Y' type='NOTENSOR' shape=None

When func=None (the identity transformer) a single Identity node is emitted instead of tracing.

Supported numpy operations#

The NumpyArray proxy also supports all Python arithmetic and comparison operators (+, -, *, /, //, **, %, @, ==, !=, <, <=, >, >=, unary -).

The full list of supported ufuncs and array functions is generated below directly from the live dispatch tables in yobx.xtracing.numpy_array.

<<<

import numpy as np
from yobx.xtracing.numpy_array import _UFUNC_TO_ONNX, _HANDLED_FUNCTIONS

rows_ufunc = []
for k in sorted(_UFUNC_TO_ONNX.keys(), key=lambda x: x.__name__):
    v = _UFUNC_TO_ONNX[k]
    onnx_op = v[0] if isinstance(v, tuple) else v
    rows_ufunc.append(f"* ``np.{k.__name__}`` → ``{onnx_op}``")

rows_func = []
for k in sorted(_HANDLED_FUNCTIONS.keys(), key=lambda x: x.__name__):
    rows_func.append(f"* ``np.{k.__name__}``")

print("**Ufuncs** (via ``__array_ufunc__``)\n")
print("\n".join(rows_ufunc))
print()
print("**Array functions** (via ``__array_function__``)\n")
print("\n".join(rows_func))

>>>

Ufuncs (via __array_ufunc__)

  • np.absoluteAbs

  • np.addAdd

  • np.arccosAcos

  • np.arcsinAsin

  • np.arctanAtan

  • np.bitwise_andAnd

  • np.bitwise_orOr

  • np.bitwise_xorXor

  • np.ceilCeil

  • np.cosCos

  • np.coshCosh

  • np.divideDiv

  • np.equalEqual

  • np.expExp

  • np.expm1expm1

  • np.floorFloor

  • np.floor_dividefloor_divide

  • np.fmodMod

  • np.greaterGreater

  • np.greater_equalGreaterOrEqual

  • np.invertNot

  • np.isnanIsNaN

  • np.lessLess

  • np.less_equalLessOrEqual

  • np.logLog

  • np.log1plog1p

  • np.logical_andAnd

  • np.logical_notNot

  • np.logical_orOr

  • np.logical_xorXor

  • np.matmulMatMul

  • np.maximummaximum

  • np.minimumminimum

  • np.multiplyMul

  • np.negativeNeg

  • np.not_equalnot_equal

  • np.powerPow

  • np.reciprocalReciprocal

  • np.remainderMod

  • np.signSign

  • np.sinSin

  • np.sinhSinh

  • np.sqrtSqrt

  • np.subtractSub

  • np.tanTan

  • np.tanhTanh

Array functions (via __array_function__)

  • np.absolute

  • np.amax

  • np.amin

  • np.clip

  • np.concatenate

  • np.dot

  • np.exp

  • np.expand_dims

  • np.expm1

  • np.log

  • np.log1p

  • np.matmul

  • np.max

  • np.mean

  • np.min

  • np.prod

  • np.reshape

  • np.sqrt

  • np.squeeze

  • np.stack

  • np.sum

  • np.transpose

  • np.where

Standalone usage#

You can also use the tracing machinery outside of scikit-learn pipelines via trace_numpy_to_onnx():

<<<

import numpy as np
from yobx.xtracing import trace_numpy_to_onnx
from yobx.helpers.onnx_helper import pretty_onnx


def my_func(X):
    return np.sqrt(np.abs(X) + np.float32(1))


X_sample = np.zeros((4, 3), dtype=np.float32)
onx = trace_numpy_to_onnx(my_func, X_sample)
print(pretty_onnx(onx))

>>>

    opset: domain='' version=21
    opset: domain='ai.onnx.ml' version=1
    input: name='X' type=dtype('float32') shape=['batch', 3]
    init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
    Abs(X) -> _onx_abs_X
      Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
        Sqrt(_onx_add_abs_X) -> output_0
    output: name='output_0' type='NOTENSOR' shape=None

Or embedded into a larger graph using trace_numpy_function() directly:

<<<

import numpy as np
from onnx import TensorProto
from yobx.xbuilder import GraphBuilder
from yobx.xtracing import trace_numpy_function
from yobx.helpers.onnx_helper import pretty_onnx

g = GraphBuilder({"": 21, "ai.onnx.ml": 1})
g.make_tensor_input("X", TensorProto.FLOAT, ("batch", 3))


def my_func(X):
    return np.sqrt(np.abs(X) + np.float32(1))


trace_numpy_function(g, {}, ["output_0"], my_func, ["X"])
g.make_tensor_output("output_0", indexed=False, allow_untyped_output=True)
onx = g.to_onnx()
print(pretty_onnx(onx))

>>>

    opset: domain='' version=21
    opset: domain='ai.onnx.ml' version=1
    input: name='X' type=dtype('float32') shape=['batch', 3]
    init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
    Abs(X) -> _onx_abs_X
      Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
        Sqrt(_onx_add_abs_X) -> output_0
    output: name='output_0' type='NOTENSOR' shape=None

See also

Custom Converter — how to write and register a custom converter for any scikit-learn estimator.