Numpy-Tracing and FunctionTransformer#
yobx can export a FunctionTransformer
to ONNX by tracing its func attribute: the function is re-executed with
lightweight proxy objects instead of real numpy arrays. Every numpy operation
performed on those proxies is recorded as an ONNX node, so the resulting ONNX
graph exactly mirrors the Python code — without any manual operator mapping.
Overview#
The mechanism consists of two layers:
:class:`~yobx.xtracing.NumpyArray` — a proxy that wraps an ONNX tensor name and an object following the
GraphBuilderExtendedProtocol. It overloads all Python arithmetic operators and registers itself as an implementation for both the__array_ufunc__and__array_function__numpy protocols. Whenever numpy (or user code) calls an operation on aNumpyArray, the proxy emits the equivalent ONNX node into the graph and returns a newNumpyArraywrapping the result tensor name.:func:`~yobx.xtracing.trace_numpy_function` — the converter-API function. It receives an object following the
GraphBuilderExtendedProtocol, the desired output names, the callable to trace, and the names of the input tensors already registered in that graph. It wraps those tensors asNumpyArrayproxies, calls the function, and collects the resulting output tensors.
The high-level helper trace_numpy_to_onnx() creates a
standalone onnx.ModelProto by building a fresh graph, registering
sample-array-derived inputs, and delegating to
trace_numpy_function().
Converter API signature#
trace_numpy_function() follows the same convention as
every other converter in this package:
def trace_numpy_function(
g: GraphBuilderExtendedProtocol,
sts: Dict,
outputs: List[str],
func: Callable,
inputs: List[str],
name: str = "trace",
kw_args: Optional[Dict[str, Any]] = None,
) -> str: ...
Parameter |
Description |
|---|---|
|
|
|
|
|
Pre-allocated output tensor names the tracer must write to. |
|
Python callable that uses numpy operations. |
|
Names of tensors already registered in g. |
|
Node-name prefix. |
|
Optional keyword arguments forwarded to func. |
FunctionTransformer converter#
The built-in converter for
FunctionTransformer lives in
yobx.sklearn.preprocessing.function_transformer. It delegates
directly to trace_numpy_function(), so all numpy ops
land in the surrounding graph with no sub-model inlining:
<<<
from sklearn.preprocessing import FunctionTransformer
import numpy as np
from yobx.helpers.onnx_helper import pretty_onnx
from yobx.sklearn import to_onnx
def my_func(X):
return np.log1p(np.abs(X))
rng = np.random.default_rng(0)
X = rng.standard_normal((10, 4)).astype(np.float32)
transformer = FunctionTransformer(func=my_func).fit(X)
onx = to_onnx(transformer, (X,))
print(pretty_onnx(onx))
>>>
opset: domain='' version=21
opset: domain='ai.onnx.ml' version=5
input: name='X' type=dtype('float32') shape=['batch', 4]
init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
Abs(X) -> _onx_abs_X
Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
Log(_onx_add_abs_X) -> Y
output: name='Y' type='NOTENSOR' shape=None
When func=None (the identity transformer) a single Identity node is
emitted instead of tracing.
Supported numpy operations#
The NumpyArray proxy also supports all Python
arithmetic and comparison operators (+, -, *, /, //,
**, %, @, ==, !=, <, <=, >, >=,
unary -).
The full list of supported ufuncs and array functions is generated below
directly from the live dispatch tables in
yobx.xtracing.numpy_array.
<<<
import numpy as np
from yobx.xtracing.numpy_array import _UFUNC_TO_ONNX, _HANDLED_FUNCTIONS
rows_ufunc = []
for k in sorted(_UFUNC_TO_ONNX.keys(), key=lambda x: x.__name__):
v = _UFUNC_TO_ONNX[k]
onnx_op = v[0] if isinstance(v, tuple) else v
rows_ufunc.append(f"* ``np.{k.__name__}`` → ``{onnx_op}``")
rows_func = []
for k in sorted(_HANDLED_FUNCTIONS.keys(), key=lambda x: x.__name__):
rows_func.append(f"* ``np.{k.__name__}``")
print("**Ufuncs** (via ``__array_ufunc__``)\n")
print("\n".join(rows_ufunc))
print()
print("**Array functions** (via ``__array_function__``)\n")
print("\n".join(rows_func))
>>>
Ufuncs (via __array_ufunc__)
np.absolute→Absnp.add→Addnp.arccos→Acosnp.arcsin→Asinnp.arctan→Atannp.bitwise_and→Andnp.bitwise_or→Ornp.bitwise_xor→Xornp.ceil→Ceilnp.cos→Cosnp.cosh→Coshnp.divide→Divnp.equal→Equalnp.exp→Expnp.expm1→expm1np.floor→Floornp.floor_divide→floor_dividenp.fmod→Modnp.greater→Greaternp.greater_equal→GreaterOrEqualnp.invert→Notnp.isnan→IsNaNnp.less→Lessnp.less_equal→LessOrEqualnp.log→Lognp.log1p→log1pnp.logical_and→Andnp.logical_not→Notnp.logical_or→Ornp.logical_xor→Xornp.matmul→MatMulnp.maximum→maximumnp.minimum→minimumnp.multiply→Mulnp.negative→Negnp.not_equal→not_equalnp.power→Pownp.reciprocal→Reciprocalnp.remainder→Modnp.sign→Signnp.sin→Sinnp.sinh→Sinhnp.sqrt→Sqrtnp.subtract→Subnp.tan→Tannp.tanh→Tanh
Array functions (via __array_function__)
np.absolutenp.amaxnp.aminnp.clipnp.concatenatenp.dotnp.expnp.expand_dimsnp.expm1np.lognp.log1pnp.matmulnp.maxnp.meannp.minnp.prodnp.reshapenp.sqrtnp.squeezenp.stacknp.sumnp.transposenp.where
Standalone usage#
You can also use the tracing machinery outside of scikit-learn pipelines via
trace_numpy_to_onnx():
<<<
import numpy as np
from yobx.xtracing import trace_numpy_to_onnx
from yobx.helpers.onnx_helper import pretty_onnx
def my_func(X):
return np.sqrt(np.abs(X) + np.float32(1))
X_sample = np.zeros((4, 3), dtype=np.float32)
onx = trace_numpy_to_onnx(my_func, X_sample)
print(pretty_onnx(onx))
>>>
opset: domain='' version=21
opset: domain='ai.onnx.ml' version=1
input: name='X' type=dtype('float32') shape=['batch', 3]
init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
Abs(X) -> _onx_abs_X
Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
Sqrt(_onx_add_abs_X) -> output_0
output: name='output_0' type='NOTENSOR' shape=None
Or embedded into a larger graph using
trace_numpy_function() directly:
<<<
import numpy as np
from onnx import TensorProto
from yobx.xbuilder import GraphBuilder
from yobx.xtracing import trace_numpy_function
from yobx.helpers.onnx_helper import pretty_onnx
g = GraphBuilder({"": 21, "ai.onnx.ml": 1})
g.make_tensor_input("X", TensorProto.FLOAT, ("batch", 3))
def my_func(X):
return np.sqrt(np.abs(X) + np.float32(1))
trace_numpy_function(g, {}, ["output_0"], my_func, ["X"])
g.make_tensor_output("output_0", indexed=False, allow_untyped_output=True)
onx = g.to_onnx()
print(pretty_onnx(onx))
>>>
opset: domain='' version=21
opset: domain='ai.onnx.ml' version=1
input: name='X' type=dtype('float32') shape=['batch', 3]
init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
Abs(X) -> _onx_abs_X
Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
Sqrt(_onx_add_abs_X) -> output_0
output: name='output_0' type='NOTENSOR' shape=None
See also
Custom Converter — how to write and register a custom converter for any scikit-learn estimator.