yobx.xtracing#

Lightweight mechanism for tracing numpy and DataFrame functions and exporting them to ONNX. See Numpy-Tracing and FunctionTransformer for a full walkthrough.

NumpyArray#

class yobx.xtracing.NumpyArray(name: str, graph_builder, dtype=None, shape=None)[source]#

Proxy for an ONNX tensor that traces numpy operations as ONNX graph nodes.

Instances are produced by trace_numpy_to_onnx() or directly by the FunctionTransformer converter when the input array is replaced by a symbolic placeholder. Every arithmetic operation, ufunc call, or reduction performed on a NumpyArray is recorded as an ONNX node in the underlying GraphBuilder.

The class follows the Python Array API standard and the numpy __array_ufunc__ / __array_function__ dispatch protocols so that plain numpy code can be traced without modification.

Parameters:
  • name – ONNX tensor name (a string handle in the graph).

  • graph_builder – the GraphBuilder that owns the graph being built.

  • dtype – optional numpy dtype for the tensor; used when creating scalar constants from Python literals.

  • shape – optional tensor shape.

property T: NumpyArray#

Transpose (reverses all axes).

astype(dtype) NumpyArray[source]#

Cast to dtype.

clip(a_min=None, a_max=None) NumpyArray[source]#

Clip values to [a_min, a_max].

property dtype: dtype | None#

Numpy dtype if known.

expand_dims(axis) NumpyArray[source]#

Add a size-1 dimension (numpy expand_dims).

flatten() NumpyArray[source]#

Flatten to a 1-D tensor.

max(axis=None, keepdims: bool = False) NumpyArray[source]#

Maximum along axis.

mean(axis=None, keepdims: bool = False) NumpyArray[source]#

Mean of elements along axis.

min(axis=None, keepdims: bool = False) NumpyArray[source]#

Minimum along axis.

property name: str#

ONNX tensor name.

prod(axis=None, keepdims: bool = False) NumpyArray[source]#

Product of elements along axis.

reshape(*shape) NumpyArray[source]#

Reshape the tensor.

property shape#

Tensor shape if known.

squeeze(axis=None) NumpyArray[source]#

Remove size-1 dimensions.

sum(axis=None, keepdims: bool = False) NumpyArray[source]#

Sum elements along axis.

transpose(*axes) NumpyArray[source]#

Transpose with optional axes permutation.

trace_numpy_function#

yobx.xtracing.trace_numpy_function(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str] | None, func: Callable, inputs: List[str], name: str = 'trace', kw_args: Dict[str, Any] | None = None) str | Tuple[str, ...][source]#

Trace a numpy function by wrapping named tensors in g as NumpyArray proxies, then recording all numpy operations as ONNX nodes in g.

This function follows the same API convention as other converters in this package: it takes an existing GraphBuilder, the scikit-learn shape/type dictionary sts, the desired output tensor names, and then the callable and input tensor names.

Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes defined by scikit-learn (may be empty; forwarded to g.set_type_shape_unary_op when non-empty)

  • outputs – desired output tensor names (one per output returned by func)

  • func – the numpy function to trace; must accept len(inputs) positional arguments and return a NumpyArray or a tuple/list of NumpyArray objects

  • inputs – existing input tensor names already present in g

  • name – node name prefix used when emitting Identity rename nodes

  • kw_args – optional keyword arguments forwarded to func

Returns:

the first output tensor name (outputs[0])

Example:

<<<

from yobx.helpers.onnx_helper import pretty_onnx
from yobx.xbuilder import GraphBuilder
from yobx.xtracing import trace_numpy_function
import numpy as np
from onnx import TensorProto

g = GraphBuilder({"": 21, "ai.onnx.ml": 1})
g.make_tensor_input("X", TensorProto.FLOAT, ("batch", 3))


def my_func(X):
    return np.sqrt(np.abs(X) + np.float32(1))


trace_numpy_function(g, {}, ["output_0"], my_func, ["X"])
g.make_tensor_output("output_0", indexed=False, allow_untyped_output=True)
art = g.to_onnx()
print(pretty_onnx(art))

>>>

    opset: domain='' version=21
    opset: domain='ai.onnx.ml' version=1
    input: name='X' type=dtype('float32') shape=['batch', 3]
    init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small
    Abs(X) -> _onx_abs_X
      Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X
        Sqrt(_onx_add_abs_X) -> output_0
    output: name='output_0' type='NOTENSOR' shape=None

trace_dataframe#

yobx.xtracing.trace_dataframe(func: Callable, input_dtypes: Dict[str, dtype | type | str] | List[Dict[str, dtype | type | str]]) ParsedQuery | List[ParsedQuery][source]#

Trace func and return the equivalent ParsedQuery.

Constructs one or more TracedDataFrame objects whose columns correspond to input_dtypes, calls func with them, and converts the resulting frame’s recorded operations into a ParsedQuery.

Parameters:
  • func – a callable that accepts one or more TracedDataFrame objects and returns a TracedDataFrame or a tuple/list of TracedDataFrame objects. The function may apply any combination of filter, select, arithmetic on columns, join, and aggregations.

  • input_dtypes

    either

    • a single {column: dtype} mapping — func is called with one TracedDataFrame; or

    • a list of {column: dtype} mappings — func is called with one TracedDataFrame per mapping, in order.

    The dtypes are not used during tracing itself; they are used only when the returned query is subsequently compiled to ONNX.

Returns:

a ParsedQuery representing the operations performed by func, or a list of ParsedQuery objects when func returns multiple dataframes (as a tuple or list).

Example — single dataframe:

import numpy as np
from yobx.xtracing.dataframe_trace import trace_dataframe

def transform(df):
    df = df.filter(df["a"] > 0)
    return df.select([(df["a"] + df["b"]).alias("total")])

pq = trace_dataframe(transform, {"a": np.float32, "b": np.float32})
for op in pq.operations:
    print(type(op).__name__, "—", op)
# FilterOp — FilterOp(condition=Condition(left=ColumnRef(column='a', ...
# SelectOp — SelectOp(items=[SelectItem(expr=BinaryExpr(...), alias='total')], ...

Example — two dataframes:

import numpy as np
from yobx.xtracing.dataframe_trace import trace_dataframe

def transform(df1, df2):
    return df1.select([(df1["a"] + df2["b"]).alias("total")])

pq = trace_dataframe(transform, [{"a": np.float32}, {"b": np.float32}])

Example — multiple output dataframes:

import numpy as np
from yobx.xtracing.dataframe_trace import trace_dataframe

def transform(df):
    out1 = df.select([(df["a"] + df["b"]).alias("sum_ab")])
    out2 = df.select([(df["a"] - df["b"]).alias("diff_ab")])
    return out1, out2

pqs = trace_dataframe(transform, {"a": np.float32, "b": np.float32})
# pqs is a list of two ParsedQuery objects