yobx.xtracing#
Lightweight mechanism for tracing numpy and DataFrame functions and exporting them to ONNX. See Numpy-Tracing and FunctionTransformer for a full walkthrough.
modules
NumpyArray#
- class yobx.xtracing.NumpyArray(name: str, graph_builder, dtype=None, shape=None)[source]#
Proxy for an ONNX tensor that traces numpy operations as ONNX graph nodes.
Instances are produced by
trace_numpy_to_onnx()or directly by theFunctionTransformerconverter when the input array is replaced by a symbolic placeholder. Every arithmetic operation, ufunc call, or reduction performed on aNumpyArrayis recorded as an ONNX node in the underlyingGraphBuilder.The class follows the Python Array API standard and the numpy
__array_ufunc__/__array_function__dispatch protocols so that plain numpy code can be traced without modification.- Parameters:
name – ONNX tensor name (a string handle in the graph).
graph_builder – the
GraphBuilderthat owns the graph being built.dtype – optional numpy dtype for the tensor; used when creating scalar constants from Python literals.
shape – optional tensor shape.
- property T: NumpyArray#
Transpose (reverses all axes).
- astype(dtype) NumpyArray[source]#
Cast to dtype.
- clip(a_min=None, a_max=None) NumpyArray[source]#
Clip values to
[a_min, a_max].
- expand_dims(axis) NumpyArray[source]#
Add a size-1 dimension (numpy
expand_dims).
- flatten() NumpyArray[source]#
Flatten to a 1-D tensor.
- max(axis=None, keepdims: bool = False) NumpyArray[source]#
Maximum along axis.
- mean(axis=None, keepdims: bool = False) NumpyArray[source]#
Mean of elements along axis.
- min(axis=None, keepdims: bool = False) NumpyArray[source]#
Minimum along axis.
- prod(axis=None, keepdims: bool = False) NumpyArray[source]#
Product of elements along axis.
- reshape(*shape) NumpyArray[source]#
Reshape the tensor.
- property shape#
Tensor shape if known.
- squeeze(axis=None) NumpyArray[source]#
Remove size-1 dimensions.
- sum(axis=None, keepdims: bool = False) NumpyArray[source]#
Sum elements along axis.
- transpose(*axes) NumpyArray[source]#
Transpose with optional axes permutation.
trace_numpy_function#
- yobx.xtracing.trace_numpy_function(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str] | None, func: Callable, inputs: List[str], name: str = 'trace', kw_args: Dict[str, Any] | None = None) str | Tuple[str, ...][source]#
Trace a numpy function by wrapping named tensors in g as
NumpyArrayproxies, then recording all numpy operations as ONNX nodes in g.This function follows the same API convention as other converters in this package: it takes an existing
GraphBuilder, the scikit-learn shape/type dictionary sts, the desired output tensor names, and then the callable and input tensor names.- Parameters:
g – the graph builder to add nodes to
sts – shapes defined by scikit-learn (may be empty; forwarded to
g.set_type_shape_unary_opwhen non-empty)outputs – desired output tensor names (one per output returned by func)
func – the numpy function to trace; must accept
len(inputs)positional arguments and return aNumpyArrayor a tuple/list ofNumpyArrayobjectsinputs – existing input tensor names already present in g
name – node name prefix used when emitting
Identityrename nodeskw_args – optional keyword arguments forwarded to func
- Returns:
the first output tensor name (
outputs[0])
Example:
<<<
from yobx.helpers.onnx_helper import pretty_onnx from yobx.xbuilder import GraphBuilder from yobx.xtracing import trace_numpy_function import numpy as np from onnx import TensorProto g = GraphBuilder({"": 21, "ai.onnx.ml": 1}) g.make_tensor_input("X", TensorProto.FLOAT, ("batch", 3)) def my_func(X): return np.sqrt(np.abs(X) + np.float32(1)) trace_numpy_function(g, {}, ["output_0"], my_func, ["X"]) g.make_tensor_output("output_0", indexed=False, allow_untyped_output=True) art = g.to_onnx() print(pretty_onnx(art))
>>>
opset: domain='' version=21 opset: domain='ai.onnx.ml' version=1 input: name='X' type=dtype('float32') shape=['batch', 3] init: name='init1_s_' type=float32 shape=() -- array([1.], dtype=float32)-- Opset.make_node.1/Small Abs(X) -> _onx_abs_X Add(_onx_abs_X, init1_s_) -> _onx_add_abs_X Sqrt(_onx_add_abs_X) -> output_0 output: name='output_0' type='NOTENSOR' shape=None
trace_dataframe#
- yobx.xtracing.trace_dataframe(func: Callable, input_dtypes: Dict[str, dtype | type | str] | List[Dict[str, dtype | type | str]]) ParsedQuery | List[ParsedQuery][source]#
Trace func and return the equivalent
ParsedQuery.Constructs one or more
TracedDataFrameobjects whose columns correspond to input_dtypes, calls func with them, and converts the resulting frame’s recorded operations into aParsedQuery.- Parameters:
func – a callable that accepts one or more
TracedDataFrameobjects and returns aTracedDataFrameor a tuple/list ofTracedDataFrameobjects. The function may apply any combination offilter,select, arithmetic on columns,join, and aggregations.input_dtypes –
either
a single
{column: dtype}mapping — func is called with oneTracedDataFrame; ora list of
{column: dtype}mappings — func is called with oneTracedDataFrameper mapping, in order.
The dtypes are not used during tracing itself; they are used only when the returned query is subsequently compiled to ONNX.
- Returns:
a
ParsedQueryrepresenting the operations performed by func, or a list ofParsedQueryobjects when func returns multiple dataframes (as a tuple or list).
Example — single dataframe:
import numpy as np from yobx.xtracing.dataframe_trace import trace_dataframe def transform(df): df = df.filter(df["a"] > 0) return df.select([(df["a"] + df["b"]).alias("total")]) pq = trace_dataframe(transform, {"a": np.float32, "b": np.float32}) for op in pq.operations: print(type(op).__name__, "—", op) # FilterOp — FilterOp(condition=Condition(left=ColumnRef(column='a', ... # SelectOp — SelectOp(items=[SelectItem(expr=BinaryExpr(...), alias='total')], ...
Example — two dataframes:
import numpy as np from yobx.xtracing.dataframe_trace import trace_dataframe def transform(df1, df2): return df1.select([(df1["a"] + df2["b"]).alias("total")]) pq = trace_dataframe(transform, [{"a": np.float32}, {"b": np.float32}])
Example — multiple output dataframes:
import numpy as np from yobx.xtracing.dataframe_trace import trace_dataframe def transform(df): out1 = df.select([(df["a"] + df["b"]).alias("sum_ab")]) out2 = df.select([(df["a"] - df["b"]).alias("diff_ab")]) return out1, out2 pqs = trace_dataframe(transform, {"a": np.float32, "b": np.float32}) # pqs is a list of two ParsedQuery objects