yobx.sql.dataframe_to_onnx#

yobx.sql.dataframe_to_onnx(func: ~typing.Callable, input_dtypes: ~typing.Dict[str, ~numpy.dtype | type | str] | ~typing.List[~typing.Dict[str, ~numpy.dtype | type | str]], target_opset: int = 21, custom_functions: ~typing.Dict[str, ~typing.Callable] | None = None, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, filename: str | None = None, verbose: int = 0, large_model: bool = False, external_threshold: int = 1024, return_optimize_report: bool = False) → ExportArtifact[source]#

Trace func and convert the resulting computation to ONNX.

Combines trace_dataframe() and parsed_query_to_onnx() into a single call.

Parameters:

func – a callable that accepts one or more TracedDataFrame objects and returns a TracedDataFrame or a tuple/list of TracedDataFrame objects. When func returns multiple dataframes, all their outputs are collected into a single ONNX graph with shared inputs and multiple output tensors.
input_dtypes –
either
- a single {column: dtype} mapping — func is called with one TracedDataFrame; or
- a list of {column: dtype} mappings — func is called with one TracedDataFrame per mapping, in order. When the traced function contains a JoinOp, the first mapping is used as the left-table dtypes and the second as the right-table dtypes. For functions that simply share columns across multiple frames without a join, all mappings are merged into a single input-dtype dict.
A pandas DataFrame (or a list of DataFrames) is also accepted; the column names and per-column dtypes are extracted automatically from each DataFrame.
target_opset – ONNX opset version to target (default: yobx.DEFAULT_TARGET_OPSET).
custom_functions – optional mapping from function name to Python callable. Functions registered here can be called inside the traced body via FuncCallExpr nodes if the traced function constructs them directly (advanced usage).
builder_cls – graph-builder class or factory callable. Defaults to GraphBuilder.
filename – if set, the exported ONNX model is saved to this path and the ExportReport is written as a companion Excel file (same base name with .xlsx extension).
verbose – verbosity level (0 = silent).
large_model – if True the returned ExportArtifact has its container attribute set to an ExtendedModelContainer
external_threshold – if large_model is True, every tensor whose element count exceeds this threshold is stored as external data
return_optimize_report – if True, the returned ExportArtifact has its report attribute populated with per-pattern optimization statistics

Returns:

ExportArtifact wrapping the exported ONNX model together with an ExportReport.

Example — single dataframe:

import numpy as np
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df):
    df = df.filter(df["a"] > 0)
    return df.select([(df["a"] + df["b"]).alias("total")])

dtypes = {"a": np.float32, "b": np.float32}
artifact = dataframe_to_onnx(transform, dtypes)

ref = ExtendedReferenceEvaluator(artifact)
a = np.array([1.0, -2.0, 3.0], dtype=np.float32)
b = np.array([4.0,  5.0, 6.0], dtype=np.float32)
(total,) = ref.run(None, {"a": a, "b": b})
# total == array([5., 9.], dtype=float32)  (rows where a > 0)

Example — two independent dataframes (no join):

import numpy as np
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df1, df2):
    return df1.select([(df1["a"] + df2["b"]).alias("total")])

artifact = dataframe_to_onnx(transform, [{"a": np.float32}, {"b": np.float32}])

ref = ExtendedReferenceEvaluator(artifact)
a = np.array([1.0, 2.0, 3.0], dtype=np.float32)
b = np.array([4.0, 5.0, 6.0], dtype=np.float32)
(total,) = ref.run(None, {"a": a, "b": b})
# total == array([5., 7., 9.], dtype=float32)

Example — two dataframes joined on a key column:

import numpy as np
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df1, df2):
    return df1.join(df2, left_key="cid", right_key="id")

dtypes1 = {"cid": np.int64, "a": np.float32}
dtypes2 = {"id": np.int64, "b": np.float32}
artifact = dataframe_to_onnx(transform, [dtypes1, dtypes2])

Example — multiple output dataframes:

import numpy as np
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df):
    out1 = df.select([(df["a"] + df["b"]).alias("sum_ab")])
    out2 = df.select([(df["a"] - df["b"]).alias("diff_ab")])
    return out1, out2

dtypes = {"a": np.float32, "b": np.float32}
artifact = dataframe_to_onnx(transform, dtypes)

ref = ExtendedReferenceEvaluator(artifact)
a = np.array([1.0, 2.0], dtype=np.float32)
b = np.array([3.0, 4.0], dtype=np.float32)
sum_ab, diff_ab = ref.run(None, {"a": a, "b": b})
# sum_ab == array([4., 6.], dtype=float32)
# diff_ab == array([-2., -2.], dtype=float32)