yobx.sql.dataframe_to_onnx#

yobx.sql.dataframe_to_onnx(func: ~typing.Callable, input_dtypes: ~typing.Dict[str, ~numpy.dtype | type | str] | ~typing.List[~typing.Dict[str, ~numpy.dtype | type | str]], target_opset: int = 21, custom_functions: ~typing.Dict[str, ~typing.Callable] | None = None, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, filename: str | None = None, verbose: int = 0, large_model: bool = False, external_threshold: int = 1024, return_optimize_report: bool = False) ExportArtifact[source]#

Trace func and convert the resulting computation to ONNX.

Combines trace_dataframe() and parsed_query_to_onnx() into a single call.

Parameters:
  • func – a callable that accepts one or more TracedDataFrame objects and returns a TracedDataFrame or a tuple/list of TracedDataFrame objects. When func returns multiple dataframes, all their outputs are collected into a single ONNX graph with shared inputs and multiple output tensors.

  • input_dtypes

    either

    • a single {column: dtype} mapping — func is called with one TracedDataFrame; or

    • a list of {column: dtype} mappings — func is called with one TracedDataFrame per mapping, in order. When the traced function contains a JoinOp, the first mapping is used as the left-table dtypes and the second as the right-table dtypes. For functions that simply share columns across multiple frames without a join, all mappings are merged into a single input-dtype dict.

    A pandas DataFrame (or a list of DataFrames) is also accepted; the column names and per-column dtypes are extracted automatically from each DataFrame.

  • target_opset – ONNX opset version to target (default: yobx.DEFAULT_TARGET_OPSET).

  • custom_functions – optional mapping from function name to Python callable. Functions registered here can be called inside the traced body via FuncCallExpr nodes if the traced function constructs them directly (advanced usage).

  • builder_cls – graph-builder class or factory callable. Defaults to GraphBuilder.

  • filename – if set, the exported ONNX model is saved to this path and the ExportReport is written as a companion Excel file (same base name with .xlsx extension).

  • verbose – verbosity level (0 = silent).

  • large_model – if True the returned ExportArtifact has its container attribute set to an ExtendedModelContainer

  • external_threshold – if large_model is True, every tensor whose element count exceeds this threshold is stored as external data

  • return_optimize_report – if True, the returned ExportArtifact has its report attribute populated with per-pattern optimization statistics

Returns:

ExportArtifact wrapping the exported ONNX model together with an ExportReport.

Example — single dataframe:

import numpy as np
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df):
    df = df.filter(df["a"] > 0)
    return df.select([(df["a"] + df["b"]).alias("total")])

dtypes = {"a": np.float32, "b": np.float32}
artifact = dataframe_to_onnx(transform, dtypes)

ref = ExtendedReferenceEvaluator(artifact)
a = np.array([1.0, -2.0, 3.0], dtype=np.float32)
b = np.array([4.0,  5.0, 6.0], dtype=np.float32)
(total,) = ref.run(None, {"a": a, "b": b})
# total == array([5., 9.], dtype=float32)  (rows where a > 0)

Example — two independent dataframes (no join):

import numpy as np
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df1, df2):
    return df1.select([(df1["a"] + df2["b"]).alias("total")])

artifact = dataframe_to_onnx(transform, [{"a": np.float32}, {"b": np.float32}])

ref = ExtendedReferenceEvaluator(artifact)
a = np.array([1.0, 2.0, 3.0], dtype=np.float32)
b = np.array([4.0, 5.0, 6.0], dtype=np.float32)
(total,) = ref.run(None, {"a": a, "b": b})
# total == array([5., 7., 9.], dtype=float32)

Example — two dataframes joined on a key column:

import numpy as np
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df1, df2):
    return df1.join(df2, left_key="cid", right_key="id")

dtypes1 = {"cid": np.int64, "a": np.float32}
dtypes2 = {"id": np.int64, "b": np.float32}
artifact = dataframe_to_onnx(transform, [dtypes1, dtypes2])

Example — multiple output dataframes:

import numpy as np
from yobx.sql import dataframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

def transform(df):
    out1 = df.select([(df["a"] + df["b"]).alias("sum_ab")])
    out2 = df.select([(df["a"] - df["b"]).alias("diff_ab")])
    return out1, out2

dtypes = {"a": np.float32, "b": np.float32}
artifact = dataframe_to_onnx(transform, dtypes)

ref = ExtendedReferenceEvaluator(artifact)
a = np.array([1.0, 2.0], dtype=np.float32)
b = np.array([3.0, 4.0], dtype=np.float32)
sum_ab, diff_ab = ref.run(None, {"a": a, "b": b})
# sum_ab == array([4., 6.], dtype=float32)
# diff_ab == array([-2., -2.], dtype=float32)