yobx.sql.dataframe_to_onnx#
- yobx.sql.dataframe_to_onnx(func: ~typing.Callable, input_dtypes: ~typing.Dict[str, ~numpy.dtype | type | str] | ~typing.List[~typing.Dict[str, ~numpy.dtype | type | str]], target_opset: int = 21, custom_functions: ~typing.Dict[str, ~typing.Callable] | None = None, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, filename: str | None = None, verbose: int = 0, large_model: bool = False, external_threshold: int = 1024, return_optimize_report: bool = False) ExportArtifact[source]#
Trace func and convert the resulting computation to ONNX.
Combines
trace_dataframe()andparsed_query_to_onnx()into a single call.- Parameters:
func – a callable that accepts one or more
TracedDataFrameobjects and returns aTracedDataFrameor a tuple/list ofTracedDataFrameobjects. When func returns multiple dataframes, all their outputs are collected into a single ONNX graph with shared inputs and multiple output tensors.input_dtypes –
either
a single
{column: dtype}mapping — func is called with oneTracedDataFrame; ora list of
{column: dtype}mappings — func is called with oneTracedDataFrameper mapping, in order. When the traced function contains aJoinOp, the first mapping is used as the left-table dtypes and the second as the right-table dtypes. For functions that simply share columns across multiple frames without a join, all mappings are merged into a single input-dtype dict.
A pandas
DataFrame(or a list of DataFrames) is also accepted; the column names and per-column dtypes are extracted automatically from each DataFrame.target_opset – ONNX opset version to target (default:
yobx.DEFAULT_TARGET_OPSET).custom_functions – optional mapping from function name to Python callable. Functions registered here can be called inside the traced body via
FuncCallExprnodes if the traced function constructs them directly (advanced usage).builder_cls – graph-builder class or factory callable. Defaults to
GraphBuilder.filename – if set, the exported ONNX model is saved to this path and the
ExportReportis written as a companion Excel file (same base name with.xlsxextension).verbose – verbosity level (0 = silent).
large_model – if True the returned
ExportArtifacthas itscontainerattribute set to anExtendedModelContainerexternal_threshold – if
large_modelis True, every tensor whose element count exceeds this threshold is stored as external datareturn_optimize_report – if True, the returned
ExportArtifacthas itsreportattribute populated with per-pattern optimization statistics
- Returns:
ExportArtifactwrapping the exported ONNX model together with anExportReport.
Example — single dataframe:
import numpy as np from yobx.sql import dataframe_to_onnx from yobx.reference import ExtendedReferenceEvaluator def transform(df): df = df.filter(df["a"] > 0) return df.select([(df["a"] + df["b"]).alias("total")]) dtypes = {"a": np.float32, "b": np.float32} artifact = dataframe_to_onnx(transform, dtypes) ref = ExtendedReferenceEvaluator(artifact) a = np.array([1.0, -2.0, 3.0], dtype=np.float32) b = np.array([4.0, 5.0, 6.0], dtype=np.float32) (total,) = ref.run(None, {"a": a, "b": b}) # total == array([5., 9.], dtype=float32) (rows where a > 0)
Example — two independent dataframes (no join):
import numpy as np from yobx.sql import dataframe_to_onnx from yobx.reference import ExtendedReferenceEvaluator def transform(df1, df2): return df1.select([(df1["a"] + df2["b"]).alias("total")]) artifact = dataframe_to_onnx(transform, [{"a": np.float32}, {"b": np.float32}]) ref = ExtendedReferenceEvaluator(artifact) a = np.array([1.0, 2.0, 3.0], dtype=np.float32) b = np.array([4.0, 5.0, 6.0], dtype=np.float32) (total,) = ref.run(None, {"a": a, "b": b}) # total == array([5., 7., 9.], dtype=float32)
Example — two dataframes joined on a key column:
import numpy as np from yobx.sql import dataframe_to_onnx from yobx.reference import ExtendedReferenceEvaluator def transform(df1, df2): return df1.join(df2, left_key="cid", right_key="id") dtypes1 = {"cid": np.int64, "a": np.float32} dtypes2 = {"id": np.int64, "b": np.float32} artifact = dataframe_to_onnx(transform, [dtypes1, dtypes2])
Example — multiple output dataframes:
import numpy as np from yobx.sql import dataframe_to_onnx from yobx.reference import ExtendedReferenceEvaluator def transform(df): out1 = df.select([(df["a"] + df["b"]).alias("sum_ab")]) out2 = df.select([(df["a"] - df["b"]).alias("diff_ab")]) return out1, out2 dtypes = {"a": np.float32, "b": np.float32} artifact = dataframe_to_onnx(transform, dtypes) ref = ExtendedReferenceEvaluator(artifact) a = np.array([1.0, 2.0], dtype=np.float32) b = np.array([3.0, 4.0], dtype=np.float32) sum_ab, diff_ab = ref.run(None, {"a": a, "b": b}) # sum_ab == array([4., 6.], dtype=float32) # diff_ab == array([-2., -2.], dtype=float32)