yobx.sql.lazyframe_to_onnx#

yobx.sql.lazyframe_to_onnx(lf: polars.LazyFrame, input_dtypes: Dict[str, Union[np.dtype, type, str]], target_opset: int = 21, builder_cls: Union[type, Callable] = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, filename: Optional[str] = None, verbose: int = 0, large_model: bool = False, external_threshold: int = 1024, return_optimize_report: bool = False) ExportArtifact[source]#

Convert a polars.LazyFrame into a self-contained ONNX model.

The function extracts the logical execution plan from the LazyFrame via polars.LazyFrame.explain(), translates it into a SQL query understood by sql_to_onnx(), and returns an ExportArtifact containing the ONNX model.

Each source column of the plan is represented as a separate 1-D ONNX input tensor. The ONNX model outputs correspond to the columns or expressions in the select (or agg) step of the plan.

Supported LazyFrame operations#

  • select — column pass-through and arithmetic expressions

  • filter — row filtering with comparison and boolean predicates

  • group_by + agg — aggregations (sum, mean, min, max, count)

param lf:

a polars.LazyFrame. The execution plan returned by lf.explain() is parsed and converted.

param input_dtypes:

a mapping from source column name to numpy dtype (e.g. {"a": np.float32, "b": np.float64}). Only the columns that actually appear in the plan need to be listed.

param target_opset:

ONNX opset version to target (default: yobx.DEFAULT_TARGET_OPSET).

param builder_cls:

the graph-builder class (or factory callable) to use. Defaults to GraphBuilder.

param filename:

if set, the exported ONNX model is saved to this path and the ExportReport is written as a companion Excel file (same base name with .xlsx extension).

param verbose:

verbosity level (0 = silent).

param large_model:

if True the returned ExportArtifact has its container attribute set to an ExtendedModelContainer

param external_threshold:

if large_model is True, every tensor whose element count exceeds this threshold is stored as external data

param return_optimize_report:

if True, the returned ExportArtifact has its report attribute populated with per-pattern optimization statistics

return:

ExportArtifact wrapping the exported ONNX model together with an ExportReport.

Example:

import numpy as np
import polars as pl
from yobx.sql import lazyframe_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

lf = pl.LazyFrame({"a": [1.0, 2.0, 3.0], "b": [4.0, 5.0, 6.0]})
lf = lf.filter(pl.col("a") > 0).select(
    [(pl.col("a") + pl.col("b")).alias("total")]
)

dtypes = {"a": np.float64, "b": np.float64}
artifact = lazyframe_to_onnx(lf, dtypes)

ref = ExtendedReferenceEvaluator(artifact)
a = np.array([1.0, -2.0, 3.0], dtype=np.float64)
b = np.array([4.0,  5.0, 6.0], dtype=np.float64)
(total,) = ref.run(None, {"a": a, "b": b})
# total contains rows where a > 0: [5.0, 9.0]

Note

GROUP BY aggregations are computed over the whole filtered dataset (same limitation as sql_to_onnx()). True SQL group-by semantics (one output row per unique key) would require an ONNX Loop or custom kernel and are not yet supported.