yobx.sql.to_onnx#
- yobx.sql.to_onnx(dataframe_or_query: Union[str, Callable[[TracedDataFrame], TracedDataFrame], 'polars.LazyFrame'], args: Optional[Union[np.ndarray, Tuple[np.ndarray, ...], 'pandas.DataFrame', Tuple['pandas.DataFrame', ...], Dict[str, Union[np.dtype, type, str]], List[Dict[str, Union[np.dtype, type, str]]]]] = None, target_opset: int = 21, custom_functions: Optional[Dict[str, Callable]] = None, builder_cls: Union[type, Callable] = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, filename: Optional[str] = None, verbose: int = 0, input_names: Optional[Sequence[str]] = None, dynamic_shapes: Optional[Tuple[Dict[int, str], ...]] = None, large_model: bool = False, external_threshold: int = 1024, return_optimize_report: bool = False) ExportArtifact[source]#
Convert a SQL string, a DataFrame-tracing function, or a polars LazyFrame to ONNX.
This is the unified entry point that dispatches to:
sql_to_onnx()— when dataframe_or_query is a string.dataframe_to_onnx()— when dataframe_or_query is a callable (a Python function that accepts aTracedDataFrameand returns one).trace_numpy_to_onnx()— when dataframe_or_query is a callable and args is anumpy.ndarrayor a tuple/list ofnumpy.ndarrayobjects (sample inputs for numpy-function tracing).lazyframe_to_onnx()— for any other value (expected to be apolars.LazyFrame).
Each source column is represented as a separate 1-D ONNX input tensor. The model outputs correspond to the
SELECTexpressions (SQL / callable) or theselect/aggstep of the LazyFrame plan.- Parameters:
dataframe_or_query –
one of:
SQL string — supported clauses:
SELECT,FROM,[INNER|LEFT|RIGHT|FULL] JOIN … ON,WHERE,GROUP BY. Custom Python functions can be called by name in theSELECTandWHEREclauses when registered via custom_functions.callable — a Python function
(df: TracedDataFrame) -> TracedDataFrameor a function that accepts multipleTracedDataFramearguments. The function is traced to capture allfilter,select, aggregation, andjoinoperations it performs, which are then compiled to ONNX. When args contains numpy arrays the function is treated as a numpy function and traced viatrace_numpy_to_onnx()instead.polars.LazyFrame — the execution plan is extracted via
polars.LazyFrame.explain()and translated into SQL before conversion. Seelazyframe_to_onnx()for details of supported operations.
args –
one of:
A single
{column: dtype}mapping or a list of such mappings (one perTracedDataFrameargument) for DataFrame-tracing callables or SQL queries.A
numpy.ndarrayor a tuple/list ofnumpy.ndarrayobjects — when dataframe_or_query is a numpy function, these sample arrays are used to determine the element types and shapes of the ONNX graph inputs; the function is then traced viatrace_numpy_to_onnx().A pandas
DataFrame(or a tuple/list of DataFrames) — column names and per-column dtypes are extracted automatically.
For SQL queries this maps left-table columns; for a
LazyFrameit maps the source DataFrame columns referenced in the plan. Only columns that actually appear in the query / plan need to be listed. Supported numpy dtypes:float16,float32,float64,int8,int16,int32,int64,uint8,uint16,uint32,uint64,bool,object(string).target_opset – ONNX opset version to target (default:
yobx.DEFAULT_TARGET_OPSET).custom_functions –
an optional mapping from function name (as it appears in the SQL string) to a Python callable. Each callable must accept one or more numpy arrays and return a numpy array. The function body is traced with
trace_numpy_function()so that numpy arithmetic is translated into ONNX nodes. Ignored when dataframe_or_query is apolars.LazyFrameor when args contains numpy arrays.Example:
import numpy as np from yobx.sql import to_onnx artifact = to_onnx( "SELECT my_sqrt(a) AS r FROM t", {"a": np.float32}, custom_functions={"my_sqrt": np.sqrt}, )
builder_cls – the graph-builder class (or factory callable) to use. Defaults to
GraphBuilder. Any class that implements the Shape and type tracking can be supplied here, e.g. a custom subclass that adds extra optimisation passes.filename – if set, the exported ONNX model is saved to this path and the
ExportReportis written as a companion Excel file (same base name with.xlsxextension).verbose – verbosity level (0 = silent).
input_names – optional list of tensor names for the ONNX graph inputs. Only used when dataframe_or_query is a numpy function (i.e. args contains
numpy.ndarrayobjects); ignored for SQL strings and DataFrame-tracing callables.dynamic_shapes – optional per-input axis-to-dimension-name mappings. Only used when dataframe_or_query is a numpy function; ignored for SQL strings and DataFrame-tracing callables.
large_model – if True the returned
ExportArtifacthas itscontainerattribute set to anExtendedModelContainerexternal_threshold – if
large_modelis True, every tensor whose element count exceeds this threshold is stored as external datareturn_optimize_report – if True, the returned
ExportArtifacthas itsreportattribute populated with per-pattern optimization statistics
- Returns:
ExportArtifactwrapping the exported ONNX model together with anExportReport.
Example — from a SQL string:
import numpy as np from yobx.sql import to_onnx from yobx.reference import ExtendedReferenceEvaluator dtypes = {"a": np.float32, "b": np.float32} artifact = to_onnx("SELECT a + b AS total FROM t WHERE a > 0", dtypes) ref = ExtendedReferenceEvaluator(artifact) a = np.array([1.0, -2.0, 3.0], dtype=np.float32) b = np.array([4.0, 5.0, 6.0], dtype=np.float32) (total,) = ref.run(None, {"a": a, "b": b}) # total == array([5., 9.], dtype=float32) (rows where a > 0)
Example — from a DataFrame-tracing callable:
import numpy as np from yobx.sql import to_onnx from yobx.reference import ExtendedReferenceEvaluator def transform(df): df = df.filter(df["a"] > 0) return df.select([(df["a"] + df["b"]).alias("total")]) dtypes = {"a": np.float32, "b": np.float32} artifact = to_onnx(transform, dtypes) ref = ExtendedReferenceEvaluator(artifact) a = np.array([1.0, -2.0, 3.0], dtype=np.float32) b = np.array([4.0, 5.0, 6.0], dtype=np.float32) (total,) = ref.run(None, {"a": a, "b": b}) # total == array([5., 9.], dtype=float32) (rows where a > 0)
Example — from a numpy-function callable with sample inputs:
import numpy as np from yobx.sql import to_onnx from yobx.reference import ExtendedReferenceEvaluator def my_func(x): return np.sqrt(np.abs(x) + 1) x = np.array([1.0, -2.0, 3.0], dtype=np.float32) artifact = to_onnx(my_func, (x,)) ref = ExtendedReferenceEvaluator(artifact) (result,) = ref.run(None, {"X": x})
Example — from a polars LazyFrame:
import numpy as np import polars as pl from yobx.sql import to_onnx from yobx.reference import ExtendedReferenceEvaluator lf = pl.LazyFrame({"a": [1.0, -2.0, 3.0], "b": [4.0, 5.0, 6.0]}) lf = lf.filter(pl.col("a") > 0).select( [(pl.col("a") + pl.col("b")).alias("total")] ) dtypes = {"a": np.float64, "b": np.float64} artifact = to_onnx(lf, dtypes) ref = ExtendedReferenceEvaluator(artifact) a = np.array([1.0, -2.0, 3.0], dtype=np.float64) b = np.array([4.0, 5.0, 6.0], dtype=np.float64) (total,) = ref.run(None, {"a": a, "b": b}) # total == array([5., 9.]) (rows where a > 0)
Note
GROUP BYproduces one output row per unique key combination. Supported aggregations:SUM,AVG,MIN,MAX,COUNT(*). For multi-columnGROUP BYthe grouping keys are cast tofloat64internally, which may lose precision for integers larger than 2^53.