yobx.sql.sql_to_onnx#
- yobx.sql.sql_to_onnx(query: str, input_dtypes: ~typing.Dict[str, ~numpy.dtype | type | str], right_input_dtypes: ~typing.Dict[str, ~numpy.dtype | type | str] | ~typing.List[~typing.Dict[str, ~numpy.dtype | type | str]] | None = None, target_opset: int = 21, custom_functions: ~typing.Dict[str, ~typing.Callable] | None = None, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, filename: str | None = None, verbose: int = 0, large_model: bool = False, external_threshold: int = 1024, return_optimize_report: bool = False) ExportArtifact[source]#
Convert a SQL query to a self-contained ONNX model.
Each column in the query is represented as a separate 1-D ONNX input tensor, allowing the caller to feed column vectors independently. The resulting model’s outputs correspond to the columns (or expressions) in the
SELECTclause, in order.Internally this function creates a fresh
GraphBuilder(or the class supplied via builder_cls), delegates tosql_to_onnx_graph()to populate it, and then callsto_onnx()to finalise the model. Usesql_to_onnx_graph()directly when you need to embed the SQL subgraph inside a larger ONNX model you are already building.- Parameters:
query – a SQL string. Supported clauses:
SELECT,FROM,[INNER|LEFT|RIGHT|FULL] JOIN … ON,WHERE,GROUP BY. Custom Python functions can be called by name in theSELECTandWHEREclauses when registered via custom_functions.input_dtypes – a mapping from left-table column name to numpy dtype (
np.float32,np.int64, etc.). Only columns actually referenced in the query need to be listed. A pandasDataFrameis also accepted; column names and dtypes are extracted automatically.right_input_dtypes – for queries with a single
JOIN, a mapping from right-table column name to numpy dtype. For queries with multiple JOINs, pass a list of such mappings where the i-th dict covers the i-th right table (in JOIN order). A single dict may also be used with multiple JOINs when all right tables share the same column schema (backward compatible). Defaults toinput_dtypeswhenNone. A pandasDataFrame(or a list thereof) is also accepted; column names and dtypes are extracted automatically.target_opset – ONNX opset version to target (default:
yobx.DEFAULT_TARGET_OPSET).custom_functions –
an optional mapping from function name (as it appears in the SQL string) to a Python callable. Each callable must accept one or more numpy arrays and return a numpy array. The function body is traced with
trace_numpy_function()so that numpy arithmetic is translated into ONNX nodes.Example:
import numpy as np from yobx.sql import sql_to_onnx dtypes = {"a": np.float32} artifact = sql_to_onnx( "SELECT my_sqrt(a) AS r FROM t", dtypes, custom_functions={"my_sqrt": np.sqrt}, )
builder_cls – the graph-builder class (or factory callable) to instantiate when creating the internal
GraphBuilder. Defaults toGraphBuilder. Any class that implements the Shape and type tracking can be supplied here, e.g. a custom subclass that adds extra optimisation passes.filename – if set, the exported ONNX model is saved to this path and the
ExportReportis written as a companion Excel file (same base name with.xlsxextension).verbose – verbosity level (0 = silent).
large_model – if True the returned
ExportArtifacthas itscontainerattribute set to anExtendedModelContainerexternal_threshold – if
large_modelis True, every tensor whose element count exceeds this threshold is stored as external datareturn_optimize_report – if True, the returned
ExportArtifacthas itsreportattribute populated with per-pattern optimization statistics
- Returns:
ExportArtifactwrapping the exported ONNX proto together with anExportReport.
Example:
import numpy as np from yobx.sql import sql_to_onnx from yobx.reference import ExtendedReferenceEvaluator dtypes = {"a": np.float32, "b": np.float32} artifact = sql_to_onnx("SELECT a + b AS total FROM t WHERE a > 0", dtypes) ref = ExtendedReferenceEvaluator(artifact) a = np.array([1.0, -2.0, 3.0], dtype=np.float32) b = np.array([4.0, 5.0, 6.0], dtype=np.float32) (total,) = ref.run(None, {"a": a, "b": b})
Note
GROUP BYproduces one output row per unique key combination. Supported aggregations:SUM,AVG,MIN,MAX,COUNT(*). For multi-columnGROUP BYthe grouping keys are cast tofloat64internally, which may lose precision for integers larger than 2^53.