yobx.sql.sql_to_onnx#

yobx.sql.sql_to_onnx(query: str, input_dtypes: ~typing.Dict[str, ~numpy.dtype | type | str], right_input_dtypes: ~typing.Dict[str, ~numpy.dtype | type | str] | ~typing.List[~typing.Dict[str, ~numpy.dtype | type | str]] | None = None, target_opset: int = 21, custom_functions: ~typing.Dict[str, ~typing.Callable] | None = None, builder_cls: type | ~typing.Callable = <class 'yobx.xbuilder.graph_builder.GraphBuilder'>, filename: str | None = None, verbose: int = 0, large_model: bool = False, external_threshold: int = 1024, return_optimize_report: bool = False) ExportArtifact[source]#

Convert a SQL query to a self-contained ONNX model.

Each column in the query is represented as a separate 1-D ONNX input tensor, allowing the caller to feed column vectors independently. The resulting model’s outputs correspond to the columns (or expressions) in the SELECT clause, in order.

Internally this function creates a fresh GraphBuilder (or the class supplied via builder_cls), delegates to sql_to_onnx_graph() to populate it, and then calls to_onnx() to finalise the model. Use sql_to_onnx_graph() directly when you need to embed the SQL subgraph inside a larger ONNX model you are already building.

Parameters:
  • query – a SQL string. Supported clauses: SELECT, FROM, [INNER|LEFT|RIGHT|FULL] JOIN ON, WHERE, GROUP BY. Custom Python functions can be called by name in the SELECT and WHERE clauses when registered via custom_functions.

  • input_dtypes – a mapping from left-table column name to numpy dtype (np.float32, np.int64, etc.). Only columns actually referenced in the query need to be listed. A pandas DataFrame is also accepted; column names and dtypes are extracted automatically.

  • right_input_dtypes – for queries with a single JOIN, a mapping from right-table column name to numpy dtype. For queries with multiple JOINs, pass a list of such mappings where the i-th dict covers the i-th right table (in JOIN order). A single dict may also be used with multiple JOINs when all right tables share the same column schema (backward compatible). Defaults to input_dtypes when None. A pandas DataFrame (or a list thereof) is also accepted; column names and dtypes are extracted automatically.

  • target_opset – ONNX opset version to target (default: yobx.DEFAULT_TARGET_OPSET).

  • custom_functions

    an optional mapping from function name (as it appears in the SQL string) to a Python callable. Each callable must accept one or more numpy arrays and return a numpy array. The function body is traced with trace_numpy_function() so that numpy arithmetic is translated into ONNX nodes.

    Example:

    import numpy as np
    from yobx.sql import sql_to_onnx
    
    dtypes = {"a": np.float32}
    artifact = sql_to_onnx(
        "SELECT my_sqrt(a) AS r FROM t",
        dtypes,
        custom_functions={"my_sqrt": np.sqrt},
    )
    

  • builder_cls – the graph-builder class (or factory callable) to instantiate when creating the internal GraphBuilder. Defaults to GraphBuilder. Any class that implements the Shape and type tracking can be supplied here, e.g. a custom subclass that adds extra optimisation passes.

  • filename – if set, the exported ONNX model is saved to this path and the ExportReport is written as a companion Excel file (same base name with .xlsx extension).

  • verbose – verbosity level (0 = silent).

  • large_model – if True the returned ExportArtifact has its container attribute set to an ExtendedModelContainer

  • external_threshold – if large_model is True, every tensor whose element count exceeds this threshold is stored as external data

  • return_optimize_report – if True, the returned ExportArtifact has its report attribute populated with per-pattern optimization statistics

Returns:

ExportArtifact wrapping the exported ONNX proto together with an ExportReport.

Example:

import numpy as np
from yobx.sql import sql_to_onnx
from yobx.reference import ExtendedReferenceEvaluator

dtypes = {"a": np.float32, "b": np.float32}
artifact = sql_to_onnx("SELECT a + b AS total FROM t WHERE a > 0", dtypes)

ref = ExtendedReferenceEvaluator(artifact)
a = np.array([1.0, -2.0, 3.0], dtype=np.float32)
b = np.array([4.0,  5.0, 6.0], dtype=np.float32)
(total,) = ref.run(None, {"a": a, "b": b})

Note

GROUP BY produces one output row per unique key combination. Supported aggregations: SUM, AVG, MIN, MAX, COUNT(*). For multi-column GROUP BY the grouping keys are cast to float64 internally, which may lose precision for integers larger than 2^53.