yobx.xtracing — parse module#

SQL parser — translates a SQL string into a structured list of SqlOperation objects.

The parser handles a small but useful subset of SQL:

SELECT — column selection and simple arithmetic expressions (+, -, *, /).
WHERE / filter — comparison predicates (=, <, >, <=, >=, <>/!=) combined with AND / OR.
GROUP BY — column grouping (aggregations SUM, COUNT, AVG, MIN, MAX in the SELECT list are recognised).
JOIN — [INNER] JOIN … ON col1 = col2 linking two input tables.
Subqueries in the FROM clause: SELECT … FROM (SELECT … FROM table) [AS alias].

All names are case-insensitive; they are normalised to lower-case by parse_sql().

class yobx.xtracing.parse.AggExpr(func: str, arg: object)[source]#: An aggregation expression: SUM(col), COUNT(*), etc.

class yobx.xtracing.parse.BinaryExpr(left: object, op: str, right: object)[source]#: A binary expression: left op right.

class yobx.xtracing.parse.ColumnRef(column: str, table: str | None = None, dtype: int = 0)[source]#

A bare column reference, optionally qualified: table.column.

Parameters:

column – the column name (lower-cased by the parser).
table – optional table qualifier (lower-cased by the parser).
dtype – ONNX element type for the column expressed as an onnx.TensorProto integer constant (e.g. onnx.TensorProto.FLOAT, onnx.TensorProto.INT64). Set by trace_dataframe() when dtype information is available at tracing time; defaults to 0 (onnx.TensorProto.UNDEFINED) otherwise (e.g. when the reference is produced by the SQL string parser).

class yobx.xtracing.parse.Condition(left: object, op: str, right: object)[source]#: A WHERE predicate, either a leaf comparison or a compound AND / OR.

class yobx.xtracing.parse.FilterOp(condition: Condition = <factory>)[source]#

Represents a WHERE clause.

Parameters:: condition – the parsed predicate tree.

class yobx.xtracing.parse.FuncCallExpr(func: str, args: List[object])[source]#: A call to a user-defined (custom) function: func_name(arg1, arg2, …).

class yobx.xtracing.parse.GroupByOp(columns: List[str] = <factory>)[source]#

Represents a GROUP BY clause.

Parameters:: columns – the column names to group by.

class yobx.xtracing.parse.JoinOp(right_table: str = '', left_keys: List[str] = <factory>, right_keys: List[str] = <factory>, join_type: str = 'inner', left_columns: List[ColumnRef] = <factory>, right_columns: List[ColumnRef] = <factory>)[source]#

Represents a JOIN clause.

Parameters:

right_table – the name of the right-hand table being joined.
left_keys – list of column names from the left table used in the equi-join predicate. A single-element list is a single-column join; a multi-element list produces col1 = col2 AND col3 = col4 … semantics.
right_keys – list of column names from the right table used in the equi-join predicate. Must have the same length as left_keys.
join_type – 'inner' (default), 'left', 'right', or 'full'.
left_columns – columns belonging to the left-hand table, each as a ColumnRef carrying the column name and ONNX element type. Populated by join(). Left empty when the join was produced by the SQL string parser.
right_columns – columns belonging to the right-hand table, each as a ColumnRef carrying the column name and ONNX element type. Populated by join() so that _populate_graph() can classify columns as left- or right-side and obtain their dtype without requiring a separate right_input_dtypes argument. Left empty when the join was produced by the SQL string parser (in that case the caller must supply right_input_dtypes to sql_to_onnx_graph()).

property left_key: str#: Return the first left join key (single-key backward compat).

property right_key: str#: Return the first right join key (single-key backward compat).

class yobx.xtracing.parse.Literal(value: object)[source]#: A scalar literal value (number or quoted string).

class yobx.xtracing.parse.ParsedQuery(operations: List[SqlOperation] = <factory>, from_table: str = '', columns: List[ColumnRef] = <factory>, subquery: ParsedQuery | None = None)[source]#

The result of parse_sql().

Parameters:

operations – ordered list of SqlOperation objects derived from the SQL string. The order reflects the logical execution sequence: JoinOp (if any) → FilterOp (if any) → GroupByOp (if any) → SelectOp.
from_table – the primary (left) table name from the FROM clause, or the alias of the subquery when subquery is set.
columns – all column references in the query, in the order they appear (deduped by column name). Each entry is a ColumnRef whose dtype field is populated when the query was produced by trace_dataframe() (i.e. the dtype is known at tracing time); it is 0 (onnx.TensorProto.UNDEFINED) when the query was produced by the SQL string parser.
subquery – when the FROM clause contains a sub-select (FROM (SELECT …)), this holds the parsed inner query; otherwise None.

class yobx.xtracing.parse.PivotTableOp(index_refs: List[ColumnRef] = <factory>, columns_refs: List[ColumnRef] = <factory>, values_refs: List[ColumnRef] = <factory>, aggfunc: str = 'sum', column_values: List[Any] = <factory>, fill_value: float = 0.0)[source]#

Represents a pivot_table operation (similar to pandas.DataFrame.pivot_table()).

The index column(s) define the row grouping; the columns column(s) provide the category values that become output column headers; the values column(s) are aggregated for each (index, column) combination.

Parameters:

index_refs – ColumnRef objects (with dtype) for the row-grouping columns. One or more columns are supported.
columns_refs – ColumnRef objects (with dtype) for the pivot-header column(s). A single-element list produces a single category column; a multi-element list creates a compound category key — in that case each entry in column_values must be a tuple/list of scalars with one value per column in columns_refs.
values_refs – ColumnRef objects (with dtype) for the column(s) to aggregate. Each values column independently produces one output tensor per column_values entry.
aggfunc – aggregation function — 'sum', 'mean', 'min', 'max', or 'count'. Defaults to 'sum'.
column_values – the known distinct values that the columns column(s) may take. Each entry yields one output column per values column, named "<values>_<cv>" (single columns column) or "<values>_<cv1>_<cv2>…" (multiple columns columns). Must be provided since ONNX graphs have a static structure.
fill_value – value inserted for (index, column) combinations that have no matching rows. Defaults to 0.0.

property columns: str | List[str]#

Name(s) of the pivot-header column(s).

Returns a bare str when there is exactly one category column (the common case), or a List[str] when multiple category columns were specified.

property index: List[str]#: Column names used as the row grouping keys.

property values: str | List[str]#

Name(s) of the column(s) to aggregate.

Returns a bare str when there is exactly one values column (the common case), or a List[str] when multiple values columns were specified.

class yobx.xtracing.parse.SelectItem(expr: object, alias: str | None = None)[source]#

One item in the SELECT list: an expression with an optional alias.

output_name() → str[source]#: Return the alias, or derive a name from the expression.

class yobx.xtracing.parse.SelectOp(items: List[SelectItem] = <factory>, distinct: bool = False)[source]#

Represents the SELECT clause.

Parameters:

items – the list of SelectItem objects to compute.
distinct – True when the query contains SELECT DISTINCT.

class yobx.xtracing.parse.SqlOperation[source]#: Base class for all SQL operations produced by parse_sql().

yobx.xtracing.parse.parse_sql(query: str) → ParsedQuery[source]#

Parse a SQL query string and return a ParsedQuery.

The parser handles:

SELECT [DISTINCT] expr [AS alias], …
FROM table
FROM (SELECT …) [AS alias] — subquery in the FROM clause
[INNER|LEFT|RIGHT|FULL [OUTER]] JOIN table ON col = col
WHERE condition [AND|OR condition] …
GROUP BY col, …

Column names in the returned operations are normalised to lower-case.

Parameters:: query – the SQL query string to parse.
Returns:: a ParsedQuery with an operations list and a columns list of all referenced column names.

<<<

from yobx.xtracing.parse import parse_sql

pq = parse_sql("SELECT a, b FROM t WHERE a > 0")
for op in pq.operations:
    print(type(op).__name__, op)

>>>

    FilterOp FilterOp(condition=Condition(left=ColumnRef(column='a', table=None, dtype=0), op='>', right=Literal(value=0)))
    SelectOp SelectOp(items=[SelectItem(expr=ColumnRef(column='a', table=None, dtype=0), alias=None), SelectItem(expr=ColumnRef(column='b', table=None, dtype=0), alias=None)], distinct=False)