yobx.xtracing — parse module#
SQL parser — translates a SQL string into a structured list of
SqlOperation objects.
The parser handles a small but useful subset of SQL:
SELECT— column selection and simple arithmetic expressions (+,-,*,/).WHERE/ filter — comparison predicates (=,<,>,<=,>=,<>/!=) combined withAND/OR.GROUP BY— column grouping (aggregationsSUM,COUNT,AVG,MIN,MAXin theSELECTlist are recognised).JOIN—[INNER] JOIN … ON col1 = col2linking two input tables.Subqueries in the
FROMclause:SELECT … FROM (SELECT … FROM table) [AS alias].
All names are case-insensitive; they are normalised to lower-case by
parse_sql().
- class yobx.xtracing.parse.AggExpr(func: str, arg: object)[source]#
An aggregation expression:
SUM(col),COUNT(*), etc.
- class yobx.xtracing.parse.BinaryExpr(left: object, op: str, right: object)[source]#
A binary expression:
left op right.
- class yobx.xtracing.parse.ColumnRef(column: str, table: str | None = None, dtype: int = 0)[source]#
A bare column reference, optionally qualified:
table.column.- Parameters:
column – the column name (lower-cased by the parser).
table – optional table qualifier (lower-cased by the parser).
dtype – ONNX element type for the column expressed as an
onnx.TensorProtointeger constant (e.g.onnx.TensorProto.FLOAT,onnx.TensorProto.INT64). Set bytrace_dataframe()when dtype information is available at tracing time; defaults to0(onnx.TensorProto.UNDEFINED) otherwise (e.g. when the reference is produced by the SQL string parser).
- class yobx.xtracing.parse.Condition(left: object, op: str, right: object)[source]#
A WHERE predicate, either a leaf comparison or a compound AND / OR.
- class yobx.xtracing.parse.FilterOp(condition: Condition = <factory>)[source]#
Represents a
WHEREclause.- Parameters:
condition – the parsed predicate tree.
- class yobx.xtracing.parse.FuncCallExpr(func: str, args: List[object])[source]#
A call to a user-defined (custom) function:
func_name(arg1, arg2, …).
- class yobx.xtracing.parse.GroupByOp(columns: List[str] = <factory>)[source]#
Represents a
GROUP BYclause.- Parameters:
columns – the column names to group by.
- class yobx.xtracing.parse.JoinOp(right_table: str = '', left_keys: List[str] = <factory>, right_keys: List[str] = <factory>, join_type: str = 'inner', left_columns: List[ColumnRef] = <factory>, right_columns: List[ColumnRef] = <factory>)[source]#
Represents a
JOINclause.- Parameters:
right_table – the name of the right-hand table being joined.
left_keys – list of column names from the left table used in the equi-join predicate. A single-element list is a single-column join; a multi-element list produces
col1 = col2 AND col3 = col4 …semantics.right_keys – list of column names from the right table used in the equi-join predicate. Must have the same length as left_keys.
join_type –
'inner'(default),'left','right', or'full'.left_columns – columns belonging to the left-hand table, each as a
ColumnRefcarrying the column name and ONNX element type. Populated byjoin(). Left empty when the join was produced by the SQL string parser.right_columns – columns belonging to the right-hand table, each as a
ColumnRefcarrying the column name and ONNX element type. Populated byjoin()so that_populate_graph()can classify columns as left- or right-side and obtain their dtype without requiring a separateright_input_dtypesargument. Left empty when the join was produced by the SQL string parser (in that case the caller must supplyright_input_dtypestosql_to_onnx_graph()).
- class yobx.xtracing.parse.Literal(value: object)[source]#
A scalar literal value (number or quoted string).
- class yobx.xtracing.parse.ParsedQuery(operations: List[SqlOperation] = <factory>, from_table: str = '', columns: List[ColumnRef] = <factory>, subquery: ParsedQuery | None = None)[source]#
The result of
parse_sql().- Parameters:
operations – ordered list of
SqlOperationobjects derived from the SQL string. The order reflects the logical execution sequence:JoinOp(if any) →FilterOp(if any) →GroupByOp(if any) →SelectOp.from_table – the primary (left) table name from the
FROMclause, or the alias of the subquery whensubqueryis set.columns – all column references in the query, in the order they appear (deduped by column name). Each entry is a
ColumnRefwhosedtypefield is populated when the query was produced bytrace_dataframe()(i.e. the dtype is known at tracing time); it is0(onnx.TensorProto.UNDEFINED) when the query was produced by the SQL string parser.subquery – when the
FROMclause contains a sub-select (FROM (SELECT …)), this holds the parsed inner query; otherwiseNone.
- class yobx.xtracing.parse.PivotTableOp(index_refs: List[ColumnRef] = <factory>, columns_refs: List[ColumnRef] = <factory>, values_refs: List[ColumnRef] = <factory>, aggfunc: str = 'sum', column_values: List[Any] = <factory>, fill_value: float = 0.0)[source]#
Represents a
pivot_tableoperation (similar topandas.DataFrame.pivot_table()).The index column(s) define the row grouping; the columns column(s) provide the category values that become output column headers; the values column(s) are aggregated for each (index, column) combination.
- Parameters:
index_refs –
ColumnRefobjects (with dtype) for the row-grouping columns. One or more columns are supported.columns_refs –
ColumnRefobjects (with dtype) for the pivot-header column(s). A single-element list produces a single category column; a multi-element list creates a compound category key — in that case each entry in column_values must be a tuple/list of scalars with one value per column in columns_refs.values_refs –
ColumnRefobjects (with dtype) for the column(s) to aggregate. Each values column independently produces one output tensor per column_values entry.aggfunc – aggregation function —
'sum','mean','min','max', or'count'. Defaults to'sum'.column_values – the known distinct values that the columns column(s) may take. Each entry yields one output column per values column, named
"<values>_<cv>"(single columns column) or"<values>_<cv1>_<cv2>…"(multiple columns columns). Must be provided since ONNX graphs have a static structure.fill_value – value inserted for (index, column) combinations that have no matching rows. Defaults to
0.0.
- class yobx.xtracing.parse.SelectItem(expr: object, alias: str | None = None)[source]#
One item in the SELECT list: an expression with an optional alias.
- class yobx.xtracing.parse.SelectOp(items: List[SelectItem] = <factory>, distinct: bool = False)[source]#
Represents the
SELECTclause.- Parameters:
items – the list of
SelectItemobjects to compute.distinct –
Truewhen the query containsSELECT DISTINCT.
- class yobx.xtracing.parse.SqlOperation[source]#
Base class for all SQL operations produced by
parse_sql().
- yobx.xtracing.parse.parse_sql(query: str) ParsedQuery[source]#
Parse a SQL query string and return a
ParsedQuery.The parser handles:
SELECT [DISTINCT] expr [AS alias], …FROM tableFROM (SELECT …) [AS alias]— subquery in theFROMclause[INNER|LEFT|RIGHT|FULL [OUTER]] JOIN table ON col = colWHERE condition [AND|OR condition] …GROUP BY col, …
Column names in the returned operations are normalised to lower-case.
- Parameters:
query – the SQL query string to parse.
- Returns:
a
ParsedQuerywith anoperationslist and acolumnslist of all referenced column names.
<<<
from yobx.xtracing.parse import parse_sql pq = parse_sql("SELECT a, b FROM t WHERE a > 0") for op in pq.operations: print(type(op).__name__, op)
>>>
FilterOp FilterOp(condition=Condition(left=ColumnRef(column='a', table=None, dtype=0), op='>', right=Literal(value=0))) SelectOp SelectOp(items=[SelectItem(expr=ColumnRef(column='a', table=None, dtype=0), alias=None), SelectItem(expr=ColumnRef(column='b', table=None, dtype=0), alias=None)], distinct=False)