Expected API#
yobx.sklearn.to_onnx() accepts a builder_cls parameter that
defaults to yobx.xbuilder.GraphBuilder. Any object can be
substituted as long as it exposes the two-part API described on this page.
The API is split into two groups that mirror the cross-references used in the source code:
Construction API (Building a graph from scratch) — methods to declare inputs, outputs, initializers, and nodes, and to export the finished graph.
Shape / type API (Shape and type tracking) — methods to attach and query shape and type metadata on intermediate tensors.
An alternative bridge implementation,
OnnxScriptGraphBuilder,
shows how the same API can be satisfied on top of onnxscript’s IR.
When any ONNXSTOP* variable triggers an exception, the resulting
stack trace points to the exact line of converter code that first
assigned a type or shape to that result.
Why using strings to refer to intermediate results?
A user usually only sees the final model and can only investigate an issue based on the names he reads. Keeping explicit, stable names for intermediate results in converters code helps to track the code where this name appears. Keeping that in mind, a protocol for a value seems unnecessary. The creation of the final name should not be delayed. That makes it easier to investigate issues such as exposes in Debugging with Environment Variables.
Construction API#
Method / attribute |
Description |
|---|---|
|
Constructor. target_opset is either an |
|
Declare a graph input tensor. elem_type is an
|
|
Declare a graph output. When |
|
Add a constant tensor to the graph. value can be a
|
|
Low-level node creation. Returns a sequence of output tensor name(s). |
|
Convenience short-hand: |
|
Finalise and return an |
Minimal example#
The snippet below builds the same Sub / Div graph emitted by the
StandardScaler converter, using the default
GraphBuilder:
<<<
import numpy as np
import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.helpers.onnx_helper import pretty_onnx
TFLOAT = onnx.TensorProto.FLOAT
opts = OptimizationOptions(constant_folding=False)
g = GraphBuilder(20, ir_version=10, optimization_options=opts)
g.make_tensor_input("X", TFLOAT, ("batch", 4))
mean = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32)
scale = np.array([0.5, 1.0, 2.0, 4.0], dtype=np.float32)
mean_name = g.make_initializer("mean", mean)
scale_name = g.make_initializer("scale", scale)
centered = g.op.Sub("X", mean_name)
g.set_type(centered, TFLOAT)
g.set_shape(centered, ("batch", 4))
result = g.op.Div(centered, scale_name)
g.set_type(result, TFLOAT)
g.set_shape(result, ("batch", 4))
g.make_tensor_output(result, indexed=False, allow_untyped_output=True)
model = g.to_onnx()
print(pretty_onnx(model))
>>>
opset: domain='' version=20
input: name='X' type=dtype('float32') shape=['batch', 4]
init: name='mean' type=float32 shape=(4,) -- array([1., 2., 3., 4.], dtype=float32)
init: name='scale' type=float32 shape=(4,) -- array([0.5, 1. , 2. , 4. ], dtype=float32)
Sub(X, mean) -> _onx_sub_X
Div(_onx_sub_X, scale) -> _onx_div_sub_X
output: name='_onx_div_sub_X' type='NOTENSOR' shape=None
Opset API#
Converters frequently need to know which opset versions are active so they
can choose the right operator variant or register an additional domain
(e.g. "ai.onnx.ml" for scikit-learn models).
Method / attribute |
Description |
|---|---|
|
Read-only property. Returns the opset version for the main ONNX
domain ( |
|
Returns the opset version (an |
|
Returns the opset version for domain. When |
|
Registers domain with the given version. If the domain is already
registered with the same version the call is a no-op; a version
mismatch raises an |
|
Deprecated alias for |
A converter that targets the main ONNX domain only needs to read
g.main_opset. A converter that also emits nodes from a secondary domain
(e.g. "ai.onnx.ml") should first call g.set_opset(domain, version)
to ensure the domain is recorded in the exported model, then query its
version with g.get_opset(domain).
from yobx.typing import GraphBuilderExtendedProtocol
from yobx.xbuilder import GraphBuilder
def convert_my_estimator(g: GraphBuilderExtendedProtocol, sts, outputs, estimator, X):
# Read the main opset to pick the right operator variant.
opset = g.main_opset
# Register and query the ai.onnx.ml domain when needed.
g.set_opset("ai.onnx.ml", 3)
ml_opset = g.get_opset("ai.onnx.ml")
# Check whether an optional domain is already registered.
if g.has_opset("com.microsoft"):
result = g.op.MicrosoftOp(X)
elif opset >= 20:
result = g.op.SomeNewOp(X)
else:
result = g.op.SomeLegacyOp(X)
...
return result
Shape and type API#
Converters are expected to propagate shape and type information after each
node so that downstream converters (e.g. pipeline steps) can query them
without re-running inference. The model may be different given that information.
Below, the required methods are where g defines the GraphBuilder
implemented the expected API.
Method |
Description |
|---|---|
|
Register the element type (an |
|
Return the previously registered element type. |
|
Return |
|
Register the shape for tensor |
|
Return the shape as a tuple of integers or strings. |
|
Return |
|
Register the device for tensor |
|
Return the device. |
|
Return |
In addition, it is usually useful to implement the following methods.
|
Return a name that starts with prefix and is not yet in use anywhere in the graph. |
|---|---|
|
Defines shape, type, device for name equal the one defined for input_name, itype can be used the change the type |
The current does not include common operations on shapes, +, -, //, *, %, min, max
or even their simplification. This is usually needed to optimize models
but not mandatory to write the model itself. This is left to the builder.
Every converter must usually known the type, the device, the rank
and sometimes if a dimension is static or dynamic.
Shape and Type representation#
This API follows ONNX standard.
A name is a string: it is a unique identifier.
A type is an integer: see supported types.
A shape is a tuple, empty or filled with integers (static dimension) or strings (dynamic dimension).
Additionally:
A device is an integer, -1 for CPU, a value >= 0 for a CUDA device.
A rank is an integer and equal to
len(shape).
Propagating shape and type in a converter#
The canonical pattern at the end of every converter is:
result = g.op.Relu(X, name=name)
g.set_type_shape_unary_op(result, X)
return result
The helper set_type_shape_unary_op combines the three
set_* calls into one call.
Convert Options#
ConvertOptionsProtocol is a lightweight protocol that
lets callers opt-in to extra outputs on a per-estimator basis without
changing the core converter signatures.
Protocol contract#
Any object that implements the single method below satisfies the protocol and
can be passed to to_onnx() as the convert_options
argument:
Method |
Description |
|---|---|
|
Return |
Inside a converter, the options object is accessible via the graph builder’s
convert_options property (g.convert_options):
# Inside a converter function:
if g.convert_options.has("decision_path", estimator):
# emit the extra decision-path output
...
Built-in options: ConvertOptions#
ConvertOptions is the default implementation shipped
with the package. It currently exposes two boolean flags:
Option name |
Type |
Description |
|---|---|---|
|
|
When |
|
|
When |
Passing options to to_onnx#
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from yobx.sklearn import to_onnx, ConvertOptions
X = np.random.default_rng(0).standard_normal((20, 4)).astype(np.float32)
y = (X[:, 0] > 0).astype(int)
clf = DecisionTreeClassifier(max_depth=3).fit(X, y)
opts = ConvertOptions(decision_path=True)
model_onnx = to_onnx(clf, (X,), convert_options=opts)
# The exported model now has three outputs:
# output_0 – label (int64, shape [N])
# output_1 – probabilities (float32, shape [N, 2])
# output_2 – decision path (object/string, shape [N, 1])
Implementing a custom protocol#
You can supply any object with a has method. The simplest way is to
subclass DefaultConvertOptions and override has:
from yobx.typing import DefaultConvertOptions
class MyOptions(DefaultConvertOptions):
def has(self, option_name: str, piece: object) -> bool:
# Only enable decision_leaf for RandomForestClassifier:
from sklearn.ensemble import RandomForestClassifier
if option_name == "decision_leaf":
return isinstance(piece, RandomForestClassifier)
return False
Alternatively, any object whose class implements the single-method
ConvertOptionsProtocol is accepted directly.
See Exporting sklearn tree models with convert options for a full runnable example
of decision_path and decision_leaf on single trees and ensembles.
Alternative implementations#
Any class that satisfies the two-part API above can be passed as
builder_cls. The package ships with:
GraphBuilder— the default; builds graphs using onnx protobuf objects with built-in optimization passes.OnnxScriptGraphBuilder— a bridge that satisfies the same API while using theonnxscriptIR internally. Useful when the rest of the pipeline already works with onnxscript.
<<<
import numpy as np
import onnx
from sklearn.preprocessing import StandardScaler
from yobx.sklearn import to_onnx
from yobx.builder.onnxscript import OnnxScriptGraphBuilder
from yobx.helpers.onnx_helper import pretty_onnx
rng = np.random.default_rng(0)
X = rng.standard_normal((10, 4)).astype(np.float32)
scaler = StandardScaler().fit(X)
model = to_onnx(scaler, (X,), builder_cls=OnnxScriptGraphBuilder)
print(pretty_onnx(model))
>>>
opset: domain='' version=21
opset: domain='ai.onnx.ml' version=5
input: name='X' type=dtype('float32') shape=['batch', 4]
init: name='init_' type=float32 shape=(4,) -- array([-0.448, 0.052, -0.093, 0.247], dtype=float32)
init: name='init_2' type=float32 shape=(4,) -- array([0.774, 0.641, 0.825, 0.728], dtype=float32)
Sub(X, init_) -> Sub
Div(Sub, init_2) -> x
output: name='x' type=dtype('float32') shape=['batch', 4]
See also
Sklearn Converter — overview of the built-in converters.
Custom Converter — how to write and register a custom converter.
GraphBuilder — the full GraphBuilder reference, including optimization
passes and dynamic shapes.