ExtendedReferenceEvaluator¶

yobx.reference.ExtendedReferenceEvaluator extends onnx.reference.ReferenceEvaluator with additional operator kernels for non-standard domains such as com.microsoft and ai.onnx.complex.

The standard onnx.reference.ReferenceEvaluator only knows about operators defined in the ONNX standard. ONNX Runtime ships many contrib operators (domain com.microsoft) that are widely used in production models — for example FusedMatMul, QuickGelu and Attention. ExtendedReferenceEvaluator makes it possible to run and unit-test such models with pure Python, without requiring a full ONNX Runtime installation.

Built-in operators¶

The following table lists the operator implementations that are registered automatically. They are available as default_ops.

Class	Domain	Description
`Attention`	com.microsoft	Multi-head self-attention with optional mask
`BiasSoftmax`	com.microsoft	Softmax with an additive bias term
`ComplexModule`	ai.onnx.complex	Element-wise modulus of a complex tensor
`FusedMatMul`	com.microsoft	Matrix multiplication with optional transpositions (`transA`/`transB`) and `alpha` scaling
`MemcpyFromHost`	(default)	Identity copy (device ↔ host no-op)
`MemcpyToHost`	(default)	Identity copy (device ↔ host no-op)
`QLinearAveragePool`	com.microsoft	Quantized average pooling
`QLinearConv`	com.microsoft	Quantized 2-D convolution
`QuickGelu`	com.microsoft	Gated sigmoid activation `x·σ(α·x)`
`SkipLayerNormalization`	com.microsoft	Residual add followed by layer normalisation
`ToComplex`	ai.onnx.complex	Converts a real tensor `(..., 2)` to complex

The full list at runtime can be printed with:

<<<

import pprint
from yobx.reference import ExtendedReferenceEvaluator

pprint.pprint(ExtendedReferenceEvaluator.default_ops)

>>>

    [<class 'yobx.reference.ops.op_attention.Attention'>,
     <class 'yobx.reference.ops.op_bias_softmax.BiasSoftmax'>,
     <class 'yobx.reference.ops.op_complex.ComplexModule'>,
     <class 'yobx.reference.ops.op_fused_matmul.FusedMatMul'>,
     <class 'yobx.reference.ops.op_memcpy_host.MemcpyFromHost'>,
     <class 'yobx.reference.ops.op_memcpy_host.MemcpyToHost'>,
     <class 'yobx.reference.ops.op_qlinear_conv.QLinearConv'>,
     <class 'yobx.reference.ops.op_qlinear_average_pool.QLinearAveragePool'>,
     <class 'yobx.reference.ops.op_quick_gelu.QuickGelu'>,
     <class 'yobx.reference.ops.op_skip_layer_normalization.SkipLayerNormalization'>,
     <class 'yobx.reference.ops.op_complex.ToComplex'>]

Basic usage¶

ExtendedReferenceEvaluator is a drop-in replacement for onnx.reference.ReferenceEvaluator. Any model that runs with the standard evaluator also runs here.

<<<

import numpy as np
import onnx
import onnx.helper as oh
from yobx.reference import ExtendedReferenceEvaluator

TFLOAT = onnx.TensorProto.FLOAT
model = oh.make_model(
    oh.make_graph(
        [oh.make_node("Add", ["X", "Y"], ["Z"])],
        "add_graph",
        [
            oh.make_tensor_value_info("X", TFLOAT, [None, None]),
            oh.make_tensor_value_info("Y", TFLOAT, [None, None]),
        ],
        [oh.make_tensor_value_info("Z", TFLOAT, [None, None])],
    ),
    opset_imports=[oh.make_opsetid("", 18)],
    ir_version=10,
)
ref = ExtendedReferenceEvaluator(model)
x = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32)
(result,) = ref.run(None, {"X": x, "Y": x})
print(result)

>>>

    [[2. 4.]
     [6. 8.]]

Contrib operators¶

Models that use ONNX Runtime contrib operators can be run directly. The example below uses FusedMatMul — a com.microsoft operator that fuses matrix multiplication with optional transposition of either operand.

<<<

import numpy as np
import onnx
import onnx.helper as oh
from yobx.reference import ExtendedReferenceEvaluator

TFLOAT = onnx.TensorProto.FLOAT
model = oh.make_model(
    oh.make_graph(
        [
            oh.make_node(
                "FusedMatMul", ["X", "Y"], ["Z"], domain="com.microsoft", transA=1
            )
        ],
        "fused_mm",
        [
            oh.make_tensor_value_info("X", TFLOAT, None),
            oh.make_tensor_value_info("Y", TFLOAT, None),
        ],
        [oh.make_tensor_value_info("Z", TFLOAT, None)],
    ),
    opset_imports=[oh.make_opsetid("", 18), oh.make_opsetid("com.microsoft", 1)],
    ir_version=10,
)
ref = ExtendedReferenceEvaluator(model)
a = np.arange(4, dtype=np.float32).reshape(2, 2)
(result,) = ref.run(None, {"X": a, "Y": a})
print(result)  # a.T @ a

>>>

    [[ 4.  6.]
     [ 6. 10.]]

Adding custom operators¶

Pass extra OpRun subclasses through the new_ops argument. They are merged with default_ops; you do not need to re-list the built-in contrib operators.

<<<

import numpy as np
import onnx
import onnx.helper as oh
from onnx.reference.op_run import OpRun
from yobx.reference import ExtendedReferenceEvaluator

TFLOAT = onnx.TensorProto.FLOAT


class MyCustomOp(OpRun):
    op_domain = "my.domain"

    def _run(self, X):
        return (X * 2,)


model = oh.make_model(
    oh.make_graph(
        [oh.make_node("MyCustomOp", ["X"], ["Z"], domain="my.domain")],
        "custom_graph",
        [oh.make_tensor_value_info("X", TFLOAT, [None])],
        [oh.make_tensor_value_info("Z", TFLOAT, [None])],
    ),
    opset_imports=[oh.make_opsetid("", 18), oh.make_opsetid("my.domain", 1)],
    ir_version=10,
)
ref = ExtendedReferenceEvaluator(model, new_ops=[MyCustomOp])
x = np.array([1.0, 2.0, 3.0], dtype=np.float32)
(result,) = ref.run(None, {"X": x})
print(result)  # [2. 4. 6.]

>>>

    [2. 4. 6.]

Inspecting intermediate results¶

Pass verbose=10 to ExtendedReferenceEvaluator to print every input, every intermediate result, and every output as the model executes. This is useful for debugging incorrect outputs or understanding how values flow through the graph.

The verbose parameter maps to the logging levels used internally by onnx.reference.ReferenceEvaluator:

verbose=0 (default) — silent
verbose=2 — prints each node as it executes (NodeOp(inputs) -> outputs)
verbose=3 or higher — also prints the value of every input, initializer constant (+C), and intermediate/final result (+, +I)

<<<

import numpy as np
import onnx
import onnx.helper as oh
from yobx.reference import ExtendedReferenceEvaluator

TFLOAT = onnx.TensorProto.FLOAT
model = oh.make_model(
    oh.make_graph(
        [
            oh.make_node("Add", ["X", "Y"], ["T"]),
            oh.make_node("Relu", ["T"], ["Z"]),
        ],
        "add_relu",
        [
            oh.make_tensor_value_info("X", TFLOAT, [None, None]),
            oh.make_tensor_value_info("Y", TFLOAT, [None, None]),
        ],
        [oh.make_tensor_value_info("Z", TFLOAT, [None, None])],
    ),
    opset_imports=[oh.make_opsetid("", 18)],
    ir_version=10,
)
ref = ExtendedReferenceEvaluator(model, verbose=10)
x = np.array([[1.0, -2.0], [3.0, -4.0]], dtype=np.float32)
(result,) = ref.run(None, {"X": x, "Y": x})
print("result:", result)

>>>

     +I X: float32:(2, 2):[1.0, -2.0, 3.0, -4.0]
     +I Y: float32:(2, 2):[1.0, -2.0, 3.0, -4.0]
    Add(X, Y) -> T
     + T: float32:(2, 2):[2.0, -4.0, 6.0, -8.0]
    Relu(T) -> Z
     + Z: float32:(2, 2):[2.0, 0.0, 6.0, 0.0]
    result: [[2. 0.]
     [6. 0.]]

The lines prefixed with +I are model inputs; lines with +C are initializer constants; and lines with + (after a node execution line) are the intermediate or final outputs produced by that node.

Operator versioning¶

When a model imports multiple versions of a domain (e.g. opset 13 and 17), filter_ops selects the best (highest version that does not exceed the model opset) implementation from the new_ops list.

This mirrors the versioning convention used by onnx.reference.ReferenceEvaluator itself: operator classes whose names end in _<version> (e.g. MyOp_13, MyOp_17) are treated as versioned alternatives and the most appropriate one is chosen automatically.