ExtendedReferenceEvaluator

yobx.reference.ExtendedReferenceEvaluator extends onnx.reference.ReferenceEvaluator with additional operator kernels for non-standard domains such as com.microsoft and ai.onnx.complex.

The standard onnx.reference.ReferenceEvaluator only knows about operators defined in the ONNX standard. ONNX Runtime ships many contrib operators (domain com.microsoft) that are widely used in production models — for example FusedMatMul, QuickGelu and Attention. ExtendedReferenceEvaluator makes it possible to run and unit-test such models with pure Python, without requiring a full ONNX Runtime installation.

Built-in operators

The following table lists the operator implementations that are registered automatically. They are available as default_ops.

Class

Domain

Description

Attention

com.microsoft

Multi-head self-attention with optional mask

BiasSoftmax

com.microsoft

Softmax with an additive bias term

ComplexModule

ai.onnx.complex

Element-wise modulus of a complex tensor

FusedMatMul

com.microsoft

Matrix multiplication with optional transpositions (transA/transB) and alpha scaling

MemcpyFromHost

(default)

Identity copy (device ↔ host no-op)

MemcpyToHost

(default)

Identity copy (device ↔ host no-op)

QLinearAveragePool

com.microsoft

Quantized average pooling

QLinearConv

com.microsoft

Quantized 2-D convolution

QuickGelu

com.microsoft

Gated sigmoid activation x·σ(α·x)

SkipLayerNormalization

com.microsoft

Residual add followed by layer normalisation

ToComplex

ai.onnx.complex

Converts a real tensor (..., 2) to complex

The full list at runtime can be printed with:

<<<

import pprint
from yobx.reference import ExtendedReferenceEvaluator

pprint.pprint(ExtendedReferenceEvaluator.default_ops)

>>>

    [<class 'yobx.reference.ops.op_attention.Attention'>,
     <class 'yobx.reference.ops.op_bias_softmax.BiasSoftmax'>,
     <class 'yobx.reference.ops.op_complex.ComplexModule'>,
     <class 'yobx.reference.ops.op_fused_matmul.FusedMatMul'>,
     <class 'yobx.reference.ops.op_memcpy_host.MemcpyFromHost'>,
     <class 'yobx.reference.ops.op_memcpy_host.MemcpyToHost'>,
     <class 'yobx.reference.ops.op_qlinear_conv.QLinearConv'>,
     <class 'yobx.reference.ops.op_qlinear_average_pool.QLinearAveragePool'>,
     <class 'yobx.reference.ops.op_quick_gelu.QuickGelu'>,
     <class 'yobx.reference.ops.op_skip_layer_normalization.SkipLayerNormalization'>,
     <class 'yobx.reference.ops.op_complex.ToComplex'>]

Basic usage

ExtendedReferenceEvaluator is a drop-in replacement for onnx.reference.ReferenceEvaluator. Any model that runs with the standard evaluator also runs here.

<<<

import numpy as np
import onnx
import onnx.helper as oh
from yobx.reference import ExtendedReferenceEvaluator

TFLOAT = onnx.TensorProto.FLOAT
model = oh.make_model(
    oh.make_graph(
        [oh.make_node("Add", ["X", "Y"], ["Z"])],
        "add_graph",
        [
            oh.make_tensor_value_info("X", TFLOAT, [None, None]),
            oh.make_tensor_value_info("Y", TFLOAT, [None, None]),
        ],
        [oh.make_tensor_value_info("Z", TFLOAT, [None, None])],
    ),
    opset_imports=[oh.make_opsetid("", 18)],
    ir_version=10,
)
ref = ExtendedReferenceEvaluator(model)
x = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32)
(result,) = ref.run(None, {"X": x, "Y": x})
print(result)

>>>

    [[2. 4.]
     [6. 8.]]

Contrib operators

Models that use ONNX Runtime contrib operators can be run directly. The example below uses FusedMatMul — a com.microsoft operator that fuses matrix multiplication with optional transposition of either operand.

<<<

import numpy as np
import onnx
import onnx.helper as oh
from yobx.reference import ExtendedReferenceEvaluator

TFLOAT = onnx.TensorProto.FLOAT
model = oh.make_model(
    oh.make_graph(
        [
            oh.make_node(
                "FusedMatMul", ["X", "Y"], ["Z"], domain="com.microsoft", transA=1
            )
        ],
        "fused_mm",
        [
            oh.make_tensor_value_info("X", TFLOAT, None),
            oh.make_tensor_value_info("Y", TFLOAT, None),
        ],
        [oh.make_tensor_value_info("Z", TFLOAT, None)],
    ),
    opset_imports=[oh.make_opsetid("", 18), oh.make_opsetid("com.microsoft", 1)],
    ir_version=10,
)
ref = ExtendedReferenceEvaluator(model)
a = np.arange(4, dtype=np.float32).reshape(2, 2)
(result,) = ref.run(None, {"X": a, "Y": a})
print(result)  # a.T @ a

>>>

    [[ 4.  6.]
     [ 6. 10.]]

Adding custom operators

Pass extra OpRun subclasses through the new_ops argument. They are merged with default_ops; you do not need to re-list the built-in contrib operators.

<<<

import numpy as np
import onnx
import onnx.helper as oh
from onnx.reference.op_run import OpRun
from yobx.reference import ExtendedReferenceEvaluator

TFLOAT = onnx.TensorProto.FLOAT


class MyCustomOp(OpRun):
    op_domain = "my.domain"

    def _run(self, X):
        return (X * 2,)


model = oh.make_model(
    oh.make_graph(
        [oh.make_node("MyCustomOp", ["X"], ["Z"], domain="my.domain")],
        "custom_graph",
        [oh.make_tensor_value_info("X", TFLOAT, [None])],
        [oh.make_tensor_value_info("Z", TFLOAT, [None])],
    ),
    opset_imports=[oh.make_opsetid("", 18), oh.make_opsetid("my.domain", 1)],
    ir_version=10,
)
ref = ExtendedReferenceEvaluator(model, new_ops=[MyCustomOp])
x = np.array([1.0, 2.0, 3.0], dtype=np.float32)
(result,) = ref.run(None, {"X": x})
print(result)  # [2. 4. 6.]

>>>

    [2. 4. 6.]

Inspecting intermediate results

Pass verbose=10 to ExtendedReferenceEvaluator to print every input, every intermediate result, and every output as the model executes. This is useful for debugging incorrect outputs or understanding how values flow through the graph.

The verbose parameter maps to the logging levels used internally by onnx.reference.ReferenceEvaluator:

  • verbose=0 (default) — silent

  • verbose=2 — prints each node as it executes (NodeOp(inputs) -> outputs)

  • verbose=3 or higher — also prints the value of every input, initializer constant (+C), and intermediate/final result (+, +I)

<<<

import numpy as np
import onnx
import onnx.helper as oh
from yobx.reference import ExtendedReferenceEvaluator

TFLOAT = onnx.TensorProto.FLOAT
model = oh.make_model(
    oh.make_graph(
        [
            oh.make_node("Add", ["X", "Y"], ["T"]),
            oh.make_node("Relu", ["T"], ["Z"]),
        ],
        "add_relu",
        [
            oh.make_tensor_value_info("X", TFLOAT, [None, None]),
            oh.make_tensor_value_info("Y", TFLOAT, [None, None]),
        ],
        [oh.make_tensor_value_info("Z", TFLOAT, [None, None])],
    ),
    opset_imports=[oh.make_opsetid("", 18)],
    ir_version=10,
)
ref = ExtendedReferenceEvaluator(model, verbose=10)
x = np.array([[1.0, -2.0], [3.0, -4.0]], dtype=np.float32)
(result,) = ref.run(None, {"X": x, "Y": x})
print("result:", result)

>>>

     +I X: float32:(2, 2):[1.0, -2.0, 3.0, -4.0]
     +I Y: float32:(2, 2):[1.0, -2.0, 3.0, -4.0]
    Add(X, Y) -> T
     + T: float32:(2, 2):[2.0, -4.0, 6.0, -8.0]
    Relu(T) -> Z
     + Z: float32:(2, 2):[2.0, 0.0, 6.0, 0.0]
    result: [[2. 0.]
     [6. 0.]]

The lines prefixed with +I are model inputs; lines with +C are initializer constants; and lines with + (after a node execution line) are the intermediate or final outputs produced by that node.

Operator versioning

When a model imports multiple versions of a domain (e.g. opset 13 and 17), filter_ops selects the best (highest version that does not exceed the model opset) implementation from the new_ops list.

This mirrors the versioning convention used by onnx.reference.ReferenceEvaluator itself: operator classes whose names end in _<version> (e.g. MyOp_13, MyOp_17) are treated as versioned alternatives and the most appropriate one is chosen automatically.

See also

ExtendedReferenceEvaluator: running models with contrib operators — sphinx-gallery example demonstrating standard operators, FusedMatMul, QuickGelu, and custom operator injection.