Note
Go to the end to download the full example code.
Intermediate results with onnxruntime¶
Example Intermediate results with (ONNX) ReferenceEvaluator demonstrated
how to run a python runtime on a model but it may very slow sometimes
and it could show some discrepancies if the only provider is not CPU.
Let’s use OnnxruntimeEvaluator
.
It splits the model into node and runs them independently until it succeeds
or fails. This class converts every node into model based on the types
discovered during the execution. It relies on InferenceSessionForTorch
or
InferenceSessionForNumpy
for the execution. This example uses torch tensor and
bfloat16.
A failing model¶
The issue here is a an operator Cast
trying to convert a result
into a non-existing type.
import onnx
import onnx.helper as oh
import torch
import onnxruntime
from onnx_diagnostic import doc
from onnx_diagnostic.ext_test_case import has_cuda
from onnx_diagnostic.helpers.onnx_helper import from_array_extended
from onnx_diagnostic.reference import OnnxruntimeEvaluator
TBFLOAT16 = onnx.TensorProto.BFLOAT16
model = oh.make_model(
oh.make_graph(
[
oh.make_node("Mul", ["X", "Y"], ["xy"], name="n0"),
oh.make_node("Sigmoid", ["xy"], ["sy"], name="n1"),
oh.make_node("Add", ["sy", "one"], ["C"], name="n2"),
oh.make_node("Cast", ["C"], ["X999"], to=999, name="failing"),
oh.make_node("CastLike", ["X999", "Y"], ["Z"], name="n4"),
],
"-nd-",
[
oh.make_tensor_value_info("X", TBFLOAT16, ["a", "b", "c"]),
oh.make_tensor_value_info("Y", TBFLOAT16, ["a", "b", "c"]),
],
[oh.make_tensor_value_info("Z", TBFLOAT16, ["a", "b", "c"])],
[from_array_extended(torch.tensor([1], dtype=torch.bfloat16), name="one")],
),
opset_imports=[oh.make_opsetid("", 18)],
ir_version=9,
)
We check it is failing.
try:
onnxruntime.InferenceSession(model.SerializeToString(), providers=["CPUExecutionProvider"])
except onnxruntime.capi.onnxruntime_pybind11_state.Fail as e:
print(e)
[ONNXRuntimeError] : 1 : FAIL : Node (failing) Op (Cast) [TypeInferenceError] Attribute to does not specify a valid type in .
OnnxruntimeEvaluator¶
This class extends onnx.reference.ReferenceEvaluator
with operators outside the standard but defined by onnxruntime.
verbose=10 tells the class to print as much as possible,
verbose=0 prints nothing. Intermediate values for more or less verbosity.
ref = OnnxruntimeEvaluator(model, verbose=10)
feeds = dict(
X=torch.rand((3, 4), dtype=torch.bfloat16), Y=torch.rand((3, 4), dtype=torch.bfloat16)
)
try:
ref.run(None, feeds)
except Exception as e:
print("ERROR", type(e), e)
+C one: bfloat16:(1,):[1.0]
+I X: D-1:torch.bfloat16:torch.Size([3, 4]):0.9609375,0.765625,0.2265625,0.75390625,0.85546875,0.33203125,0.90625,0.51953125,0.1875,0.41796875...
+I Y: D-1:torch.bfloat16:torch.Size([3, 4]):0.33984375,0.71875,0.6796875,0.0234375,0.23828125,0.44921875,0.2578125,0.20703125,0.44921875,0.33203125...
Mul(X, Y) -> xy
ERROR <class 'onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented'> [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Mul(14) node with name 'n0'
onnxruntime may not support bfloat16 on CPU. See onnxruntime kernels.
if has_cuda():
ref = OnnxruntimeEvaluator(model, providers="cuda", verbose=10)
feeds = dict(
X=torch.rand((3, 4), dtype=torch.bfloat16), Y=torch.rand((3, 4), dtype=torch.bfloat16)
)
try:
ref.run(None, feeds)
except Exception as e:
print("ERROR", type(e), e)
+C one: bfloat16:(1,):[1.0]
+I X: D-1:torch.bfloat16:torch.Size([3, 4]):0.52734375,0.19921875,0.91796875,0.109375,0.69140625,0.9921875,0.30859375,0.0390625,0.18359375,0.28515625...
+I Y: D-1:torch.bfloat16:torch.Size([3, 4]):0.6328125,0.70703125,0.6171875,0.1953125,0.13671875,0.68359375,0.8359375,0.421875,0.35546875,0.125...
Mul(X, Y) -> xy
+ xy: D-1:torch.bfloat16:torch.Size([3, 4]):0.333984375,0.140625,0.56640625,0.0213623046875,0.0947265625,0.6796875,0.2578125,0.0164794921875,0.0654296875,0.03564453125...
Sigmoid(xy) -> sy
+ sy: D-1:torch.bfloat16:torch.Size([3, 4]):0.58203125,0.53515625,0.640625,0.50390625,0.5234375,0.6640625,0.5625,0.50390625,0.515625,0.5078125...
Add(sy, one) -> C
+ C: bfloat16:(3, 4):1.578125,1.53125,1.640625,1.5,1.5234375,1.6640625,1.5625,1.5,1.515625,1.5078125...
Cast(C) -> X999
ERROR <class 'RuntimeError'> Unable to infer a session with inputs
#1[A16r2]
due to [ONNXRuntimeError] : 1 : FAIL : Node (failing) Op (Cast) [TypeInferenceError] Attribute to does not specify a valid type in .
opset: domain='' version=18
input: name='C' type=dtype('float32') shape=[3, 4]
Cast(C, to=999) -> X999
output: name='X999' type='NOTENSOR' shape=None
We can see it run until it reaches Cast and stops. The error message is not always obvious to interpret. It gets improved every time from time to time. This runtime is useful when it fails for a numerical reason. It is possible to insert prints in the python code to print more information or debug if needed.
doc.plot_legend("onnxruntime\nrunning\nstep by step", "OnnxruntimeEvaluator", "lightgrey")

Total running time of the script: (0 minutes 7.580 seconds)
Related examples

Intermediate results with (ONNX) ReferenceEvaluator

Find where a model is failing by running submodels