Note
Go to the end to download the full example code.
Running inference with onnxruntime#
This example shows how to build a small ONNX model and run it with
onnxruntime. The model is a two-layer MLP created with the
demo_mlp_model() helper.
import numpy as np
import onnxruntime
from yaourt.doc import demo_mlp_model
1. Build the model#
demo_mlp_model returns an
onnx.ModelProto representing a small fully-connected network:
x (3×10) → MatMul → Add → Relu → MatMul → Add → output (3×1).
The input shape is fixed to a batch size of 3.
Opset: 18
Inputs : ['x']
Outputs: ['output_0']
2. Run inference#
We create a random input tensor matching the model’s fixed batch size
and call run().
x = np.random.randn(3, 10).astype(np.float32)
sess = onnxruntime.InferenceSession(model.SerializeToString(), providers=["CPUExecutionProvider"])
(output,) = sess.run(None, {"x": x})
print("Input shape:", x.shape)
print("Output shape:", output.shape)
assert output.shape == (3, 1), f"Unexpected output shape: {output.shape}"
Input shape: (3, 10)
Output shape: (3, 1)
3. Run multiple times#
The same session can be reused for multiple calls with different data.
for i in range(3):
xi = np.random.randn(3, 10).astype(np.float32)
(yi,) = sess.run(None, {"x": xi})
print(f" run={i} output shape: {yi.shape}")
assert yi.shape == (3, 1)
print("All runs passed.")
run=0 output shape: (3, 1)
run=1 output shape: (3, 1)
run=2 output shape: (3, 1)
All runs passed.
Total running time of the script: (0 minutes 0.248 seconds)