Many ways to implement a custom graph in ONNX#
ONNX defines a long list of operators used in machine learning models. They are used to implement functions. This step is usually taken care of by converting libraries: sklearn-onnx for scikit-learn, torch.onnx for pytorch, tensorflow-onnx for tensorflow. Both torch.onnx and tensorflow-onnx converts any function expressed with the available function in those packages and that works because there is usually no need to mix packages. But in some occasions, there is a need to directly write functions with the onnx syntax. scikit-learn is implemented with numpy and there is no converter from numpy to onnx. Sometimes, it is needed to extend an existing onnx models or to merge models coming from different packages. Sometimes, they are just not available, only onnx is. Let’s see how it looks like with a very simply example.
Euclidian distance#
For example, the well known Euclidian distance can be expressed with numpy as follows:
import numpy as np
def euclidan(X: np.array, Y: np.array) -> float:
return ((X - Y) ** 2).sum()
The mathematical function must first be translated with ONNX Operators or primitives. It is usually easy because the primitives are very close to what numpy defines. It can be expressed as (the syntax is just for illustration).
import onnx
onnx-def euclidian(X: onnx.TensorProto[FLOAT], X: onnx.TensorProto[FLOAT]) -> onnx.FLOAT:
dxy = onnx.Sub(X, Y)
sxy = onnx.Pow(dxy, 2)
d = onnx.ReduceSum(sxy)
return d
This example is short but does not work as it is. The inner API defined in onnx.helper is quite verbose and the true implementation would be the following.
<<<
import onnx
import onnx.helper as oh
def make_euclidean(
input_names: tuple[str] = ("X", "Y"),
output_name: str = "Z",
elem_type: int = onnx.TensorProto.FLOAT,
opset: int | None = None,
) -> onnx.ModelProto:
if opset is None:
opset = onnx.defs.onnx_opset_version()
X = oh.make_tensor_value_info(input_names[0], elem_type, None)
Y = oh.make_tensor_value_info(input_names[1], elem_type, None)
Z = oh.make_tensor_value_info(output_name, elem_type, None)
two = oh.make_tensor("two", onnx.TensorProto.INT64, [1], [2])
n1 = oh.make_node("Sub", ["X", "Y"], ["dxy"])
n2 = oh.make_node("Pow", ["dxy", "two"], ["dxy2"])
n3 = oh.make_node("ReduceSum", ["dxy2"], [output_name])
graph = oh.make_graph([n1, n2, n3], "euclidian", [X, Y], [Z], [two])
model = oh.make_model(graph, opset_imports=[oh.make_opsetid("", opset)])
return model
model = make_euclidean()
print(model)
>>>
ir_version: 9
opset_import {
domain: ""
version: 21
}
graph {
node {
input: "X"
input: "Y"
output: "dxy"
op_type: "Sub"
}
node {
input: "dxy"
input: "two"
output: "dxy2"
op_type: "Pow"
}
node {
input: "dxy2"
output: "Z"
op_type: "ReduceSum"
}
name: "euclidian"
initializer {
dims: 1
data_type: 7
int64_data: 2
name: "two"
}
input {
name: "X"
type {
tensor_type {
elem_type: 1
}
}
}
input {
name: "Y"
type {
tensor_type {
elem_type: 1
}
}
}
output {
name: "Z"
type {
tensor_type {
elem_type: 1
}
}
}
}
Since it is a second implementation of an existing function, it is necessary to check the output is the same.
<<<
import numpy as np
from numpy.testing import assert_allclose
from onnx.reference import ReferenceEvaluator
from onnx_array_api.ext_test_case import ExtTestCase
# This is the same function.
from onnx_array_api.validation.docs import make_euclidean
def test_make_euclidean():
model = make_euclidean()
ref = ReferenceEvaluator(model)
X = np.random.rand(3, 4).astype(np.float32)
Y = np.random.rand(3, 4).astype(np.float32)
expected = ((X - Y) ** 2).sum(keepdims=1)
got = ref.run(None, {"X": X, "Y": Y})[0]
assert_allclose(expected, got, atol=1e-6)
test_make_euclidean()
>>>
But the reference implementation in onnx is not the runtime used to deploy the model. A second unit test must be added to check this one as well.
<<<
import numpy as np
from numpy.testing import assert_allclose
from onnx_array_api.ext_test_case import ExtTestCase
# This is the same function.
from onnx_array_api.validation.docs import make_euclidean
def test_make_euclidean_ort():
from onnxruntime import InferenceSession
model = make_euclidean()
ref = InferenceSession(
model.SerializeToString(), providers=["CPUExecutionProvider"]
)
X = np.random.rand(3, 4).astype(np.float32)
Y = np.random.rand(3, 4).astype(np.float32)
expected = ((X - Y) ** 2).sum(keepdims=1)
got = ref.run(None, {"X": X, "Y": Y})[0]
assert_allclose(expected, got, atol=1e-6)
try:
test_make_euclidean_ort()
except Exception as e:
print(e)
>>>
[2023-11-25 13:29:58,027] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/xadupre/install/neural-compressor/neural_compressor/utils/utility.py:44: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import parse_version
/home/xadupre/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)
/home/xadupre/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)
/home/xadupre/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)
/home/xadupre/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /home/xadupre/github/onnxruntime/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::__cxx11::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 21 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx is till opset 20.
The list of operators is constantly evolving: onnx is versioned. The function may fail because the model says it is using a version a runtime does not support. Let’s change it.
<<<
import numpy as np
from numpy.testing import assert_allclose
from onnx_array_api.ext_test_case import ExtTestCase
# This is the same function.
from onnx_array_api.validation.docs import make_euclidean
def test_make_euclidean_ort():
from onnxruntime import InferenceSession
# opset=18: it uses the opset version 18, this number
# is incremented at every minor release.
model = make_euclidean(opset=18)
ref = InferenceSession(
model.SerializeToString(), providers=["CPUExecutionProvider"]
)
X = np.random.rand(3, 4).astype(np.float32)
Y = np.random.rand(3, 4).astype(np.float32)
expected = ((X - Y) ** 2).sum(keepdims=1)
got = ref.run(None, {"X": X, "Y": Y})[0]
assert_allclose(expected, got, atol=1e-6)
test_make_euclidean_ort()
>>>
But the runtime must support many versions and the unit tests may look like the following:
<<<
import numpy as np
from numpy.testing import assert_allclose
import onnx.defs
from onnx_array_api.ext_test_case import ExtTestCase
# This is the same function.
from onnx_array_api.validation.docs import make_euclidean
def test_make_euclidean_ort():
from onnxruntime import InferenceSession
# opset=18: it uses the opset version 18, this number
# is incremented at every minor release.
X = np.random.rand(3, 4).astype(np.float32)
Y = np.random.rand(3, 4).astype(np.float32)
expected = ((X - Y) ** 2).sum(keepdims=1)
for opset in range(6, onnx.defs.onnx_opset_version() - 1):
model = make_euclidean(opset=opset)
try:
ref = InferenceSession(
model.SerializeToString(), providers=["CPUExecutionProvider"]
)
got = ref.run(None, {"X": X, "Y": Y})[0]
except Exception as e:
print(f"fail opset={opset}", e)
if opset < 18:
continue
raise e
assert_allclose(expected, got, atol=1e-6)
test_make_euclidean_ort()
>>>
fail opset=6 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
fail opset=7 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
fail opset=8 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
fail opset=9 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
fail opset=10 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
fail opset=11 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
This work is quite long even for a simple function. For a longer one, due to the verbosity of the inner API, it is quite difficult to write the correct implementation on the first try. The unit test cannot be avoided. The inner API is usually enough when the translation from python to onnx does not happen often. When it is, almost every library implements its own simplified way to create onnx graphs and because creating its own API is not difficult, many times, the decision was made to create a new one rather than using an existing one.
Existing API#
Many existing options are available to write custom onnx graphs. The development is usually driven by what they are used for. Each of them may not fully support all your needs and it is not always easy to understand the error messages they provide when something goes wrong. It is better to understand its own need before choosing one. Here are some of the questions which may need to be answered.
ability to easily write loops and tests (control flow)
ability to debug (eager mode)
ability to use the same function to produce different implementations based on the same version
ability to interact with other frameworks
ability to merge existing onnx graph
ability to describe an existing graph with this API
ability to easily define constants
ability to handle multiple domains
ability to support local functions
easy error messages
is it actively maintained?
Use torch or tensorflow#
pytorch offers the possibility to convert any function implemented with pytorch function into onnx with torch.onnx. A couple of examples.
import torch
import torch.nn
class MyModel(torch.nn.Module):
def __init__(self) -> None:
super().__init__()
self.linear = torch.nn.Linear(2, 2)
def forward(self, x, bias=None):
out = self.linear(x)
out = out + bias
return out
model = MyModel()
kwargs = {"bias": 3.}
inputs = (torch.randn(2, 2, 2),)
export_output = torch.onnx.dynamo_export(model, inputs, **kwargs)
export_output.save("my_simple_model.onnx")
from typing import Dict, Tuple
import torch
import torch.onnx
def func_with_nested_input_structure(
x_dict: Dict[str, torch.Tensor],
y_tuple: Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
):
if "a" in x_dict:
x = x_dict["a"]
elif "b" in x_dict:
x = x_dict["b"]
else:
x = torch.randn(3)
y1, (y2, y3) = y_tuple
return x + y1 + y2 + y3
x_dict = {"a": torch.tensor(1.)}
y_tuple = (torch.tensor(2.), (torch.tensor(3.), torch.tensor(4.)))
export_output = torch.onnx.dynamo_export(func_with_nested_input_structure, x_dict, y_tuple)
print(export_output.adapt_torch_inputs_to_onnx(x_dict, y_tuple))
onnxscript#
onnxscript is used in Torch Export to ONNX. It converts python code to onnx code by analyzing the python code (through ast). The package makes it very easy to use loops and tests in onnx. It is very close to onnx syntax. It is not easy to support multiple implementation depending on the opset version required by the user.
Example taken from the documentation :
import onnx
# We use ONNX opset 15 to define the function below.
from onnxscript import FLOAT
from onnxscript import opset15 as op
from onnxscript import script
# We use the script decorator to indicate that
# this is meant to be translated to ONNX.
@script()
def onnx_hardmax(X, axis: int):
"""Hardmax is similar to ArgMax, with the result being encoded OneHot style."""
# The type annotation on X indicates that it is a float tensor of
# unknown rank. The type annotation on axis indicates that it will
# be treated as an int attribute in ONNX.
#
# Invoke ONNX opset 15 op ArgMax.
# Use unnamed arguments for ONNX input parameters, and named
# arguments for ONNX attribute parameters.
argmax = op.ArgMax(X, axis=axis, keepdims=False)
xshape = op.Shape(X, start=axis)
# use the Constant operator to create constant tensors
zero = op.Constant(value_ints=[0])
depth = op.GatherElements(xshape, zero)
empty_shape = op.Constant(value_ints=[0])
depth = op.Reshape(depth, empty_shape)
values = op.Constant(value_ints=[0, 1])
cast_values = op.CastLike(values, X)
return op.OneHot(argmax, depth, cast_values, axis=axis)
# We use the script decorator to indicate that
# this is meant to be translated to ONNX.
@script()
def sample_model(X: FLOAT[64, 128], Wt: FLOAT[128, 10], Bias: FLOAT[10]) -> FLOAT[64, 10]:
matmul = op.MatMul(X, Wt) + Bias
return onnx_hardmax(matmul, axis=1)
# onnx_model is an in-memory ModelProto
onnx_model = sample_model.to_model_proto()
# Save the ONNX model at a given path
onnx.save(onnx_model, "sample_model.onnx")
# Check the model
try:
onnx.checker.check_model(onnx_model)
except onnx.checker.ValidationError as e:
print(f"The model is invalid: {e}")
else:
print("The model is valid!")
An Eager mode is available to debug what the code does.
import numpy as np
v = np.array([[0, 1], [2, 3]], dtype=np.float32)
result = Hardmax(v)
spox#
The syntax of spox is similar but it does not use ast. Therefore, loops and tests are expressed in a very different way. The tricky part with it is to handle the local context. A variable created in the main graph is known by any of its subgraphs.
Example taken from the documentation :
import onnx
from spox import argument, build, Tensor, Var
# Import operators from the ai.onnx domain at version 17
from spox.opset.ai.onnx import v17 as op
def geometric_mean(x: Var, y: Var) -> Var:
# use the standard Sqrt and Mul
return op.sqrt(op.mul(x, y))
# Create typed model inputs. Each tensor is of rank 1
# and has the runtime-determined length 'N'.
a = argument(Tensor(float, ('N',)))
b = argument(Tensor(float, ('N',)))
# Perform operations on `Var`s
c = geometric_mean(a, b)
# Build an `onnx.ModelProto` for the given inputs and outputs.
model: onnx.ModelProto = build(inputs={'a': a, 'b': b}, outputs={'c': c})
The function can be tested with a mechanism called value propagation.
sklearn-onnx#
sklearn-onnx also implements its own API to add custom graphs. It was designed to shorten the time spent in reimplementing scikit-learn code into onnx code. It can be used to implement a new converter mapped a custom model as described in this example: Implement a new converter. But it can also be used to build standalone models.
<<<
import numpy as np
import onnx
import onnx.helper as oh
from onnx_array_api.plotting.text_plot import onnx_simple_text_plot
def make_euclidean_skl2onnx(
input_names: tuple[str] = ("X", "Y"),
output_name: str = "Z",
elem_type: int = onnx.TensorProto.FLOAT,
opset: int | None = None,
) -> onnx.ModelProto:
if opset is None:
opset = onnx.defs.onnx_opset_version()
from skl2onnx.algebra.onnx_ops import OnnxSub, OnnxPow, OnnxReduceSum
dxy = OnnxSub(input_names[0], input_names[1], op_version=opset)
dxy2 = OnnxPow(dxy, np.array([2], dtype=np.int64), op_version=opset)
final = OnnxReduceSum(dxy2, op_version=opset, output_names=[output_name])
np_type = oh.tensor_dtype_to_np_dtype(elem_type)
dummy = np.empty([1], np_type)
return final.to_onnx({"X": dummy, "Y": dummy})
model = make_euclidean_skl2onnx()
print(onnx_simple_text_plot(model))
>>>
opset: domain='' version=15
input: name='X' type=dtype('float32') shape=['']
input: name='Y' type=dtype('float32') shape=['']
init: name='Po_Powcst' type=dtype('int64') shape=(1,) -- array([2])
Sub(X, Y) -> Su_C0
Pow(Su_C0, Po_Powcst) -> Po_Z0
ReduceSum(Po_Z0) -> Z
output: name='Z' type=dtype('float32') shape=[1]
onnxblocks#
onnxblocks was introduced in onnxruntime to define custom losses in order to train a model with onnxruntime-training. It is mostly used for this usage. The syntax is similar to pytorch.
import onnxruntime.training.onnxblock as onnxblock
from onnxruntime.training import artifacts
# Define a custom loss block that takes in two inputs
# and performs a weighted average of the losses from these
# two inputs.
class WeightedAverageLoss(onnxblock.Block):
def __init__(self):
self._loss1 = onnxblock.loss.MSELoss()
self._loss2 = onnxblock.loss.MSELoss()
self._w1 = onnxblock.blocks.Constant(0.4)
self._w2 = onnxblock.blocks.Constant(0.6)
self._add = onnxblock.blocks.Add()
self._mul = onnxblock.blocks.Mul()
def build(self, loss_input_name1, loss_input_name2):
# The build method defines how the block should be stacked on top of
# loss_input_name1 and loss_input_name2
# Returns weighted average of the two losses
return self._add(
self._mul(self._w1(), self._loss1(loss_input_name1, target_name="target1")),
self._mul(self._w2(), self._loss2(loss_input_name2, target_name="target2"))
)
my_custom_loss = WeightedAverageLoss()
# Load the onnx model
model_path = "model.onnx"
base_model = onnx.load(model_path)
# Define the parameters that need their gradient computed
requires_grad = ["weight1", "bias1", "weight2", "bias2"]
frozen_params = ["weight3", "bias3"]
# Now, we can invoke generate_artifacts with this custom loss function
artifacts.generate_artifacts(base_model, requires_grad = requires_grad, frozen_params = frozen_params,
loss = my_custom_loss, optimizer = artifacts.OptimType.AdamW)
# Successful completion of the above call will generate 4 files in the current working directory,
# one for each of the artifacts mentioned above (training_model.onnx, eval_model.onnx, checkpoint, op)
ONNX GraphSurgeon#
onnx-graphsurgeon implements main class Graph which provides all the necessary method to add nodes, import existing onnx files. The following example is taken from onnx-graphsurgeon/examples. The first part generates a graph.
import onnx_graphsurgeon as gs
import numpy as np
import onnx
# Computes Y = x0 + (a * x1 + b)
shape = (1, 3, 224, 224)
# Inputs
x0 = gs.Variable(name="x0", dtype=np.float32, shape=shape)
x1 = gs.Variable(name="x1", dtype=np.float32, shape=shape)
# Intermediate tensors
a = gs.Constant("a", values=np.ones(shape=shape, dtype=np.float32))
b = gs.Constant("b", values=np.ones(shape=shape, dtype=np.float32))
mul_out = gs.Variable(name="mul_out")
add_out = gs.Variable(name="add_out")
# Outputs
Y = gs.Variable(name="Y", dtype=np.float32, shape=shape)
nodes = [
# mul_out = a * x1
gs.Node(op="Mul", inputs=[a, x1], outputs=[mul_out]),
# add_out = mul_out + b
gs.Node(op="Add", inputs=[mul_out, b], outputs=[add_out]),
# Y = x0 + add
gs.Node(op="Add", inputs=[x0, add_out], outputs=[Y]),
]
graph = gs.Graph(nodes=nodes, inputs=[x0, x1], outputs=[Y])
onnx.save(gs.export_onnx(graph), "model.onnx")
The second part modifies it.
import onnx_graphsurgeon as gs
import numpy as np
import onnx
graph = gs.import_onnx(onnx.load("model.onnx"))
# 1. Remove the `b` input of the add node
first_add = [node for node in graph.nodes if node.op == "Add"][0]
first_add.inputs = [inp for inp in first_add.inputs if inp.name != "b"]
# 2. Change the Add to a LeakyRelu
first_add.op = "LeakyRelu"
first_add.attrs["alpha"] = 0.02
# 3. Add an identity after the add node
identity_out = gs.Variable("identity_out", dtype=np.float32)
identity = gs.Node(op="Identity", inputs=first_add.outputs, outputs=[identity_out])
graph.nodes.append(identity)
# 4. Modify the graph output to be the identity output
graph.outputs = [identity_out]
# 5. Remove unused nodes/tensors, and topologically sort the graph
# ONNX requires nodes to be topologically sorted to be considered valid.
# Therefore, you should only need to sort the graph when you have added new nodes out-of-order.
# In this case, the identity node is already in the correct spot (it is the last node,
# and was appended to the end of the list), but to be on the safer side, we can sort anyway.
graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "modified.onnx")
numpy API for onnx#
See Numpy API for ONNX. This API was introduced to create graphs by using numpy API. If a function is defined only with numpy, it should be possible to use the exact same code to create the corresponding onnx graph. That’s what this API tries to achieve. It works with the exception of control flow. In that case, the function produces different onnx graphs depending on the execution path.
<<<
import numpy as np
from onnx_array_api.npx import jit_onnx
from onnx_array_api.plotting.text_plot import onnx_simple_text_plot
def l2_loss(x, y):
return ((x - y) ** 2).sum(keepdims=1)
jitted_myloss = jit_onnx(l2_loss)
dummy = np.array([0], dtype=np.float32)
# The function is executed. Only then a onnx graph is created.
# One is created depending on the input type.
jitted_myloss(dummy, dummy)
# get_onnx only works if it was executed once or at least with
# the same input type
model = jitted_myloss.get_onnx()
print(onnx_simple_text_plot(model))
>>>
opset: domain='' version=18
input: name='x0' type=dtype('float32') shape=['']
input: name='x1' type=dtype('float32') shape=['']
Constant(value=2) -> r__1
Sub(x0, x1) -> r__0
CastLike(r__1, r__0) -> r__2
Pow(r__0, r__2) -> r__3
ReduceSum(r__3, keepdims=1) -> r__4
output: name='r__4' type=dtype('float32') shape=[1]
Light API#
See Light API for ONNX: everything in one line. This API was created to be able to write an onnx graph in one instruction. It is inspired from the reverse Polish notation. There is no eager mode.
<<<
import numpy as np
from onnx_array_api.light_api import start
from onnx_array_api.plotting.text_plot import onnx_simple_text_plot
model = (
start()
.vin("X")
.vin("Y")
.bring("X", "Y")
.Sub()
.rename("dxy")
.cst(np.array([2], dtype=np.int64), "two")
.bring("dxy", "two")
.Pow()
.ReduceSum()
.rename("Z")
.vout()
.to_onnx()
)
print(onnx_simple_text_plot(model))
>>>
opset: domain='' version=20
input: name='X' type=dtype('float32') shape=None
input: name='Y' type=dtype('float32') shape=None
init: name='two' type=dtype('int64') shape=(1,) -- array([2])
Sub(X, Y) -> dxy
Pow(dxy, two) -> r1_0
ReduceSum(r1_0, keepdims=1, noop_with_empty_axes=0) -> Z
output: name='Z' type=dtype('float32') shape=None