Many ways to implement a custom graph in ONNX#

ONNX defines a long list of operators used in machine learning models. They are used to implement functions. This step is usually taken care of by converting libraries: sklearn-onnx for scikit-learn, torch.onnx for pytorch, tensorflow-onnx for tensorflow. Both torch.onnx and tensorflow-onnx converts any function expressed with the available function in those packages and that works because there is usually no need to mix packages. But in some occasions, there is a need to directly write functions with the onnx syntax. scikit-learn is implemented with numpy and there is no converter from numpy to onnx. Sometimes, it is needed to extend an existing onnx models or to merge models coming from different packages. Sometimes, they are just not available, only onnx is. Let’s see how it looks like with a very simply example.

Euclidian distance#

For example, the well known Euclidian distance f(X,Y)=\sum_{i=1}^n (X_i - Y_i)^2 can be expressed with numpy as follows:

import numpy as np

def euclidan(X: np.array, Y: np.array) -> float:
    return ((X - Y) ** 2).sum()

The mathematical function must first be translated with ONNX Operators or primitives. It is usually easy because the primitives are very close to what numpy defines. It can be expressed as (the syntax is just for illustration).

import onnx

onnx-def euclidian(X: onnx.TensorProto[FLOAT], X: onnx.TensorProto[FLOAT]) -> onnx.FLOAT:
    dxy = onnx.Sub(X, Y)
    sxy = onnx.Pow(dxy, 2)
    d = onnx.ReduceSum(sxy)
    return d

This example is short but does not work as it is. The inner API defined in onnx.helper is quite verbose and the true implementation would be the following.

<<<

import onnx
import onnx.helper as oh


def make_euclidean(
    input_names: tuple[str] = ("X", "Y"),
    output_name: str = "Z",
    elem_type: int = onnx.TensorProto.FLOAT,
    opset: int | None = None,
) -> onnx.ModelProto:
    if opset is None:
        opset = onnx.defs.onnx_opset_version()

        X = oh.make_tensor_value_info(input_names[0], elem_type, None)
        Y = oh.make_tensor_value_info(input_names[1], elem_type, None)
        Z = oh.make_tensor_value_info(output_name, elem_type, None)
        two = oh.make_tensor("two", onnx.TensorProto.INT64, [1], [2])
        n1 = oh.make_node("Sub", ["X", "Y"], ["dxy"])
        n2 = oh.make_node("Pow", ["dxy", "two"], ["dxy2"])
        n3 = oh.make_node("ReduceSum", ["dxy2"], [output_name])
        graph = oh.make_graph([n1, n2, n3], "euclidian", [X, Y], [Z], [two])
        model = oh.make_model(
            graph,
            opset_imports=[oh.make_opsetid("", opset)],
            ir_version=9,
        )
        return model


model = make_euclidean()
print(model)

>>>

    ir_version: 9
    opset_import {
      domain: ""
      version: 22
    }
    graph {
      node {
        input: "X"
        input: "Y"
        output: "dxy"
        op_type: "Sub"
      }
      node {
        input: "dxy"
        input: "two"
        output: "dxy2"
        op_type: "Pow"
      }
      node {
        input: "dxy2"
        output: "Z"
        op_type: "ReduceSum"
      }
      name: "euclidian"
      initializer {
        dims: 1
        data_type: 7
        int64_data: 2
        name: "two"
      }
      input {
        name: "X"
        type {
          tensor_type {
            elem_type: 1
          }
        }
      }
      input {
        name: "Y"
        type {
          tensor_type {
            elem_type: 1
          }
        }
      }
      output {
        name: "Z"
        type {
          tensor_type {
            elem_type: 1
          }
        }
      }
    }

Since it is a second implementation of an existing function, it is necessary to check the output is the same.

<<<

import numpy as np
from numpy.testing import assert_allclose
from onnx.reference import ReferenceEvaluator
from onnx_array_api.ext_test_case import ExtTestCase

# This is the same function.
from onnx_array_api.validation.docs import make_euclidean


def test_make_euclidean():
    model = make_euclidean()

    ref = ReferenceEvaluator(model)
    X = np.random.rand(3, 4).astype(np.float32)
    Y = np.random.rand(3, 4).astype(np.float32)
    expected = ((X - Y) ** 2).sum(keepdims=1)
    got = ref.run(None, {"X": X, "Y": Y})[0]
    assert_allclose(expected, got, atol=1e-6)


test_make_euclidean()

>>>

    

But the reference implementation in onnx is not the runtime used to deploy the model. A second unit test must be added to check this one as well.

<<<

import numpy as np
from numpy.testing import assert_allclose
from onnx_array_api.ext_test_case import ExtTestCase

# This is the same function.
from onnx_array_api.validation.docs import make_euclidean


def test_make_euclidean_ort():
    from onnxruntime import InferenceSession

    model = make_euclidean()

    ref = InferenceSession(
        model.SerializeToString(), providers=["CPUExecutionProvider"]
    )

    X = np.random.rand(3, 4).astype(np.float32)
    Y = np.random.rand(3, 4).astype(np.float32)
    expected = ((X - Y) ** 2).sum(keepdims=1)
    got = ref.run(None, {"X": X, "Y": Y})[0]
    assert_allclose(expected, got, atol=1e-6)


try:
    test_make_euclidean_ort()
except Exception as e:
    print(e)

>>>

    [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /home/xadupre/github/onnxruntime/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::__cxx11::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 22 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx is till opset 20.

The list of operators is constantly evolving: onnx is versioned. The function may fail because the model says it is using a version a runtime does not support. Let’s change it.

<<<

import numpy as np
from numpy.testing import assert_allclose
from onnx_array_api.ext_test_case import ExtTestCase

# This is the same function.
from onnx_array_api.validation.docs import make_euclidean


def test_make_euclidean_ort():
    from onnxruntime import InferenceSession

    # opset=18: it uses the opset version 18, this number
    # is incremented at every minor release.
    model = make_euclidean(opset=18)

    ref = InferenceSession(
        model.SerializeToString(), providers=["CPUExecutionProvider"]
    )
    X = np.random.rand(3, 4).astype(np.float32)
    Y = np.random.rand(3, 4).astype(np.float32)
    expected = ((X - Y) ** 2).sum(keepdims=1)
    got = ref.run(None, {"X": X, "Y": Y})[0]
    assert_allclose(expected, got, atol=1e-6)


test_make_euclidean_ort()

>>>

    

But the runtime must support many versions and the unit tests may look like the following:

<<<

import numpy as np
from numpy.testing import assert_allclose
import onnx.defs
from onnx_array_api.ext_test_case import ExtTestCase

# This is the same function.
from onnx_array_api.validation.docs import make_euclidean


def test_make_euclidean_ort():
    from onnxruntime import InferenceSession

    # opset=18: it uses the opset version 18, this number
    # is incremented at every minor release.
    X = np.random.rand(3, 4).astype(np.float32)
    Y = np.random.rand(3, 4).astype(np.float32)
    expected = ((X - Y) ** 2).sum(keepdims=1)

    for opset in range(6, onnx.defs.onnx_opset_version() - 1):
        model = make_euclidean(opset=opset)

        try:
            ref = InferenceSession(
                model.SerializeToString(), providers=["CPUExecutionProvider"]
            )
            got = ref.run(None, {"X": X, "Y": Y})[0]
        except Exception as e:
            print(f"fail opset={opset}", e)
            if opset < 18:
                continue
            raise e
        assert_allclose(expected, got, atol=1e-6)


test_make_euclidean_ort()

>>>

    fail opset=6 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
    fail opset=7 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
    fail opset=8 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
    fail opset=9 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
    fail opset=10 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.
    fail opset=11 [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (two) of operator (Pow) in node () is invalid.

This work is quite long even for a simple function. For a longer one, due to the verbosity of the inner API, it is quite difficult to write the correct implementation on the first try. The unit test cannot be avoided. The inner API is usually enough when the translation from python to onnx does not happen often. When it is, almost every library implements its own simplified way to create onnx graphs and because creating its own API is not difficult, many times, the decision was made to create a new one rather than using an existing one.

Existing API#

Many existing options are available to write custom onnx graphs. The development is usually driven by what they are used for. Each of them may not fully support all your needs and it is not always easy to understand the error messages they provide when something goes wrong. It is better to understand its own need before choosing one. Here are some of the questions which may need to be answered.

  • ability to easily write loops and tests (control flow)

  • ability to debug (eager mode)

  • ability to use the same function to produce different implementations based on the same version

  • ability to interact with other frameworks

  • ability to merge existing onnx graph

  • ability to describe an existing graph with this API

  • ability to easily define constants

  • ability to handle multiple domains

  • ability to support local functions

  • easy error messages

  • is it actively maintained?

Use torch or tensorflow#

pytorch offers the possibility to convert any function implemented with pytorch function into onnx with torch.onnx. A couple of examples.

import torch
import torch.nn


class MyModel(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = torch.nn.Linear(2, 2)

    def forward(self, x, bias=None):
        out = self.linear(x)
        out = out + bias
        return out

model = MyModel()
kwargs = {"bias": 3.}
inputs = (torch.randn(2, 2, 2),)

export_output = torch.onnx.dynamo_export(model, inputs, **kwargs)
export_output.save("my_simple_model.onnx")
from typing import Dict, Tuple
import torch
import torch.onnx


def func_with_nested_input_structure(
    x_dict: Dict[str, torch.Tensor],
    y_tuple: Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
):
    if "a" in x_dict:
        x = x_dict["a"]
    elif "b" in x_dict:
        x = x_dict["b"]
    else:
        x = torch.randn(3)

    y1, (y2, y3) = y_tuple

    return x + y1 + y2 + y3

x_dict = {"a": torch.tensor(1.)}
y_tuple = (torch.tensor(2.), (torch.tensor(3.), torch.tensor(4.)))
export_output = torch.onnx.dynamo_export(func_with_nested_input_structure, x_dict, y_tuple)

print(export_output.adapt_torch_inputs_to_onnx(x_dict, y_tuple))

onnxscript#

onnxscript is used in Torch Export to ONNX. It converts python code to onnx code by analyzing the python code (through ast). The package makes it very easy to use loops and tests in onnx. It is very close to onnx syntax. It is not easy to support multiple implementation depending on the opset version required by the user.

Example taken from the documentation :

import onnx

# We use ONNX opset 15 to define the function below.
from onnxscript import FLOAT
from onnxscript import opset15 as op
from onnxscript import script


# We use the script decorator to indicate that
# this is meant to be translated to ONNX.
@script()
def onnx_hardmax(X, axis: int):
    """Hardmax is similar to ArgMax, with the result being encoded OneHot style."""

    # The type annotation on X indicates that it is a float tensor of
    # unknown rank. The type annotation on axis indicates that it will
    # be treated as an int attribute in ONNX.
    #
    # Invoke ONNX opset 15 op ArgMax.
    # Use unnamed arguments for ONNX input parameters, and named
    # arguments for ONNX attribute parameters.
    argmax = op.ArgMax(X, axis=axis, keepdims=False)
    xshape = op.Shape(X, start=axis)
    # use the Constant operator to create constant tensors
    zero = op.Constant(value_ints=[0])
    depth = op.GatherElements(xshape, zero)
    empty_shape = op.Constant(value_ints=[0])
    depth = op.Reshape(depth, empty_shape)
    values = op.Constant(value_ints=[0, 1])
    cast_values = op.CastLike(values, X)
    return op.OneHot(argmax, depth, cast_values, axis=axis)


# We use the script decorator to indicate that
# this is meant to be translated to ONNX.
@script()
def sample_model(X: FLOAT[64, 128], Wt: FLOAT[128, 10], Bias: FLOAT[10]) -> FLOAT[64, 10]:
    matmul = op.MatMul(X, Wt) + Bias
    return onnx_hardmax(matmul, axis=1)


# onnx_model is an in-memory ModelProto
onnx_model = sample_model.to_model_proto()

# Save the ONNX model at a given path
onnx.save(onnx_model, "sample_model.onnx")

# Check the model
try:
    onnx.checker.check_model(onnx_model)
except onnx.checker.ValidationError as e:
    print(f"The model is invalid: {e}")
else:
    print("The model is valid!")

An Eager mode is available to debug what the code does.

import numpy as np

v = np.array([[0, 1], [2, 3]], dtype=np.float32)
result = Hardmax(v)

spox#

The syntax of spox is similar but it does not use ast. Therefore, loops and tests are expressed in a very different way. The tricky part with it is to handle the local context. A variable created in the main graph is known by any of its subgraphs.

Example taken from the documentation :

import onnx

from spox import argument, build, Tensor, Var
# Import operators from the ai.onnx domain at version 17
from spox.opset.ai.onnx import v17 as op

def geometric_mean(x: Var, y: Var) -> Var:
    # use the standard Sqrt and Mul
    return op.sqrt(op.mul(x, y))

# Create typed model inputs. Each tensor is of rank 1
# and has the runtime-determined length 'N'.
a = argument(Tensor(float, ('N',)))
b = argument(Tensor(float, ('N',)))

# Perform operations on `Var`s
c = geometric_mean(a, b)

# Build an `onnx.ModelProto` for the given inputs and outputs.
model: onnx.ModelProto = build(inputs={'a': a, 'b': b}, outputs={'c': c})

The function can be tested with a mechanism called value propagation.

sklearn-onnx#

sklearn-onnx also implements its own API to add custom graphs. It was designed to shorten the time spent in reimplementing scikit-learn code into onnx code. It can be used to implement a new converter mapped a custom model as described in this example: Implement a new converter. But it can also be used to build standalone models.

<<<

import numpy as np
import onnx
import onnx.helper as oh
from onnx_array_api.plotting.text_plot import onnx_simple_text_plot


def make_euclidean_skl2onnx(
    input_names: tuple[str] = ("X", "Y"),
    output_name: str = "Z",
    elem_type: int = onnx.TensorProto.FLOAT,
    opset: int | None = None,
) -> onnx.ModelProto:
    if opset is None:
        opset = onnx.defs.onnx_opset_version()

    from skl2onnx.algebra.onnx_ops import OnnxSub, OnnxPow, OnnxReduceSum

    dxy = OnnxSub(input_names[0], input_names[1], op_version=opset)
    dxy2 = OnnxPow(dxy, np.array([2], dtype=np.int64), op_version=opset)
    final = OnnxReduceSum(dxy2, op_version=opset, output_names=[output_name])

    np_type = oh.tensor_dtype_to_np_dtype(elem_type)
    dummy = np.empty([1], np_type)
    return final.to_onnx({"X": dummy, "Y": dummy})


model = make_euclidean_skl2onnx()
print(onnx_simple_text_plot(model))

>>>

    opset: domain='' version=15
    input: name='X' type=dtype('float32') shape=['']
    input: name='Y' type=dtype('float32') shape=['']
    init: name='Po_Powcst' type=dtype('int64') shape=(1,) -- array([2])
    Sub(X, Y) -> Su_C0
      Pow(Su_C0, Po_Powcst) -> Po_Z0
        ReduceSum(Po_Z0) -> Z
    output: name='Z' type=dtype('float32') shape=[1]

onnxblocks#

onnxblocks was introduced in onnxruntime to define custom losses in order to train a model with onnxruntime-training. It is mostly used for this usage. The syntax is similar to pytorch.

import onnxruntime.training.onnxblock as onnxblock
from onnxruntime.training import artifacts

# Define a custom loss block that takes in two inputs
# and performs a weighted average of the losses from these
# two inputs.
class WeightedAverageLoss(onnxblock.Block):
    def __init__(self):
        self._loss1 = onnxblock.loss.MSELoss()
        self._loss2 = onnxblock.loss.MSELoss()
        self._w1 = onnxblock.blocks.Constant(0.4)
        self._w2 = onnxblock.blocks.Constant(0.6)
        self._add = onnxblock.blocks.Add()
        self._mul = onnxblock.blocks.Mul()

    def build(self, loss_input_name1, loss_input_name2):
        # The build method defines how the block should be stacked on top of
        # loss_input_name1 and loss_input_name2

        # Returns weighted average of the two losses
        return self._add(
            self._mul(self._w1(), self._loss1(loss_input_name1, target_name="target1")),
            self._mul(self._w2(), self._loss2(loss_input_name2, target_name="target2"))
        )

my_custom_loss = WeightedAverageLoss()

# Load the onnx model
model_path = "model.onnx"
base_model = onnx.load(model_path)

# Define the parameters that need their gradient computed
requires_grad = ["weight1", "bias1", "weight2", "bias2"]
frozen_params = ["weight3", "bias3"]

# Now, we can invoke generate_artifacts with this custom loss function
artifacts.generate_artifacts(base_model, requires_grad = requires_grad, frozen_params = frozen_params,
                            loss = my_custom_loss, optimizer = artifacts.OptimType.AdamW)

# Successful completion of the above call will generate 4 files in the current working directory,
# one for each of the artifacts mentioned above (training_model.onnx, eval_model.onnx, checkpoint, op)

ONNX GraphSurgeon#

onnx-graphsurgeon implements main class Graph which provides all the necessary method to add nodes, import existing onnx files. The following example is taken from onnx-graphsurgeon/examples. The first part generates a graph.

import onnx_graphsurgeon as gs
import numpy as np
import onnx

# Computes Y = x0 + (a * x1 + b)

shape = (1, 3, 224, 224)
# Inputs
x0 = gs.Variable(name="x0", dtype=np.float32, shape=shape)
x1 = gs.Variable(name="x1", dtype=np.float32, shape=shape)

# Intermediate tensors
a = gs.Constant("a", values=np.ones(shape=shape, dtype=np.float32))
b = gs.Constant("b", values=np.ones(shape=shape, dtype=np.float32))
mul_out = gs.Variable(name="mul_out")
add_out = gs.Variable(name="add_out")

# Outputs
Y = gs.Variable(name="Y", dtype=np.float32, shape=shape)

nodes = [
    # mul_out = a * x1
    gs.Node(op="Mul", inputs=[a, x1], outputs=[mul_out]),
    # add_out = mul_out + b
    gs.Node(op="Add", inputs=[mul_out, b], outputs=[add_out]),
    # Y = x0 + add
    gs.Node(op="Add", inputs=[x0, add_out], outputs=[Y]),
]

graph = gs.Graph(nodes=nodes, inputs=[x0, x1], outputs=[Y])
onnx.save(gs.export_onnx(graph), "model.onnx")

The second part modifies it.

import onnx_graphsurgeon as gs
import numpy as np
import onnx

graph = gs.import_onnx(onnx.load("model.onnx"))

# 1. Remove the `b` input of the add node
first_add = [node for node in graph.nodes if node.op == "Add"][0]
first_add.inputs = [inp for inp in first_add.inputs if inp.name != "b"]

# 2. Change the Add to a LeakyRelu
first_add.op = "LeakyRelu"
first_add.attrs["alpha"] = 0.02

# 3. Add an identity after the add node
identity_out = gs.Variable("identity_out", dtype=np.float32)
identity = gs.Node(op="Identity", inputs=first_add.outputs, outputs=[identity_out])
graph.nodes.append(identity)

# 4. Modify the graph output to be the identity output
graph.outputs = [identity_out]

# 5. Remove unused nodes/tensors, and topologically sort the graph
# ONNX requires nodes to be topologically sorted to be considered valid.
# Therefore, you should only need to sort the graph when you have added new nodes out-of-order.
# In this case, the identity node is already in the correct spot (it is the last node,
# and was appended to the end of the list), but to be on the safer side, we can sort anyway.
graph.cleanup().toposort()

onnx.save(gs.export_onnx(graph), "modified.onnx")

Graph Builder API#

See GraphBuilder: common API for ONNX. This API is very similar to what skl2onnx implements. It is still about adding nodes to a graph but some tasks are automated such as naming the results or converting constants to onnx classes.

<<<

import numpy as np
from onnx_array_api.graph_api import GraphBuilder
from onnx_array_api.plotting.text_plot import onnx_simple_text_plot

g = GraphBuilder()
g.make_tensor_input("X", np.float32, (None, None))
g.make_tensor_input("Y", np.float32, (None, None))
r1 = g.op.Sub("X", "Y")
r2 = g.op.Pow(r1, np.array([2], dtype=np.int64))
g.op.ReduceSum(r2, outputs=["Z"])
g.make_tensor_output("Z", np.float32, (None, None))

onx = g.to_onnx()

print(onnx_simple_text_plot(onx))

>>>

    opset: domain='' version=21
    input: name='X' type=dtype('float32') shape=['', '']
    input: name='Y' type=dtype('float32') shape=['', '']
    init: name='cst' type=dtype('int64') shape=(1,) -- array([2])
    Sub(X, Y) -> _onx_sub0
      Pow(_onx_sub0, cst) -> _onx_pow0
        ReduceSum(_onx_pow0) -> Z
    output: name='Z' type=dtype('float32') shape=['', '']

Light API#

See Light API for ONNX: everything in one line. This API was created to be able to write an onnx graph in one instruction. It is inspired from the reverse Polish notation. There is no eager mode.

<<<

import numpy as np
from onnx_array_api.light_api import start
from onnx_array_api.plotting.text_plot import onnx_simple_text_plot

model = (
    start()
    .vin("X")
    .vin("Y")
    .bring("X", "Y")
    .Sub()
    .rename("dxy")
    .cst(np.array([2], dtype=np.int64), "two")
    .bring("dxy", "two")
    .Pow()
    .ReduceSum()
    .rename("Z")
    .vout()
    .to_onnx()
)

print(onnx_simple_text_plot(model))

>>>

    opset: domain='' version=21
    input: name='X' type=dtype('float32') shape=None
    input: name='Y' type=dtype('float32') shape=None
    init: name='two' type=dtype('int64') shape=(1,) -- array([2])
    Sub(X, Y) -> dxy
      Pow(dxy, two) -> r1_0
        ReduceSum(r1_0, keepdims=1, noop_with_empty_axes=0) -> Z
    output: name='Z' type=dtype('float32') shape=None

numpy API for onnx#

See Numpy API for ONNX. This API was introduced to create graphs by using numpy API. If a function is defined only with numpy, it should be possible to use the exact same code to create the corresponding onnx graph. That’s what this API tries to achieve. It works with the exception of control flow. In that case, the function produces different onnx graphs depending on the execution path.

<<<

import numpy as np
from onnx_array_api.npx import jit_onnx
from onnx_array_api.plotting.text_plot import onnx_simple_text_plot


def l2_loss(x, y):
    return ((x - y) ** 2).sum(keepdims=1)


jitted_myloss = jit_onnx(l2_loss)
dummy = np.array([0], dtype=np.float32)

# The function is executed. Only then a onnx graph is created.
# One is created depending on the input type.
jitted_myloss(dummy, dummy)

# get_onnx only works if it was executed once or at least with
# the same input type
model = jitted_myloss.get_onnx()
print(onnx_simple_text_plot(model))

>>>

    opset: domain='' version=18
    input: name='x0' type=dtype('float32') shape=['']
    input: name='x1' type=dtype('float32') shape=['']
    Constant(value=2) -> r__1
    Sub(x0, x1) -> r__0
      CastLike(r__1, r__0) -> r__2
      Pow(r__0, r__2) -> r__3
        ReduceSum(r__3, keepdims=1) -> r__4
    output: name='r__4' type=dtype('float32') shape=[1]