Pattern Optimizer¶

The pattern optimizer is implemented by class GraphBuilderPatternOptimization. It searches for a specific sequence of nodes in the graph and replaces it by another one without changing the inputs or the long_outputs of the graph. The goal of the optimizer is to make the whole computation graph more efficient. The goal of this implementation is to make this optimization as fast as possible. Assuming the nodes in an onnx graph are ordered in a way every input of a node was created by previous nodes, the optimizer must not require any global reordering. The cost should be in $O(N P I)$ in the worst case where N is the number of nodes, P is the number of patterns, I is the number of iterations.

It is difficult to foresee what a pattern needs in order to rewrite a part of the graph. This API tries to give as much freedom as it can without leaving too much to do to the developper which tries to add a new pattern.

Patterns¶

Patterns must inherit from PatternOptimization. This class defines two methods.

PatternOptimization.match¶

def match(
    self,
    g: "GraphBuilderPatternOptimization",
    node: NodeProto,
    matched: List[MatchResult],
) -> Optional[MatchResult]:

g is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.
node: the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.
matched: usually unused, it returns of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.

Debugging: method none

def none(
    self,
    node: Optional[NodeProto] = None,
    lineno: Optional[int] = None,
    msg: str = "",
):

It may be useful which reason made a pattern matching fail. Instead of returning None, method match can return the following expression:

return self.none(node, inspect.currentframe().f_lineno)

By setting the verbosity (see next Section), the user may then know which lines in the code returned None and which condition failed.

PatternOptimization.apply¶

@classmethod
def apply(
    cls, g: "GraphBuilder", *nodes: Sequence[NodeProto]
) -> List[NodeProto]:

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting. It assumes no other pattern optimizer modified them or will modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.

Optimization Algorithm¶

It is implemented in method optimize

def optimize(
    self, max_iter=-1, remove_identity: bool = True
) -> List[Dict[str, Any]]:

The algorithm runs multiple iteration until the graph is not evolving or max_iter is reached. By default, it is equal to the number of nodes. An iteration is:

matches = []

builds all successors and predecessors

# Step 1: match

for all patterns P:

    for all nodes n:

        r = p.match(n)
        if r:
            if no node already scheduled to be rewritten by another match:
                matches.append(r)

# Step 2: apply

for all matches r:
    apply the match r

# Step 3: clean

remove unused nodes
remove identity nodes

This algorithm may apply more than one rewriting at each iteration but it guarantees the local structure when applying the rewriting was not altered by another one.

Adding a pattern¶

See #80 about the addition of a new pattern.

Example¶

Simple API¶

We consider the following simple model:

<<<

import torch
from experimental_experiment.helpers import pretty_onnx
from experimental_experiment.xbuilder import OptimizationOptions
from experimental_experiment.torch_interpreter import to_onnx


class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(10, 32),
            torch.nn.ReLU(),
            torch.nn.Linear(32, 1),
        )

    def forward(self, x):
        return self.layers(x)


x = torch.rand(3, 10)
onx = to_onnx(
    MLP(), (x,), input_names=["x"], options=OptimizationOptions(patterns=None)
)
with open("temp_doc_mlp.onnx", "wb") as f:
    f.write(onx.SerializeToString())
print(pretty_onnx(onx))

>>>

    /home/xadupre/vv/this312/lib/python3.12/site-packages/torch/backends/mkldnn/__init__.py:78: UserWarning: TF32 acceleration on top of oneDNN is available for Intel GPUs. The current Torch version does not have Intel GPU Support. (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:148.)
      torch._C._set_onednn_allow_tf32(_allow_tf32)
    /home/xadupre/vv/this312/lib/python3.12/site-packages/torch/backends/mkldnn/__init__.py:78: UserWarning: TF32 acceleration on top of oneDNN is available for Intel GPUs. The current Torch version does not have Intel GPU Support. (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:148.)
      torch._C._set_onednn_allow_tf32(_allow_tf32)
    opset: domain='' version=18
    input: name='x' type=dtype('float32') shape=[3, 10]
    init: name='layers.0.weight' type=float32 shape=(32, 10)              -- DynamoInterpret.placeholder.1/P(layers.0.weight)
    init: name='layers.0.bias' type=float32 shape=(32,)                   -- DynamoInterpret.placeholder.1/P(layers.0.bias)
    init: name='layers.2.weight' type=float32 shape=(1, 32)               -- DynamoInterpret.placeholder.1/P(layers.2.weight)
    init: name='layers.2.bias' type=float32 shape=(1,) -- array([0.027], dtype=float32)-- DynamoInterpret.placeholder.1/P(layers.2.bias)
    Transpose(layers.0.weight, perm=[1,0]) -> _onx_transpose_p_layers_0_weight0
      MatMul(x, _onx_transpose_p_layers_0_weight0) -> _onx_matmul_x0
        Add(_onx_matmul_x0, layers.0.bias) -> linear
          Relu(linear) -> relu
    Transpose(layers.2.weight, perm=[1,0]) -> _onx_transpose_p_layers_2_weight0
      MatMul(relu, _onx_transpose_p_layers_2_weight0) -> _onx_matmul_relu0
        Add(_onx_matmul_relu0, layers.2.bias) -> output_0
    output: name='output_0' type=dtype('float32') shape=[3, 1]

Which we can renders as follows:

$digraph{ ranksep=0.25; size=7; nodesep=0.05; orientation=portrait; x [shape=box color=red label="x\nTensorProto.FLOAT\nshape=[3, 10]" fontsize=10]; output_0 [shape=box color=green label="output_0\nTensorProto.FLOAT\nshape=[3, 1]" fontsize=10]; layers_0_weight [shape=box label="layers_0_weight\nfloat32((32, 10))\n[[ 1.136e-01 2.967e-02 -1.043e-01 1.257e-01 1.4..." fontsize=10]; layers_0_bias [shape=box label="layers_0_bias\nfloat32((32,))\n[ 0.302 -0.243 0.218 0.268 0.114 -0.284 0.114 ..." fontsize=10]; layers_2_weight [shape=box label="layers_2_weight\nfloat32((1, 32))\n[[ 0.13 0.085 -0.176 0.134 -0.027 -0.052 -0.055..." fontsize=10]; layers_2_bias [shape=box label="layers_2_bias\nfloat32((1,))\n[0.027]" fontsize=10]; _onx_transpose_p_layers_0_weight0 [shape=box label="_onx_transpose_p_layers_0_weight0" fontsize=10]; linear [shape=box style="filled,rounded" color=orange label="Transpose\nperm=[1, 0]" fontsize=10]; layers_0_weight -> linear; linear -> _onx_transpose_p_layers_0_weight0; _onx_matmul_x0 [shape=box label="_onx_matmul_x0" fontsize=10]; Opset [shape=box style="filled,rounded" color=orange label="MatMul" fontsize=10]; x -> Opset; _onx_transpose_p_layers_0_weight0 -> Opset; Opset -> _onx_matmul_x0; linear [shape=box label="linear" fontsize=10]; Opset2 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; _onx_matmul_x0 -> Opset2; layers_0_bias -> Opset2; Opset2 -> linear; relu [shape=box label="relu" fontsize=10]; relu [shape=box style="filled,rounded" color=orange label="Relu" fontsize=10]; linear -> relu; relu -> relu; _onx_transpose_p_layers_2_weight0 [shape=box label="_onx_transpose_p_layers_2_weight0" fontsize=10]; linear2 [shape=box style="filled,rounded" color=orange label="Transpose\nperm=[1, 0]" fontsize=10]; layers_2_weight -> linear2; linear2 -> _onx_transpose_p_layers_2_weight0; _onx_matmul_relu0 [shape=box label="_onx_matmul_relu0" fontsize=10]; Opset3 [shape=box style="filled,rounded" color=orange label="MatMul" fontsize=10]; relu -> Opset3; _onx_transpose_p_layers_2_weight0 -> Opset3; Opset3 -> _onx_matmul_relu0; Opset4 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; _onx_matmul_relu0 -> Opset4; layers_2_bias -> Opset4; Opset4 -> output_0; }$

We then apply the optimizations by writing the following code:

<<<

import onnx
from experimental_experiment.helpers import pretty_onnx
from experimental_experiment.xbuilder import GraphBuilder

onx = onnx.load("temp_doc_mlp.onnx")

# The model is placed in a GraphBuilder.
# It creates dictionnaires to store shapes, ranks, types
# to make it easier to the optimizers to find the information
# they need. It still uses NodeProto to store nodes
gr = GraphBuilder(onx, infer_shapes_options=True)

# Let's optimize.
opt_onx = gr.to_onnx(optimize=True)
with open("temp_doc_mlp_opt.onnx", "wb") as f:
    f.write(opt_onx.SerializeToString())
print(pretty_onnx(opt_onx))

>>>

    opset: domain='' version=18
    input: name='x' type=dtype('float32') shape=[3, 10]
    init: name='layers.0.weight' type=float32 shape=(32, 10)              -- DynamoInterpret.placeholder.1/P(layers.0.weight)GraphBuilder._update_structures_with_proto.1/from(layers.0.weight)
    init: name='layers.0.bias' type=float32 shape=(32,)                   -- DynamoInterpret.placeholder.1/P(layers.0.bias)GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: name='layers.2.weight' type=float32 shape=(1, 32)               -- DynamoInterpret.placeholder.1/P(layers.2.weight)GraphBuilder._update_structures_with_proto.1/from(layers.2.weight)
    init: name='layers.2.bias' type=float32 shape=(1,) -- array([0.027], dtype=float32)-- DynamoInterpret.placeholder.1/P(layers.2.bias)GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    init: name='init7_s2_-1_1' type=int64 shape=(2,) -- array([-1,  1])   -- TransposeEqualReshapePattern.apply.new_shape
    init: name='init7_s2_1_-1' type=int64 shape=(2,) -- array([ 1, -1])   -- TransposeEqualReshapePattern.apply.new_shape
    Gemm(x, layers.0.weight, layers.0.bias, transB=1) -> linear
      Relu(linear) -> relu
    Reshape(layers.2.weight, init7_s2_-1_1) -> _onx_transpose_p_layers_2_weight0
      Reshape(_onx_transpose_p_layers_2_weight0, init7_s2_1_-1) -> GemmTransposePattern--_onx_transpose_p_layers_2_weight0
        Gemm(relu, GemmTransposePattern--_onx_transpose_p_layers_2_weight0, layers.2.bias, transB=1) -> output_0
    output: name='output_0' type=dtype('float32') shape=[3, 1]

Which renders as follows:

$digraph{ ranksep=0.25; size=7; nodesep=0.05; orientation=portrait; x [shape=box color=red label="x\nTensorProto.FLOAT\nshape=[3, 10]" fontsize=10]; output_0 [shape=box color=green label="output_0\nTensorProto.FLOAT\nshape=[3, 1]" fontsize=10]; layers_0_weight [shape=box label="layers_0_weight\nfloat32((32, 10))\n[[ 1.136e-01 2.967e-02 -1.043e-01 1.257e-01 1.4..." fontsize=10]; layers_0_bias [shape=box label="layers_0_bias\nfloat32((32,))\n[ 0.302 -0.243 0.218 0.268 0.114 -0.284 0.114 ..." fontsize=10]; layers_2_weight [shape=box label="layers_2_weight\nfloat32((1, 32))\n[[ 0.13 0.085 -0.176 0.134 -0.027 -0.052 -0.055..." fontsize=10]; layers_2_bias [shape=box label="layers_2_bias\nfloat32((1,))\n[0.027]" fontsize=10]; init7_s2__1_1 [shape=box label="init7_s2__1_1\nint64((2,))\n[-1 1]" fontsize=10]; init7_s2_1__1 [shape=box label="init7_s2_1__1\nint64((2,))\n[ 1 -1]" fontsize=10]; linear [shape=box label="linear" fontsize=10]; GemmTransposePattern__MatMulAddPattern__Opset2 [shape=box style="filled,rounded" color=orange label="Gemm\ntransB=1" fontsize=10]; x -> GemmTransposePattern__MatMulAddPattern__Opset2; layers_0_weight -> GemmTransposePattern__MatMulAddPattern__Opset2; layers_0_bias -> GemmTransposePattern__MatMulAddPattern__Opset2; GemmTransposePattern__MatMulAddPattern__Opset2 -> linear; relu [shape=box label="relu" fontsize=10]; relu [shape=box style="filled,rounded" color=orange label="Relu" fontsize=10]; linear -> relu; relu -> relu; _onx_transpose_p_layers_2_weight0 [shape=box label="_onx_transpose_p_layers_2_weight0" fontsize=10]; TransposeEqualReshapePattern__B__linear2 [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10]; layers_2_weight -> TransposeEqualReshapePattern__B__linear2; init7_s2__1_1 -> TransposeEqualReshapePattern__B__linear2; TransposeEqualReshapePattern__B__linear2 -> _onx_transpose_p_layers_2_weight0; GemmTransposePattern___onx_transpose_p_layers_2_weight0 [shape=box label="GemmTransposePattern___onx_transpose_p_layers_2_weight0" fontsize=10]; TransposeEqualReshapePattern__B__GemmTransposePattern__MatMulAddPattern__Opset3 [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10]; _onx_transpose_p_layers_2_weight0 -> TransposeEqualReshapePattern__B__GemmTransposePattern__MatMulAddPattern__Opset3; init7_s2_1__1 -> TransposeEqualReshapePattern__B__GemmTransposePattern__MatMulAddPattern__Opset3; TransposeEqualReshapePattern__B__GemmTransposePattern__MatMulAddPattern__Opset3 -> GemmTransposePattern___onx_transpose_p_layers_2_weight0; GemmTransposePattern__MatMulAddPattern__Opset32 [shape=box style="filled,rounded" color=orange label="Gemm\ntransB=1" fontsize=10]; relu -> GemmTransposePattern__MatMulAddPattern__Opset32; GemmTransposePattern___onx_transpose_p_layers_2_weight0 -> GemmTransposePattern__MatMulAddPattern__Opset32; layers_2_bias -> GemmTransposePattern__MatMulAddPattern__Opset32; GemmTransposePattern__MatMulAddPattern__Opset32 -> output_0; }$

Verbosity¶

<<<

import onnx
from experimental_experiment.xbuilder import GraphBuilder

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(onx, infer_shapes_options=True, verbose=1)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-IWE.optimize] start with 7 nodes
    [GraphBuilder-IWE.optimize] #patterns=48
    [GraphBuilder-IWE.optimize] start with subgraphs
    [GraphBuilder-IWE.optimize] done with subgraphs
    [GraphBuilderPatternOptimization-IWE.optimize] start with 7 nodes, 4 initializers, 48 patterns, priorities=[0, 1]
    [GraphBuilderPatternOptimization-IWE.optimize] iteration 0: 7 nodes, priority=0
    [GraphBuilderPatternOptimization-IWE.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-IWE.optimize] iteration 1: 7 nodes, priority=1
    [GraphBuilderPatternOptimization-IWE.optimize] applies 3 matches, 2*MatMulAddPattern, 1*TransposeEqualReshapePattern - time=0.001 | max_time=IdentityPattern:0.000
    [GraphBuilderPatternOptimization-IWE.optimize] iteration 2: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-IWE.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.000 | max_time=GemmTransposePattern:0.000
    [GraphBuilderPatternOptimization-IWE.optimize] iteration 3: 7 nodes, priority=1
    [GraphBuilderPatternOptimization-IWE.optimize] applies 2 matches, 1*TransposeEqualReshapePattern, 1*TransposeTransposePattern - time=0.000 | max_time=TransposeMatMulPattern:0.000
    [GraphBuilderPatternOptimization-IWE.optimize] iteration 4: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-IWE.optimize] stops current_priority_index=2, priorities=[0, 1]
    [GraphBuilderPatternOptimization-IWE.optimize] done after 5 iterations with 5 nodes in 0.005
    [GraphBuilder-IWE.optimize] done with 5 nodes in 0.006
    [GraphBuilder-IWE.to_onnx] make_model 6 inits 0 params
    [GraphBuilder-IWE.time_evaluation_constants_] 0
    [GraphBuilder-IWE._build_initializers] start with 6 initializers, large_model=False, external_threshold=1024
    [GraphBuilder-IWE._build_initializers] switch low/high order
    [GraphBuilder-IWE._build_initializers] done in 6.590016710106283e-07s with 6 initializers, 0 large initializers
    [GraphBuilder-IWE._add_shape_information] dynamic shapes replacements={}

With more verbosity:

<<<

import onnx
from experimental_experiment.xbuilder import GraphBuilder

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(onx, infer_shapes_options=True, verbose=11)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-LYU._update_structures_with_proto] -- starts with 7 nodes
    [GraphBuilder-LYU.set_shape] layers.0.weight:(32, 10)
    [GraphBuilder-LYU.set_rank] layers.0.weight:2
    [GraphBuilder-LYU.set_type] layers.0.weight:1
    [GraphBuilder-LYU.make_initializer] layers.0.weight[1:(32, 10)]
    [GraphBuilder-LYU.update_node_constant] new constant 'layers.0.weight', node=None
    [GraphBuilder-LYU.set_shape] layers.0.bias:(32,)
    [GraphBuilder-LYU.set_rank] layers.0.bias:1
    [GraphBuilder-LYU.set_type] layers.0.bias:1
    [GraphBuilder-LYU.make_initializer] layers.0.bias[1:(32,)]
    [GraphBuilder-LYU.update_node_constant] new constant 'layers.0.bias', node=None
    [GraphBuilder-LYU.set_shape] layers.2.weight:(1, 32)
    [GraphBuilder-LYU.set_rank] layers.2.weight:2
    [GraphBuilder-LYU.set_type] layers.2.weight:1
    [GraphBuilder-LYU.make_initializer] layers.2.weight[1:(1, 32)]
    [GraphBuilder-LYU.update_node_constant] new constant 'layers.2.weight', node=None
    [GraphBuilder-LYU.set_shape] layers.2.bias:(1,)
    [GraphBuilder-LYU.set_rank] layers.2.bias:1
    [GraphBuilder-LYU.set_type] layers.2.bias:1
    [GraphBuilder-LYU.make_initializer] layers.2.bias[1:(1,)]
    [GraphBuilder-LYU.update_node_constant] new constant 'layers.2.bias', node=None
    [GraphBuilder-LYU.set_type] x:1
    [GraphBuilder-LYU.set_shape] x:(3, 10)
    [GraphBuilder-LYU.set_rank] x:2
    [GraphBuilder-LYU.set_type] output_0:1
    [GraphBuilder-LYU.set_shape] output_0:(3, 1)
    [GraphBuilder-LYU.set_rank] output_0:2
    [GraphBuilder-LYU.update_node_constant] new constant '_onx_transpose_p_layers_0_weight0', node=Transpose
    [GraphBuilder-LYU.set_type] _onx_transpose_p_layers_0_weight0:1
    [GraphBuilder-LYU.set_shape] _onx_transpose_p_layers_0_weight0:(10, 32)
    [GraphBuilder-LYU.set_rank] _onx_transpose_p_layers_0_weight0:2
    [GraphBuilder-LYU.set_type] _onx_transpose_p_layers_0_weight0:1
    [GraphBuilder-LYU.set_type] _onx_matmul_x0:1
    [GraphBuilder-LYU.set_shape] _onx_matmul_x0:(3, 32)
    [GraphBuilder-LYU.set_rank] _onx_matmul_x0:2
    [GraphBuilder-LYU.set_type] _onx_matmul_x0:1
    [GraphBuilder-LYU.set_type] linear:1
    [GraphBuilder-LYU.set_shape] linear:(3, 32)
    [GraphBuilder-LYU.set_rank] linear:2
    [GraphBuilder-LYU.set_type] linear:1
    [GraphBuilder-LYU.set_type] relu:1
    [GraphBuilder-LYU.set_shape] relu:(3, 32)
    [GraphBuilder-LYU.set_rank] relu:2
    [GraphBuilder-LYU.set_type] relu:1
    [GraphBuilder-LYU.update_node_constant] new constant '_onx_transpose_p_layers_2_weight0', node=Transpose
    [GraphBuilder-LYU.set_type] _onx_transpose_p_layers_2_weight0:1
    [GraphBuilder-LYU.set_shape] _onx_transpose_p_layers_2_weight0:(32, 1)
    [GraphBuilder-LYU.set_rank] _onx_transpose_p_layers_2_weight0:2
    [GraphBuilder-LYU.set_type] _onx_transpose_p_layers_2_weight0:1
    [GraphBuilder-LYU.set_type] _onx_matmul_relu0:1
    [GraphBuilder-LYU.set_shape] _onx_matmul_relu0:(3, 1)
    [GraphBuilder-LYU.set_rank] _onx_matmul_relu0:2
    [GraphBuilder-LYU.set_type] _onx_matmul_relu0:1
    [GraphBuilder-LYU.set_type] output_0:1
    [GraphBuilder-LYU._update_structures_with_proto] ends with 7 nodes in 0.0005002030011382885
    [GraphBuilder-LYU.constant_folding] -- starts with 6 constants and 7 nodes.
    [GraphBuilder-LYU.constant_folding] cst:: . :: output_0
    [GraphBuilder-LYU.constant_folding] cst:: . :: relu
    [GraphBuilder-LYU.constant_folding] cst:: 1 :: layers.0.bias
    [GraphBuilder-LYU.constant_folding] cst:: 1 :: layers.0.weight
    [GraphBuilder-LYU.constant_folding] cst:: . :: _onx_matmul_x0
    [GraphBuilder-LYU.constant_folding] cst:: 1 :: layers.2.weight
    [GraphBuilder-LYU.constant_folding] cst:: . :: x
    [GraphBuilder-LYU.constant_folding] cst:: . :: _onx_matmul_relu0
    [GraphBuilder-LYU.constant_folding] cst:: 1 :: _onx_transpose_p_layers_2_weight0
    [GraphBuilder-LYU.constant_folding] cst:: 1 :: _onx_transpose_p_layers_0_weight0
    [GraphBuilder-LYU.constant_folding] cst:: 1 :: layers.2.bias
    [GraphBuilder-LYU.constant_folding] cst:: . :: linear
    [GraphBuilder-LYU.constant_folding] initializer: layers.0.weight
    [GraphBuilder-LYU.constant_folding] initializer: layers.0.bias
    [GraphBuilder-LYU.constant_folding] initializer: layers.2.weight
    [GraphBuilder-LYU.constant_folding] initializer: layers.2.bias
    [GraphBuilder-LYU.constant_folding] from: Transpose(_onx_transpose_p_layers_0_weight0)
    [GraphBuilder-LYU.constant_folding] fold_constant:Transpose:_onx_transpose_p_layers_0_weight0[torch.float32:torch.Size([10, 32])]:from:layers.0.weight
    [GraphBuilder-LYU.constant_folding] from: Transpose(_onx_transpose_p_layers_2_weight0)
    [GraphBuilder-LYU.constant_folding] fold_constant:Transpose:_onx_transpose_p_layers_2_weight0[torch.float32:torch.Size([32, 1])]:from:layers.2.weight
    [GraphBuilder-LYU.update_node_constant] new constant '_onx_transpose_p_layers_0_weight0', node=Transpose
    [GraphBuilder-LYU.update_node_constant] new constant '_onx_transpose_p_layers_2_weight0', node=Transpose
    [GraphBuilder-LYU.constant_folding] ends with 6 constants and 7 nodes in 0.00025388499852851965 seconds
    [GraphBuilder-LYU._update_shape_types_with_proto] -- starts with 7 nodes and 11 shapes.
    [GraphBuilder._update_shape_types_with_proto] infer shapes
    [GraphBuilder._update_shape_types_with_proto] infer shapes done 0.00021583199850283563 seconds
    [GraphBuilder._update_shape_types_with_proto] _clean_shapes after 0.0002485999975760933 seconds
    [GraphBuilder-LYU._update_shape_types_with_proto] walk through 11 shapes.
    [GraphBuilder-LYU.set_type] relu:1
    [GraphBuilder-LYU.set_type] p_layers_0_bias:1
    [GraphBuilder-LYU.set_shape] p_layers_0_bias:(32,)
    [GraphBuilder-LYU.set_rank] p_layers_0_bias:1
    [GraphBuilder-LYU.set_type] linear_1:1
    [GraphBuilder-LYU.set_shape] linear_1:(3, 1)
    [GraphBuilder-LYU.set_rank] linear_1:2
    [GraphBuilder-LYU.set_type] p_layers_2_weight:1
    [GraphBuilder-LYU.set_shape] p_layers_2_weight:(1, 32)
    [GraphBuilder-LYU.set_rank] p_layers_2_weight:2
    [GraphBuilder-LYU.set_type] p_layers_2_bias:1
    [GraphBuilder-LYU.set_shape] p_layers_2_bias:(1,)
    [GraphBuilder-LYU.set_rank] p_layers_2_bias:1
    [GraphBuilder-LYU.set_type] p_layers_0_weight:1
    [GraphBuilder-LYU.set_shape] p_layers_0_weight:(32, 10)
    [GraphBuilder-LYU.set_rank] p_layers_0_weight:2
    [GraphBuilder-LYU.set_type] _onx_matmul_x0:1
    [GraphBuilder-LYU.set_type] _onx_matmul_relu0:1
    [GraphBuilder-LYU.set_type] _onx_transpose_p_layers_2_weight0:1
    [GraphBuilder-LYU.set_type] _onx_transpose_p_layers_0_weight0:1
    [GraphBuilder-LYU.set_type] linear:1
    [GraphBuilder-LYU._update_shape_types_with_proto] ends in 0.00010940100037259981 seconds.
    [GraphBuilder-LYU.optimize] start with 7 nodes
    [GraphBuilder-LYU.optimize] options=OptimizationOptions(patterns=[BatchNormalizationPattern(), BatchNormalizationTrainingPattern(), CastLayerNormalizationCastPattern(), CastPattern(), CastCastBinaryPattern(), CastOpCastPattern(), ClipClipPattern(), ComputationCastOpCastPattern(), ConvBiasNullPattern(), DropoutPattern(), ExpandPattern(), ExpandBroadcastPattern(), ExpandSwapPattern(), GeluPattern(), IdentityPattern(), LayerNormalizationPattern(), LayerNormalizationScalePattern(), LeakyReluPattern(), MulMulMulScalarPattern(), ReduceReshapePattern(), ReduceSumNormalizePattern(), ReshapePattern(), ReshapeMatMulReshapePattern(), Reshape2Of3Pattern(), ReshapeReshapeBinaryPattern(), MatMulAddPattern(), GemmTransposePattern(), MatMulReshape2Of3Pattern(), MulMulMatMulPattern(), ReshapeReshapePattern(), RotaryConcatPartPattern(), SameChildrenPattern(), SequenceConstructAtPattern(), SliceSlicePattern(), SlicesSplitPattern(), SoftmaxCrossEntropyLossCastPattern(), SplitConcatPattern(), SqueezeUnsqueezePattern(), Sub1MulPattern(), SwitchOrderBinaryPattern(), SwitchReshapeActivationPattern(), TransposeEqualReshapePattern(), TransposeMatMulPattern(), TransposeReshapeMatMulPattern(), TransposeReshapeTransposePattern(), TransposeTransposePattern(), UnsqueezeEqualPattern(), UnsqueezeUnsqueezePattern()], verbose=11)
    -- GRAPH BEFORE OPTIMIZATON --
    
    opset: : 18
    init: layers.0.weight: ?: ?                                            -- GraphBuilder._update_structures_with_proto.1/from(layers.0.weight)
    init: layers.0.bias: ?: ?                                              -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: layers.2.weight: ?: ?                                            -- GraphBuilder._update_structures_with_proto.1/from(layers.2.weight)
    init: layers.2.bias: ?: ?                                              -- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    input:: x                                                                       |T1: 3 x 10
    Transpose: layers.0.weight -> _onx_transpose_p_layers_0_weight0                 |T1: 10 x 32                  - linear
    MatMul: x, _onx_transpose_p_layers_0_weight0 -> _onx_matmul_x0                  |T1: 3 x 32                   - Opset
    Add: _onx_matmul_x0, layers.0.bias -> linear                                    |T1: 3 x 32                   - Opset2
    Relu: linear -> relu                                                            |T1: 3 x 32                   - relu
    Transpose: layers.2.weight -> _onx_transpose_p_layers_2_weight0                 |T1: 32 x 1                   - linear2
    MatMul: relu, _onx_transpose_p_layers_2_weight0 -> _onx_matmul_relu0            |T1: 3 x 1                    - Opset3
    Add: _onx_matmul_relu0, layers.2.bias -> output_0                               |T1: 3 x 1                    - Opset4
    output:: output_0                                                               |T1: 3 x 1
    -- END --
    [GraphBuilder-LYU.optimize] start with subgraphs
    [GraphBuilder-LYU.optimize] done with subgraphs
    [GraphBuilder-LYU.remove_identity_nodes] -- starts with 7
    [GraphBuilder-LYU.remove_identity_nodes] found 0 replacements
    [GraphBuilder-LYU.remove_identity_nodes] kept 7 nodes
    [GraphBuilder-LYU.remove_identity_nodes] ends with 7 nodes in 2.625900015118532e-05 seconds
    [GraphBuilderPatternOptimization-LYU.optimize] start with 7 nodes, 4 initializers, 48 patterns, priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   1/48 - P0 - BatchNormalizationPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   2/48 - P0 - BatchNormalizationTrainingPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   3/48 - P0 - CastPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   4/48 - P0 - ConvBiasNullPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   5/48 - P0 - ExpandPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   6/48 - P0 - GeluPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   7/48 - P0 - IdentityPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   8/48 - P0 - LeakyReluPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern   9/48 - P0 - ReshapePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  10/48 - P0 - ReshapeReshapePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  11/48 - P0 - SameChildrenPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  12/48 - P0 - SoftmaxCrossEntropyLossCastPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  13/48 - P0 - SqueezeUnsqueezePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  14/48 - P0 - TransposeReshapeTransposePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  15/48 - P0 - TransposeTransposePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  16/48 - P0 - UnsqueezeUnsqueezePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  17/48 - P1 - CastCastBinaryPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  18/48 - P1 - CastLayerNormalizationCastPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  19/48 - P1 - CastOpCastPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  20/48 - P1 - ClipClipPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  21/48 - P1 - ComputationCastOpCastPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  22/48 - P1 - DropoutPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  23/48 - P1 - ExpandBroadcastPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  24/48 - P1 - ExpandSwapPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  25/48 - P1 - GemmTransposePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  26/48 - P1 - LayerNormalizationPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  27/48 - P1 - LayerNormalizationScalePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  28/48 - P1 - MatMulAddPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  29/48 - P1 - MatMulReshape2Of3Pattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  30/48 - P1 - MulMulMatMulPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  31/48 - P1 - MulMulMulScalarPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  32/48 - P1 - ReduceReshapePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  33/48 - P1 - ReduceSumNormalizePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  34/48 - P1 - Reshape2Of3Pattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  35/48 - P1 - ReshapeMatMulReshapePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  36/48 - P1 - ReshapeReshapeBinaryPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  37/48 - P1 - RotaryConcatPartPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  38/48 - P1 - SequenceConstructAtPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  39/48 - P1 - SliceSlicePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  40/48 - P1 - SlicesSplitPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  41/48 - P1 - SplitConcatPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  42/48 - P1 - Sub1MulPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  43/48 - P1 - SwitchOrderBinaryPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  44/48 - P1 - SwitchReshapeActivationPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  45/48 - P1 - TransposeEqualReshapePattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  46/48 - P1 - TransposeMatMulPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  47/48 - P1 - TransposeReshapeMatMulPattern()
    [GraphBuilderPatternOptimization-LYU.optimize] use pattern  48/48 - P1 - UnsqueezeEqualPattern()
    --
    
    opset: : 18
    init: layers.0.weight: ?: ?                                            -- GraphBuilder._update_structures_with_proto.1/from(layers.0.weight)
    init: layers.0.bias: ?: ?                                              -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: layers.2.weight: ?: ?                                            -- GraphBuilder._update_structures_with_proto.1/from(layers.2.weight)
    init: layers.2.bias: ?: ?                                              -- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    input:: x                                                                       |T1: 3 x 10
    Transpose: layers.0.weight -> _onx_transpose_p_layers_0_weight0                 |T1: 10 x 32                  - linear
    MatMul: x, _onx_transpose_p_layers_0_weight0 -> _onx_matmul_x0                  |T1: 3 x 32                   - Opset
    Add: _onx_matmul_x0, layers.0.bias -> linear                                    |T1: 3 x 32                   - Opset2
    Relu: linear -> relu                                                            |T1: 3 x 32                   - relu
    Transpose: layers.2.weight -> _onx_transpose_p_layers_2_weight0                 |T1: 32 x 1                   - linear2
    MatMul: relu, _onx_transpose_p_layers_2_weight0 -> _onx_matmul_relu0            |T1: 3 x 1                    - Opset3
    Add: _onx_matmul_relu0, layers.2.bias -> output_0                               |T1: 3 x 1                    - Opset4
    output:: output_0                                                               |T1: 3 x 1
    --
    [GraphBuilderPatternOptimization-LYU.optimize] iteration 0: 7 nodes, priority=0
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] skips CastLayerNormalizationCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] skips CastCastBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips CastOpCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips ClipClipPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips ComputationCastOpCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] skips DropoutPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] skips ExpandBroadcastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips ExpandSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 165:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=linear
    [IdentityPattern.match] NONE - line: 197:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset2
    [IdentityPattern.match] NONE - line: 165:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=linear2
    [IdentityPattern.match] NONE - line: 210:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset4
    [GraphBuilderPatternOptimization-LYU.optimize] skips LayerNormalizationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips LayerNormalizationScalePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [GraphBuilder-XRK.make_tensor_input] x[0:None] -- marker=_build_pattern1_x
    [GraphBuilder-XRK.set_type] x:0
    [GraphBuilder-XRK.set_type] x:-1
    [GraphBuilder-XRK.make_tensor_input] zero[0:None] -- marker=_build_pattern1_zero
    [GraphBuilder-XRK.set_type] zero:0
    [GraphBuilder-XRK.set_type] zero:-1
    [GraphBuilder-XRK.make_tensor_input] slope[0:None] -- marker=_build_pattern1_slope
    [GraphBuilder-XRK.set_type] slope:0
    [GraphBuilder-XRK.set_type] slope:-1
    [GraphBuilder-XRK.make_node] [TT:-] Greater: ['x', 'zero']->['_onx_greater_x0']
    [GraphBuilder-XRK.set_type] _onx_greater_x0:9
    [GraphBuilder-XRK.make_node] [TT:-] Mul: ['x', 'slope']->['_onx_mul_x0']
    [GraphBuilder-XRK.set_type] _onx_mul_x0:-1
    [GraphBuilder-XRK.make_node] [TTT:-] Where: ['_onx_greater_x0', 'x', '_onx_mul_x0']->['_onx_where_greater_x00']
    [GraphBuilder-XRK.set_type] _onx_where_greater_x00:-1
    [GraphBuilder-XRK.make_tensor_output] _onx_where_greater_x00[0: None]
    [GraphBuilderPatternOptimization-LYU.optimize] skips MulMulMulScalarPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips ReduceReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips ReduceSumNormalizePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] skips ReshapeMatMulReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips Reshape2Of3Pattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips ReshapeReshapeBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips MatMulAddPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips GemmTransposePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips MatMulReshape2Of3Pattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips MulMulMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] skips RotaryConcatPartPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] skips SequenceConstructAtPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips SliceSlicePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips SlicesSplitPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [GraphBuilder-OKE.make_tensor_input] X[0:None] -- marker=_build_pattern1_X
    [GraphBuilder-OKE.set_type] X:0
    [GraphBuilder-OKE.set_type] X:-1
    [GraphBuilder-OKE.make_tensor_input] indices[0:None] -- marker=_build_pattern1_indices
    [GraphBuilder-OKE.set_type] indices:0
    [GraphBuilder-OKE.set_type] indices:-1
    [GraphBuilder-OKE.make_tensor_input] axis[0:None] -- marker=_build_pattern1_axis
    [GraphBuilder-OKE.set_type] axis:0
    [GraphBuilder-OKE.set_type] axis:-1
    [GraphBuilder-OKE.make_tensor_input] zerof[0:None] -- marker=_build_pattern1_zerof
    [GraphBuilder-OKE.set_type] zerof:0
    [GraphBuilder-OKE.set_type] zerof:-1
    [GraphBuilder-OKE.make_tensor_input] zeroi[0:None] -- marker=_build_pattern1_zeroi
    [GraphBuilder-OKE.set_type] zeroi:0
    [GraphBuilder-OKE.set_type] zeroi:-1
    [GraphBuilder-OKE.make_tensor_input] b[0:None] -- marker=_build_pattern1_b
    [GraphBuilder-OKE.set_type] b:0
    [GraphBuilder-OKE.set_type] b:-1
    [GraphBuilder-OKE.make_node] [TT:-] Equal: ['indices', 'b']->['_onx_equal_indices0']
    [GraphBuilder-OKE.set_type] _onx_equal_indices0:9
    [GraphBuilder-OKE.make_node] [T:-] Not: ['_onx_equal_indices0']->['_onx_not_equal_indices00']
    [GraphBuilder-OKE.set_type] _onx_not_equal_indices00:9
    [GraphBuilder-OKE.make_node] [TTT:-] Where: ['_onx_not_equal_indices00', 'indices', 'zeroi']->['_onx_where_not_equal_indices000']
    [GraphBuilder-OKE.set_type] _onx_where_not_equal_indices000:-1
    [GraphBuilder-OKE.make_node] [TT:-] Unsqueeze: ['_onx_where_not_equal_indices000', 'axis']->['_onx_unsqueeze_where_not_equal_indices0000']
    [GraphBuilder-OKE.set_type] _onx_unsqueeze_where_not_equal_indices0000:-1
    [GraphBuilder-OKE.make_node] [T:-] LogSoftmax: ['X']->['_onx_logsoftmax_X0']
    [GraphBuilder-OKE.set_type] _onx_logsoftmax_X0:-1
    [GraphBuilder-OKE.set_type] _onx_gatherelements_logsoftmax_X00:-1
    [GraphBuilder-OKE.make_node] [TT:T] GatherElements: ['_onx_logsoftmax_X0', '_onx_unsqueeze_where_not_equal_indices0000']->['_onx_gatherelements_logsoftmax_X00']
    [GraphBuilder-OKE.set_type] _onx_gatherelements_logsoftmax_X00:-1
    [GraphBuilder-OKE.make_node] [TT:-] Squeeze: ['_onx_gatherelements_logsoftmax_X00', 'axis']->['_onx_squeeze_gatherelements_logsoftmax_X000']
    [GraphBuilder-OKE.set_type] _onx_squeeze_gatherelements_logsoftmax_X000:-1
    [GraphBuilder-OKE.make_node] [T:-] Neg: ['_onx_squeeze_gatherelements_logsoftmax_X000']->['_onx_neg_squeeze_gatherelements_logsoftmax_X0000']
    [GraphBuilder-OKE.set_type] _onx_neg_squeeze_gatherelements_logsoftmax_X0000:-1
    [GraphBuilder-OKE.make_node] [TTT:-] Where: ['_onx_not_equal_indices00', '_onx_neg_squeeze_gatherelements_logsoftmax_X0000', 'zerof']->['_onx_where_not_equal_indices0002']
    [GraphBuilder-OKE.set_type] _onx_where_not_equal_indices0002:-1
    [GraphBuilder-OKE.make_node] [T:-] Cast: ['_onx_not_equal_indices00']->['_onx_cast_not_equal_indices000']
    [GraphBuilder-OKE.set_type] _onx_cast_not_equal_indices000:1
    [GraphBuilder-OKE.make_node] [T:-] ReduceSum: ['_onx_cast_not_equal_indices000']->['_onx_reducesum_cast_not_equal_indices0000']
    [GraphBuilder-OKE.set_type] _onx_reducesum_cast_not_equal_indices0000:1
    [GraphBuilder-OKE.set_shape] _onx_reducesum_cast_not_equal_indices0000:()
    [GraphBuilder-OKE.set_rank] _onx_reducesum_cast_not_equal_indices0000:0
    [GraphBuilder-OKE.make_node] [#:-] Cast: ['_onx_reducesum_cast_not_equal_indices0000']->['_onx_cast_reducesum_cast_not_equal_indices00000']
    [GraphBuilder-OKE.set_type] _onx_cast_reducesum_cast_not_equal_indices00000:10
    [GraphBuilder-OKE.set_shape] _onx_cast_reducesum_cast_not_equal_indices00000:()
    [GraphBuilder-OKE.set_rank] _onx_cast_reducesum_cast_not_equal_indices00000:0
    [GraphBuilder-OKE.make_node] [T:-] Cast: ['_onx_where_not_equal_indices0002']->['_onx_cast_where_not_equal_indices00020']
    [GraphBuilder-OKE.set_type] _onx_cast_where_not_equal_indices00020:1
    [GraphBuilder-OKE.make_node] [T:-] ReduceSum: ['_onx_cast_where_not_equal_indices00020']->['_onx_reducesum_cast_where_not_equal_indices000200']
    [GraphBuilder-OKE.set_type] _onx_reducesum_cast_where_not_equal_indices000200:1
    [GraphBuilder-OKE.set_shape] _onx_reducesum_cast_where_not_equal_indices000200:()
    [GraphBuilder-OKE.set_rank] _onx_reducesum_cast_where_not_equal_indices000200:0
    [GraphBuilder-OKE.make_node] [#:-] Cast: ['_onx_reducesum_cast_where_not_equal_indices000200']->['_onx_cast_reducesum_cast_where_not_equal_indices0002000']
    [GraphBuilder-OKE.set_type] _onx_cast_reducesum_cast_where_not_equal_indices0002000:10
    [GraphBuilder-OKE.set_shape] _onx_cast_reducesum_cast_where_not_equal_indices0002000:()
    [GraphBuilder-OKE.set_rank] _onx_cast_reducesum_cast_where_not_equal_indices0002000:0
    [GraphBuilder-OKE.make_node] [##:-] Div: ['_onx_cast_reducesum_cast_where_not_equal_indices0002000', '_onx_cast_reducesum_cast_not_equal_indices00000']->['_onx_div_cast_reducesum_cast_where_not_equal_indices00020000']
    [GraphBuilder-OKE.set_type] _onx_div_cast_reducesum_cast_where_not_equal_indices00020000:10
    [GraphBuilder-OKE.set_shape] _onx_div_cast_reducesum_cast_where_not_equal_indices00020000:()
    [GraphBuilder-OKE.set_rank] _onx_div_cast_reducesum_cast_where_not_equal_indices00020000:0
    [GraphBuilder-OKE.make_tensor_output] _onx_div_cast_reducesum_cast_where_not_equal_indices00020000[0: None]
    [GraphBuilderPatternOptimization-LYU.optimize] skips SplitConcatPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] skips Sub1MulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips SwitchOrderBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips SwitchReshapeActivationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips TransposeEqualReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips TransposeMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] skips TransposeReshapeMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [TransposeReshapeTransposePattern.match] NONE - line: 140:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [TransposeReshapeTransposePattern.match] NONE - line: 140:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear2
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [TransposeTransposePattern.match] NONE - line: 51:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [TransposeTransposePattern.match] NONE - line: 51:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear2
    [GraphBuilderPatternOptimization-LYU.optimize] skips UnsqueezeEqualPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1]
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] done all: -0 +0 nodes
    [GraphBuilder-LYU.remove_identity_nodes] -- starts with 7
    [GraphBuilder-LYU.remove_identity_nodes] found 0 replacements
    [GraphBuilder-LYU.remove_identity_nodes] kept 7 nodes
    [GraphBuilder-LYU.remove_identity_nodes] ends with 7 nodes in 4.395500218379311e-05 seconds
    [GraphBuilderPatternOptimization-LYU.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-LYU.optimize] iteration 1: 7 nodes, priority=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [CastCastBinaryPattern.match] NONE - line: 86:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2
    [CastCastBinaryPattern.match] NONE - line: 86:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [CastOpCastPattern.match] NONE - line: 162:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2
    [CastOpCastPattern.match] NONE - line: 159:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ComputationCastOpCastPattern with main_opset=18 and min_opset=1
    [ComputationCastOpCastPattern.match] NONE - line: 303:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2
    [ComputationCastOpCastPattern.match] NONE - line: 303:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 165:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=linear
    [IdentityPattern.match] NONE - line: 197:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset2
    [IdentityPattern.match] NONE - line: 165:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=linear2
    [IdentityPattern.match] NONE - line: 210:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset4
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [ReshapeMatMulReshapePattern.match] NONE - line: 770:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset
    [ReshapeMatMulReshapePattern.match] NONE - line: 770:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [Reshape2Of3Pattern.match] NONE - line: 227:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset2
    [Reshape2Of3Pattern.match] NONE - line: 227:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset4
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [ReshapeReshapeBinaryPattern.match] NONE - line: 389:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset2
    [ReshapeReshapeBinaryPattern.match] NONE - line: 389:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset4
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] match=MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
    [GraphBuilderPatternOptimization-LYU.optimize] match=MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [MatMulReshape2Of3Pattern.match] NONE - line: 390:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset
    [MatMulReshape2Of3Pattern.match] NONE - line: 390:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [MulMulMatMulPattern.match] NONE - line: 706:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset
    [MulMulMatMulPattern.match] NONE - line: 709:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1168:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [TransposeEqualReshapePattern.match] NONE - line: 342:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [GraphBuilderPatternOptimization-LYU.optimize] match=MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization.match] OVERLAP match=MatchResult: TransposeMatMulPattern replaces ['Transpose', 'MatMul'] #marked: 5)
    [GraphBuilderPatternOptimization.match] OVERLAP match=MatchResult: TransposeMatMulPattern replaces ['Transpose', 'MatMul'] #marked: 5)
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeReshapeMatMulPattern.match] NONE - line: 1023:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset
    [TransposeReshapeMatMulPattern.match] NONE - line: 1023:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [TransposeReshapeTransposePattern.match] NONE - line: 140:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [TransposeReshapeTransposePattern.match] NONE - line: 140:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear2
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [TransposeTransposePattern.match] NONE - line: 51:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [TransposeTransposePattern.match] NONE - line: 51:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear2
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] applies 3 matches, 2*MatMulAddPattern, 1*TransposeEqualReshapePattern - time=0.001 | max_time=IdentityPattern:0.000
    [GraphBuilderPatternOptimization-LYU.optimize] apply MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'], inputs: {'layers.0.bias', '_onx_transpose_p_layers_0_weight0', '_onx_matmul_x0', 'x'}, outputs: {'_onx_matmul_x0', 'linear'}
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
      - MatMul: ['x', '_onx_transpose_p_layers_0_weight0'] -> ['_onx_matmul_x0']
      - Add: ['_onx_matmul_x0', 'layers.0.bias'] -> ['linear']
      + Gemm: ['x', '_onx_transpose_p_layers_0_weight0', 'layers.0.bias'] -> ['linear']
    [GraphBuilder-LYU.set_type] linear:1
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'] applied.
    [GraphBuilderPatternOptimization-LYU.optimize] - add ['Gemm']
    [GraphBuilderPatternOptimization-LYU.optimize] done MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']: -2 +1 nodes
    [GraphBuilderPatternOptimization-LYU.optimize] removed outputs {'_onx_matmul_x0'}
    [GraphBuilderPatternOptimization-LYU.optimize] apply MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'], inputs: {'relu', '_onx_transpose_p_layers_2_weight0', '_onx_matmul_relu0', 'layers.2.bias'}, outputs: {'output_0', '_onx_matmul_relu0'}
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
      - MatMul: ['relu', '_onx_transpose_p_layers_2_weight0'] -> ['_onx_matmul_relu0']
      - Add: ['_onx_matmul_relu0', 'layers.2.bias'] -> ['output_0']
      + Gemm: ['relu', '_onx_transpose_p_layers_2_weight0', 'layers.2.bias'] -> ['output_0']
    [GraphBuilder-LYU.set_type] output_0:1
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'] applied.
    [GraphBuilderPatternOptimization-LYU.optimize] - add ['Gemm']
    [GraphBuilderPatternOptimization-LYU.optimize] done MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']: -2 +1 nodes
    [GraphBuilderPatternOptimization-LYU.optimize] removed outputs {'_onx_matmul_relu0'}
    [GraphBuilderPatternOptimization-LYU.optimize] apply MatchResult: TransposeEqualReshapePattern replaces ['Transpose'], inputs: {'layers.2.weight'}, outputs: {'_onx_transpose_p_layers_2_weight0'}
    [GraphBuilder-LYU.set_shape] init7_s2_-1_1:(2,)
    [GraphBuilder-LYU.set_rank] init7_s2_-1_1:1
    [GraphBuilder-LYU.set_type] init7_s2_-1_1:7
    [GraphBuilder-LYU.make_initializer] init7_s2_-1_1[7:(2,)]
    [GraphBuilder-LYU.update_node_constant] new constant 'init7_s2_-1_1', node=None
    [GraphBuilder-LYU.update_node_constant] new constant '_onx_transpose_p_layers_2_weight0', node=Reshape
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
      - Transpose: ['layers.2.weight'] -> ['_onx_transpose_p_layers_2_weight0']
      + Reshape: ['layers.2.weight', 'init7_s2_-1_1'] -> ['_onx_transpose_p_layers_2_weight0']
    [GraphBuilder-LYU.update_node_constant] new constant '_onx_transpose_p_layers_2_weight0', node=Reshape
    [GraphBuilder-LYU.set_type] _onx_transpose_p_layers_2_weight0:1
    [GraphBuilder-LYU.set_type] _onx_transpose_p_layers_2_weight0:1
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] applied.
    [GraphBuilderPatternOptimization-LYU.optimize] - add ['Reshape']
    [GraphBuilderPatternOptimization-LYU.optimize] done MatchResult: TransposeEqualReshapePattern replaces ['Transpose']: -1 +1 nodes
    [GraphBuilderPatternOptimization-LYU.optimize] done all: -5 +3 nodes
    [GraphBuilder-LYU.remove_identity_nodes] -- starts with 5
    [GraphBuilder-LYU.remove_identity_nodes] found 0 replacements
    [GraphBuilder-LYU.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-LYU.remove_identity_nodes] ends with 5 nodes in 1.908899866975844e-05 seconds
    [GraphBuilderPatternOptimization-LYU.optimize] iteration 2: 5 nodes, priority=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ComputationCastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 165:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=linear
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [ReshapePattern.match] NONE - line: 38:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--linear2
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 50:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--Opset
    [MatMulAddPattern.match] NONE - line: 47:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--Opset3
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] match=MatchResult: GemmTransposePattern replaces ['Gemm']
    [GraphBuilderPatternOptimization-LYU.optimize] match=MatchResult: GemmTransposePattern replaces ['Gemm']
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [ReshapeReshapePattern.match] NONE - line: 167:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--linear2
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1168:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [TransposeEqualReshapePattern.match] NONE - line: 342:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization.match] OVERLAP match=MatchResult: TransposeMatMulPattern replaces ['Transpose', 'Gemm'] #marked: 2)
    [TransposeMatMulPattern.match] NONE - line: 880:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--Opset3
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [TransposeReshapeTransposePattern.match] NONE - line: 140:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [TransposeTransposePattern.match] NONE - line: 51:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.000 | max_time=LeakyReluPattern:0.000
    [GraphBuilderPatternOptimization-LYU.optimize] apply MatchResult: GemmTransposePattern replaces ['Gemm'], inputs: {'layers.0.bias', '_onx_transpose_p_layers_0_weight0', 'x'}, outputs: {'linear'}
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_0_weight0', node=Transpose
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm']
      - Gemm: ['x', '_onx_transpose_p_layers_0_weight0', 'layers.0.bias'] -> ['linear']
      + Transpose: ['_onx_transpose_p_layers_0_weight0'] -> ['GemmTransposePattern--_onx_transpose_p_layers_0_weight0']
      + Gemm: ['x', 'GemmTransposePattern--_onx_transpose_p_layers_0_weight0', 'layers.0.bias'] -> ['linear']
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_0_weight0', node=Transpose
    [GraphBuilder-LYU.set_type] GemmTransposePattern--_onx_transpose_p_layers_0_weight0:1
    [GraphBuilder-LYU.set_shape] GemmTransposePattern--_onx_transpose_p_layers_0_weight0:(32, 10)
    [GraphBuilder-LYU.set_rank] GemmTransposePattern--_onx_transpose_p_layers_0_weight0:2
    [GraphBuilder-LYU.set_type] linear:1
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm'] applied.
    [GraphBuilderPatternOptimization-LYU.optimize] - add ['Transpose', 'Gemm']
    [GraphBuilderPatternOptimization-LYU.optimize] done MatchResult: GemmTransposePattern replaces ['Gemm']: -1 +2 nodes
    [GraphBuilderPatternOptimization-LYU.optimize] apply MatchResult: GemmTransposePattern replaces ['Gemm'], inputs: {'relu', '_onx_transpose_p_layers_2_weight0', 'layers.2.bias'}, outputs: {'output_0'}
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_2_weight0', node=Transpose
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm']
      - Gemm: ['relu', '_onx_transpose_p_layers_2_weight0', 'layers.2.bias'] -> ['output_0']
      + Transpose: ['_onx_transpose_p_layers_2_weight0'] -> ['GemmTransposePattern--_onx_transpose_p_layers_2_weight0']
      + Gemm: ['relu', 'GemmTransposePattern--_onx_transpose_p_layers_2_weight0', 'layers.2.bias'] -> ['output_0']
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_2_weight0', node=Transpose
    [GraphBuilder-LYU.set_type] GemmTransposePattern--_onx_transpose_p_layers_2_weight0:1
    [GraphBuilder-LYU.set_shape] GemmTransposePattern--_onx_transpose_p_layers_2_weight0:(1, 32)
    [GraphBuilder-LYU.set_rank] GemmTransposePattern--_onx_transpose_p_layers_2_weight0:2
    [GraphBuilder-LYU.set_type] output_0:1
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm'] applied.
    [GraphBuilderPatternOptimization-LYU.optimize] - add ['Transpose', 'Gemm']
    [GraphBuilderPatternOptimization-LYU.optimize] done MatchResult: GemmTransposePattern replaces ['Gemm']: -1 +2 nodes
    [GraphBuilderPatternOptimization-LYU.optimize] done all: -2 +4 nodes
    [GraphBuilder-LYU.remove_identity_nodes] -- starts with 7
    [GraphBuilder-LYU.remove_identity_nodes] found 0 replacements
    [GraphBuilder-LYU.remove_identity_nodes] kept 7 nodes
    [GraphBuilder-LYU.remove_identity_nodes] ends with 7 nodes in 2.2805001208325848e-05 seconds
    [GraphBuilderPatternOptimization-LYU.optimize] iteration 3: 7 nodes, priority=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ComputationCastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 165:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=linear
    [IdentityPattern.match] NONE - line: 165:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset
    [IdentityPattern.match] NONE - line: 165:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [ReshapePattern.match] NONE - line: 38:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--linear2
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 50:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2
    [MatMulAddPattern.match] NONE - line: 47:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [GemmTransposePattern.match] NONE - line: 297:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2
    [GemmTransposePattern.match] NONE - line: 297:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [ReshapeReshapePattern.match] NONE - line: 167:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--linear2
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1168:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [TransposeEqualReshapePattern.match] NONE - line: 342:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [TransposeEqualReshapePattern.match] NONE - line: 342:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset
    [GraphBuilderPatternOptimization-LYU.optimize] match=MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 918:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2
    [TransposeMatMulPattern.match] NONE - line: 918:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [TransposeReshapeTransposePattern.match] NONE - line: 140:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=linear
    [TransposeReshapeTransposePattern.match] NONE - line: 140:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset
    [TransposeReshapeTransposePattern.match] NONE - line: 140:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] match=MatchResult: TransposeTransposePattern replaces ['Transpose', 'Transpose']
    [TransposeTransposePattern.match] NONE - line: 51:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset
    [TransposeTransposePattern.match] NONE - line: 51:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] applies 2 matches, 1*TransposeEqualReshapePattern, 1*TransposeTransposePattern - time=0.000 | max_time=TransposeMatMulPattern:0.000
    [GraphBuilderPatternOptimization-LYU.optimize] apply MatchResult: TransposeEqualReshapePattern replaces ['Transpose'], inputs: {'_onx_transpose_p_layers_2_weight0'}, outputs: {'GemmTransposePattern--_onx_transpose_p_layers_2_weight0'}
    [GraphBuilder-LYU.set_shape] init7_s2_1_-1:(2,)
    [GraphBuilder-LYU.set_rank] init7_s2_1_-1:1
    [GraphBuilder-LYU.set_type] init7_s2_1_-1:7
    [GraphBuilder-LYU.make_initializer] init7_s2_1_-1[7:(2,)]
    [GraphBuilder-LYU.update_node_constant] new constant 'init7_s2_1_-1', node=None
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_2_weight0', node=Reshape
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
      - Transpose: ['_onx_transpose_p_layers_2_weight0'] -> ['GemmTransposePattern--_onx_transpose_p_layers_2_weight0']
      + Reshape: ['_onx_transpose_p_layers_2_weight0', 'init7_s2_1_-1'] -> ['GemmTransposePattern--_onx_transpose_p_layers_2_weight0']
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_2_weight0', node=Reshape
    [GraphBuilder-LYU.set_type] GemmTransposePattern--_onx_transpose_p_layers_2_weight0:1
    [GraphBuilder-LYU.set_type] GemmTransposePattern--_onx_transpose_p_layers_2_weight0:1
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] applied.
    [GraphBuilderPatternOptimization-LYU.optimize] - add ['Reshape']
    [GraphBuilderPatternOptimization-LYU.optimize] done MatchResult: TransposeEqualReshapePattern replaces ['Transpose']: -1 +1 nodes
    [GraphBuilderPatternOptimization-LYU.optimize] apply MatchResult: TransposeTransposePattern replaces ['Transpose', 'Transpose'], inputs: {'_onx_transpose_p_layers_0_weight0', 'layers.0.weight'}, outputs: {'_onx_transpose_p_layers_0_weight0', 'GemmTransposePattern--_onx_transpose_p_layers_0_weight0'}
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_0_weight0', node=Identity
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: TransposeTransposePattern replaces ['Transpose', 'Transpose']
      - Transpose: ['layers.0.weight'] -> ['_onx_transpose_p_layers_0_weight0']
      - Transpose: ['_onx_transpose_p_layers_0_weight0'] -> ['GemmTransposePattern--_onx_transpose_p_layers_0_weight0']
      + Identity: ['layers.0.weight'] -> ['GemmTransposePattern--_onx_transpose_p_layers_0_weight0']
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_0_weight0', node=Identity
    [GraphBuilder-LYU.set_type] GemmTransposePattern--_onx_transpose_p_layers_0_weight0:1
    [GraphBuilder-LYU.update_node_constant] new constant 'GemmTransposePattern--_onx_transpose_p_layers_0_weight0', node=Identity
    [GraphBuilderPatternOptimization-LYU.apply_match] MatchResult: TransposeTransposePattern replaces ['Transpose', 'Transpose'] applied.
    [GraphBuilderPatternOptimization-LYU.optimize] - add ['Identity']
    [GraphBuilderPatternOptimization-LYU.optimize] done MatchResult: TransposeTransposePattern replaces ['Transpose', 'Transpose']: -2 +1 nodes
    [GraphBuilderPatternOptimization-LYU.optimize] removed outputs {'_onx_transpose_p_layers_0_weight0'}
    [GraphBuilderPatternOptimization-LYU.optimize] done all: -3 +2 nodes
    [GraphBuilder-LYU.remove_identity_nodes] -- starts with 6
    [GraphBuilder-LYU.remove_identity_nodes] found 1 replacements
    [GraphBuilder-LYU.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-LYU.remove_identity_nodes] node Gemm-GemmTransposePattern--MatMulAddPattern--Opset2:['x', 'GemmTransposePattern--_onx_transpose_p_layers_0_weight0', 'layers.0.bias']->['x', 'layers.0.weight', 'layers.0.bias']:['linear']->['linear']
    [GraphBuilder-LYU.remove_identity_nodes] ends with 5 nodes in 4.711200017482042e-05 seconds
    [GraphBuilderPatternOptimization-LYU.optimize] iteration 4: 5 nodes, priority=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ComputationCastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [ReshapePattern.match] NONE - line: 38:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--linear2
    [ReshapePattern.match] NONE - line: 38:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 50:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2
    [MatMulAddPattern.match] NONE - line: 47:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [GemmTransposePattern.match] NONE - line: 297:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2
    [GemmTransposePattern.match] NONE - line: 297:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [ReshapeReshapePattern.match] NONE - line: 178:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--linear2
    [ReshapeReshapePattern.match] NONE - line: 167:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1168:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 880:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2
    [TransposeMatMulPattern.match] NONE - line: 880:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-LYU.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-LYU.optimize] stops current_priority_index=2, priorities=[0, 1]
    [GraphBuilderPatternOptimization-LYU.optimize] done after 5 iterations with 5 nodes in 0.006
        STAT apply_GemmTransposePattern +4 -2 #it=1 maxmatch=1 i=2 - time=0.0002693349997571204
        STAT apply_MatMulAddPattern +2 -4 #it=1 maxmatch=1 i=2 - time=0.00017815400133258663
        STAT apply_TransposeEqualReshapePattern +2 -2 #it=2 maxmatch=2 i=2 - time=0.0004642809981305618
        STAT apply_TransposeTransposePattern +1 -2 #it=1 maxmatch=1 i=1 - time=0.00014603599993279204
        STAT build_graph_for_pattern +0 -0 #it=5 maxmatch=0 i=0 - time=0.00014401800217456184
        STAT check_pattern_00 +0 -0 #it=1 maxmatch=0 i=0 - time=1.7701000615488738e-05
        STAT check_pattern_A0 +0 -0 #it=3 maxmatch=0 i=0 - time=0.00010430700785946101
        STAT check_pattern_B0 +0 -0 #it=4 maxmatch=0 i=0 - time=5.523499567061663e-05
        STAT match_BatchNormalizationPattern +0 -0 #it=5 maxmatch=0 i=0 - time=4.106500273337588e-05
        STAT match_BatchNormalizationTrainingPattern +0 -0 #it=5 maxmatch=0 i=0 - time=2.7318001230014488e-05
        STAT match_CastCastBinaryPattern +0 -0 #it=4 maxmatch=0 i=0 - time=4.295500184525736e-05
        STAT match_CastLayerNormalizationCastPattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.1321997337508947e-05
        STAT match_CastOpCastPattern +0 -0 #it=4 maxmatch=0 i=0 - time=3.930100137949921e-05
        STAT match_CastPattern +0 -0 #it=5 maxmatch=0 i=0 - time=2.9682993044843897e-05
        STAT match_ClipClipPattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.9863997295033187e-05
        STAT match_ComputationCastOpCastPattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.7710000722436234e-05
        STAT match_ConvBiasNullPattern +0 -0 #it=5 maxmatch=0 i=0 - time=2.3543001589132473e-05
        STAT match_DropoutPattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.7659996956354007e-05
        STAT match_ExpandBroadcastPattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.848499960033223e-05
        STAT match_ExpandPattern +0 -0 #it=5 maxmatch=0 i=0 - time=2.205899727414362e-05
        STAT match_ExpandSwapPattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.7235008272109553e-05
        STAT match_GeluPattern +0 -0 #it=5 maxmatch=0 i=0 - time=6.1920036387164146e-06
        STAT match_GemmTransposePattern +0 -0 #it=4 maxmatch=2 i=2 - time=8.446499850833789e-05
        STAT match_IdentityPattern +0 -0 #it=5 maxmatch=0 i=0 - time=0.0003636820038082078
        STAT match_LayerNormalizationPattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.163300450774841e-05
        STAT match_LayerNormalizationScalePattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.9529998098732904e-05
        STAT match_LeakyReluPattern +0 -0 #it=5 maxmatch=0 i=0 - time=0.0007512060001317877
        STAT match_MatMulAddPattern +0 -0 #it=4 maxmatch=2 i=2 - time=0.00010589600060484372
        STAT match_MatMulReshape2Of3Pattern +0 -0 #it=4 maxmatch=2 i=0 - time=5.434400372905657e-05
        STAT match_MulMulMatMulPattern +0 -0 #it=4 maxmatch=2 i=0 - time=3.6158999137114733e-05
        STAT match_MulMulMulScalarPattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.648199733812362e-05
        STAT match_ReduceReshapePattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.3526001314166933e-05
        STAT match_ReduceSumNormalizePattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.008199953706935e-05
        STAT match_Reshape2Of3Pattern +0 -0 #it=4 maxmatch=0 i=0 - time=4.304800313548185e-05
        STAT match_ReshapeMatMulReshapePattern +0 -0 #it=4 maxmatch=0 i=0 - time=3.811900023720227e-05
        STAT match_ReshapePattern +0 -0 #it=5 maxmatch=0 i=0 - time=0.00011494900172692724
        STAT match_ReshapeReshapeBinaryPattern +0 -0 #it=4 maxmatch=0 i=0 - time=3.5535998904379085e-05
        STAT match_ReshapeReshapePattern +0 -0 #it=5 maxmatch=2 i=0 - time=6.318899249890819e-05
        STAT match_RotaryConcatPartPattern +0 -0 #it=4 maxmatch=2 i=0 - time=2.2735996026312932e-05
        STAT match_SameChildrenPattern +0 -0 #it=5 maxmatch=2 i=0 - time=4.180800533504225e-05
        STAT match_SequenceConstructAtPattern +0 -0 #it=4 maxmatch=2 i=0 - time=2.0724997739307582e-05
        STAT match_SliceSlicePattern +0 -0 #it=4 maxmatch=2 i=0 - time=1.927100311149843e-05
        STAT match_SlicesSplitPattern +0 -0 #it=4 maxmatch=2 i=0 - time=1.9360999431228265e-05
        STAT match_SoftmaxCrossEntropyLossCastPattern +0 -0 #it=5 maxmatch=2 i=0 - time=0.0014468930021394044
        STAT match_SplitConcatPattern +0 -0 #it=4 maxmatch=2 i=0 - time=1.869600237114355e-05
        STAT match_SqueezeUnsqueezePattern +0 -0 #it=5 maxmatch=2 i=0 - time=3.007699706358835e-05
        STAT match_Sub1MulPattern +0 -0 #it=4 maxmatch=2 i=0 - time=1.7219994333572686e-05
        STAT match_SwitchOrderBinaryPattern +0 -0 #it=4 maxmatch=2 i=0 - time=2.5965004169847816e-05
        STAT match_SwitchReshapeActivationPattern +0 -0 #it=4 maxmatch=2 i=0 - time=4.460799755179323e-05
        STAT match_TransposeEqualReshapePattern +0 -0 #it=4 maxmatch=3 i=2 - time=8.380600047530606e-05
        STAT match_TransposeMatMulPattern +0 -0 #it=4 maxmatch=3 i=0 - time=0.00011956700109294616
        STAT match_TransposeReshapeMatMulPattern +0 -0 #it=4 maxmatch=3 i=0 - time=3.4086002415278926e-05
        STAT match_TransposeReshapeTransposePattern +0 -0 #it=5 maxmatch=3 i=0 - time=7.191600161604583e-05
        STAT match_TransposeTransposePattern +0 -0 #it=5 maxmatch=3 i=1 - time=7.079400529619306e-05
        STAT match_UnsqueezeEqualPattern +0 -0 #it=4 maxmatch=3 i=0 - time=1.8144000932807103e-05
        STAT match_UnsqueezeUnsqueezePattern +0 -0 #it=5 maxmatch=3 i=0 - time=2.5593002646928653e-05
        STAT remove_identity_nodes +1 -2 #it=4 maxmatch=0 i=0 - time=0.00019592000171542168
    --MODEL: 5 nodes, 1 inputs, 1 outputs, 6 initializers--
             INPUT:   1 x 1t
         INPUT-SEQ:   1 x Falset
            OUTPUT:   1 x 1t
        OUTPUT-SEQ:   1 x Falset
              INIT:   4 x 1t
              INIT:   2 x 7t
              NODE:   2 x Gemm
              NODE:   1 x Relu
              NODE:   2 x Reshape
    --MODEL: 5 nodes, 1 inputs, 1 outputs, 6 initializers--DETAILED--
         INPUT:   1 x 1t[3x10]
        OUTPUT:   1 x 1t[3x1]
          INIT:   1 x 1t[1]
          INIT:   1 x 1t[1x32]
          INIT:   1 x 1t[32]
          INIT:   1 x 1t[32x10]
          INIT:   2 x 7t[2]
          NODE:   1 x Gemm -SIG- 1t[3x10], 1t[32x10], 1t[32]
          NODE:   1 x Gemm -SIG- 1t[3x32], 1t[1x32], 1t[1]
          NODE:   1 x Relu -SIG- 1t[3x32]
          NODE:   1 x Reshape -SIG- 1t[1x32], 7t[2]
          NODE:   1 x Reshape -SIG- 1t[32x1], 7t[2]
    [GraphBuilder-LYU.optimize] done with 5 nodes in 0.008
        STAT apply_GemmTransposePattern +4 -2 #it=1 maxmatch=1 i=2 - time=0.0002693349997571204
        STAT apply_MatMulAddPattern +2 -4 #it=1 maxmatch=1 i=2 - time=0.00017815400133258663
        STAT apply_TransposeEqualReshapePattern +2 -2 #it=2 maxmatch=2 i=2 - time=0.0004642809981305618
        STAT apply_TransposeTransposePattern +1 -2 #it=1 maxmatch=1 i=1 - time=0.00014603599993279204
        STAT build_graph_for_pattern +0 -0 #it=5 maxmatch=0 i=0 - time=0.00014401800217456184
        STAT check_A +0 -0 #it=0 maxmatch=0 i=0 - time=1.7382000805810094e-05
        STAT check_B +0 -0 #it=0 maxmatch=0 i=0 - time=1.4330998965306208e-05
        STAT check_C +0 -0 #it=0 maxmatch=0 i=0 - time=1.3492997823050246e-05
        STAT check_F +0 -0 #it=0 maxmatch=0 i=0 - time=1.8820999684976414e-05
        STAT check_G +0 -0 #it=0 maxmatch=0 i=0 - time=1.298200004384853e-05
        STAT check_pattern_00 +0 -0 #it=1 maxmatch=0 i=0 - time=1.7701000615488738e-05
        STAT check_pattern_A0 +0 -0 #it=3 maxmatch=0 i=0 - time=0.00010430700785946101
        STAT check_pattern_B0 +0 -0 #it=4 maxmatch=0 i=0 - time=5.523499567061663e-05
        STAT match_BatchNormalizationPattern +0 -0 #it=5 maxmatch=0 i=0 - time=4.106500273337588e-05
        STAT match_BatchNormalizationTrainingPattern +0 -0 #it=5 maxmatch=0 i=0 - time=2.7318001230014488e-05
        STAT match_CastCastBinaryPattern +0 -0 #it=4 maxmatch=0 i=0 - time=4.295500184525736e-05
        STAT match_CastLayerNormalizationCastPattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.1321997337508947e-05
        STAT match_CastOpCastPattern +0 -0 #it=4 maxmatch=0 i=0 - time=3.930100137949921e-05
        STAT match_CastPattern +0 -0 #it=5 maxmatch=0 i=0 - time=2.9682993044843897e-05
        STAT match_ClipClipPattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.9863997295033187e-05
        STAT match_ComputationCastOpCastPattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.7710000722436234e-05
        STAT match_ConvBiasNullPattern +0 -0 #it=5 maxmatch=0 i=0 - time=2.3543001589132473e-05
        STAT match_DropoutPattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.7659996956354007e-05
        STAT match_ExpandBroadcastPattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.848499960033223e-05
        STAT match_ExpandPattern +0 -0 #it=5 maxmatch=0 i=0 - time=2.205899727414362e-05
        STAT match_ExpandSwapPattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.7235008272109553e-05
        STAT match_GeluPattern +0 -0 #it=5 maxmatch=0 i=0 - time=6.1920036387164146e-06
        STAT match_GemmTransposePattern +0 -0 #it=4 maxmatch=2 i=2 - time=8.446499850833789e-05
        STAT match_IdentityPattern +0 -0 #it=5 maxmatch=0 i=0 - time=0.0003636820038082078
        STAT match_LayerNormalizationPattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.163300450774841e-05
        STAT match_LayerNormalizationScalePattern +0 -0 #it=4 maxmatch=0 i=0 - time=1.9529998098732904e-05
        STAT match_LeakyReluPattern +0 -0 #it=5 maxmatch=0 i=0 - time=0.0007512060001317877
        STAT match_MatMulAddPattern +0 -0 #it=4 maxmatch=2 i=2 - time=0.00010589600060484372
        STAT match_MatMulReshape2Of3Pattern +0 -0 #it=4 maxmatch=2 i=0 - time=5.434400372905657e-05
        STAT match_MulMulMatMulPattern +0 -0 #it=4 maxmatch=2 i=0 - time=3.6158999137114733e-05
        STAT match_MulMulMulScalarPattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.648199733812362e-05
        STAT match_ReduceReshapePattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.3526001314166933e-05
        STAT match_ReduceSumNormalizePattern +0 -0 #it=4 maxmatch=0 i=0 - time=2.008199953706935e-05
        STAT match_Reshape2Of3Pattern +0 -0 #it=4 maxmatch=0 i=0 - time=4.304800313548185e-05
        STAT match_ReshapeMatMulReshapePattern +0 -0 #it=4 maxmatch=0 i=0 - time=3.811900023720227e-05
        STAT match_ReshapePattern +0 -0 #it=5 maxmatch=0 i=0 - time=0.00011494900172692724
        STAT match_ReshapeReshapeBinaryPattern +0 -0 #it=4 maxmatch=0 i=0 - time=3.5535998904379085e-05
        STAT match_ReshapeReshapePattern +0 -0 #it=5 maxmatch=2 i=0 - time=6.318899249890819e-05
        STAT match_RotaryConcatPartPattern +0 -0 #it=4 maxmatch=2 i=0 - time=2.2735996026312932e-05
        STAT match_SameChildrenPattern +0 -0 #it=5 maxmatch=2 i=0 - time=4.180800533504225e-05
        STAT match_SequenceConstructAtPattern +0 -0 #it=4 maxmatch=2 i=0 - time=2.0724997739307582e-05
        STAT match_SliceSlicePattern +0 -0 #it=4 maxmatch=2 i=0 - time=1.927100311149843e-05
        STAT match_SlicesSplitPattern +0 -0 #it=4 maxmatch=2 i=0 - time=1.9360999431228265e-05
        STAT match_SoftmaxCrossEntropyLossCastPattern +0 -0 #it=5 maxmatch=2 i=0 - time=0.0014468930021394044
        STAT match_SplitConcatPattern +0 -0 #it=4 maxmatch=2 i=0 - time=1.869600237114355e-05
        STAT match_SqueezeUnsqueezePattern +0 -0 #it=5 maxmatch=2 i=0 - time=3.007699706358835e-05
        STAT match_Sub1MulPattern +0 -0 #it=4 maxmatch=2 i=0 - time=1.7219994333572686e-05
        STAT match_SwitchOrderBinaryPattern +0 -0 #it=4 maxmatch=2 i=0 - time=2.5965004169847816e-05
        STAT match_SwitchReshapeActivationPattern +0 -0 #it=4 maxmatch=2 i=0 - time=4.460799755179323e-05
        STAT match_TransposeEqualReshapePattern +0 -0 #it=4 maxmatch=3 i=2 - time=8.380600047530606e-05
        STAT match_TransposeMatMulPattern +0 -0 #it=4 maxmatch=3 i=0 - time=0.00011956700109294616
        STAT match_TransposeReshapeMatMulPattern +0 -0 #it=4 maxmatch=3 i=0 - time=3.4086002415278926e-05
        STAT match_TransposeReshapeTransposePattern +0 -0 #it=5 maxmatch=3 i=0 - time=7.191600161604583e-05
        STAT match_TransposeTransposePattern +0 -0 #it=5 maxmatch=3 i=1 - time=7.079400529619306e-05
        STAT match_UnsqueezeEqualPattern +0 -0 #it=4 maxmatch=3 i=0 - time=1.8144000932807103e-05
        STAT match_UnsqueezeUnsqueezePattern +0 -0 #it=5 maxmatch=3 i=0 - time=2.5593002646928653e-05
        STAT pattern_optimization +0 -2 #it=0 maxmatch=0 i=0 - time=0.007170977998612216
        STAT remove_identity_nodes +1 -2 #it=4 maxmatch=0 i=0 - time=0.00023520200193161145
        STAT remove_unused +0 -0 #it=0 maxmatch=0 i=0 - time=7.206200098153204e-05
    --MODEL: 5 nodes, 1 inputs, 1 outputs, 6 initializers--
             INPUT:   1 x 1t
         INPUT-SEQ:   1 x Falset
            OUTPUT:   1 x 1t
        OUTPUT-SEQ:   1 x Falset
              INIT:   4 x 1t
              INIT:   2 x 7t
              NODE:   2 x Gemm
              NODE:   1 x Relu
              NODE:   2 x Reshape
    --MODEL: 5 nodes, 1 inputs, 1 outputs, 6 initializers--DETAILED--
         INPUT:   1 x 1t[3x10]
        OUTPUT:   1 x 1t[3x1]
          INIT:   1 x 1t[1]
          INIT:   1 x 1t[1x32]
          INIT:   1 x 1t[32]
          INIT:   1 x 1t[32x10]
          INIT:   2 x 7t[2]
          NODE:   1 x Gemm -SIG- 1t[3x10], 1t[32x10], 1t[32]
          NODE:   1 x Gemm -SIG- 1t[3x32], 1t[1x32], 1t[1]
          NODE:   1 x Relu -SIG- 1t[3x32]
          NODE:   1 x Reshape -SIG- 1t[1x32], 7t[2]
          NODE:   1 x Reshape -SIG- 1t[32x1], 7t[2]
    [GraphBuilder-LYU.to_onnx] make_model 6 inits 0 params
    [GraphBuilder-LYU.time_evaluation_constants_] 0
    [GraphBuilder-LYU._build_initializers] start with 6 initializers, large_model=False, external_threshold=1024
    [GraphBuilder-LYU._build_initializers] switch low/high order
    [GraphBuilder-LYU._build_initializers] TensorProto-layers.0.weight:1[(32, 10)]
    [GraphBuilder-LYU._build_initializers] TensorProto-layers.0.bias:1[(32,)]
    [GraphBuilder-LYU._build_initializers] TensorProto-layers.2.weight:1[(1, 32)]
    [GraphBuilder-LYU._build_initializers] TensorProto-layers.2.bias:1[(1,)]
    [GraphBuilder-LYU._build_initializers] <ndarray>-init7_s2_-1_1:int64[(2,)]
    [GraphBuilder-LYU._build_initializers] <ndarray>-init7_s2_1_-1:int64[(2,)]
    [GraphBuilder-LYU._build_initializers] done in 1.1749980330932885e-06s with 6 initializers, 0 large initializers
    [GraphBuilder-LYU._add_shape_information] dynamic shapes replacements={}

Select the pattern to use¶

Class OptimizationOptions is used to enable or disable patterns.

<<<

import onnx
from experimental_experiment.xbuilder import GraphBuilder, OptimizationOptions

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(
        patterns="TransposeTranspose,TransposeMatMul", verbose=1
    ),
)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-SOM.optimize] start with 7 nodes
    [GraphBuilder-SOM.optimize] #patterns=2
    [GraphBuilderPatternOptimization-SOM.optimize] start with 7 nodes, 4 initializers, 2 patterns, priorities=[0, 1]
    [GraphBuilderPatternOptimization-SOM.optimize] iteration 0: 7 nodes, priority=0
    [GraphBuilderPatternOptimization-SOM.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-SOM.optimize] iteration 1: 7 nodes, priority=1
    [GraphBuilderPatternOptimization-SOM.optimize] applies 2 matches, 2*TransposeMatMulPattern - time=0.000 | max_time=TransposeMatMulPattern:0.000
    [GraphBuilderPatternOptimization-SOM.optimize] iteration 2: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-SOM.optimize] stops current_priority_index=2, priorities=[0, 1]
    [GraphBuilderPatternOptimization-SOM.optimize] done after 3 iterations with 5 nodes in 0.001
    [GraphBuilder-SOM.optimize] done with 5 nodes in 0.001

There exists some predefined lists of patterns:

default: includes all patterns using only standard onnx patterns.
onnxruntime: patterns specific to onnxruntime, the final model may be executed by onnxruntime and possibly only onnxruntime as it may introduce patterns from Supported Operators and Data Types.

<<<

import onnx
from experimental_experiment.xbuilder import GraphBuilder, OptimizationOptions

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default+onnxruntime", verbose=1),
)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-UDC.optimize] start with 7 nodes
    [GraphBuilder-UDC.optimize] #patterns=63
    [GraphBuilderPatternOptimization-UDC.optimize] start with 7 nodes, 4 initializers, 63 patterns, priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UDC.optimize] iteration 0: 7 nodes, priority=0
    [GraphBuilderPatternOptimization-UDC.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-UDC.optimize] iteration 1: 7 nodes, priority=1
    [GraphBuilderPatternOptimization-UDC.optimize] applies 3 matches, 2*MatMulAddPattern, 1*TransposeEqualReshapePattern - time=0.001 | max_time=IdentityPattern:0.000
    [GraphBuilderPatternOptimization-UDC.optimize] iteration 2: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-UDC.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.001 | max_time=ReshapePattern:0.000
    [GraphBuilderPatternOptimization-UDC.optimize] iteration 3: 7 nodes, priority=1
    [GraphBuilderPatternOptimization-UDC.optimize] applies 2 matches, 1*TransposeEqualReshapePattern, 1*TransposeTransposePattern - time=0.000 | max_time=TransposeMatMulPattern:0.000
    [GraphBuilderPatternOptimization-UDC.optimize] iteration 4: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-UDC.optimize] increase priority to 2
    [GraphBuilderPatternOptimization-UDC.optimize] iteration 5: 5 nodes, priority=2
    [GraphBuilderPatternOptimization-UDC.optimize] increase priority to 3
    [GraphBuilderPatternOptimization-UDC.optimize] iteration 6: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-UDC.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UDC.optimize] done after 7 iterations with 5 nodes in 0.009
    [GraphBuilder-UDC.optimize] done with 5 nodes in 0.010

Statistics¶

This can be used to see when a pattern is applied and how long it takes.

<<<

import pandas
import onnx
from experimental_experiment.xbuilder import GraphBuilder, OptimizationOptions

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default"),
)
stat = gr.optimize()

print(pandas.DataFrame(stat))

>>>

                         pattern   time_in  removed  added  iteration  instances  match_index
                  check_A  0.000055      NaN    NaN        NaN        NaN          NaN
    remove_identity_nodes  0.000095      0.0    0.0        NaN        NaN          NaN
                  check_B  0.000041      NaN    NaN        NaN        NaN          NaN
            remove_unused  0.000105      0.0    NaN        NaN        NaN          NaN
                  check_C  0.000038      NaN    NaN        NaN        NaN          NaN
    ..                       ...       ...      ...    ...        ...        ...          ...
build_graph_for_pattern  0.000324      NaN    NaN        4.0        NaN          NaN
   pattern_optimization  0.020576      2.0    NaN        NaN        NaN          NaN
                check_F  0.000083      NaN    NaN        NaN        NaN          NaN
          remove_unused  0.000163      0.0    NaN        NaN        NaN          NaN
                check_G  0.000067      NaN    NaN        NaN        NaN          NaN
    
    [245 rows x 7 columns]

It can be aggregated:

<<<

import pandas
import onnx
from experimental_experiment.xbuilder import GraphBuilder, OptimizationOptions

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default"),
)
stat = gr.optimize()

df = pandas.DataFrame(stat)
for c in df.columns:
    if "time" not in c and "pattern" not in c:
        df[c] = df[c].fillna(0).astype(int)
aggs = {
    "time_in": "sum",
    "added": "sum",
    "removed": "sum",
    "iteration": "max",
    "match_index": "max",
    "instances": "sum",
}
print(df.groupby("pattern").agg(aggs))

>>>

                                         time_in  added  ...  match_index  instances
    pattern                                              ...                        
    apply_GemmTransposePattern          0.000958      4  ...            1          2
    apply_MatMulAddPattern              0.000873      2  ...            1          2
    apply_TransposeEqualReshapePattern  0.002046      2  ...            2          2
    apply_TransposeTransposePattern     0.000437      1  ...            1          1
    build_graph_for_pattern             0.000616      0  ...            0          0
    ...                                      ...    ...  ...          ...        ...
    match_UnsqueezeEqualPattern         0.000063      0  ...            3          0
    match_UnsqueezeUnsqueezePattern     0.000068      0  ...            3          0
    pattern_optimization                0.021206      0  ...            0          0
    remove_identity_nodes               0.000721      1  ...            0          0
    remove_unused                       0.000225      0  ...            0          0
    
    [64 rows x 6 columns]

Shape inference¶

The optimizers require to know the shapes to ensure they can rewrite some nodes and avoid producing a model which does not return the same results. If it is missing, some patterns cannot match for sure and they will not match.

This information can be built by running shape inference on the onnx models. That’s what is done is the previous examples. However, the best case is when this information comes from torch.

Function to_onnx converts a torch model into ONNX. While doing so, it stores the shape information coming from torch. There is no need to run shape inference on the onnx model it generates before optimizing it.

Available Patterns and API¶

All patterns may be found at .xoptim.patterns and .xoptim.patterns_ort.

When writing a pattern, walking along the graph or checking the shape is very common. Class GraphBuilderPatternOptimization provides the following methods.

Opsets¶

Patterns must rewrite using the nodes of the opset defined in the model.

main_opset: returns the opset

Shapes, Types¶

has_type: tells if a result type is known
get_type: returns a result type, fails if not known
has_shape: tells if a result shape is known
get_shape: returns a result shape, fails if not known
has_rank: tells if a result rank is known
get_rank: returns a result rank, fails if not known
try_infer_type: returns a type if it can be guessed
try_infer_shape: returns a shape if it can be guessed

Constants¶

is_constant: tells if a node is a constant (it may be a constant, an initializer or any value built on other constants)
is_constant_scalar: checks a constant is a scalar and compares its value to a number
get_computed_constant: returns the constant, computes it is a constant built from other constants
get_attribute: returns an attribute of a node

Graph¶

next_node: returns the next node only if there is only one
next_nodes: returns the node consuming this result
node_before: returns the node producing the result
is_output: tells if a result is an output
is_used_by_subgraph: tells if a result is used by a subgraph
is_used_more_than_once: tells if a result is used more than once
is_used_only_by: tells if a result is only used by specific nodes

Nodes¶

make_node: creates a node without adding it to the graph
make_node_check_opset: creates a node without adding it to the graph, deals with some constraints related to opset version