Pattern Optimizer#

The pattern optimizer is implemented by class GraphBuilderPatternOptimization. It searches for a specific sequence of nodes in the graph and replaces it by another one without changing the inputs or the outputs of the graph. The goal of the optimizer is to make the whole computation graph more efficient. The goal of this implementation is to make this optimization as fast as possible. Assuming the nodes in an onnx graph are ordered in a way every input of a node was created by previous nodes, the optimizer must not require any global reordering. The cost should be in O(N P I) in the worst case where N is the number of nodes, P is the number of patterns, I is the number of iterations.

It is difficult to foresee what a pattern needs in order to rewrite a part of the graph. This API tries to give as much freedom as it can without leaving too much to do to the developer which tries to add a new pattern.

Patterns#

Patterns must inherit from PatternOptimization. This class defines two methods.

PatternOptimization.match#

def match(
    self,
    g: "GraphBuilderPatternOptimization",
    node: NodeProto,
    matched: List[MatchResult],
) -> Optional[MatchResult]:
  • g is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.

  • node: the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.

  • matched: usually unused, it contains the list of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

  • a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.

  • A function doing the rewriting (usually method apply of the pattern class).

  • An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewrite. If not specified, the optimizer will automatically determine the position of the new nodes.

Debugging: method none

def none(
    self,
    node: Optional[NodeProto] = None,
    lineno: Optional[int] = None,
    msg: Optional[Union[Callable[[], str], str]] = None,
):

It may be useful to know the reason why a pattern matching failed. Instead of returning None, method match can return the following expression:

return self.none(node, inspect.currentframe().f_lineno)

By setting the verbosity (see next Section), the user may then know which lines in the code returned None and which condition failed. The last parameter is used to print a more comprehensive message about the reason why the match failed.

PatternOptimization.apply#

@classmethod
def apply(
    cls, g: "GraphBuilder", *nodes: Sequence[NodeProto]
) -> List[NodeProto]:

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting. It assumes no other pattern optimizer modified them or will modify them. It receives the list of nodes returned by method match. Since it is a list of arguments, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned nodes.

PatternOptimization.fast_op_type#

@classmethod
def fast_op_type(cls) -> Set[str]:

The base class returns an empty set. Overriding this method is an optional performance hint: when the returned set contains exactly one op_type string, the optimizer builds an op-type → nodes index over the graph once per matching step and restricts enumerate_matches to only the nodes of that type. This avoids iterating over the entire graph for patterns whose entry point is always a specific operator.

When the method returns an empty set (the default) or a set with more than one element, the full node list is used and no pre-filtering takes place.

from yobx.xoptim import PatternOptimization

class ReshapePattern(PatternOptimization):
    """Base class for patterns whose entry node is always a Reshape."""

    @classmethod
    def fast_op_type(cls):
        return {"Reshape"}

Subclasses that always start matching from the same inherited entry point do not need to override fast_op_type; the inherited implementation is already correct.

Optimization Algorithm#

It is implemented in method optimize

def optimize(
    self, max_iter=-1, remove_identity: bool = True
) -> List[Dict[str, Any]]:

The algorithm runs multiple iterations until the graph is not evolving or max_iter is reached. By default, it is equal to the number of nodes. An iteration is:

matches = []

builds all successors and predecessors

# Step 1: match

build op_type → nodes index (fast_nodes)

for all patterns P:

    nodes_to_visit = fast_nodes[P.fast_op_type()]  # pre-filtered
                     if len(P.fast_op_type()) == 1
                     else all nodes

    for all nodes n in nodes_to_visit:

        r = p.match(n)
        if r:
            if no node already scheduled to be rewritten by another match:
                matches.append(r)
# Step 2: apply

for all matches r:
    apply the match r

# Step 3: clean

remove unused nodes
remove identity nodes

This algorithm may apply more than one rewriting at each iteration but it guarantees the local structure when applying the rewriting was not altered by another one.

Adding a pattern#

Simple API#

We consider the following simple model:

<<<

import torch
from yobx.helpers.onnx_helper import pretty_onnx
from yobx.xbuilder import OptimizationOptions
from yobx.torch import to_onnx


class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(10, 32),
            torch.nn.ReLU(),
            torch.nn.Linear(32, 1),
        )

    def forward(self, x):
        return self.layers(x)


x = torch.rand(3, 10)
onx = to_onnx(
    MLP(), (x,), input_names=["x"], options=OptimizationOptions(patterns=None)
)
with open("temp_doc_mlp.onnx", "wb") as f:
    f.write(onx.SerializeToString())
print(pretty_onnx(onx))

>>>

    opset: domain='' version=21
    input: name='x' type=dtype('float32') shape=[3, 10]
    init: name='p_layers_0_weight::T10' type=float32 shape=(10, 32)       -- GraphBuilder.constant_folding.from/fold(p_layers_0_weight)##p_layers_0_weight/DynamoInterpret.placeholder.1/P(layers.0.weight)
    init: name='p_layers_2_weight::T10' type=float32 shape=(32, 1)        -- GraphBuilder.constant_folding.from/fold(p_layers_2_weight)##p_layers_2_weight/DynamoInterpret.placeholder.1/P(layers.2.weight)
    init: name='layers.0.bias' type=float32 shape=(32,)                   -- DynamoInterpret.placeholder.1/P(layers.0.bias)
    init: name='layers.2.bias' type=float32 shape=(1,) -- array([-0.12150986], dtype=float32)-- DynamoInterpret.placeholder.1/P(layers.2.bias)
    MatMul(x, p_layers_0_weight::T10) -> _onx_matmul_x
      Add(_onx_matmul_x, layers.0.bias) -> linear
        Relu(linear) -> relu
          MatMul(relu, p_layers_2_weight::T10) -> _onx_matmul_relu
            Add(_onx_matmul_relu, layers.2.bias) -> output_0
    output: name='output_0' type=dtype('float32') shape=[3, 1]

Which we can render as follows:

digraph { graph [rankdir=TB, splines=true, overlap=false, nodesep=0.2, ranksep=0.2, fontsize=8]; node [style="rounded,filled", color="#888888", fontcolor="#222222", shape=box]; edge [arrowhead=vee, fontsize=7, labeldistance=-5, labelangle=0]; I_0 [label="x\nFLOAT(3,10)", fillcolor="#aaeeaa"]; i_1 [label="p_layers_0_weight::T10\nFLOAT(10, 32)", fillcolor="#cccc00"]; i_2 [label="p_layers_2_weight::T10\nFLOAT(32, 1)", fillcolor="#cccc00"]; i_3 [label="layers.0.bias\nFLOAT(32)", fillcolor="#cccc00"]; MatMul_4 [label="MatMul(., .)", fillcolor="#ee9999"]; Add_5 [label="Add(., .)", fillcolor="#cccccc"]; Relu_6 [label="Relu(.)", fillcolor="#cccccc"]; MatMul_7 [label="MatMul(., .)", fillcolor="#ee9999"]; Add_8 [label="Add(., [-0.14222133])", fillcolor="#cccccc"]; I_0 -> MatMul_4 [label="FLOAT(3,10)"]; i_1 -> MatMul_4 [label="FLOAT(10, 32)"]; MatMul_4 -> Add_5 [label="FLOAT(3,32)"]; i_3 -> Add_5 [label="FLOAT(32)"]; Add_5 -> Relu_6 [label="FLOAT(3,32)"]; Relu_6 -> MatMul_7 [label="FLOAT(3,32)"]; i_2 -> MatMul_7 [label="FLOAT(32, 1)"]; MatMul_7 -> Add_8 [label="FLOAT(3,1)"]; O_9 [label="output_0\nFLOAT(3,1)", fillcolor="#aaaaee"]; Add_8 -> O_9; }

We then apply the optimizations by writing the following code:

<<<

import onnx
from yobx.helpers.onnx_helper import pretty_onnx
from yobx.xbuilder import GraphBuilder
from yobx.doc import demo_mlp_model

onx = demo_mlp_model("temp_doc_mlp.onnx")

# The model is placed in a GraphBuilder.
# It creates dictionaries to store shapes, ranks, types
# to make it easier to the optimizers to find the information
# they need. It still uses NodeProto to store nodes
gr = GraphBuilder(onx, infer_shapes_options=True)

# Let's optimize.
opt_onx = gr.to_onnx(optimize=True)
with open("temp_doc_mlp_opt.onnx", "wb") as f:
    f.write(opt_onx.SerializeToString())
print(pretty_onnx(opt_onx))

>>>

    opset: domain='' version=18
    input: name='x' type=dtype('float32') shape=[3, 10]
    init: name='layers.0.bias' type=float32 shape=(32,)                   -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: name='layers.2.bias' type=float32 shape=(1,) -- array([-0.14222133], dtype=float32)-- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    init: name='GemmTransposePattern--p_layers_0_weight::T10' type=float32 shape=(32, 10)-- GraphBuilder.constant_folding.from/fold(p_layers_0_weight::T10)##p_layers_0_weight::T10/GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
    init: name='GemmTransposePattern--p_layers_2_weight::T10' type=float32 shape=(1, 32)-- GraphBuilder.constant_folding.from/fold(init7_s2_1_32,p_layers_2_weight::T10)##p_layers_2_weight::T10/GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)##init7_s2_1_32/TransposeEqualReshapePattern.apply.new_shape
    Gemm(x, GemmTransposePattern--p_layers_0_weight::T10, layers.0.bias, transB=1) -> linear
      Relu(linear) -> relu
        Gemm(relu, GemmTransposePattern--p_layers_2_weight::T10, layers.2.bias, transB=1) -> output_0
    output: name='output_0' type=dtype('float32') shape=[3, 1]

Which renders as follows:

digraph { graph [rankdir=TB, splines=true, overlap=false, nodesep=0.2, ranksep=0.2, fontsize=8]; node [style="rounded,filled", color="#888888", fontcolor="#222222", shape=box]; edge [arrowhead=vee, fontsize=7, labeldistance=-5, labelangle=0]; I_0 [label="x\nFLOAT(3,10)", fillcolor="#aaeeaa"]; i_1 [label="layers.0.bias\nFLOAT(32)", fillcolor="#cccc00"]; i_2 [label="GemmTransposePattern--p_layers_0_weight::T10\nFLOAT(32, 10)", fillcolor="#cccc00"]; i_3 [label="GemmTransposePattern--p_layers_2_weight::T10\nFLOAT(1, 32)", fillcolor="#cccc00"]; Gemm_4 [label="Gemm(., ., .)", fillcolor="#cccccc"]; Relu_5 [label="Relu(.)", fillcolor="#cccccc"]; Gemm_6 [label="Gemm(., ., [-0.14222133])", fillcolor="#cccccc"]; I_0 -> Gemm_4 [label="FLOAT(3,10)"]; i_2 -> Gemm_4 [label="FLOAT(32, 10)"]; i_1 -> Gemm_4 [label="FLOAT(32)"]; Gemm_4 -> Relu_5 [label="FLOAT(3,32)"]; Relu_5 -> Gemm_6 [label="FLOAT(3,32)"]; i_3 -> Gemm_6 [label="FLOAT(1, 32)"]; O_7 [label="output_0\nFLOAT(3,1)", fillcolor="#aaaaee"]; Gemm_6 -> O_7; }

Verbosity#

<<<

import onnx
from yobx.xbuilder import GraphBuilder
from yobx.doc import demo_mlp_model

onx = demo_mlp_model("temp_doc_mlp.onnx")

gr = GraphBuilder(onx, infer_shapes_options=True, verbose=1)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-XKY._add_shape_information] dynamic shapes replacements={}
    [GraphBuilder-XKY.optimize] start with 5 nodes
    [GraphBuilder-XKY.optimize] #patterns=98
    [GraphBuilder-XKY.optimize] start with subgraphs
    [GraphBuilder-XKY.optimize] done with subgraphs
    [GraphBuilderPatternOptimization-XKY.optimize] start with 5 nodes, 4 initializers, 98 patterns, priorities=[0, 1, 2, 3], max_iter=40
    [GraphBuilderPatternOptimization-XKY.optimize] same children={'SameChildrenPattern', 'SameChildrenFromInputPattern'}
    [GraphBuilderPatternOptimization-XKY.optimize] iteration 0: 5 nodes, priority=0
    [GraphBuilderPatternOptimization-XKY.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-XKY.optimize] iteration 1: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-XKY.optimize] increase priority to 2
    [GraphBuilderPatternOptimization-XKY.optimize] iteration 2: 5 nodes, priority=2
    [GraphBuilderPatternOptimization-XKY.optimize] increase priority to 3
    [GraphBuilderPatternOptimization-XKY.optimize] iteration 3: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-XKY.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.004 | max_time=IdentityPattern:0.000
    [GraphBuilderPatternOptimization-XKY.optimize] iteration 4: 3 nodes, priority=3
    [GraphBuilderPatternOptimization-XKY.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.002 | max_time=GemmTransposePattern:0.000
    [GraphBuilderPatternOptimization-XKY.optimize] iteration 5: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-XKY.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.004 | max_time=PadConvPattern:0.000
    [GraphBuilderPatternOptimization-XKY.optimize] iteration 6: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-XKY.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XKY.optimize] done after 7 iterations with 5 nodes in 0.067
    [OrderOptimization.optimize] ALGO-2
    [OrderOptimization.shape_order] -- starts with 3 nodes, 4 initializers
    [OrderOptimization.shape_order] done after in 0.00023618200066266581s with changed=0 scale=0
    [GraphBuilder-XKY.optimize] done with 3 nodes in 0.077
    [GraphBuilder-XKY.to_onnx] make_model 4 inits 0 params
    [GraphBuilder-XKY.time_evaluation_constants_] 0
    [GraphBuilder-XKY._build_initializers] start with 4 initializers, large_model=False, external_threshold=1024
    [GraphBuilder-XKY._build_initializers] switch low/high order
    [GraphBuilder-XKY._build_initializers] done in 9.733999831951223e-06s with 4 initializers, 0 large initializers
    [GraphBuilder-XKY._add_shape_information] dynamic shapes replacements={}

With more verbosity:

<<<

import onnx
from yobx.xbuilder import GraphBuilder
from yobx.doc import demo_mlp_model

onx = demo_mlp_model("temp_doc_mlp.onnx")

gr = GraphBuilder(onx, infer_shapes_options=True, verbose=11)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-UUQ._update_structures_with_proto] -- starts with 5 nodes
    [GraphBuilder-UUQ.set_shape] p_layers_0_weight::T10:(10, 32)
    [GraphBuilder-UUQ.set_rank] p_layers_0_weight::T10:2
    [GraphBuilder-UUQ.set_type] p_layers_0_weight::T10:1
    [GraphBuilder-UUQ.make_initializer] p_layers_0_weight::T10[1:(10, 32)]
    [GraphBuilder-UUQ.update_node_constant] new constant 'p_layers_0_weight::T10', node=None
    [GraphBuilder-UUQ.set_shape] p_layers_2_weight::T10:(32, 1)
    [GraphBuilder-UUQ.set_rank] p_layers_2_weight::T10:2
    [GraphBuilder-UUQ.set_type] p_layers_2_weight::T10:1
    [GraphBuilder-UUQ.make_initializer] p_layers_2_weight::T10[1:(32, 1)]
    [GraphBuilder-UUQ.update_node_constant] new constant 'p_layers_2_weight::T10', node=None
    [GraphBuilder-UUQ.set_shape] layers.0.bias:(32,)
    [GraphBuilder-UUQ.set_rank] layers.0.bias:1
    [GraphBuilder-UUQ.set_type] layers.0.bias:1
    [GraphBuilder-UUQ.make_initializer] layers.0.bias[1:(32,)]
    [GraphBuilder-UUQ.update_node_constant] new constant 'layers.0.bias', node=None
    [GraphBuilder-UUQ.set_shape] layers.2.bias:(1,)
    [GraphBuilder-UUQ.set_rank] layers.2.bias:1
    [GraphBuilder-UUQ.set_type] layers.2.bias:1
    [GraphBuilder-UUQ.make_initializer] layers.2.bias[1:(1,)]
    [GraphBuilder-UUQ.update_node_constant] new constant 'layers.2.bias', node=None
    [GraphBuilder-UUQ.set_type] x:1
    [GraphBuilder-UUQ.set_shape] x:(3, 10)
    [GraphBuilder-UUQ.set_rank] x:2
    [GraphBuilder-UUQ.set_type] output_0:1
    [GraphBuilder-UUQ.set_shape] output_0:(3, 1)
    [GraphBuilder-UUQ.set_rank] output_0:2
    [GraphBuilder-UUQ.set_type] _onx_matmul_x:1
    [GraphBuilder-UUQ.set_shape] _onx_matmul_x:(3, 32)
    [GraphBuilder-UUQ.set_rank] _onx_matmul_x:2
    [GraphBuilder-UUQ.set_type] linear:1
    [GraphBuilder-UUQ.set_shape] linear:(3, 32)
    [GraphBuilder-UUQ.set_rank] linear:2
    [GraphBuilder-UUQ.set_type] relu:1
    [GraphBuilder-UUQ.set_shape] relu:(3, 32)
    [GraphBuilder-UUQ.set_rank] relu:2
    [GraphBuilder-UUQ.set_type] _onx_matmul_relu:1
    [GraphBuilder-UUQ.set_shape] _onx_matmul_relu:(3, 1)
    [GraphBuilder-UUQ.set_rank] _onx_matmul_relu:2
    [GraphBuilder-UUQ.set_type] output_0:1
    [GraphBuilder-UUQ._update_structures_with_proto] ends with 5 nodes in 0.0022887389995958074
    [GraphBuilder-UUQ.constant_folding] -- starts with 4 constants and 5 nodes.
    [GraphBuilder-UUQ.constant_folding] cst:: . :: x
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: layers.0.bias
    [GraphBuilder-UUQ.constant_folding] cst:: . :: output_0
    [GraphBuilder-UUQ.constant_folding] cst:: . :: _onx_matmul_x
    [GraphBuilder-UUQ.constant_folding] cst:: . :: _onx_matmul_relu
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: p_layers_2_weight::T10
    [GraphBuilder-UUQ.constant_folding] cst:: . :: linear
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: p_layers_0_weight::T10
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: layers.2.bias
    [GraphBuilder-UUQ.constant_folding] cst:: . :: relu
    [GraphBuilder-UUQ.constant_folding] initializer: p_layers_0_weight::T10
    [GraphBuilder-UUQ.constant_folding] initializer: p_layers_2_weight::T10
    [GraphBuilder-UUQ.constant_folding] initializer: layers.0.bias
    [GraphBuilder-UUQ.constant_folding] initializer: layers.2.bias
    [GraphBuilder-UUQ.constant_folding] ends with 4 constants and 5 nodes in 0.0002557380003054277 seconds
    [GraphBuilder-UUQ._update_shape_types_with_proto] -- starts with 5 nodes and 0 shapes.
    [GraphBuilder._update_shape_types_with_proto] infer shapes
    [GraphBuilder._update_shape_types_with_proto] infer shapes done 0.0005857159994775429 seconds
    [GraphBuilder._update_shape_types_with_proto] _clean_shapes after 0.0007166739997046534 seconds
    [GraphBuilder-UUQ._update_shape_types_with_proto] walk through 0 shapes.
    [GraphBuilder-UUQ.set_type] _onx_matmul_x:1
    [_update_shape_types_with_proto_one_result] update shape(_onx_matmul_x) with (3, 32)
    [GraphBuilder-UUQ.set_type] linear:1
    [_update_shape_types_with_proto_one_result] update shape(linear) with (3, 32)
    [GraphBuilder-UUQ.set_type] relu:1
    [_update_shape_types_with_proto_one_result] update shape(relu) with (3, 32)
    [GraphBuilder-UUQ.set_type] _onx_matmul_relu:1
    [_update_shape_types_with_proto_one_result] update shape(_onx_matmul_relu) with (3, 1)
    [GraphBuilder-UUQ._update_shape_types_with_proto] ends in 0.0005067239999334561 seconds.
    [GraphBuilder-UUQ._add_shape_information] dynamic shapes replacements={}
    [GraphBuilder-UUQ.optimize] start with 5 nodes
    [GraphBuilder-UUQ.optimize] options=OptimizationOptions(constant_folding={'Reciprocal', 'Concat', 'Sub', 'Squeeze', 'Add', 'Transpose', 'Unsqueeze', 'Div', 'Cast', 'Reshape', 'Exp', 'Sqrt', 'Mul'}, patterns=[BatchNormalizationPattern(), BatchNormalizationTrainingPattern(), CastLayerNormalizationCastPattern(), CastPattern(), CastCastBinaryPattern(), CastCastPattern(), CastOpCastPattern(), ClipClipPattern(), ConcatEmptyPattern(), ConcatGatherPattern(), ConcatReshapePattern(), ConcatTwiceUnaryPattern(), ConstantToInitializerPattern(), ConvBiasNullPattern(), PadConvPattern(), DropoutPattern(), ExpandPattern(), ExpandBroadcastPattern(), ExpandSwapPattern(), ExpandUnsqueezeExpandPattern(), GathersSplitPattern(), GeluPattern(), IdentityPattern(), LayerNormalizationPattern(), LayerNormalizationScalePattern(), LeakyReluPattern(), MaxReluPattern(), MulMulMulScalarPattern(), MulUnsqueezeUnsqueezePattern(), NotNotPattern(), NotWherePattern(), ReduceArgTopKPattern(), ReduceReshapePattern(), ReduceSumNormalizePattern(), ReshapePattern(), ReshapeMatMulReshapePattern(), Reshape2Of3Pattern(), ReshapeReshapeBinaryPattern(), MatMulAddPattern(), GemmTransposePattern(), MatMulReshape2Of3Pattern(), MulMulMatMulPattern(), ShapeBasedReshapeIsSqueezePattern(), ShapeBasedStaticExpandPattern(), ShapeBasedConcatExpandPattern(), ShapeBasedEditDistanceReshapePattern(), ShapeBasedIdentityPattern(), ShapeBasedExpandBroadcastPattern(), ShapeBasedExpandBroadcastMatMulPattern(), ShapeBasedExpandCastWhereSwapPattern(), ShapeBasedExpandSwapPattern(), ShapeBasedMatMulToMulPattern(), ShapedBasedReshapePattern(), ShapeBasedSameChildrenPattern(), ShapeBasedShapeShapeAddPattern(), ReshapeReshapePattern(), RotaryEmbeddingPattern(), SameChildrenPattern(), SameChildrenFromInputPattern(), SequenceConstructAtPattern(), SplitToSequenceSequenceAtPattern(), SliceSlicePattern(), SlicesSplitPattern(), SoftmaxCrossEntropyLossCastPattern(), SplitConcatPattern(), SqueezeAddPattern(), SqueezeBinaryUnsqueezePattern(), SqueezeUnsqueezePattern(), StaticConcatReshapePattern(), Sub1MulPattern(), SwapExpandReshapePattern(), SwapExpandUnsqueezePattern(), SwapRangeAddScalarPattern(), SwapUnaryPattern(), SwapUnsqueezeTransposePattern(), SwitchOrderBinaryPattern(), SwitchReshapeActivationPattern(), TransposeEqualReshapePattern(), TransposeGatherPattern(), TransposeMatMulPattern(), TransposeReshapeMatMulPattern(), TransposeReshapeTransposePattern(), TransposeTransposePattern(), UnsqueezeEqualPattern(), UnsqueezeOrSqueezeReshapePattern(), UnsqueezeReshapePattern(), UnsqueezeUnsqueezePattern(), WhereAddPattern(), RotaryConcatPartPattern(), FunctionAttentionPattern(), FunctionAttentionGQAPattern(), FunctionCausalMaskPattern(), FunctionCausalMaskMulAddPattern(), FunctionCosSinCachePattern(), FunctionHalfRotaryEmbeddingPattern(), RMSNormalizationPattern(), RMSNormalizationMulPattern(), AttentionGQAPattern()], verbose=11, order=SHAPE)
    -- GRAPH BEFORE OPTIMIZATION --
    
    opset: : 18
    init: p_layers_0_weight::T10: CP1: (10, 32)                            -- GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
    init: p_layers_2_weight::T10: CP1: (32, 1)                             -- GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)
    init: layers.0.bias: CP1: (32,)                                        -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: layers.2.bias: CP1: (1,)                                         -- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    input:: x                                                                       |T1: 3 x 10
    MatMul: x, p_layers_0_weight::T10 -> _onx_matmul_x                              |T1: 3 x 32
    Add: _onx_matmul_x, layers.0.bias -> linear                                     |T1: 3 x 32
    Relu: linear -> relu                                                            |T1: 3 x 32
    MatMul: relu, p_layers_2_weight::T10 -> _onx_matmul_relu                        |T1: 3 x 1
    Add: _onx_matmul_relu, layers.2.bias -> output_0                                |T1: 3 x 1
    output:: output_0                                                               |T1: 3 x 1
    -- END --
    [GraphBuilder-UUQ.optimize] start with subgraphs
    [GraphBuilder-UUQ.optimize] done with subgraphs
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 5
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 5 nodes in 0.00019540899938874645 seconds
    [GraphBuilder-UUQ.constant_folding] -- starts with 4 constants and 5 nodes.
    [GraphBuilder-UUQ.constant_folding] cst:: . :: x
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: layers.0.bias
    [GraphBuilder-UUQ.constant_folding] cst:: . :: output_0
    [GraphBuilder-UUQ.constant_folding] cst:: . :: _onx_matmul_x
    [GraphBuilder-UUQ.constant_folding] cst:: . :: _onx_matmul_relu
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: p_layers_2_weight::T10
    [GraphBuilder-UUQ.constant_folding] cst:: . :: linear
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: p_layers_0_weight::T10
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: layers.2.bias
    [GraphBuilder-UUQ.constant_folding] cst:: . :: relu
    [GraphBuilder-UUQ.constant_folding] initializer: p_layers_0_weight::T10
    [GraphBuilder-UUQ.constant_folding] initializer: p_layers_2_weight::T10
    [GraphBuilder-UUQ.constant_folding] initializer: layers.0.bias
    [GraphBuilder-UUQ.constant_folding] initializer: layers.2.bias
    [GraphBuilder-UUQ.constant_folding] ends with 4 constants and 5 nodes in 0.00015339699984906474 seconds
    [GraphBuilderPatternOptimization-UUQ.optimize] start with 5 nodes, 4 initializers, 98 patterns, priorities=[0, 1, 2, 3], max_iter=40
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   1/98 - P0 - BatchNormalizationPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   2/98 - P0 - BatchNormalizationTrainingPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   3/98 - P0 - CastCastPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   4/98 - P0 - CastPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   5/98 - P0 - ConcatGatherPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   6/98 - P0 - ConcatReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   7/98 - P0 - ConvBiasNullPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   8/98 - P0 - ExpandPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern   9/98 - P0 - ExpandUnsqueezeExpandPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  10/98 - P0 - FunctionAttentionGQAPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  11/98 - P0 - FunctionAttentionPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  12/98 - P0 - GeluPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  13/98 - P0 - IdentityPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  14/98 - P0 - LeakyReluPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  15/98 - P0 - MulUnsqueezeUnsqueezePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  16/98 - P0 - PadConvPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  17/98 - P0 - ReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  18/98 - P0 - ReshapeReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  19/98 - P0 - SameChildrenFromInputPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  20/98 - P0 - SameChildrenPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  21/98 - P0 - ShapeBasedEditDistanceReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  22/98 - P0 - ShapeBasedIdentityPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  23/98 - P0 - ShapeBasedReshapeIsSqueezePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  24/98 - P0 - ShapeBasedSameChildrenPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  25/98 - P0 - ShapeBasedShapeShapeAddPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  26/98 - P0 - ShapeBasedStaticExpandPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  27/98 - P0 - ShapedBasedReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  28/98 - P0 - SoftmaxCrossEntropyLossCastPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  29/98 - P0 - SqueezeAddPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  30/98 - P0 - SqueezeBinaryUnsqueezePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  31/98 - P0 - SqueezeUnsqueezePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  32/98 - P0 - StaticConcatReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  33/98 - P0 - SwapExpandReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  34/98 - P0 - SwapExpandUnsqueezePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  35/98 - P0 - SwapUnaryPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  36/98 - P0 - SwapUnsqueezeTransposePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  37/98 - P0 - TransposeGatherPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  38/98 - P0 - TransposeReshapeTransposePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  39/98 - P0 - TransposeTransposePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  40/98 - P0 - UnsqueezeOrSqueezeReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  41/98 - P0 - UnsqueezeReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  42/98 - P0 - UnsqueezeUnsqueezePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  43/98 - P1 - CastCastBinaryPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  44/98 - P1 - CastLayerNormalizationCastPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  45/98 - P1 - CastOpCastPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  46/98 - P1 - ClipClipPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  47/98 - P1 - ConcatEmptyPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  48/98 - P1 - ConcatTwiceUnaryPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  49/98 - P1 - ConstantToInitializerPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  50/98 - P1 - DropoutPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  51/98 - P1 - ExpandBroadcastPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  52/98 - P1 - ExpandSwapPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  53/98 - P1 - FunctionCausalMaskMulAddPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  54/98 - P1 - FunctionCausalMaskPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  55/98 - P1 - FunctionCosSinCachePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  56/98 - P1 - FunctionHalfRotaryEmbeddingPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  57/98 - P1 - GathersSplitPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  58/98 - P1 - GemmTransposePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  59/98 - P1 - LayerNormalizationPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  60/98 - P1 - LayerNormalizationScalePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  61/98 - P1 - MatMulReshape2Of3Pattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  62/98 - P1 - MaxReluPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  63/98 - P1 - MulMulMatMulPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  64/98 - P1 - MulMulMulScalarPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  65/98 - P1 - NotNotPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  66/98 - P1 - NotWherePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  67/98 - P1 - RMSNormalizationMulPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  68/98 - P1 - RMSNormalizationPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  69/98 - P1 - ReduceArgTopKPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  70/98 - P1 - ReduceReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  71/98 - P1 - ReduceSumNormalizePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  72/98 - P1 - Reshape2Of3Pattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  73/98 - P1 - ReshapeMatMulReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  74/98 - P1 - ReshapeReshapeBinaryPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  75/98 - P1 - RotaryConcatPartPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  76/98 - P1 - RotaryEmbeddingPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  77/98 - P1 - SequenceConstructAtPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  78/98 - P1 - ShapeBasedConcatExpandPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  79/98 - P1 - ShapeBasedExpandBroadcastMatMulPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  80/98 - P1 - ShapeBasedExpandBroadcastPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  81/98 - P1 - ShapeBasedExpandCastWhereSwapPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  82/98 - P1 - ShapeBasedExpandSwapPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  83/98 - P1 - ShapeBasedMatMulToMulPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  84/98 - P1 - SliceSlicePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  85/98 - P1 - SlicesSplitPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  86/98 - P1 - SplitConcatPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  87/98 - P1 - SplitToSequenceSequenceAtPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  88/98 - P1 - Sub1MulPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  89/98 - P1 - SwapRangeAddScalarPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  90/98 - P1 - SwitchOrderBinaryPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  91/98 - P1 - SwitchReshapeActivationPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  92/98 - P1 - TransposeEqualReshapePattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  93/98 - P1 - TransposeMatMulPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  94/98 - P1 - TransposeReshapeMatMulPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  95/98 - P1 - UnsqueezeEqualPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  96/98 - P1 - WhereAddPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  97/98 - P2 - AttentionGQAPattern()
    [GraphBuilderPatternOptimization-UUQ.optimize] use pattern  98/98 - P3 - MatMulAddPattern()
    -- optimize starts with...
    
    opset: : 18
    init: p_layers_0_weight::T10: CP1: (10, 32)                            -- GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
    init: p_layers_2_weight::T10: CP1: (32, 1)                             -- GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)
    init: layers.0.bias: CP1: (32,)                                        -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: layers.2.bias: CP1: (1,)                                         -- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    input:: x                                                                       |T1: 3 x 10
    MatMul: x, p_layers_0_weight::T10 -> _onx_matmul_x                              |T1: 3 x 32
    Add: _onx_matmul_x, layers.0.bias -> linear                                     |T1: 3 x 32
    Relu: linear -> relu                                                            |T1: 3 x 32
    MatMul: relu, p_layers_2_weight::T10 -> _onx_matmul_relu                        |T1: 3 x 1
    Add: _onx_matmul_relu, layers.2.bias -> output_0                                |T1: 3 x 1
    output:: output_0                                                               |T1: 3 x 1
    -- starts optimization
    [GraphBuilderPatternOptimization-UUQ.optimize] same children={'SameChildrenPattern', 'SameChildrenFromInputPattern'}
    [GraphBuilderPatternOptimization-UUQ.optimize] iteration 0: 5 nodes, priority=0
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips CastLayerNormalizationCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips CastCastBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips CastOpCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ClipClipPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ConcatEmptyPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ConcatTwiceUnaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ConstantToInitializerPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips DropoutPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ExpandBroadcastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ExpandSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips GathersSplitPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 730:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [IdentityPattern.match] NONE - line: 772:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [GraphBuilderPatternOptimization-UUQ.optimize] skips LayerNormalizationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips LayerNormalizationScalePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [GraphBuilder-ZVK.make_tensor_input] x[0:None] -- marker=_build_pattern1_x
    [GraphBuilder-ZVK.set_type] x:0
    [GraphBuilder-ZVK.set_type] x:-1
    [GraphBuilder-ZVK.make_tensor_input] zero[0:None] -- marker=_build_pattern1_zero
    [GraphBuilder-ZVK.set_type] zero:0
    [GraphBuilder-ZVK.set_type] zero:-1
    [GraphBuilder-ZVK.make_tensor_input] slope[0:None] -- marker=_build_pattern1_slope
    [GraphBuilder-ZVK.set_type] slope:0
    [GraphBuilder-ZVK.set_type] slope:-1
    [GraphBuilder-ZVK.3.make_node] [tt:-] Greater: ['x', 'zero']->['_onx_greater_x']
    [GraphBuilder-ZVK.set_type] _onx_greater_x:9
    [GraphBuilder-ZVK.3.make_node] [tt:-] Mul: ['x', 'slope']->['_onx_mul_x']
    [GraphBuilder-ZVK.set_type] _onx_mul_x:-1
    [GraphBuilder-ZVK.3.make_node] [ttt:-] Where: ['_onx_greater_x', 'x', '_onx_mul_x']->['_onx_where_greater_x']
    [GraphBuilder-ZVK.set_type] _onx_where_greater_x:-1
    [GraphBuilder-ZVK.make_tensor_output] _onx_where_greater_x[0: None]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips MaxReluPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips MulMulMulScalarPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips NotNotPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips NotWherePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ReduceArgTopKPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ReduceReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ReduceSumNormalizePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ReshapeMatMulReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips Reshape2Of3Pattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ReshapeReshapeBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips GemmTransposePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips MatMulReshape2Of3Pattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips MulMulMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ShapeBasedConcatExpandPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ShapeBasedExpandBroadcastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ShapeBasedExpandBroadcastMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ShapeBasedExpandCastWhereSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ShapeBasedExpandSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips ShapeBasedMatMulToMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips RotaryEmbeddingPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips SequenceConstructAtPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips SplitToSequenceSequenceAtPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips SliceSlicePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips SlicesSplitPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [GraphBuilder-ZXG.make_tensor_input] X[0:None] -- marker=_build_pattern1_X
    [GraphBuilder-ZXG.set_type] X:0
    [GraphBuilder-ZXG.set_type] X:-1
    [GraphBuilder-ZXG.make_tensor_input] indices[0:None] -- marker=_build_pattern1_indices
    [GraphBuilder-ZXG.set_type] indices:0
    [GraphBuilder-ZXG.set_type] indices:-1
    [GraphBuilder-ZXG.make_tensor_input] axis[0:None] -- marker=_build_pattern1_axis
    [GraphBuilder-ZXG.set_type] axis:0
    [GraphBuilder-ZXG.set_type] axis:-1
    [GraphBuilder-ZXG.make_tensor_input] zerof[0:None] -- marker=_build_pattern1_zerof
    [GraphBuilder-ZXG.set_type] zerof:0
    [GraphBuilder-ZXG.set_type] zerof:-1
    [GraphBuilder-ZXG.make_tensor_input] zeroi[0:None] -- marker=_build_pattern1_zeroi
    [GraphBuilder-ZXG.set_type] zeroi:0
    [GraphBuilder-ZXG.set_type] zeroi:-1
    [GraphBuilder-ZXG.make_tensor_input] b[0:None] -- marker=_build_pattern1_b
    [GraphBuilder-ZXG.set_type] b:0
    [GraphBuilder-ZXG.set_type] b:-1
    [GraphBuilder-ZXG.3.make_node] [tt:-] Equal: ['indices', 'b']->['_onx_equal_indices']
    [GraphBuilder-ZXG.set_type] _onx_equal_indices:9
    [GraphBuilder-ZXG.3.make_node] [t:-] Not: ['_onx_equal_indices']->['_onx_not_equal_indices']
    [GraphBuilder-ZXG.set_type] _onx_not_equal_indices:9
    [GraphBuilder-ZXG.3.make_node] [ttt:-] Where: ['_onx_not_equal_indices', 'indices', 'zeroi']->['_onx_where_not_equal_indices']
    [GraphBuilder-ZXG.set_type] _onx_where_not_equal_indices:-1
    [GraphBuilder-ZXG.3.make_node] [tt:-] Unsqueeze: ['_onx_where_not_equal_indices', 'axis']->['_onx_where_not_equal_indices::UnSq']
    [GraphBuilder-ZXG.set_type] _onx_where_not_equal_indices::UnSq:-1
    [GraphBuilder-ZXG.3.make_node] [t:-] LogSoftmax: ['X']->['_onx_logsoftmax_X']
    [GraphBuilder-ZXG.set_type] _onx_logsoftmax_X:-1
    [GraphBuilder-ZXG.set_type] _onx_gatherelements_logsoftmax_X:-1
    [GraphBuilder-ZXG.3.make_node] [tt:t] GatherElements: ['_onx_logsoftmax_X', '_onx_where_not_equal_indices::UnSq']->['_onx_gatherelements_logsoftmax_X']
    [GraphBuilder-ZXG.set_type] _onx_gatherelements_logsoftmax_X:-1
    [GraphBuilder-ZXG.3.make_node] [tt:-] Squeeze: ['_onx_gatherelements_logsoftmax_X', 'axis']->['_onx_gatherelements_logsoftmax_X::Sq']
    [GraphBuilder-ZXG.set_type] _onx_gatherelements_logsoftmax_X::Sq:-1
    [GraphBuilder-ZXG.3.make_node] [t:-] Neg: ['_onx_gatherelements_logsoftmax_X::Sq']->['_onx_neg_gatherelements_logsoftmax_X::Sq']
    [GraphBuilder-ZXG.set_type] _onx_neg_gatherelements_logsoftmax_X::Sq:-1
    [GraphBuilder-ZXG.3.make_node] [ttt:-] Where: ['_onx_not_equal_indices', '_onx_neg_gatherelements_logsoftmax_X::Sq', 'zerof']->['_onx_where_not_equal_indices2']
    [GraphBuilder-ZXG.set_type] _onx_where_not_equal_indices2:-1
    [GraphBuilder-ZXG.3.make_node] [t:-] Cast: ['_onx_not_equal_indices']->['_onx_not_equal_indices::C1']
    [GraphBuilder-ZXG.set_type] _onx_not_equal_indices::C1:1
    [GraphBuilder-ZXG.3.make_node] [t:-] ReduceSum: ['_onx_not_equal_indices::C1']->['_onx_reducesum_not_equal_indices::C1']
    [GraphBuilder-ZXG.set_shape] _onx_reducesum_not_equal_indices::C1:()
    [GraphBuilder-ZXG.set_rank] _onx_reducesum_not_equal_indices::C1:0
    [GraphBuilder-ZXG.set_type] _onx_reducesum_not_equal_indices::C1:1
    [GraphBuilder-ZXG.3.make_node] [#:-] Cast: ['_onx_reducesum_not_equal_indices::C1']->['_onx_reducesum_not_equal_indices::C1::C10']
    [GraphBuilder-ZXG.set_type] _onx_reducesum_not_equal_indices::C1::C10:10
    [GraphBuilder-ZXG.set_shape] _onx_reducesum_not_equal_indices::C1::C10:()
    [GraphBuilder-ZXG.set_rank] _onx_reducesum_not_equal_indices::C1::C10:0
    [GraphBuilder-ZXG.3.make_node] [t:-] Cast: ['_onx_where_not_equal_indices2']->['_onx_where_not_equal_indices2::C1']
    [GraphBuilder-ZXG.set_type] _onx_where_not_equal_indices2::C1:1
    [GraphBuilder-ZXG.3.make_node] [t:-] ReduceSum: ['_onx_where_not_equal_indices2::C1']->['_onx_reducesum_where_not_equal_indices2::C1']
    [GraphBuilder-ZXG.set_shape] _onx_reducesum_where_not_equal_indices2::C1:()
    [GraphBuilder-ZXG.set_rank] _onx_reducesum_where_not_equal_indices2::C1:0
    [GraphBuilder-ZXG.set_type] _onx_reducesum_where_not_equal_indices2::C1:1
    [GraphBuilder-ZXG.3.make_node] [#:-] Cast: ['_onx_reducesum_where_not_equal_indices2::C1']->['_onx_reducesum_where_not_equal_indices2::C1::C10']
    [GraphBuilder-ZXG.set_type] _onx_reducesum_where_not_equal_indices2::C1::C10:10
    [GraphBuilder-ZXG.set_shape] _onx_reducesum_where_not_equal_indices2::C1::C10:()
    [GraphBuilder-ZXG.set_rank] _onx_reducesum_where_not_equal_indices2::C1::C10:0
    [GraphBuilder-ZXG.3.make_node] [##:-] Div: ['_onx_reducesum_where_not_equal_indices2::C1::C10', '_onx_reducesum_not_equal_indices::C1::C10']->['_onx_div_reducesum_where_not_equal_indices2::C1::C10']
    [GraphBuilder-ZXG.set_type] _onx_div_reducesum_where_not_equal_indices2::C1::C10:10
    [GraphBuilder-ZXG.set_shape] _onx_div_reducesum_where_not_equal_indices2::C1::C10:()
    [GraphBuilder-ZXG.set_rank] _onx_div_reducesum_where_not_equal_indices2::C1::C10:0
    [GraphBuilder-ZXG.make_tensor_output] _onx_div_reducesum_where_not_equal_indices2::C1::C10[0: None]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips SplitConcatPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips Sub1MulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips SwapRangeAddScalarPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips SwitchOrderBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips SwitchReshapeActivationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips TransposeEqualReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips TransposeMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips TransposeReshapeMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips UnsqueezeEqualPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips WhereAddPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips RotaryConcatPartPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips FunctionCausalMaskPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips FunctionCausalMaskMulAddPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips FunctionCosSinCachePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips FunctionHalfRotaryEmbeddingPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips RMSNormalizationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips RMSNormalizationMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] skips AttentionGQAPattern, pattern.priority=2, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0 - matching_step done 0
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0F0 - apply_step with 0 matches
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0F0 - done with 0 applied patterns
    [GraphBuilderPatternOptimization-UUQ.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0F0 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0F0 - remove_duplicated_shape done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0F0 - remove_identity
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 5
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 5 nodes in 0.00015922600050544133 seconds
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0F0 - remove_identity done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0F0 - remove_unused
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C0F0 - remove_unused done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-UUQ.optimize] it=0C1F0 - next
    [GraphBuilderPatternOptimization-UUQ.optimize] iteration 1: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [CastOpCastPattern.match] NONE - line: 454:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [CastOpCastPattern.match] NONE - line: 451:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 730:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [IdentityPattern.match] NONE - line: 772:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [GraphBuilderPatternOptimization-UUQ.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=1, priorities[current_priority_index]=1 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] skips AttentionGQAPattern, pattern.priority=2, current_priority_index=1, priorities[current_priority_index]=1 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0 - matching_step done 0
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0F0 - apply_step with 0 matches
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0F0 - done with 0 applied patterns
    [GraphBuilderPatternOptimization-UUQ.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0F0 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0F0 - remove_duplicated_shape done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0F0 - remove_identity
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 5
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 5 nodes in 0.00017109899999923073 seconds
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0F0 - remove_identity done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0F0 - remove_unused
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C0F0 - remove_unused done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] increase priority to 2
    [GraphBuilderPatternOptimization-UUQ.optimize] it=1C1F0 - next
    [GraphBuilderPatternOptimization-UUQ.optimize] iteration 2: 5 nodes, priority=2
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [CastOpCastPattern.match] NONE - line: 454:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [CastOpCastPattern.match] NONE - line: 451:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 730:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [IdentityPattern.match] NONE - line: 772:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [GraphBuilderPatternOptimization-UUQ.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=2, priorities[current_priority_index]=2 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0 - matching_step done 0
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0F0 - apply_step with 0 matches
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0F0 - done with 0 applied patterns
    [GraphBuilderPatternOptimization-UUQ.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0F0 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0F0 - remove_duplicated_shape done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0F0 - remove_identity
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 5
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 5 nodes in 0.0001176429996121442 seconds
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0F0 - remove_identity done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0F0 - remove_unused
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C0F0 - remove_unused done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] increase priority to 3
    [GraphBuilderPatternOptimization-UUQ.optimize] it=2C1F0 - next
    [GraphBuilderPatternOptimization-UUQ.optimize] iteration 3: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [CastOpCastPattern.match] NONE - line: 454:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [CastOpCastPattern.match] NONE - line: 451:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 730:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [IdentityPattern.match] NONE - line: 772:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatchResult.match] MATCH MatMulAddPattern with 2 nodes and types ['MatMul', 'Add'] - []
    [GraphBuilderPatternOptimization-UUQ.optimize] match=MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
    [MatchResult.match] MATCH MatMulAddPattern with 2 nodes and types ['MatMul', 'Add'] - []
    [GraphBuilderPatternOptimization-UUQ.optimize] match=MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
    [TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C0 - matching_step done 2
    [GraphBuilderPatternOptimization-UUQ.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.011 | max_time=FunctionAttentionPattern:0.004
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C0F1 - apply_step with 2 matches
    [GraphBuilderPatternOptimization-UUQ.optimize] apply MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'], inputs: ['x', 'p_layers_0_weight::T10', '_onx_matmul_x', 'layers.0.bias'], outputs: ['_onx_matmul_x', 'linear']
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
      - MatMul: ['x', 'p_layers_0_weight::T10'] -> ['_onx_matmul_x']
      - Add: ['_onx_matmul_x', 'layers.0.bias'] -> ['linear']
      + Gemm: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
    [GraphBuilder-UUQ.set_type] linear:1
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'] applied.
    [GraphBuilderPatternOptimization-UUQ.optimize] - add ['Gemm']
    [GraphBuilderPatternOptimization-UUQ.optimize] done MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']: -2 +1 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] removed outputs {'_onx_matmul_x'}
    [GraphBuilderPatternOptimization-UUQ.optimize] apply MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'], inputs: ['relu', 'p_layers_2_weight::T10', '_onx_matmul_relu', 'layers.2.bias'], outputs: ['_onx_matmul_relu', 'output_0']
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
      - MatMul: ['relu', 'p_layers_2_weight::T10'] -> ['_onx_matmul_relu']
      - Add: ['_onx_matmul_relu', 'layers.2.bias'] -> ['output_0']
      + Gemm: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
    [GraphBuilder-UUQ.set_type] output_0:1
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'] applied.
    [GraphBuilderPatternOptimization-UUQ.optimize] - add ['Gemm']
    [GraphBuilderPatternOptimization-UUQ.optimize] done MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']: -2 +1 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] removed outputs {'_onx_matmul_relu'}
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C1F1 - done with 2 applied patterns
    [GraphBuilderPatternOptimization-UUQ.optimize] done all: -4 +2 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C1F1 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C1F1 - remove_duplicated_shape done -4 +2 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C1F1 - remove_identity
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 3
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 3 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 3 nodes in 0.00011734300005628029 seconds
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C1F1 - remove_identity done -4 +2 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C1F1 - remove_unused
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C1F1 - remove_unused done -4 +2 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=3C1F1 - next
    [GraphBuilderPatternOptimization-UUQ.optimize] iteration 4: 3 nodes, priority=3
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 130:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--, inputs=x,p_layers_0_weight::T10,layers.0.bias
    [MatMulAddPattern.match] NONE - line: 127:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--2, inputs=relu,p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [MatchResult.match] MATCH GemmTransposePattern with 1 nodes and types ['Gemm'] - []
    [GraphBuilderPatternOptimization-UUQ.optimize] match=MatchResult: GemmTransposePattern replaces ['Gemm']
    [MatchResult.match] MATCH GemmTransposePattern with 1 nodes and types ['Gemm'] - []
    [GraphBuilderPatternOptimization-UUQ.optimize] match=MatchResult: GemmTransposePattern replaces ['Gemm']
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--, inputs=x,p_layers_0_weight::T10,layers.0.bias
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--2, inputs=relu,p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C0 - matching_step done 2
    [GraphBuilderPatternOptimization-UUQ.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.003 | max_time=MatMulAddPattern:0.000
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C0F1 - apply_step with 2 matches
    [GraphBuilderPatternOptimization-UUQ.optimize] apply MatchResult: GemmTransposePattern replaces ['Gemm'], inputs: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'], outputs: ['linear']
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=Transpose
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm']
      - Gemm: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
      + Transpose: ['p_layers_0_weight::T10'] -> ['GemmTransposePattern--p_layers_0_weight::T10']
      + Gemm: ['x', 'GemmTransposePattern--p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=Transpose
    [GraphBuilder-UUQ.set_type] GemmTransposePattern--p_layers_0_weight::T10:1
    [GraphBuilder-UUQ.set_shape] GemmTransposePattern--p_layers_0_weight::T10:(32, 10)
    [GraphBuilder-UUQ.set_rank] GemmTransposePattern--p_layers_0_weight::T10:2
    [GraphBuilder-UUQ.set_type] linear:1
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm'] applied.
    [GraphBuilderPatternOptimization-UUQ.optimize] - add ['Transpose', 'Gemm']
    [GraphBuilderPatternOptimization-UUQ.optimize] done MatchResult: GemmTransposePattern replaces ['Gemm']: -1 +2 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] apply MatchResult: GemmTransposePattern replaces ['Gemm'], inputs: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'], outputs: ['output_0']
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Transpose
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm']
      - Gemm: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
      + Transpose: ['p_layers_2_weight::T10'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
      + Gemm: ['relu', 'GemmTransposePattern--p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Transpose
    [GraphBuilder-UUQ.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
    [GraphBuilder-UUQ.set_shape] GemmTransposePattern--p_layers_2_weight::T10:(1, 32)
    [GraphBuilder-UUQ.set_rank] GemmTransposePattern--p_layers_2_weight::T10:2
    [GraphBuilder-UUQ.set_type] output_0:1
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm'] applied.
    [GraphBuilderPatternOptimization-UUQ.optimize] - add ['Transpose', 'Gemm']
    [GraphBuilderPatternOptimization-UUQ.optimize] done MatchResult: GemmTransposePattern replaces ['Gemm']: -1 +2 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C1F1 - done with 2 applied patterns
    [GraphBuilderPatternOptimization-UUQ.optimize] done all: -2 +4 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C1F1 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C1F1 - remove_duplicated_shape done -2 +4 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C1F1 - remove_identity
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 5
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 5 nodes in 0.00014761400052520912 seconds
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C1F1 - remove_identity done -2 +4 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C1F1 - remove_unused
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C1F1 - remove_unused done -2 +4 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=4C1F1 - next
    [GraphBuilderPatternOptimization-UUQ.optimize] iteration 5: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 649:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [IdentityPattern.match] NONE - line: 649:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 130:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [MatMulAddPattern.match] NONE - line: 127:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [GemmTransposePattern.match] NONE - line: 405:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [GemmTransposePattern.match] NONE - line: 405:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [ShapeBasedIdentityPattern.match] NONE - line: 880:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [ShapeBasedIdentityPattern.match] NONE - line: 880:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [SwapUnaryPattern.match] NONE - line: 983:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [SwapUnaryPattern.match] NONE - line: 983:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [SwapUnsqueezeTransposePattern.match] NONE - line: 715:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [SwapUnsqueezeTransposePattern.match] NONE - line: 715:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [TransposeEqualReshapePattern.match] NONE - line: 493:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [MatchResult.match] MATCH TransposeEqualReshapePattern with 1 nodes and types ['Transpose'] - []
    [GraphBuilderPatternOptimization-UUQ.optimize] match=MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1231:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [TransposeMatMulPattern.match] NONE - line: 1231:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [TransposeReshapeTransposePattern.match] NONE - line: 245:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [TransposeReshapeTransposePattern.match] NONE - line: 245:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [TransposeTransposePattern.match] NONE - line: 99:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [TransposeTransposePattern.match] NONE - line: 99:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C0 - matching_step done 1
    [GraphBuilderPatternOptimization-UUQ.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.005 | max_time=SoftmaxCrossEntropyLossCastPattern:0.000
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C0F1 - apply_step with 1 matches
    [GraphBuilderPatternOptimization-UUQ.optimize] apply MatchResult: TransposeEqualReshapePattern replaces ['Transpose'], inputs: ['p_layers_2_weight::T10'], outputs: ['GemmTransposePattern--p_layers_2_weight::T10']
    [GraphBuilder-UUQ.set_shape] init7_s2_1_32:(2,)
    [GraphBuilder-UUQ.set_rank] init7_s2_1_32:1
    [GraphBuilder-UUQ.set_type] init7_s2_1_32:7
    [GraphBuilder-UUQ.make_initializer] init7_s2_1_32[7:(2,)]
    [GraphBuilder-UUQ.update_node_constant] new constant 'init7_s2_1_32', node=None
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Reshape
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
      - Transpose: ['p_layers_2_weight::T10'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
      + Reshape: ['p_layers_2_weight::T10', 'init7_s2_1_32'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Reshape
    [GraphBuilder-UUQ.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
    [GraphBuilder-UUQ.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
    [GraphBuilderPatternOptimization-UUQ.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] applied.
    [GraphBuilderPatternOptimization-UUQ.optimize] - add ['Reshape']
    [GraphBuilderPatternOptimization-UUQ.optimize] done MatchResult: TransposeEqualReshapePattern replaces ['Transpose']: -1 +1 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C1F1 - done with 1 applied patterns
    [GraphBuilderPatternOptimization-UUQ.optimize] done all: -1 +1 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C1F1 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C1F1 - remove_duplicated_shape done -1 +1 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C1F1 - remove_identity
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 5
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 5 nodes in 0.00010745700001280056 seconds
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C1F1 - remove_identity done -1 +1 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C1F1 - remove_unused
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C1F1 - remove_unused done -1 +1 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=5C1F1 - next
    [GraphBuilderPatternOptimization-UUQ.optimize] iteration 6: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [ConcatReshapePattern.match] NONE - line: 1079:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 649:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [ReshapePattern.match] NONE - line: 42:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 130:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [MatMulAddPattern.match] NONE - line: 127:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [GemmTransposePattern.match] NONE - line: 405:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [GemmTransposePattern.match] NONE - line: 405:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [ShapeBasedReshapeIsSqueezePattern.match] NONE - line: 1689:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [ShapeBasedEditDistanceReshapePattern.match] NONE - line: 1538:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [ShapeBasedIdentityPattern.match] NONE - line: 880:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [ShapedBasedReshapePattern.match] NONE - line: 121:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [ReshapeReshapePattern.match] NONE - line: 352:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [StaticConcatReshapePattern.match] NONE - line: 1256:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [SwapExpandReshapePattern.match] NONE - line: 1724:yobx.xoptim.patterns.onnx_expand, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [SwapUnaryPattern.match] NONE - line: 983:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [SwapUnaryPattern.match] NONE - line: 983:yobx.xoptim.patterns.onnx_any, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [SwapUnsqueezeTransposePattern.match] NONE - line: 715:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [TransposeEqualReshapePattern.match] NONE - line: 493:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1231:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [TransposeReshapeTransposePattern.match] NONE - line: 245:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [TransposeTransposePattern.match] NONE - line: 99:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [UnsqueezeOrSqueezeReshapePattern.match] NONE - line: 1923:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [UnsqueezeReshapePattern.match] NONE - line: 1796:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0 - matching_step done 0
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0F0 - apply_step with 0 matches
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0F0 - done with 0 applied patterns
    [GraphBuilderPatternOptimization-UUQ.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0F0 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0F0 - remove_duplicated_shape done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0F0 - remove_identity
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 5
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 5 nodes in 0.00014310699953057338 seconds
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0F0 - remove_identity done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0F0 - remove_unused
    [GraphBuilderPatternOptimization-UUQ.optimize] it=6C0F0 - remove_unused done -0 +0 nodes
    [GraphBuilderPatternOptimization-UUQ.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-UUQ.optimize] done after 7 iterations with 5 nodes in 0.086
        STAT apply_GemmTransposePattern +4 -2 #it=1 maxmatch=1 i=2 - time=0.0025873079994198633
        STAT apply_MatMulAddPattern +2 -4 #it=1 maxmatch=1 i=2 - time=0.001986579000003985
        STAT apply_TransposeEqualReshapePattern +1 -1 #it=1 maxmatch=0 i=1 - time=0.0011033079999833717
        STAT build_graph_for_pattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0010290670006725122
        STAT check_pattern_00 +0 -0 #it=1 maxmatch=0 i=0 - time=9.849100024439394e-05
        STAT check_pattern_A10 +0 -0 #it=3 maxmatch=0 i=0 - time=1.9268999494670425e-05
        STAT check_pattern_A20 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0009831639999902109
        STAT check_pattern_BD0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0008387780007979018
        STAT check_pattern_BI0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0006436960011342308
        STAT check_pattern_BUS0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0006156519993965048
        STAT insert_and_remove_nodes +0 -0 #it=0 maxmatch=0 i=0 - time=0.002682608000213804
        STAT iteration_0 +0 -0 #it=1 maxmatch=0 i=0 - time=0.023111839000193868
        STAT iteration_1 +0 -0 #it=1 maxmatch=0 i=0 - time=0.007610972000293259
        STAT iteration_2 +0 -0 #it=1 maxmatch=0 i=0 - time=0.010470103999978164
        STAT iteration_3 +0 -0 #it=1 maxmatch=0 i=0 - time=0.017805355999371386
        STAT iteration_4 +0 -0 #it=1 maxmatch=0 i=0 - time=0.009131378999882145
        STAT iteration_5 +0 -0 #it=1 maxmatch=0 i=0 - time=0.009869566999441304
        STAT match_AttentionGQAPattern +0 -0 #it=5 maxmatch=2 i=0 - time=0.000118566998935421
        STAT match_BatchNormalizationPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0004154540010858909
        STAT match_BatchNormalizationTrainingPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00020817299991904292
        STAT match_CastCastBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0006397309998646961
        STAT match_CastCastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0002071959988825256
        STAT match_CastLayerNormalizationCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00020809100169572048
        STAT match_CastOpCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0005632290003632079
        STAT match_CastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00019877599970641313
        STAT match_ClipClipPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00016012000014598016
        STAT match_ConcatEmptyPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0002266100009364891
        STAT match_ConcatGatherPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00019352100025571417
        STAT match_ConcatReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00028873199971712893
        STAT match_ConcatTwiceUnaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001793769988580607
        STAT match_ConstantToInitializerPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001906300012706197
        STAT match_ConvBiasNullPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00023267199867404997
        STAT match_DropoutPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001427449997208896
        STAT match_ExpandBroadcastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00022996400002739392
        STAT match_ExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0003182799991918728
        STAT match_ExpandSwapPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00014828499934083084
        STAT match_ExpandUnsqueezeExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00016460100050608162
        STAT match_FunctionAttentionGQAPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00029265099965414265
        STAT match_FunctionAttentionPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.004754586998387822
        STAT match_FunctionCausalMaskMulAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0004079489981450024
        STAT match_FunctionCausalMaskPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00018986400027642958
        STAT match_FunctionCosSinCachePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00014747500244993716
        STAT match_FunctionHalfRotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016214599872910185
        STAT match_GathersSplitPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00021581799865089124
        STAT match_GeluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=8.492000051774085e-05
        STAT match_GemmTransposePattern +0 -0 #it=6 maxmatch=2 i=2 - time=0.0006385269998645526
        STAT match_IdentityPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0031763790002514725
        STAT match_LayerNormalizationPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00017838500025391113
        STAT match_LayerNormalizationScalePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00013619199944514548
        STAT match_LeakyReluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.006065037998268963
        STAT match_MatMulAddPattern +0 -0 #it=4 maxmatch=2 i=2 - time=0.0011170280013175216
        STAT match_MatMulReshape2Of3Pattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00048357699961343314
        STAT match_MaxReluPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00014370400185725885
        STAT match_MulMulMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00032005800039769383
        STAT match_MulMulMulScalarPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00019407100080570672
        STAT match_MulUnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00019753099877561908
        STAT match_NotNotPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00013650300024892204
        STAT match_NotWherePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001320589990427834
        STAT match_PadConvPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00017515599938633386
        STAT match_RMSNormalizationMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013277700054459274
        STAT match_RMSNormalizationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00019401100053073606
        STAT match_ReduceArgTopKPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00022764400000596652
        STAT match_ReduceReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00020375300118757877
        STAT match_ReduceSumNormalizePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00017468399892095476
        STAT match_Reshape2Of3Pattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0005864000013389159
        STAT match_ReshapeMatMulReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00036803300008614315
        STAT match_ReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00029851900035282597
        STAT match_ReshapeReshapeBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0005683910012521665
        STAT match_ReshapeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00020679600038420176
        STAT match_RotaryConcatPartPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00017679800021142
        STAT match_RotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00028224100151419407
        STAT match_SameChildrenFromInputPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00034861300082411617
        STAT match_SameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0006026319988450268
        STAT match_SequenceConstructAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016123800014611334
        STAT match_ShapeBasedConcatExpandPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001780280008460977
        STAT match_ShapeBasedEditDistanceReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0004987520005670376
        STAT match_ShapeBasedExpandBroadcastMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00035460600065562176
        STAT match_ShapeBasedExpandBroadcastPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0007411999977193773
        STAT match_ShapeBasedExpandCastWhereSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00023009199867374264
        STAT match_ShapeBasedExpandSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0005243860005066381
        STAT match_ShapeBasedIdentityPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0002803660008794395
        STAT match_ShapeBasedMatMulToMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00032428100166725926
        STAT match_ShapeBasedReshapeIsSqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00027061099990532966
        STAT match_ShapeBasedSameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019254400012869155
        STAT match_ShapeBasedShapeShapeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0004916360012430232
        STAT match_ShapeBasedStaticExpandPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00028270100119698327
        STAT match_ShapedBasedReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0003365399988979334
        STAT match_SliceSlicePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001340180006081937
        STAT match_SlicesSplitPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001340650014753919
        STAT match_SoftmaxCrossEntropyLossCastPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.011647823999737739
        STAT match_SplitConcatPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00019596200036176015
        STAT match_SplitToSequenceSequenceAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013796799794363324
        STAT match_SqueezeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0005424200007837499
        STAT match_SqueezeBinaryUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0002515889991627773
        STAT match_SqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019516399970598286
        STAT match_StaticConcatReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00021540699981414946
        STAT match_Sub1MulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013909499921282986
        STAT match_SwapExpandReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00032802399891807
        STAT match_SwapExpandUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00032560399904468795
        STAT match_SwapRangeAddScalarPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001414420012224582
        STAT match_SwapUnaryPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0004152530018473044
        STAT match_SwapUnsqueezeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0002728850013227202
        STAT match_SwitchOrderBinaryPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002557639991209726
        STAT match_SwitchReshapeActivationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0005496920020959806
        STAT match_TransposeEqualReshapePattern +0 -0 #it=6 maxmatch=2 i=1 - time=0.0004341889998613624
        STAT match_TransposeGatherPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00017745300010574283
        STAT match_TransposeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.001072580999789352
        STAT match_TransposeReshapeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00031847200079937465
        STAT match_TransposeReshapeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0002599059989734087
        STAT match_TransposeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00036651300069934223
        STAT match_UnsqueezeEqualPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016100099946925184
        STAT match_UnsqueezeOrSqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00026689500009524636
        STAT match_UnsqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00021884499892621534
        STAT match_UnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00023759499890729785
        STAT match_WhereAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016358900029445067
        STAT remove_duplicated_shape +0 -0 #it=7 maxmatch=0 i=0 - time=0.00011730899950634921
        STAT remove_identity_nodes +0 -0 #it=7 maxmatch=0 i=0 - time=0.006442560000323283
        STAT remove_unused +0 -0 #it=7 maxmatch=0 i=0 - time=0.00598498900035338
    --MODEL: 5 nodes, 1 inputs, 1 outputs, 5 initializers--
             INPUT:   1 x 1t
         INPUT-SEQ:   1 x Falset
            OUTPUT:   1 x 1t
        OUTPUT-SEQ:   1 x Falset
              INIT:   4 x 1t
              INIT:   1 x 7t
              NODE:   2 x Gemm
              NODE:   1 x Relu
              NODE:   1 x Reshape
              NODE:   1 x Transpose
    --MODEL: 5 nodes, 1 inputs, 1 outputs, 5 initializers--DETAILED--
         INPUT:   1 x 1t[3x10]
        OUTPUT:   1 x 1t[3x1]
          INIT:   1 x 1t[10x32]
          INIT:   1 x 1t[1]
          INIT:   1 x 1t[32]
          INIT:   1 x 1t[32x1]
          INIT:   1 x 7t[2]
          NODE:   1 x Gemm -SIG- 1t[3x10], 1t[32x10], 1t[32]
          NODE:   1 x Gemm -SIG- 1t[3x32], 1t[1x32], 1t[1]
          NODE:   1 x Relu -SIG- 1t[3x32]
          NODE:   1 x Reshape -SIG- 1t[32x1], 7t[2]
          NODE:   1 x Transpose -SIG- 1t[10x32]-perm=1;0
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 5
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 5 nodes in 0.00012231900018377928 seconds
    [GraphBuilder-UUQ.constant_folding] -- starts with 7 constants and 5 nodes.
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: init7_s2_1_32
    [GraphBuilder-UUQ.constant_folding] cst:: . :: x
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: layers.0.bias
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: GemmTransposePattern--p_layers_0_weight::T10
    [GraphBuilder-UUQ.constant_folding] cst:: . :: output_0
    [GraphBuilder-UUQ.constant_folding] cst:: . :: _onx_matmul_x
    [GraphBuilder-UUQ.constant_folding] cst:: . :: _onx_matmul_relu
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: p_layers_2_weight::T10
    [GraphBuilder-UUQ.constant_folding] cst:: . :: linear
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: p_layers_0_weight::T10
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: layers.2.bias
    [GraphBuilder-UUQ.constant_folding] cst:: . :: relu
    [GraphBuilder-UUQ.constant_folding] cst:: 1 :: GemmTransposePattern--p_layers_2_weight::T10
    [GraphBuilder-UUQ.constant_folding] initializer: p_layers_0_weight::T10
    [GraphBuilder-UUQ.constant_folding] initializer: p_layers_2_weight::T10
    [GraphBuilder-UUQ.constant_folding] initializer: layers.0.bias
    [GraphBuilder-UUQ.constant_folding] initializer: layers.2.bias
    [GraphBuilder-UUQ.constant_folding] from: Transpose(GemmTransposePattern--p_layers_0_weight::T10)
    [GraphBuilder-UUQ.set_type] GemmTransposePattern--p_layers_0_weight::T10:1
    [GraphBuilder-UUQ.make_initializer] GemmTransposePattern--p_layers_0_weight::T10[1:(32, 10)]
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=None
    [GraphBuilder-UUQ.constant_folding] fold_constant:Transpose:GemmTransposePattern--p_layers_0_weight::T10[float32:(32, 10)]:from:p_layers_0_weight::T10
    [GraphBuilder-UUQ.constant_folding] from: Reshape(GemmTransposePattern--p_layers_2_weight::T10)
    [GraphBuilder-UUQ.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
    [GraphBuilder-UUQ.make_initializer] GemmTransposePattern--p_layers_2_weight::T10[1:(1, 32)]
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=None
    [GraphBuilder-UUQ.constant_folding] fold_constant:Reshape:GemmTransposePattern--p_layers_2_weight::T10[float32:(1, 32)]:from:init7_s2_1_32,p_layers_2_weight::T10
    [GraphBuilder-UUQ.constant_folding] initializer: init7_s2_1_32
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=None
    [GraphBuilder-UUQ.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=None
    [GraphBuilder-UUQ.constant_folding] ends with 7 constants and 3 nodes in 0.0017203289999088156 seconds
    [GraphBuilder-UUQ.remove_unused] remove_initializer 1:0/7:p_layers_0_weight::T10
    [GraphBuilder-UUQ.remove_unused] remove_initializer 2:1/7:p_layers_2_weight::T10
    [GraphBuilder-UUQ.remove_unused] remove_initializer 3:4/7:init7_s2_1_32:int64[(2,)]
    [GraphBuilder-UUQ.remove_identity_nodes] -- starts with 3
    [GraphBuilder-UUQ.remove_identity_nodes] found 0 replacements
    [GraphBuilder-UUQ.remove_identity_nodes] kept 3 nodes
    [GraphBuilder-UUQ.remove_identity_nodes] ends with 3 nodes in 0.0001828479998948751 seconds
    [OrderOptimization.optimize] ALGO-2
    [OrderOptimization.shape_order] -- starts with 3 nodes, 4 initializers
    [OrderOptimization.shape_order] done after in 0.00018567100050859153s with changed=0 scale=0
    [GraphBuilder-UUQ.optimize] done with 3 nodes in 0.109
        STAT apply_GemmTransposePattern +4 -2 #it=1 maxmatch=1 i=2 - time=0.0025873079994198633
        STAT apply_MatMulAddPattern +2 -4 #it=1 maxmatch=1 i=2 - time=0.001986579000003985
        STAT apply_TransposeEqualReshapePattern +1 -1 #it=1 maxmatch=0 i=1 - time=0.0011033079999833717
        STAT apply_constant_folding__Reshape +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
        STAT apply_constant_folding__Transpose +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
        STAT apply_constant_folding_new_inits +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
        STAT build_graph_for_pattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0010290670006725122
        STAT check_A-dynamic_dimension_naming +0 -0 #it=0 maxmatch=0 i=0 - time=9.445100022276165e-05
        STAT check_A-opt-sub +0 -0 #it=0 maxmatch=0 i=0 - time=0.00010527199992793612
        STAT check_constant_folding-2 +0 -0 #it=0 maxmatch=0 i=0 - time=0.000107326000033936
        STAT check_constant_folding-7 +0 -0 #it=0 maxmatch=0 i=0 - time=0.0001433890001862892
        STAT check_order-12 +0 -0 #it=0 maxmatch=0 i=0 - time=5.988100019749254e-05
        STAT check_orderA +0 -0 #it=0 maxmatch=0 i=0 - time=7.732299945928389e-05
        STAT check_orderL +0 -0 #it=0 maxmatch=0 i=0 - time=5.052800042903982e-05
        STAT check_pattern_00 +0 -0 #it=1 maxmatch=0 i=0 - time=9.849100024439394e-05
        STAT check_pattern_A10 +0 -0 #it=3 maxmatch=0 i=0 - time=1.9268999494670425e-05
        STAT check_pattern_A20 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0009831639999902109
        STAT check_pattern_BD0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0008387780007979018
        STAT check_pattern_BI0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0006436960011342308
        STAT check_pattern_BUS0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0006156519993965048
        STAT check_patterns-4 +0 -0 #it=0 maxmatch=0 i=0 - time=0.00011930100026802393
        STAT check_remove_duplicated_initializer-9 +0 -0 #it=0 maxmatch=0 i=0 - time=8.737799998925766e-05
        STAT check_remove_identity-0 +0 -0 #it=0 maxmatch=0 i=0 - time=0.00011407600050006295
        STAT check_remove_identity-10 +0 -0 #it=0 maxmatch=0 i=0 - time=7.857399941713084e-05
        STAT check_remove_identity-6 +0 -0 #it=0 maxmatch=0 i=0 - time=7.775999984005466e-05
        STAT check_remove_unused-1 +0 -0 #it=0 maxmatch=0 i=0 - time=0.00011053400066884933
        STAT check_remove_unused-11 +0 -0 #it=0 maxmatch=0 i=0 - time=7.387700043182122e-05
        STAT check_remove_unused-3 +0 -0 #it=0 maxmatch=0 i=0 - time=0.00012841099942306755
        STAT check_remove_unused-5 +0 -0 #it=0 maxmatch=0 i=0 - time=0.00038548600059584714
        STAT check_remove_unused-8 +0 -0 #it=0 maxmatch=0 i=0 - time=0.00010204599948338
        STAT constant_folding +0 -2 #it=0 maxmatch=0 i=0 - time=0.003540284999871801
        STAT dynamic_dimension_naming +0 -0 #it=0 maxmatch=0 i=0 - time=0.00014117099999566562
        STAT insert_and_remove_nodes +0 -0 #it=0 maxmatch=0 i=0 - time=0.002682608000213804
        STAT iteration_0 +0 -0 #it=1 maxmatch=0 i=0 - time=0.023111839000193868
        STAT iteration_1 +0 -0 #it=1 maxmatch=0 i=0 - time=0.007610972000293259
        STAT iteration_2 +0 -0 #it=1 maxmatch=0 i=0 - time=0.010470103999978164
        STAT iteration_3 +0 -0 #it=1 maxmatch=0 i=0 - time=0.017805355999371386
        STAT iteration_4 +0 -0 #it=1 maxmatch=0 i=0 - time=0.009131378999882145
        STAT iteration_5 +0 -0 #it=1 maxmatch=0 i=0 - time=0.009869566999441304
        STAT match_AttentionGQAPattern +0 -0 #it=5 maxmatch=2 i=0 - time=0.000118566998935421
        STAT match_BatchNormalizationPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0004154540010858909
        STAT match_BatchNormalizationTrainingPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00020817299991904292
        STAT match_CastCastBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0006397309998646961
        STAT match_CastCastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0002071959988825256
        STAT match_CastLayerNormalizationCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00020809100169572048
        STAT match_CastOpCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0005632290003632079
        STAT match_CastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00019877599970641313
        STAT match_ClipClipPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00016012000014598016
        STAT match_ConcatEmptyPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0002266100009364891
        STAT match_ConcatGatherPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00019352100025571417
        STAT match_ConcatReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00028873199971712893
        STAT match_ConcatTwiceUnaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001793769988580607
        STAT match_ConstantToInitializerPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001906300012706197
        STAT match_ConvBiasNullPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00023267199867404997
        STAT match_DropoutPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001427449997208896
        STAT match_ExpandBroadcastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00022996400002739392
        STAT match_ExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0003182799991918728
        STAT match_ExpandSwapPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00014828499934083084
        STAT match_ExpandUnsqueezeExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00016460100050608162
        STAT match_FunctionAttentionGQAPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00029265099965414265
        STAT match_FunctionAttentionPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.004754586998387822
        STAT match_FunctionCausalMaskMulAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0004079489981450024
        STAT match_FunctionCausalMaskPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00018986400027642958
        STAT match_FunctionCosSinCachePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00014747500244993716
        STAT match_FunctionHalfRotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016214599872910185
        STAT match_GathersSplitPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00021581799865089124
        STAT match_GeluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=8.492000051774085e-05
        STAT match_GemmTransposePattern +0 -0 #it=6 maxmatch=2 i=2 - time=0.0006385269998645526
        STAT match_IdentityPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0031763790002514725
        STAT match_LayerNormalizationPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00017838500025391113
        STAT match_LayerNormalizationScalePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00013619199944514548
        STAT match_LeakyReluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.006065037998268963
        STAT match_MatMulAddPattern +0 -0 #it=4 maxmatch=2 i=2 - time=0.0011170280013175216
        STAT match_MatMulReshape2Of3Pattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00048357699961343314
        STAT match_MaxReluPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00014370400185725885
        STAT match_MulMulMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00032005800039769383
        STAT match_MulMulMulScalarPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00019407100080570672
        STAT match_MulUnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00019753099877561908
        STAT match_NotNotPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00013650300024892204
        STAT match_NotWherePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001320589990427834
        STAT match_PadConvPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00017515599938633386
        STAT match_RMSNormalizationMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013277700054459274
        STAT match_RMSNormalizationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00019401100053073606
        STAT match_ReduceArgTopKPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00022764400000596652
        STAT match_ReduceReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00020375300118757877
        STAT match_ReduceSumNormalizePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00017468399892095476
        STAT match_Reshape2Of3Pattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0005864000013389159
        STAT match_ReshapeMatMulReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00036803300008614315
        STAT match_ReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00029851900035282597
        STAT match_ReshapeReshapeBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0005683910012521665
        STAT match_ReshapeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00020679600038420176
        STAT match_RotaryConcatPartPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00017679800021142
        STAT match_RotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00028224100151419407
        STAT match_SameChildrenFromInputPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00034861300082411617
        STAT match_SameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0006026319988450268
        STAT match_SequenceConstructAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016123800014611334
        STAT match_ShapeBasedConcatExpandPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001780280008460977
        STAT match_ShapeBasedEditDistanceReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0004987520005670376
        STAT match_ShapeBasedExpandBroadcastMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00035460600065562176
        STAT match_ShapeBasedExpandBroadcastPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0007411999977193773
        STAT match_ShapeBasedExpandCastWhereSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00023009199867374264
        STAT match_ShapeBasedExpandSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0005243860005066381
        STAT match_ShapeBasedIdentityPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0002803660008794395
        STAT match_ShapeBasedMatMulToMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00032428100166725926
        STAT match_ShapeBasedReshapeIsSqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00027061099990532966
        STAT match_ShapeBasedSameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019254400012869155
        STAT match_ShapeBasedShapeShapeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0004916360012430232
        STAT match_ShapeBasedStaticExpandPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00028270100119698327
        STAT match_ShapedBasedReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0003365399988979334
        STAT match_SliceSlicePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001340180006081937
        STAT match_SlicesSplitPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001340650014753919
        STAT match_SoftmaxCrossEntropyLossCastPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.011647823999737739
        STAT match_SplitConcatPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00019596200036176015
        STAT match_SplitToSequenceSequenceAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013796799794363324
        STAT match_SqueezeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0005424200007837499
        STAT match_SqueezeBinaryUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0002515889991627773
        STAT match_SqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019516399970598286
        STAT match_StaticConcatReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00021540699981414946
        STAT match_Sub1MulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013909499921282986
        STAT match_SwapExpandReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00032802399891807
        STAT match_SwapExpandUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00032560399904468795
        STAT match_SwapRangeAddScalarPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001414420012224582
        STAT match_SwapUnaryPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0004152530018473044
        STAT match_SwapUnsqueezeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0002728850013227202
        STAT match_SwitchOrderBinaryPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002557639991209726
        STAT match_SwitchReshapeActivationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0005496920020959806
        STAT match_TransposeEqualReshapePattern +0 -0 #it=6 maxmatch=2 i=1 - time=0.0004341889998613624
        STAT match_TransposeGatherPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00017745300010574283
        STAT match_TransposeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.001072580999789352
        STAT match_TransposeReshapeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00031847200079937465
        STAT match_TransposeReshapeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0002599059989734087
        STAT match_TransposeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00036651300069934223
        STAT match_UnsqueezeEqualPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016100099946925184
        STAT match_UnsqueezeOrSqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00026689500009524636
        STAT match_UnsqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00021884499892621534
        STAT match_UnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00023759499890729785
        STAT match_WhereAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016358900029445067
        STAT order +0 -0 #it=0 maxmatch=0 i=0 - time=0.0003080930000578519
        STAT patterns +0 -0 #it=0 maxmatch=0 i=0 - time=0.09456365799997002
        STAT remove_duplicated_initializer +0 -0 #it=0 maxmatch=0 i=0 - time=0.0005161030003364431
        STAT remove_duplicated_shape +0 -0 #it=7 maxmatch=0 i=0 - time=0.00011730899950634921
        STAT remove_identity +0 -0 #it=0 maxmatch=0 i=0 - time=0.002335402999960934
        STAT remove_identity_nodes +0 -0 #it=7 maxmatch=0 i=0 - time=0.006442560000323283
        STAT remove_unused +0 -0 #it=7 maxmatch=0 i=0 - time=0.010252653999486938
        STAT shape_order +0 -0 #it=0 maxmatch=0 i=0 - time=0.00020477100042626262
    --MODEL: 3 nodes, 1 inputs, 1 outputs, 4 initializers--
             INPUT:   1 x 1t
         INPUT-SEQ:   1 x Falset
            OUTPUT:   1 x 1t
        OUTPUT-SEQ:   1 x Falset
              INIT:   4 x 1t
              NODE:   2 x Gemm
              NODE:   1 x Relu
    --MODEL: 3 nodes, 1 inputs, 1 outputs, 4 initializers--DETAILED--
         INPUT:   1 x 1t[3x10]
        OUTPUT:   1 x 1t[3x1]
          INIT:   1 x 1t[1]
          INIT:   1 x 1t[1x32]
          INIT:   1 x 1t[32]
          INIT:   1 x 1t[32x10]
          NODE:   1 x Gemm -SIG- 1t[3x10], 1t[32x10], 1t[32]
          NODE:   1 x Gemm -SIG- 1t[3x32], 1t[1x32], 1t[1]
          NODE:   1 x Relu -SIG- 1t[3x32]
    [GraphBuilder-UUQ.to_onnx] make_model 4 inits 0 params
    [GraphBuilder-UUQ.time_evaluation_constants_] 0
    [GraphBuilder-UUQ._build_initializers] start with 4 initializers, large_model=False, external_threshold=1024
    [GraphBuilder-UUQ._build_initializers] switch low/high order
    [GraphBuilder-UUQ._build_initializers] TensorProto-layers.0.bias:1[(32,)]
    [GraphBuilder-UUQ._build_initializers] TensorProto-layers.2.bias:1[(1,)]
    [GraphBuilder-UUQ._build_initializers] <ndarray>-GemmTransposePattern--p_layers_0_weight::T10:float32[(32, 10)]
    [GraphBuilder-UUQ._build_initializers] <ndarray>-GemmTransposePattern--p_layers_2_weight::T10:float32[(1, 32)]
    [GraphBuilder-UUQ._build_initializers] done in 5.670000064128544e-06s with 4 initializers, 0 large initializers
    [GraphBuilder-UUQ._add_shape_information] dynamic shapes replacements={}

Select the pattern to use#

Class OptimizationOptions is used to enable or disable patterns.

<<<

import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.doc import demo_mlp_model

onx = demo_mlp_model("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(
        patterns="TransposeTranspose,TransposeMatMul", verbose=1
    ),
)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-NCA.optimize] start with 5 nodes
    [GraphBuilder-NCA.optimize] #patterns=2
    [GraphBuilderPatternOptimization-NCA.optimize] start with 5 nodes, 4 initializers, 2 patterns, priorities=[0, 1], max_iter=20
    [GraphBuilderPatternOptimization-NCA.optimize] iteration 0: 5 nodes, priority=0
    [GraphBuilderPatternOptimization-NCA.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-NCA.optimize] iteration 1: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-NCA.optimize] stops current_priority_index=2, priorities=[0, 1]
    [GraphBuilderPatternOptimization-NCA.optimize] done after 2 iterations with 5 nodes in 0.003
    [OrderOptimization.optimize] ALGO-2
    [OrderOptimization.shape_order] -- starts with 5 nodes, 4 initializers
    [OrderOptimization.shape_order] done after in 0.00022476000049209688s with changed=0 scale=0
    [GraphBuilder-NCA.optimize] done with 5 nodes in 0.011

There exists some predefined lists of patterns:

  • default: includes all patterns using only standard onnx patterns.

  • onnxruntime: patterns specific to onnxruntime, the final model may be executed by onnxruntime and possibly only onnxruntime as it may introduce patterns from Supported Operators and Data Types.

<<<

import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.doc import demo_mlp_model

onx = demo_mlp_model("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default+onnxruntime", verbose=1),
)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-ELS.optimize] start with 5 nodes
    [GraphBuilder-ELS.optimize] #patterns=126
    [GraphBuilderPatternOptimization-ELS.optimize] start with 5 nodes, 4 initializers, 126 patterns, priorities=[0, 1, 2, 3], max_iter=40
    [GraphBuilderPatternOptimization-ELS.optimize] same children={'SameChildrenPattern', 'SameChildrenFromInputPattern'}
    [GraphBuilderPatternOptimization-ELS.optimize] iteration 0: 5 nodes, priority=0
    [GraphBuilderPatternOptimization-ELS.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-ELS.optimize] iteration 1: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-ELS.optimize] increase priority to 2
    [GraphBuilderPatternOptimization-ELS.optimize] iteration 2: 5 nodes, priority=2
    [GraphBuilderPatternOptimization-ELS.optimize] increase priority to 3
    [GraphBuilderPatternOptimization-ELS.optimize] iteration 3: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-ELS.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.005 | max_time=IdentityPattern:0.000
    [GraphBuilderPatternOptimization-ELS.optimize] iteration 4: 3 nodes, priority=3
    [GraphBuilderPatternOptimization-ELS.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.003 | max_time=GemmTransposePattern:0.000
    [GraphBuilderPatternOptimization-ELS.optimize] iteration 5: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-ELS.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.005 | max_time=ContribRotaryEmbeddingPattern:0.001
    [GraphBuilderPatternOptimization-ELS.optimize] iteration 6: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-ELS.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-ELS.optimize] done after 7 iterations with 5 nodes in 0.078
    [OrderOptimization.optimize] ALGO-2
    [OrderOptimization.shape_order] -- starts with 3 nodes, 4 initializers
    [OrderOptimization.shape_order] done after in 0.00013417500031209784s with changed=0 scale=0
    [GraphBuilder-ELS.optimize] done with 3 nodes in 0.087

Statistics#

This can be used to see when a pattern is applied and how long it takes.

<<<

import pandas
import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.doc import demo_mlp_model

onx = demo_mlp_model("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default"),
)
stat = gr.optimize()

print(pandas.DataFrame(stat))

>>>

                                  pattern  removed  added   time_in  value  iteration  instances  match_index  n_nodes exit_point  changed  scale algo
    0            dynamic_dimension_naming      0.0    0.0  0.000073    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN  NaN
    1    check_A-dynamic_dimension_naming      NaN    NaN  0.000088    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN  NaN
    2                     check_A-opt-sub      NaN    NaN  0.000062    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN  NaN
    3                     remove_identity      0.0    0.0  0.000431    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN  NaN
    4             check_remove_identity-0      NaN    NaN  0.000068    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN  NaN
    ..                                ...      ...    ...       ...    ...        ...        ...          ...      ...        ...      ...    ...  ...
    738                      check_orderL      NaN    NaN  0.000049    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN  NaN
    739                       shape_order      NaN    NaN  0.000191    NaN        NaN        NaN          NaN      NaN        NaN      0.0    0.0  NaN
    740                             order      NaN    NaN       NaN    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN    2
    741                    check_order-12      NaN    NaN  0.000059    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN  NaN
    742                      optimization      2.0    0.0  0.060156    NaN        NaN        NaN          NaN      NaN        NaN      NaN    NaN  NaN
    
    [743 rows x 13 columns]

It can be aggregated:

<<<

import pandas
import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.doc import demo_mlp_model

onx = demo_mlp_model("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default"),
)
stat = gr.optimize()

df = pandas.DataFrame(stat)
for c in df.columns:
    if "time" not in c and "pattern" not in c and "exit_point" not in c:
        df[c] = df[c].fillna(0).astype(int)
aggs = {
    "time_in": "sum",
    "added": "sum",
    "removed": "sum",
    "iteration": "max",
    "match_index": "max",
    "instances": "sum",
}
print(df.groupby("pattern").agg(aggs))

>>>

                                         time_in  added  removed  iteration  match_index  instances
    pattern                                                                                        
    apply_GemmTransposePattern          0.001959      4        2          4            1          2
    apply_MatMulAddPattern              0.000761      2        4          3            1          2
    apply_TransposeEqualReshapePattern  0.001198      1        1          5            0          1
    apply_constant_folding__Reshape     0.000000      0        0          0            0          0
    apply_constant_folding__Transpose   0.000000      0        0          0            0          0
    ...                                      ...    ...      ...        ...          ...        ...
    remove_duplicated_shape             0.000099      0        0          6            0          0
    remove_identity                     0.000977      0        0          0            0          0
    remove_identity_nodes               0.005151      0        0          6            0          0
    remove_unused                       0.006293      0        0          6            0          0
    shape_order                         0.000085      0        0          0            0          0
    
    [146 rows x 6 columns]

Matching Algorithm#

EasyPatternOptimization implements a bidirectional subgraph-matching algorithm that avoids a full enumeration of all possible node assignments. Rather than writing a custom match method, the user only has to declare the subgraph to look for (match_pattern) and the replacement (apply_pattern) using the same builder API that is used to build ONNX graphs.

Pattern definition#

Both match_pattern and apply_pattern are written as regular Python functions that call g.op.<OpType>(...) to create nodes. Each positional argument becomes a symbolic input to the subgraph. The function returns the name(s) of the symbolic output(s).

class TransposeTransposePattern(EasyPatternOptimization):

    def match_pattern(self, g: "GraphBuilder", x):
        t1 = g.op.Transpose(x)
        return g.op.Transpose(t1)

    def apply_pattern(self, g: "GraphBuilder", x):
        return x   # two transposes cancel each other

At build time the framework converts each function into a small GraphBuilderPatternOptimization that stores the nodes in topological order. The last node of the match pattern is used as the anchor: the matching loop only fires when a graph node has the same op_type as that anchor.

Bidirectional matching#

Given a candidate graph node with the same type as the anchor, the algorithm expands the match iteratively with a stack-based approach:

marked  = {anchor_pattern_key: (graph_node, anchor_pattern_node)}
stacked = [anchor_pattern_key]

while stacked:
    (graph_node, pattern_node) = pop(stacked)

    # --- backward pass ---
    # Walk up the predecessors of pattern_node.
    # For each predecessor in the pattern, find the corresponding
    # predecessor in the graph. Fail if types or arities differ.
    backward_match(graph_node, pattern_node)

    # --- forward pass ---
    # Walk down the successors of pattern_node.
    # For each successor in the pattern, find the corresponding
    # successor in the graph. Fail if types or arities differ.
    forward_match(graph_node, pattern_node)

    # New matched pairs are pushed onto stacked.

The two sub-routines are implemented in _match_backward and _match_forward.

Ambiguity detection#

A dictionary pair_results_names maps every pattern result name to the graph result name it has been paired with. Before recording a new pair the algorithm checks that neither name already points to a different name (ambiguity). An ambiguity means the same pattern result would have to correspond to two different graph results simultaneously, which would be inconsistent; the match is rejected in that case.

Validation#

After all pattern nodes have been matched the algorithm performs two additional checks:

  • validate_attribute_mapping – verifies that the attributes of the matched graph nodes are consistent with those declared in the pattern (e.g. same axis value).

  • validate_mapping – an optional hook for subclasses to add arbitrary semantic checks (e.g. verify that a constant operand has a specific numerical value).

Only when both validations succeed does the method return a MatchResult that schedules the matched nodes for replacement.

Overlap prevention#

The outer loop (see Optimization Algorithm above) maintains a marked set of all node identifiers that have already been claimed by an earlier MatchResult. A candidate match is discarded if any of its nodes appears in that set, so no two rewrites ever touch the same node during the same pass.

Worked examples#

The two classes cover the same use-cases but at different levels of abstraction. The examples below both implement a Not + Not → Identity fusion so that the difference is easy to compare.

PatternOptimization (manual match / apply)

The developer writes the matching logic by hand, navigating the graph with the helpers provided by GraphBuilderPatternOptimization.

import inspect
from typing import List, Optional
from onnx import NodeProto
from yobx.xoptim import PatternOptimization, MatchResult


class NotNotPattern(PatternOptimization):
    """Fuses ``Not(Not(x))`` into ``Identity(x)``."""

    def match(
        self,
        g: "GraphBuilderPatternOptimization",
        node: NodeProto,
        matched: List[MatchResult],
    ) -> Optional[MatchResult]:
        # Only consider Not nodes.
        if node.op_type != "Not" or node.domain != "":
            return self.none()

        # Walk one step backward: the producer of node's input must also be Not.
        not_before = g.node_before(node.input[0])
        if not_before is None or not_before.op_type != "Not" or not_before.domain != "":
            return self.none(node, inspect.currentframe().f_lineno)

        # Return both nodes as the rewrite target.
        return MatchResult(self, [not_before, node], self.apply, insert_at=node)

    def apply(
        self,
        g: "GraphBuilder",
        not_before: NodeProto,
        not_after: NodeProto,
    ) -> List[NodeProto]:
        pre_nodes = []
        # Keep the first Not if its output is consumed elsewhere.
        if g.is_used_more_than_once(not_before.output[0]):
            pre_nodes.append(not_before)
        return [
            *pre_nodes,
            g.make_node(
                "Identity",
                [not_before.input[0]],
                [not_after.output[0]],
                name=f"{self.__class__.__name__}--{not_after.name}",
            ),
        ]

EasyPatternOptimization (declarative match_pattern / apply_pattern)

The developer declares the subgraph to look for and the replacement as builder calls. The framework takes care of matching and result renaming automatically.

from typing import List, Optional
from onnx import NodeProto
from yobx.xoptim import EasyPatternOptimization, MatchResult


class NotNotEasyPattern(EasyPatternOptimization):
    """Fuses ``Not(Not(x))`` into ``Identity(x)`` using the easy API."""

    def match_pattern(self, g: "GraphBuilder", x):
        t = g.op.Not(x)      # first Not
        return g.op.Not(t)   # second Not  <-- anchor node

    def apply_pattern(self, g: "GraphBuilder", x):
        return g.op.Identity(x)

Key differences#

Aspect

PatternOptimization

EasyPatternOptimization

Matching logic

Written by hand in match(). The developer calls graph-navigation helpers such as node_before, next_nodes, get_attribute, …

Declared as a Python function match_pattern() using g.op.* calls. The bidirectional BFS is run automatically by the framework.

Replacement logic

Written by hand in apply(). The developer calls g.make_node and explicitly manages which nodes are kept or removed.

Declared as a Python function apply_pattern() using g.op.* calls. The framework renames results and assembles the replacement nodes automatically.

Flexibility

Full control: can inspect any attribute, handle optional inputs, cope with multi-output rewrites, or make graph-wide checks.

More constrained: the subgraph must have a fixed topology with no branching within the pattern. Attribute checks require overriding validate_mapping or validate_attribute_mapping.

Typical use-case

Complex rewrites (e.g. Attention fusion) where the matching involves many conditional checks that are hard to express as a fixed topology.

Simple structural fusions (e.g. double-Not, LeakyRelu decomposition, Gelu decomposition) where the topology is fixed and self-describing.

Shape inference#

The optimizers require to know the shapes to ensure they can rewrite some nodes and avoid producing a model which does not return the same results. If it is missing, some patterns cannot match for sure and they will not match.

This information can be built by running shape inference on the onnx models. That’s what is done in the previous examples. However, the best case is when this information comes from torch.

Function to_onnx converts a torch model into ONNX. While doing so, it stores the shape information coming from torch. There is no need to run shape inference on the onnx model it generates before optimizing it.

Available Patterns and API#

All patterns are documented in Available Patterns.

When writing a pattern, walking along the graph or checking the shape is very common. Class GraphBuilderPatternOptimization provides the following methods.

Opsets#

Patterns must rewrite using the nodes of the opset defined in the model.

Shapes, Types#

  • has_type: tells if a result type is known

  • get_type: returns a result type, fails if not known

  • has_shape: tells if a result shape is known

  • get_shape: returns a result shape, fails if not known

  • has_rank: tells if a result rank is known

  • get_rank: returns a result rank, fails if not known

  • try_infer_type: returns a type if it can be guessed

  • try_infer_shape: returns a shape if it can be guessed

  • has_device: tells if a result device is known

  • get_device: returns a result device, fails if not known

Constants#

  • is_constant: tells if a node is a constant (it may be a constant, an initializer or any value built on other constants)

  • is_constant_scalar: checks a constant is a scalar and compares its value to a number

  • get_computed_constant: returns the constant, computing it if it is a constant built from other constants

  • get_attribute: returns an attribute of a node

Graph#

Nodes#

  • make_node: creates a node without adding it to the graph

  • make_node_check_opset: creates a node without adding it to the graph, deals with some constraints related to opset version

Debugging Optimization with Environment Variables#

Several environment variables can be set to help debug the pattern optimizer.

  • LOG_PATTERN_OPTIMIZE: sets the verbosity level for all patterns. Setting it to 10 produces the most detailed output. Example:

    LOG_PATTERN_OPTIMIZE=10 python my_script.py
    
  • PATTERN: increases the verbosity to 10 for one or more specific patterns (comma-separated class names or class names with the Pattern suffix removed). This is useful to focus on a single pattern without flooding the output with information from all the others. Example:

    PATTERN=ReshapeReshapePattern python my_script.py
    
  • <ClassName>: setting an environment variable whose name matches the class name of a pattern (e.g. ReshapeReshapePattern=10) sets the verbosity for that individual pattern. This is equivalent to using PATTERN but more explicit.

  • DROPPATTERN: comma-separated list of pattern class names to exclude from the optimizer. Useful to bisect which pattern is causing a wrong result or an unexpected error. Example:

    DROPPATTERN=ReshapeReshapePattern,CastPattern python my_script.py
    
  • DUMPPATTERNS: when set to a folder path, the optimizer writes the matched nodes and their replacements to that folder for every successful pattern application. Useful for inspecting what the optimizer is actually doing. Example:

    DUMPPATTERNS=/tmp/dump_patterns python my_script.py
    
  • PATTERNNOREMOVE: when set to a result name, the optimizer raises an exception if an optimization step removes that name from the graph. Useful to track down which pattern is eliminating a particular node or result. Example:

    PATTERNNOREMOVE=output_0 python my_script.py
    
  • PATTERNSTEP: when set to 1, True, or true, the optimizer runs one optimization step at a time, which can help narrow down which step introduces a problem. Example:

    PATTERNSTEP=1 python my_script.py