Pattern Optimizer

The pattern optimizer is implemented by class GraphBuilderPatternOptimization. It searches for a specific sequence of nodes in the graph and replaces it by another one without changing the inputs or the long_outputs of the graph. The goal of the optimizer is to make the whole computation graph more efficient. The goal of this implementation is to make this optimization as fast as possible. Assuming the nodes in an onnx graph are ordered in a way every input of a node was created by previous nodes, the optimizer must not require any global reordering. The cost should be in O(N P I) in the worst case where N is the number of nodes, P is the number of patterns, I is the number of iterations.

It is difficult to foresee what a pattern needs in order to rewrite a part of the graph. This API tries to give as much freedom as it can without leaving too much to do to the developper which tries to add a new pattern.

Patterns

Patterns must inherit from PatternOptimization. This class defines two methods.

PatternOptimization.match

def match(
    self,
    g: "GraphBuilderPatternOptimization",
    node: NodeProto,
    matched: List[MatchResult],
) -> Optional[MatchResult]:
  • g is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.

  • node: the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.

  • matched: usually unused, it returns of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

  • a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.

  • A function doing the rewriting (usually method apply of the pattern class).

  • An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.

Debugging: method none

def none(
    self,
    node: Optional[NodeProto] = None,
    lineno: Optional[int] = None,
    msg: Optional[Union[Callable[[], str], str]] = None,
):

It may be useful which reason made a pattern matching fail. Instead of returning None, method match can return the following expression:

return self.none(node, inspect.currentframe().f_lineno)

By setting the verbosity (see next Section), the user may then know which lines in the code returned None and which condition failed. The last parameter is used to print a more comprehensive message about the reason why the match failed.

PatternOptimization.apply

@classmethod
def apply(
    cls, g: "GraphBuilder", *nodes: Sequence[NodeProto]
) -> List[NodeProto]:

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting. It assumes no other pattern optimizer modified them or will modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.

Optimization Algorithm

It is implemented in method optimize

def optimize(
    self, max_iter=-1, remove_identity: bool = True
) -> List[Dict[str, Any]]:

The algorithm runs multiple iteration until the graph is not evolving or max_iter is reached. By default, it is equal to the number of nodes. An iteration is:

matches = []

builds all successors and predecessors

# Step 1: match

for all patterns P:

    for all nodes n:

        r = p.match(n)
        if r:
            if no node already scheduled to be rewritten by another match:
                matches.append(r)

# Step 2: apply

for all matches r:
    apply the match r

# Step 3: clean

remove unused nodes
remove identity nodes

This algorithm may apply more than one rewriting at each iteration but it guarantees the local structure when applying the rewriting was not altered by another one.

Adding a pattern

See #80 about the addition of a new pattern.

Example

Simple API

We consider the following simple model:

<<<

import torch
from experimental_experiment.helpers import pretty_onnx
from experimental_experiment.xbuilder import OptimizationOptions
from experimental_experiment.torch_interpreter import to_onnx


class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(10, 32),
            torch.nn.ReLU(),
            torch.nn.Linear(32, 1),
        )

    def forward(self, x):
        return self.layers(x)


x = torch.rand(3, 10)
onx = to_onnx(
    MLP(), (x,), input_names=["x"], options=OptimizationOptions(patterns=None)
)
with open("temp_doc_mlp.onnx", "wb") as f:
    f.write(onx.SerializeToString())
print(pretty_onnx(onx))

>>>

    opset: domain='' version=18
    input: name='x' type=dtype('float32') shape=[3, 10]
    init: name='p_layers_0_weight::T10' type=float32 shape=(10, 32)       -- GraphBuilder.constant_folding.from/fold(p_layers_0_weight)##p_layers_0_weight/DynamoInterpret.placeholder.1/P(layers.0.weight)
    init: name='p_layers_2_weight::T10' type=float32 shape=(32, 1)        -- GraphBuilder.constant_folding.from/fold(p_layers_2_weight)##p_layers_2_weight/DynamoInterpret.placeholder.1/P(layers.2.weight)
    init: name='layers.0.bias' type=float32 shape=(32,)                   -- DynamoInterpret.placeholder.1/P(layers.0.bias)
    init: name='layers.2.bias' type=float32 shape=(1,) -- array([0.105], dtype=float32)-- DynamoInterpret.placeholder.1/P(layers.2.bias)
    MatMul(x, p_layers_0_weight::T10) -> _onx_matmul_x
      Add(_onx_matmul_x, layers.0.bias) -> linear
        Relu(linear) -> relu
          MatMul(relu, p_layers_2_weight::T10) -> _onx_matmul_relu
            Add(_onx_matmul_relu, layers.2.bias) -> output_0
    output: name='output_0' type=dtype('float32') shape=[3, 1]

Which we can renders as follows:

digraph {
  graph [rankdir=TB, splines=true, overlap=false, nodesep=0.2, ranksep=0.2, fontsize=8];
  node [style="rounded,filled", color="#888888", fontcolor="#222222", shape=box];
  edge [arrowhead=vee, fontsize=7, labeldistance=-5, labelangle=0];
  I_0 [label="x\nFLOAT(3,10)", fillcolor="#aaeeaa"];
  i_1 [label="p_layers_0_weight::T10\nFLOAT(10, 32)", fillcolor="#cccc00"];
  i_2 [label="p_layers_2_weight::T10\nFLOAT(32, 1)", fillcolor="#cccc00"];
  i_3 [label="layers.0.bias\nFLOAT(32)", fillcolor="#cccc00"];
  MatMul_4 [label="MatMul(., .)", fillcolor="#ee9999"];
  Add_5 [label="Add(., .)", fillcolor="#cccccc"];
  Relu_6 [label="Relu(.)", fillcolor="#cccccc"];
  MatMul_7 [label="MatMul(., .)", fillcolor="#ee9999"];
  Add_8 [label="Add(., [0.10470577])", fillcolor="#cccccc"];
  I_0 -> MatMul_4 [label="FLOAT(3,10)"];
  i_1 -> MatMul_4 [label="FLOAT(10, 32)"];
  MatMul_4 -> Add_5 [label="FLOAT(3,32)"];
  i_3 -> Add_5 [label="FLOAT(32)"];
  Add_5 -> Relu_6 [label="FLOAT(3,32)"];
  Relu_6 -> MatMul_7 [label="FLOAT(3,32)"];
  i_2 -> MatMul_7 [label="FLOAT(32, 1)"];
  MatMul_7 -> Add_8 [label="FLOAT(3,1)"];
  O_9 [label="output_0\nFLOAT(3,1)", fillcolor="#aaaaee"];
  Add_8 -> O_9;
}

We then apply the optimizations by writing the following code:

<<<

import onnx
from experimental_experiment.helpers import pretty_onnx
from experimental_experiment.xbuilder import GraphBuilder

onx = onnx.load("temp_doc_mlp.onnx")

# The model is placed in a GraphBuilder.
# It creates dictionnaires to store shapes, ranks, types
# to make it easier to the optimizers to find the information
# they need. It still uses NodeProto to store nodes
gr = GraphBuilder(onx, infer_shapes_options=True)

# Let's optimize.
opt_onx = gr.to_onnx(optimize=True)
with open("temp_doc_mlp_opt.onnx", "wb") as f:
    f.write(opt_onx.SerializeToString())
print(pretty_onnx(opt_onx))

>>>

    opset: domain='' version=18
    input: name='x' type=dtype('float32') shape=[3, 10]
    init: name='layers.0.bias' type=float32 shape=(32,)                   -- DynamoInterpret.placeholder.1/P(layers.0.bias)GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: name='layers.2.bias' type=float32 shape=(1,) -- array([0.105], dtype=float32)-- DynamoInterpret.placeholder.1/P(layers.2.bias)GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    init: name='GemmTransposePattern--p_layers_0_weight::T10' type=float32 shape=(32, 10)-- GraphBuilder.constant_folding.from/fold(p_layers_0_weight::T10)##p_layers_0_weight::T10/GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
    init: name='GemmTransposePattern--p_layers_2_weight::T10' type=float32 shape=(1, 32)-- GraphBuilder.constant_folding.from/fold(init7_s2_1_32,p_layers_2_weight::T10)##p_layers_2_weight::T10/GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)##init7_s2_1_32/TransposeEqualReshapePattern.apply.new_shape
    Gemm(x, GemmTransposePattern--p_layers_0_weight::T10, layers.0.bias, transB=1) -> linear
      Relu(linear) -> relu
        Gemm(relu, GemmTransposePattern--p_layers_2_weight::T10, layers.2.bias, transB=1) -> output_0
    output: name='output_0' type=dtype('float32') shape=[3, 1]

Which renders as follows:

digraph {
  graph [rankdir=TB, splines=true, overlap=false, nodesep=0.2, ranksep=0.2, fontsize=8];
  node [style="rounded,filled", color="#888888", fontcolor="#222222", shape=box];
  edge [arrowhead=vee, fontsize=7, labeldistance=-5, labelangle=0];
  I_0 [label="x\nFLOAT(3,10)", fillcolor="#aaeeaa"];
  i_1 [label="layers.0.bias\nFLOAT(32)", fillcolor="#cccc00"];
  i_2 [label="GemmTransposePattern--p_layers_0_weight::T10\nFLOAT(32, 10)", fillcolor="#cccc00"];
  i_3 [label="GemmTransposePattern--p_layers_2_weight::T10\nFLOAT(1, 32)", fillcolor="#cccc00"];
  Gemm_4 [label="Gemm(., ., .)", fillcolor="#cccccc"];
  Relu_5 [label="Relu(.)", fillcolor="#cccccc"];
  Gemm_6 [label="Gemm(., ., [0.10470577])", fillcolor="#cccccc"];
  I_0 -> Gemm_4 [label="FLOAT(3,10)"];
  i_2 -> Gemm_4 [label="FLOAT(32, 10)"];
  i_1 -> Gemm_4 [label="FLOAT(32)"];
  Gemm_4 -> Relu_5 [label="FLOAT(3,32)"];
  Relu_5 -> Gemm_6 [label="FLOAT(3,32)"];
  i_3 -> Gemm_6 [label="FLOAT(1, 32)"];
  O_7 [label="output_0\nFLOAT(3,1)", fillcolor="#aaaaee"];
  Gemm_6 -> O_7;
}

Verbosity

<<<

import onnx
from experimental_experiment.xbuilder import GraphBuilder

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(onx, infer_shapes_options=True, verbose=1)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-TPG._add_shape_information] dynamic shapes replacements={}
    [GraphBuilder-TPG.optimize] start with 5 nodes
    [GraphBuilder-TPG.optimize] #patterns=92
    [GraphBuilder-TPG.optimize] start with subgraphs
    [GraphBuilder-TPG.optimize] done with subgraphs
    [GraphBuilderPatternOptimization-TPG.optimize] start with 5 nodes, 4 initializers, 92 patterns, priorities=[0, 1, 2, 3], max_iter=40
    [GraphBuilderPatternOptimization-TPG.optimize] same children={'SameChildrenFromInputPattern', 'SameChildrenPattern'}
    [GraphBuilderPatternOptimization-TPG.optimize] iteration 0: 5 nodes, priority=0
    [GraphBuilderPatternOptimization-TPG.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-TPG.optimize] iteration 1: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-TPG.optimize] increase priority to 2
    [GraphBuilderPatternOptimization-TPG.optimize] iteration 2: 5 nodes, priority=2
    [GraphBuilderPatternOptimization-TPG.optimize] increase priority to 3
    [GraphBuilderPatternOptimization-TPG.optimize] iteration 3: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-TPG.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.002 | max_time=IdentityPattern:0.000
    [GraphBuilderPatternOptimization-TPG.optimize] iteration 4: 3 nodes, priority=3
    [GraphBuilderPatternOptimization-TPG.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.001 | max_time=GemmTransposePattern:0.000
    [GraphBuilderPatternOptimization-TPG.optimize] iteration 5: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-TPG.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.002 | max_time=Sub1MulPattern:0.000
    [GraphBuilderPatternOptimization-TPG.optimize] iteration 6: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-TPG.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-TPG.optimize] done after 7 iterations with 5 nodes in 0.042
    [OrderOptimization.optimize] ALGO-2
    [OrderOptimization.random_order] -- starts with 3 nodes, 4 initializers
    [OrderOptimization.shape_order] done after in 7.373800326604396e-05s with changed=0 scale=0
    [GraphBuilder-TPG.optimize] done with 3 nodes in 0.046
    [GraphBuilder-TPG.to_onnx] make_model 4 inits 0 params
    [GraphBuilder-TPG.time_evaluation_constants_] 0
    [GraphBuilder-TPG._build_initializers] start with 4 initializers, large_model=False, external_threshold=1024
    [GraphBuilder-TPG._build_initializers] switch low/high order
    [GraphBuilder-TPG._build_initializers] done in 6.090995157137513e-06s with 4 initializers, 0 large initializers
    [GraphBuilder-TPG._add_shape_information] dynamic shapes replacements={}

With more verbosity:

<<<

import onnx
from experimental_experiment.xbuilder import GraphBuilder

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(onx, infer_shapes_options=True, verbose=11)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-XCY._update_structures_with_proto] -- starts with 5 nodes
    [GraphBuilder-XCY.set_shape] p_layers_0_weight::T10:(10, 32)
    [GraphBuilder-XCY.set_rank] p_layers_0_weight::T10:2
    [GraphBuilder-XCY.set_type] p_layers_0_weight::T10:1
    [GraphBuilder-XCY.make_initializer] p_layers_0_weight::T10[1:(10, 32)]
    [GraphBuilder-XCY.update_node_constant] new constant 'p_layers_0_weight::T10', node=None
    [GraphBuilder-XCY.set_shape] p_layers_2_weight::T10:(32, 1)
    [GraphBuilder-XCY.set_rank] p_layers_2_weight::T10:2
    [GraphBuilder-XCY.set_type] p_layers_2_weight::T10:1
    [GraphBuilder-XCY.make_initializer] p_layers_2_weight::T10[1:(32, 1)]
    [GraphBuilder-XCY.update_node_constant] new constant 'p_layers_2_weight::T10', node=None
    [GraphBuilder-XCY.set_shape] layers.0.bias:(32,)
    [GraphBuilder-XCY.set_rank] layers.0.bias:1
    [GraphBuilder-XCY.set_type] layers.0.bias:1
    [GraphBuilder-XCY.make_initializer] layers.0.bias[1:(32,)]
    [GraphBuilder-XCY.update_node_constant] new constant 'layers.0.bias', node=None
    [GraphBuilder-XCY.set_shape] layers.2.bias:(1,)
    [GraphBuilder-XCY.set_rank] layers.2.bias:1
    [GraphBuilder-XCY.set_type] layers.2.bias:1
    [GraphBuilder-XCY.make_initializer] layers.2.bias[1:(1,)]
    [GraphBuilder-XCY.update_node_constant] new constant 'layers.2.bias', node=None
    [GraphBuilder-XCY.set_type] x:1
    [GraphBuilder-XCY.set_shape] x:(3, 10)
    [GraphBuilder-XCY.set_rank] x:2
    [GraphBuilder-XCY.set_type] output_0:1
    [GraphBuilder-XCY.set_shape] output_0:(3, 1)
    [GraphBuilder-XCY.set_rank] output_0:2
    [GraphBuilder-XCY.set_type] _onx_matmul_x:1
    [GraphBuilder-XCY.set_shape] _onx_matmul_x:(3, 32)
    [GraphBuilder-XCY.set_rank] _onx_matmul_x:2
    [GraphBuilder-XCY.set_type] linear:1
    [GraphBuilder-XCY.set_shape] linear:(3, 32)
    [GraphBuilder-XCY.set_rank] linear:2
    [GraphBuilder-XCY.set_type] relu:1
    [GraphBuilder-XCY.set_shape] relu:(3, 32)
    [GraphBuilder-XCY.set_rank] relu:2
    [GraphBuilder-XCY.set_type] _onx_matmul_relu:1
    [GraphBuilder-XCY.set_shape] _onx_matmul_relu:(3, 1)
    [GraphBuilder-XCY.set_rank] _onx_matmul_relu:2
    [GraphBuilder-XCY.set_type] output_0:1
    [GraphBuilder-XCY._update_structures_with_proto] ends with 5 nodes in 0.001443892004317604
    [GraphBuilder-XCY.constant_folding] -- starts with 4 constants and 5 nodes.
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: p_layers_0_weight::T10
    [GraphBuilder-XCY.constant_folding] cst:: . :: _onx_matmul_relu
    [GraphBuilder-XCY.constant_folding] cst:: . :: x
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: p_layers_2_weight::T10
    [GraphBuilder-XCY.constant_folding] cst:: . :: relu
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: layers.0.bias
    [GraphBuilder-XCY.constant_folding] cst:: . :: output_0
    [GraphBuilder-XCY.constant_folding] cst:: . :: linear
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: layers.2.bias
    [GraphBuilder-XCY.constant_folding] cst:: . :: _onx_matmul_x
    [GraphBuilder-XCY.constant_folding] initializer: p_layers_0_weight::T10
    [GraphBuilder-XCY.constant_folding] initializer: p_layers_2_weight::T10
    [GraphBuilder-XCY.constant_folding] initializer: layers.0.bias
    [GraphBuilder-XCY.constant_folding] initializer: layers.2.bias
    [GraphBuilder-XCY.constant_folding] ends with 4 constants and 5 nodes in 0.00010349799413233995 seconds
    [GraphBuilder-XCY._update_shape_types_with_proto] -- starts with 5 nodes and 0 shapes.
    [GraphBuilder._update_shape_types_with_proto] infer shapes
    [GraphBuilder._update_shape_types_with_proto] infer shapes done 0.0004092410090379417 seconds
    [GraphBuilder._update_shape_types_with_proto] _clean_shapes after 0.00046036399726290256 seconds
    [GraphBuilder-XCY._update_shape_types_with_proto] walk through 0 shapes.
    [GraphBuilder-XCY.set_type] _onx_matmul_x:1
    [_update_shape_types_with_proto_one_result] update shape(_onx_matmul_x) with (3, 32)
    [GraphBuilder-XCY.set_type] linear:1
    [_update_shape_types_with_proto_one_result] update shape(linear) with (3, 32)
    [GraphBuilder-XCY.set_type] relu:1
    [_update_shape_types_with_proto_one_result] update shape(relu) with (3, 32)
    [GraphBuilder-XCY.set_type] _onx_matmul_relu:1
    [_update_shape_types_with_proto_one_result] update shape(_onx_matmul_relu) with (3, 1)
    [GraphBuilder-XCY._update_shape_types_with_proto] ends in 0.00017665998893789947 seconds.
    [GraphBuilder-XCY._add_shape_information] dynamic shapes replacements={}
    [GraphBuilder-XCY.optimize] start with 5 nodes
    [GraphBuilder-XCY.optimize] options=OptimizationOptions(constant_folding={'Sqrt', 'Unsqueeze', 'Reciprocal', 'Cast', 'Exp', 'Transpose', 'Concat', 'Mul', 'Add', 'Squeeze', 'Sub', 'Reshape', 'Div'}, patterns=[BatchNormalizationPattern(), BatchNormalizationTrainingPattern(), CastLayerNormalizationCastPattern(), CastPattern(), CastCastBinaryPattern(), CastCastPattern(), CastOpCastPattern(), ClipClipPattern(), ConcatEmptyPattern(), ConcatGatherPattern(), ConcatReshapePattern(), ConcatTwiceUnaryPattern(), ConstantToInitializerPattern(), ConvBiasNullPattern(), DropoutPattern(), ExpandPattern(), ExpandBroadcastPattern(), ExpandSwapPattern(), GathersSplitPattern(), GeluPattern(), IdentityPattern(), LayerNormalizationPattern(), LayerNormalizationScalePattern(), LeakyReluPattern(), MulMulMulScalarPattern(), NotNotPattern(), NotWherePattern(), ReduceArgTopKPattern(), ReduceReshapePattern(), ReduceSumNormalizePattern(), ReshapePattern(), ReshapeMatMulReshapePattern(), Reshape2Of3Pattern(), ReshapeReshapeBinaryPattern(), MatMulAddPattern(), GemmTransposePattern(), MatMulReshape2Of3Pattern(), MulMulMatMulPattern(), ShapeBasedReshapeIsSqueezePattern(), ShapeBasedStaticExpandPattern(), ShapeBasedConcatExpandPattern(), ShapeBasedEditDistanceReshapePattern(), ShapeBasedIdentityPattern(), ShapeBasedExpandBroadcastPattern(), ShapeBasedExpandBroadcastMatMulPattern(), ShapeBasedExpandCastWhereSwapPattern(), ShapeBasedExpandSwapPattern(), ShapeBasedMatMulToMulPattern(), ShapedBasedReshapePattern(), ShapeBasedSameChildrenPattern(), ShapeBasedShapeShapeAddPattern(), ReshapeReshapePattern(), RotaryEmbeddingPattern(), SameChildrenPattern(), SameChildrenFromInputPattern(), SequenceConstructAtPattern(), SliceSlicePattern(), SlicesSplitPattern(), SoftmaxCrossEntropyLossCastPattern(), SplitConcatPattern(), SqueezeAddPattern(), SqueezeBinaryUnsqueezePattern(), SqueezeUnsqueezePattern(), StaticConcatReshapePattern(), Sub1MulPattern(), SwapExpandReshapePattern(), SwapRangeAddScalarPattern(), SwapUnaryPattern(), SwapUnsqueezeTransposePattern(), SwitchOrderBinaryPattern(), SwitchReshapeActivationPattern(), TransposeEqualReshapePattern(), TransposeGatherPattern(), TransposeMatMulPattern(), TransposeReshapeMatMulPattern(), TransposeReshapeTransposePattern(), TransposeTransposePattern(), UnsqueezeEqualPattern(), UnsqueezeOrSqueezeReshapePattern(), UnsqueezeReshapePattern(), UnsqueezeUnsqueezePattern(), WhereAddPattern(), RotaryConcatPartPattern(), FunctionAttentionPattern(), FunctionAttentionGQAPattern(), FunctionCausalMaskPattern(), FunctionCausalMaskMulAddPattern(), FunctionCosSinCachePattern(), FunctionHalfRotaryEmbeddingPattern(), RMSNormalizationPattern(), RMSNormalizationMulPattern(), AttentionGQAPattern()], verbose=11)
    -- GRAPH BEFORE OPTIMIZATON --
    
    opset: : 18
    init: p_layers_0_weight::T10: CP1: (10, 32)                            -- GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
    init: p_layers_2_weight::T10: CP1: (32, 1)                             -- GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)
    init: layers.0.bias: CP1: (32,)                                        -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: layers.2.bias: CP1: (1,)                                         -- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    input:: x                                                                       |T1: 3 x 10
    MatMul: x, p_layers_0_weight::T10 -> _onx_matmul_x                              |T1: 3 x 32                   - Opset
    Add: _onx_matmul_x, layers.0.bias -> linear                                     |T1: 3 x 32                   - Opset2
    Relu: linear -> relu                                                            |T1: 3 x 32                   - relu
    MatMul: relu, p_layers_2_weight::T10 -> _onx_matmul_relu                        |T1: 3 x 1                    - Opset3
    Add: _onx_matmul_relu, layers.2.bias -> output_0                                |T1: 3 x 1                    - Opset4
    output:: output_0                                                               |T1: 3 x 1
    -- END --
    [GraphBuilder-XCY.optimize] start with subgraphs
    [GraphBuilder-XCY.optimize] done with subgraphs
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 5
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 5 nodes in 5.8995006838813424e-05 seconds
    [GraphBuilder-XCY.constant_folding] -- starts with 4 constants and 5 nodes.
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: p_layers_0_weight::T10
    [GraphBuilder-XCY.constant_folding] cst:: . :: _onx_matmul_relu
    [GraphBuilder-XCY.constant_folding] cst:: . :: x
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: p_layers_2_weight::T10
    [GraphBuilder-XCY.constant_folding] cst:: . :: relu
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: layers.0.bias
    [GraphBuilder-XCY.constant_folding] cst:: . :: output_0
    [GraphBuilder-XCY.constant_folding] cst:: . :: linear
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: layers.2.bias
    [GraphBuilder-XCY.constant_folding] cst:: . :: _onx_matmul_x
    [GraphBuilder-XCY.constant_folding] initializer: p_layers_0_weight::T10
    [GraphBuilder-XCY.constant_folding] initializer: p_layers_2_weight::T10
    [GraphBuilder-XCY.constant_folding] initializer: layers.0.bias
    [GraphBuilder-XCY.constant_folding] initializer: layers.2.bias
    [GraphBuilder-XCY.constant_folding] ends with 4 constants and 5 nodes in 5.506799789145589e-05 seconds
    [GraphBuilderPatternOptimization-XCY.optimize] start with 5 nodes, 4 initializers, 92 patterns, priorities=[0, 1, 2, 3], max_iter=40
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   1/92 - P0 - BatchNormalizationPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   2/92 - P0 - BatchNormalizationTrainingPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   3/92 - P0 - CastCastPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   4/92 - P0 - CastPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   5/92 - P0 - ConcatGatherPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   6/92 - P0 - ConcatReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   7/92 - P0 - ConvBiasNullPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   8/92 - P0 - ExpandPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern   9/92 - P0 - FunctionAttentionGQAPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  10/92 - P0 - FunctionAttentionPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  11/92 - P0 - GeluPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  12/92 - P0 - IdentityPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  13/92 - P0 - LeakyReluPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  14/92 - P0 - ReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  15/92 - P0 - ReshapeReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  16/92 - P0 - SameChildrenFromInputPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  17/92 - P0 - SameChildrenPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  18/92 - P0 - ShapeBasedEditDistanceReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  19/92 - P0 - ShapeBasedIdentityPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  20/92 - P0 - ShapeBasedReshapeIsSqueezePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  21/92 - P0 - ShapeBasedSameChildrenPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  22/92 - P0 - ShapeBasedShapeShapeAddPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  23/92 - P0 - ShapeBasedStaticExpandPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  24/92 - P0 - ShapedBasedReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  25/92 - P0 - SoftmaxCrossEntropyLossCastPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  26/92 - P0 - SqueezeAddPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  27/92 - P0 - SqueezeBinaryUnsqueezePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  28/92 - P0 - SqueezeUnsqueezePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  29/92 - P0 - StaticConcatReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  30/92 - P0 - SwapExpandReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  31/92 - P0 - SwapUnaryPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  32/92 - P0 - SwapUnsqueezeTransposePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  33/92 - P0 - TransposeGatherPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  34/92 - P0 - TransposeReshapeTransposePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  35/92 - P0 - TransposeTransposePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  36/92 - P0 - UnsqueezeOrSqueezeReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  37/92 - P0 - UnsqueezeReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  38/92 - P0 - UnsqueezeUnsqueezePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  39/92 - P1 - CastCastBinaryPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  40/92 - P1 - CastLayerNormalizationCastPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  41/92 - P1 - CastOpCastPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  42/92 - P1 - ClipClipPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  43/92 - P1 - ConcatEmptyPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  44/92 - P1 - ConcatTwiceUnaryPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  45/92 - P1 - ConstantToInitializerPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  46/92 - P1 - DropoutPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  47/92 - P1 - ExpandBroadcastPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  48/92 - P1 - ExpandSwapPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  49/92 - P1 - FunctionCausalMaskMulAddPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  50/92 - P1 - FunctionCausalMaskPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  51/92 - P1 - FunctionCosSinCachePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  52/92 - P1 - FunctionHalfRotaryEmbeddingPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  53/92 - P1 - GathersSplitPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  54/92 - P1 - GemmTransposePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  55/92 - P1 - LayerNormalizationPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  56/92 - P1 - LayerNormalizationScalePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  57/92 - P1 - MatMulReshape2Of3Pattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  58/92 - P1 - MulMulMatMulPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  59/92 - P1 - MulMulMulScalarPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  60/92 - P1 - NotNotPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  61/92 - P1 - NotWherePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  62/92 - P1 - RMSNormalizationMulPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  63/92 - P1 - RMSNormalizationPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  64/92 - P1 - ReduceArgTopKPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  65/92 - P1 - ReduceReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  66/92 - P1 - ReduceSumNormalizePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  67/92 - P1 - Reshape2Of3Pattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  68/92 - P1 - ReshapeMatMulReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  69/92 - P1 - ReshapeReshapeBinaryPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  70/92 - P1 - RotaryConcatPartPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  71/92 - P1 - RotaryEmbeddingPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  72/92 - P1 - SequenceConstructAtPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  73/92 - P1 - ShapeBasedConcatExpandPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  74/92 - P1 - ShapeBasedExpandBroadcastMatMulPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  75/92 - P1 - ShapeBasedExpandBroadcastPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  76/92 - P1 - ShapeBasedExpandCastWhereSwapPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  77/92 - P1 - ShapeBasedExpandSwapPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  78/92 - P1 - ShapeBasedMatMulToMulPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  79/92 - P1 - SliceSlicePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  80/92 - P1 - SlicesSplitPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  81/92 - P1 - SplitConcatPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  82/92 - P1 - Sub1MulPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  83/92 - P1 - SwapRangeAddScalarPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  84/92 - P1 - SwitchOrderBinaryPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  85/92 - P1 - SwitchReshapeActivationPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  86/92 - P1 - TransposeEqualReshapePattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  87/92 - P1 - TransposeMatMulPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  88/92 - P1 - TransposeReshapeMatMulPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  89/92 - P1 - UnsqueezeEqualPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  90/92 - P1 - WhereAddPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  91/92 - P2 - AttentionGQAPattern()
    [GraphBuilderPatternOptimization-XCY.optimize] use pattern  92/92 - P3 - MatMulAddPattern()
    -- optimize starts with...
    
    opset: : 18
    init: p_layers_0_weight::T10: CP1: (10, 32)                            -- GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
    init: p_layers_2_weight::T10: CP1: (32, 1)                             -- GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)
    init: layers.0.bias: CP1: (32,)                                        -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
    init: layers.2.bias: CP1: (1,)                                         -- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
    input:: x                                                                       |T1: 3 x 10
    MatMul: x, p_layers_0_weight::T10 -> _onx_matmul_x                              |T1: 3 x 32                   - Opset
    Add: _onx_matmul_x, layers.0.bias -> linear                                     |T1: 3 x 32                   - Opset2
    Relu: linear -> relu                                                            |T1: 3 x 32                   - relu
    MatMul: relu, p_layers_2_weight::T10 -> _onx_matmul_relu                        |T1: 3 x 1                    - Opset3
    Add: _onx_matmul_relu, layers.2.bias -> output_0                                |T1: 3 x 1                    - Opset4
    output:: output_0                                                               |T1: 3 x 1
    -- starts optimization
    [GraphBuilderPatternOptimization-XCY.optimize] same children={'SameChildrenFromInputPattern', 'SameChildrenPattern'}
    [GraphBuilderPatternOptimization-XCY.optimize] iteration 0: 5 nodes, priority=0
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips CastLayerNormalizationCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips CastCastBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips CastOpCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ClipClipPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ConcatEmptyPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips ConcatTwiceUnaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ConstantToInitializerPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips DropoutPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips ExpandBroadcastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ExpandSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips GathersSplitPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 884:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [IdentityPattern.match] NONE - line: 926:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [GraphBuilderPatternOptimization-XCY.optimize] skips LayerNormalizationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips LayerNormalizationScalePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [GraphBuilder-GRG.make_tensor_input] x[0:None] -- marker=_build_pattern1_x
    [GraphBuilder-GRG.set_type] x:0
    [GraphBuilder-GRG.set_type] x:-1
    [GraphBuilder-GRG.make_tensor_input] zero[0:None] -- marker=_build_pattern1_zero
    [GraphBuilder-GRG.set_type] zero:0
    [GraphBuilder-GRG.set_type] zero:-1
    [GraphBuilder-GRG.make_tensor_input] slope[0:None] -- marker=_build_pattern1_slope
    [GraphBuilder-GRG.set_type] slope:0
    [GraphBuilder-GRG.set_type] slope:-1
    [GraphBuilder-GRG.3.make_node] [tt:-] Greater: ['x', 'zero']->['_onx_greater_x']
    [GraphBuilder-GRG.set_type] _onx_greater_x:9
    [GraphBuilder-GRG.3.make_node] [tt:-] Mul: ['x', 'slope']->['_onx_mul_x']
    [GraphBuilder-GRG.set_type] _onx_mul_x:-1
    [GraphBuilder-GRG.3.make_node] [ttt:-] Where: ['_onx_greater_x', 'x', '_onx_mul_x']->['_onx_where_greater_x']
    [GraphBuilder-GRG.set_type] _onx_where_greater_x:-1
    [GraphBuilder-GRG.make_tensor_output] _onx_where_greater_x[0: None]
    [GraphBuilderPatternOptimization-XCY.optimize] skips MulMulMulScalarPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips NotNotPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips NotWherePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ReduceArgTopKPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ReduceReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ReduceSumNormalizePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips ReshapeMatMulReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips Reshape2Of3Pattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ReshapeReshapeBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips GemmTransposePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips MatMulReshape2Of3Pattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips MulMulMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips ShapeBasedConcatExpandPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips ShapeBasedExpandBroadcastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ShapeBasedExpandBroadcastMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ShapeBasedExpandCastWhereSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ShapeBasedExpandSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips ShapeBasedMatMulToMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:experimental_experiment.xoptim.patterns.onnx_shape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:experimental_experiment.xoptim.patterns.onnx_shape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips RotaryEmbeddingPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips SequenceConstructAtPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips SliceSlicePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips SlicesSplitPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [GraphBuilder-SPK.make_tensor_input] X[0:None] -- marker=_build_pattern1_X
    [GraphBuilder-SPK.set_type] X:0
    [GraphBuilder-SPK.set_type] X:-1
    [GraphBuilder-SPK.make_tensor_input] indices[0:None] -- marker=_build_pattern1_indices
    [GraphBuilder-SPK.set_type] indices:0
    [GraphBuilder-SPK.set_type] indices:-1
    [GraphBuilder-SPK.make_tensor_input] axis[0:None] -- marker=_build_pattern1_axis
    [GraphBuilder-SPK.set_type] axis:0
    [GraphBuilder-SPK.set_type] axis:-1
    [GraphBuilder-SPK.make_tensor_input] zerof[0:None] -- marker=_build_pattern1_zerof
    [GraphBuilder-SPK.set_type] zerof:0
    [GraphBuilder-SPK.set_type] zerof:-1
    [GraphBuilder-SPK.make_tensor_input] zeroi[0:None] -- marker=_build_pattern1_zeroi
    [GraphBuilder-SPK.set_type] zeroi:0
    [GraphBuilder-SPK.set_type] zeroi:-1
    [GraphBuilder-SPK.make_tensor_input] b[0:None] -- marker=_build_pattern1_b
    [GraphBuilder-SPK.set_type] b:0
    [GraphBuilder-SPK.set_type] b:-1
    [GraphBuilder-SPK.3.make_node] [tt:-] Equal: ['indices', 'b']->['_onx_equal_indices']
    [GraphBuilder-SPK.set_type] _onx_equal_indices:9
    [GraphBuilder-SPK.3.make_node] [t:-] Not: ['_onx_equal_indices']->['_onx_not_equal_indices']
    [GraphBuilder-SPK.set_type] _onx_not_equal_indices:9
    [GraphBuilder-SPK.3.make_node] [ttt:-] Where: ['_onx_not_equal_indices', 'indices', 'zeroi']->['_onx_where_not_equal_indices']
    [GraphBuilder-SPK.set_type] _onx_where_not_equal_indices:-1
    [GraphBuilder-SPK.3.make_node] [tt:-] Unsqueeze: ['_onx_where_not_equal_indices', 'axis']->['_onx_where_not_equal_indices::UnSq']
    [GraphBuilder-SPK.set_type] _onx_where_not_equal_indices::UnSq:-1
    [GraphBuilder-SPK.3.make_node] [t:-] LogSoftmax: ['X']->['_onx_logsoftmax_X']
    [GraphBuilder-SPK.set_type] _onx_logsoftmax_X:-1
    [GraphBuilder-SPK.set_type] _onx_gatherelements_logsoftmax_X:-1
    [GraphBuilder-SPK.3.make_node] [tt:t] GatherElements: ['_onx_logsoftmax_X', '_onx_where_not_equal_indices::UnSq']->['_onx_gatherelements_logsoftmax_X']
    [GraphBuilder-SPK.set_type] _onx_gatherelements_logsoftmax_X:-1
    [GraphBuilder-SPK.3.make_node] [tt:-] Squeeze: ['_onx_gatherelements_logsoftmax_X', 'axis']->['_onx_gatherelements_logsoftmax_X::Sq']
    [GraphBuilder-SPK.set_type] _onx_gatherelements_logsoftmax_X::Sq:-1
    [GraphBuilder-SPK.3.make_node] [t:-] Neg: ['_onx_gatherelements_logsoftmax_X::Sq']->['_onx_neg_gatherelements_logsoftmax_X::Sq']
    [GraphBuilder-SPK.set_type] _onx_neg_gatherelements_logsoftmax_X::Sq:-1
    [GraphBuilder-SPK.3.make_node] [ttt:-] Where: ['_onx_not_equal_indices', '_onx_neg_gatherelements_logsoftmax_X::Sq', 'zerof']->['_onx_where_not_equal_indices2']
    [GraphBuilder-SPK.set_type] _onx_where_not_equal_indices2:-1
    [GraphBuilder-SPK.3.make_node] [t:-] Cast: ['_onx_not_equal_indices']->['_onx_not_equal_indices::C1']
    [GraphBuilder-SPK.set_type] _onx_not_equal_indices::C1:1
    [GraphBuilder-SPK.3.make_node] [t:-] ReduceSum: ['_onx_not_equal_indices::C1']->['_onx_reducesum_not_equal_indices::C1']
    [GraphBuilder-SPK.set_type] _onx_reducesum_not_equal_indices::C1:1
    [GraphBuilder-SPK.set_shape] _onx_reducesum_not_equal_indices::C1:()
    [GraphBuilder-SPK.set_rank] _onx_reducesum_not_equal_indices::C1:0
    [GraphBuilder-SPK.3.make_node] [#:-] Cast: ['_onx_reducesum_not_equal_indices::C1']->['_onx_reducesum_not_equal_indices::C1::C10']
    [GraphBuilder-SPK.set_type] _onx_reducesum_not_equal_indices::C1::C10:10
    [GraphBuilder-SPK.set_shape] _onx_reducesum_not_equal_indices::C1::C10:()
    [GraphBuilder-SPK.set_rank] _onx_reducesum_not_equal_indices::C1::C10:0
    [GraphBuilder-SPK.3.make_node] [t:-] Cast: ['_onx_where_not_equal_indices2']->['_onx_where_not_equal_indices2::C1']
    [GraphBuilder-SPK.set_type] _onx_where_not_equal_indices2::C1:1
    [GraphBuilder-SPK.3.make_node] [t:-] ReduceSum: ['_onx_where_not_equal_indices2::C1']->['_onx_reducesum_where_not_equal_indices2::C1']
    [GraphBuilder-SPK.set_type] _onx_reducesum_where_not_equal_indices2::C1:1
    [GraphBuilder-SPK.set_shape] _onx_reducesum_where_not_equal_indices2::C1:()
    [GraphBuilder-SPK.set_rank] _onx_reducesum_where_not_equal_indices2::C1:0
    [GraphBuilder-SPK.3.make_node] [#:-] Cast: ['_onx_reducesum_where_not_equal_indices2::C1']->['_onx_reducesum_where_not_equal_indices2::C1::C10']
    [GraphBuilder-SPK.set_type] _onx_reducesum_where_not_equal_indices2::C1::C10:10
    [GraphBuilder-SPK.set_shape] _onx_reducesum_where_not_equal_indices2::C1::C10:()
    [GraphBuilder-SPK.set_rank] _onx_reducesum_where_not_equal_indices2::C1::C10:0
    [GraphBuilder-SPK.3.make_node] [##:-] Div: ['_onx_reducesum_where_not_equal_indices2::C1::C10', '_onx_reducesum_not_equal_indices::C1::C10']->['_onx_div_reducesum_where_not_equal_indices2::C1::C10']
    [GraphBuilder-SPK.set_type] _onx_div_reducesum_where_not_equal_indices2::C1::C10:10
    [GraphBuilder-SPK.set_shape] _onx_div_reducesum_where_not_equal_indices2::C1::C10:()
    [GraphBuilder-SPK.set_rank] _onx_div_reducesum_where_not_equal_indices2::C1::C10:0
    [GraphBuilder-SPK.make_tensor_output] _onx_div_reducesum_where_not_equal_indices2::C1::C10[0: None]
    [GraphBuilderPatternOptimization-XCY.optimize] skips SplitConcatPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [SqueezeAddPattern.match] NONE - line: 533:experimental_experiment.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [SqueezeAddPattern.match] NONE - line: 533:experimental_experiment.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips Sub1MulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips SwapRangeAddScalarPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips SwitchOrderBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips SwitchReshapeActivationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips TransposeEqualReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips TransposeMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips TransposeReshapeMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips UnsqueezeEqualPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips WhereAddPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips RotaryConcatPartPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips FunctionCausalMaskPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips FunctionCausalMaskMulAddPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips FunctionCosSinCachePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips FunctionHalfRotaryEmbeddingPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips RMSNormalizationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips RMSNormalizationMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] skips AttentionGQAPattern, pattern.priority=2, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0 - matching_step done 0
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0F0 - apply_step with 0 matches
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0F0 - done with 0 applied patterns
    [GraphBuilderPatternOptimization-XCY.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0F0 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0F0 - remove_duplicated_shape done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0F0 - remove_identity
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 5
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 5 nodes in 5.6837001466192305e-05 seconds
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0F0 - remove_identity done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0F0 - remove_unused
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C0F0 - remove_unused done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-XCY.optimize] it=0C1F0 - next
    [GraphBuilderPatternOptimization-XCY.optimize] iteration 1: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [CastCastBinaryPattern.match] NONE - line: 412:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [CastCastBinaryPattern.match] NONE - line: 412:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [CastOpCastPattern.match] NONE - line: 592:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [CastOpCastPattern.match] NONE - line: 589:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 884:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [IdentityPattern.match] NONE - line: 926:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [ReshapeMatMulReshapePattern.match] NONE - line: 1352:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ReshapeMatMulReshapePattern.match] NONE - line: 1352:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [Reshape2Of3Pattern.match] NONE - line: 836:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [Reshape2Of3Pattern.match] NONE - line: 836:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [ReshapeReshapeBinaryPattern.match] NONE - line: 1117:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ReshapeReshapeBinaryPattern.match] NONE - line: 1117:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [GraphBuilderPatternOptimization-XCY.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=1, priorities[current_priority_index]=1 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [MatMulReshape2Of3Pattern.match] NONE - line: 754:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [MatMulReshape2Of3Pattern.match] NONE - line: 754:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [MulMulMatMulPattern.match] NONE - line: 1173:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [MulMulMatMulPattern.match] NONE - line: 1173:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 437:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 437:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1288:experimental_experiment.xoptim.patterns.onnx_expand, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1288:experimental_experiment.xoptim.patterns.onnx_expand, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandSwapPattern.match] NONE - line: 1047:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandSwapPattern.match] NONE - line: 1047:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 2235:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 2235:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:experimental_experiment.xoptim.patterns.onnx_shape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:experimental_experiment.xoptim.patterns.onnx_shape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [SqueezeAddPattern.match] NONE - line: 533:experimental_experiment.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [SqueezeAddPattern.match] NONE - line: 533:experimental_experiment.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 2051:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeReshapeMatMulPattern.match] NONE - line: 1802:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [TransposeReshapeMatMulPattern.match] NONE - line: 1802:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1804:experimental_experiment.xoptim.patterns.onnx_rotary, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1804:experimental_experiment.xoptim.patterns.onnx_rotary, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] skips AttentionGQAPattern, pattern.priority=2, current_priority_index=1, priorities[current_priority_index]=1 priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0 - matching_step done 0
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0F0 - apply_step with 0 matches
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0F0 - done with 0 applied patterns
    [GraphBuilderPatternOptimization-XCY.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0F0 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0F0 - remove_duplicated_shape done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0F0 - remove_identity
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 5
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 5 nodes in 5.731800047215074e-05 seconds
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0F0 - remove_identity done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0F0 - remove_unused
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C0F0 - remove_unused done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] increase priority to 2
    [GraphBuilderPatternOptimization-XCY.optimize] it=1C1F0 - next
    [GraphBuilderPatternOptimization-XCY.optimize] iteration 2: 5 nodes, priority=2
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [CastCastBinaryPattern.match] NONE - line: 412:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [CastCastBinaryPattern.match] NONE - line: 412:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [CastOpCastPattern.match] NONE - line: 592:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [CastOpCastPattern.match] NONE - line: 589:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 884:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [IdentityPattern.match] NONE - line: 926:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [ReshapeMatMulReshapePattern.match] NONE - line: 1352:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ReshapeMatMulReshapePattern.match] NONE - line: 1352:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [Reshape2Of3Pattern.match] NONE - line: 836:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [Reshape2Of3Pattern.match] NONE - line: 836:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [ReshapeReshapeBinaryPattern.match] NONE - line: 1117:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ReshapeReshapeBinaryPattern.match] NONE - line: 1117:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [GraphBuilderPatternOptimization-XCY.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=2, priorities[current_priority_index]=2 priorities=[0, 1, 2, 3]
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [MatMulReshape2Of3Pattern.match] NONE - line: 754:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [MatMulReshape2Of3Pattern.match] NONE - line: 754:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [MulMulMatMulPattern.match] NONE - line: 1173:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [MulMulMatMulPattern.match] NONE - line: 1173:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 437:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 437:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1288:experimental_experiment.xoptim.patterns.onnx_expand, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1288:experimental_experiment.xoptim.patterns.onnx_expand, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandSwapPattern.match] NONE - line: 1047:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandSwapPattern.match] NONE - line: 1047:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 2235:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 2235:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:experimental_experiment.xoptim.patterns.onnx_shape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:experimental_experiment.xoptim.patterns.onnx_shape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [SqueezeAddPattern.match] NONE - line: 533:experimental_experiment.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [SqueezeAddPattern.match] NONE - line: 533:experimental_experiment.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 2051:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeReshapeMatMulPattern.match] NONE - line: 1802:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [TransposeReshapeMatMulPattern.match] NONE - line: 1802:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1804:experimental_experiment.xoptim.patterns.onnx_rotary, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1804:experimental_experiment.xoptim.patterns.onnx_rotary, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0 - matching_step done 0
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0F0 - apply_step with 0 matches
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0F0 - done with 0 applied patterns
    [GraphBuilderPatternOptimization-XCY.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0F0 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0F0 - remove_duplicated_shape done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0F0 - remove_identity
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 5
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 5 nodes in 0.00019101299403700978 seconds
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0F0 - remove_identity done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0F0 - remove_unused
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C0F0 - remove_unused done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] increase priority to 3
    [GraphBuilderPatternOptimization-XCY.optimize] it=2C1F0 - next
    [GraphBuilderPatternOptimization-XCY.optimize] iteration 3: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [CastCastBinaryPattern.match] NONE - line: 412:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [CastCastBinaryPattern.match] NONE - line: 412:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [CastOpCastPattern.match] NONE - line: 592:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [CastOpCastPattern.match] NONE - line: 589:experimental_experiment.xoptim.patterns.onnx_cast, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 884:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [IdentityPattern.match] NONE - line: 926:experimental_experiment.xoptim.patterns.onnx_any, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [ReshapeMatMulReshapePattern.match] NONE - line: 1352:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ReshapeMatMulReshapePattern.match] NONE - line: 1352:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [Reshape2Of3Pattern.match] NONE - line: 836:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [Reshape2Of3Pattern.match] NONE - line: 836:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [ReshapeReshapeBinaryPattern.match] NONE - line: 1117:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ReshapeReshapeBinaryPattern.match] NONE - line: 1117:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatchResult.match] MATCH MatMulAddPattern with 2 nodes and types ['MatMul', 'Add'] - []
    [GraphBuilderPatternOptimization-XCY.optimize] match=MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
    [MatchResult.match] MATCH MatMulAddPattern with 2 nodes and types ['MatMul', 'Add'] - []
    [GraphBuilderPatternOptimization-XCY.optimize] match=MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [MatMulReshape2Of3Pattern.match] NONE - line: 754:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [MatMulReshape2Of3Pattern.match] NONE - line: 754:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [MulMulMatMulPattern.match] NONE - line: 1173:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [MulMulMatMulPattern.match] NONE - line: 1173:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 437:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandBroadcastPattern.match] NONE - line: 437:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1288:experimental_experiment.xoptim.patterns.onnx_expand, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1288:experimental_experiment.xoptim.patterns.onnx_expand, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [ShapeBasedExpandSwapPattern.match] NONE - line: 1047:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedExpandSwapPattern.match] NONE - line: 1047:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 2235:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [ShapeBasedMatMulToMulPattern.match] NONE - line: 2235:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:experimental_experiment.xoptim.patterns.onnx_shape, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:experimental_experiment.xoptim.patterns.onnx_shape, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [SqueezeAddPattern.match] NONE - line: 533:experimental_experiment.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [SqueezeAddPattern.match] NONE - line: 533:experimental_experiment.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 2051:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeReshapeMatMulPattern.match] NONE - line: 1802:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset, inputs=x,p_layers_0_weight::T10
    [TransposeReshapeMatMulPattern.match] NONE - line: 1802:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=MatMul, name=Opset3, inputs=relu,p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1804:experimental_experiment.xoptim.patterns.onnx_rotary, op_type=Add, name=Opset2, inputs=_onx_matmul_x,layers.0.bias
    [FunctionCausalMaskMulAddPattern.match] NONE - line: 1804:experimental_experiment.xoptim.patterns.onnx_rotary, op_type=Add, name=Opset4, inputs=_onx_matmul_relu,layers.2.bias
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C0 - matching_step done 2
    [GraphBuilderPatternOptimization-XCY.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.005 | max_time=IdentityPattern:0.001
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C0F1 - apply_step with 2 matches
    [GraphBuilderPatternOptimization-XCY.optimize] apply MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'], inputs: ['x', 'p_layers_0_weight::T10', '_onx_matmul_x', 'layers.0.bias'], outputs: ['_onx_matmul_x', 'linear']
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
      - MatMul: ['x', 'p_layers_0_weight::T10'] -> ['_onx_matmul_x']
      - Add: ['_onx_matmul_x', 'layers.0.bias'] -> ['linear']
      + Gemm: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
    [GraphBuilder-XCY.set_type] linear:1
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'] applied.
    [GraphBuilderPatternOptimization-XCY.optimize] - add ['Gemm']
    [GraphBuilderPatternOptimization-XCY.optimize] done MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']: -2 +1 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] removed outputs {'_onx_matmul_x'}
    [GraphBuilderPatternOptimization-XCY.optimize] apply MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'], inputs: ['relu', 'p_layers_2_weight::T10', '_onx_matmul_relu', 'layers.2.bias'], outputs: ['_onx_matmul_relu', 'output_0']
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
      - MatMul: ['relu', 'p_layers_2_weight::T10'] -> ['_onx_matmul_relu']
      - Add: ['_onx_matmul_relu', 'layers.2.bias'] -> ['output_0']
      + Gemm: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
    [GraphBuilder-XCY.set_type] output_0:1
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'] applied.
    [GraphBuilderPatternOptimization-XCY.optimize] - add ['Gemm']
    [GraphBuilderPatternOptimization-XCY.optimize] done MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']: -2 +1 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] removed outputs {'_onx_matmul_relu'}
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C1F1 - done with 2 applied patterns
    [GraphBuilderPatternOptimization-XCY.optimize] done all: -4 +2 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C1F1 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C1F1 - remove_duplicated_shape done -4 +2 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C1F1 - remove_identity
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 3
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 3 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 3 nodes in 3.5124001442454755e-05 seconds
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C1F1 - remove_identity done -4 +2 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C1F1 - remove_unused
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C1F1 - remove_unused done -4 +2 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=3C1F1 - next
    [GraphBuilderPatternOptimization-XCY.optimize] iteration 4: 3 nodes, priority=3
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 204:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--Opset, inputs=x,p_layers_0_weight::T10,layers.0.bias
    [MatMulAddPattern.match] NONE - line: 201:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--Opset3, inputs=relu,p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [MatchResult.match] MATCH GemmTransposePattern with 1 nodes and types ['Gemm'] - []
    [GraphBuilderPatternOptimization-XCY.optimize] match=MatchResult: GemmTransposePattern replaces ['Gemm']
    [MatchResult.match] MATCH GemmTransposePattern with 1 nodes and types ['Gemm'] - []
    [GraphBuilderPatternOptimization-XCY.optimize] match=MatchResult: GemmTransposePattern replaces ['Gemm']
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 2051:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--Opset, inputs=x,p_layers_0_weight::T10,layers.0.bias
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--Opset3, inputs=relu,p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C0 - matching_step done 2
    [GraphBuilderPatternOptimization-XCY.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.001 | max_time=GemmTransposePattern:0.000
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C0F1 - apply_step with 2 matches
    [GraphBuilderPatternOptimization-XCY.optimize] apply MatchResult: GemmTransposePattern replaces ['Gemm'], inputs: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'], outputs: ['linear']
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=Transpose
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm']
      - Gemm: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
      + Transpose: ['p_layers_0_weight::T10'] -> ['GemmTransposePattern--p_layers_0_weight::T10']
      + Gemm: ['x', 'GemmTransposePattern--p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=Transpose
    [GraphBuilder-XCY.set_type] GemmTransposePattern--p_layers_0_weight::T10:1
    [GraphBuilder-XCY.set_shape] GemmTransposePattern--p_layers_0_weight::T10:(32, 10)
    [GraphBuilder-XCY.set_rank] GemmTransposePattern--p_layers_0_weight::T10:2
    [GraphBuilder-XCY.set_type] linear:1
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm'] applied.
    [GraphBuilderPatternOptimization-XCY.optimize] - add ['Transpose', 'Gemm']
    [GraphBuilderPatternOptimization-XCY.optimize] done MatchResult: GemmTransposePattern replaces ['Gemm']: -1 +2 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] apply MatchResult: GemmTransposePattern replaces ['Gemm'], inputs: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'], outputs: ['output_0']
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Transpose
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm']
      - Gemm: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
      + Transpose: ['p_layers_2_weight::T10'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
      + Gemm: ['relu', 'GemmTransposePattern--p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Transpose
    [GraphBuilder-XCY.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
    [GraphBuilder-XCY.set_shape] GemmTransposePattern--p_layers_2_weight::T10:(1, 32)
    [GraphBuilder-XCY.set_rank] GemmTransposePattern--p_layers_2_weight::T10:2
    [GraphBuilder-XCY.set_type] output_0:1
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm'] applied.
    [GraphBuilderPatternOptimization-XCY.optimize] - add ['Transpose', 'Gemm']
    [GraphBuilderPatternOptimization-XCY.optimize] done MatchResult: GemmTransposePattern replaces ['Gemm']: -1 +2 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C1F1 - done with 2 applied patterns
    [GraphBuilderPatternOptimization-XCY.optimize] done all: -2 +4 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C1F1 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C1F1 - remove_duplicated_shape done -2 +4 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C1F1 - remove_identity
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 5
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 5 nodes in 4.688100307248533e-05 seconds
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C1F1 - remove_identity done -2 +4 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C1F1 - remove_unused
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C1F1 - remove_unused done -2 +4 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=4C1F1 - next
    [GraphBuilderPatternOptimization-XCY.optimize] iteration 5: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 803:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [IdentityPattern.match] NONE - line: 803:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 204:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [MatMulAddPattern.match] NONE - line: 201:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [GemmTransposePattern.match] NONE - line: 536:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [GemmTransposePattern.match] NONE - line: 536:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [ShapeBasedIdentityPattern.match] NONE - line: 1080:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [ShapeBasedIdentityPattern.match] NONE - line: 1080:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [SwapUnaryPattern.match] NONE - line: 1227:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [SwapUnaryPattern.match] NONE - line: 1227:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [SwapUnsqueezeTransposePattern.match] NONE - line: 970:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [SwapUnsqueezeTransposePattern.match] NONE - line: 970:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 2051:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [TransposeEqualReshapePattern.match] NONE - line: 656:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [MatchResult.match] MATCH TransposeEqualReshapePattern with 1 nodes and types ['Transpose'] - []
    [GraphBuilderPatternOptimization-XCY.optimize] match=MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1575:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [TransposeMatMulPattern.match] NONE - line: 1575:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [TransposeReshapeTransposePattern.match] NONE - line: 364:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [TransposeReshapeTransposePattern.match] NONE - line: 364:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [TransposeTransposePattern.match] NONE - line: 134:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [TransposeTransposePattern.match] NONE - line: 134:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C0 - matching_step done 1
    [GraphBuilderPatternOptimization-XCY.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.002 | max_time=LeakyReluPattern:0.000
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C0F1 - apply_step with 1 matches
    [GraphBuilderPatternOptimization-XCY.optimize] apply MatchResult: TransposeEqualReshapePattern replaces ['Transpose'], inputs: ['p_layers_2_weight::T10'], outputs: ['GemmTransposePattern--p_layers_2_weight::T10']
    [GraphBuilder-XCY.set_shape] init7_s2_1_32:(2,)
    [GraphBuilder-XCY.set_rank] init7_s2_1_32:1
    [GraphBuilder-XCY.set_type] init7_s2_1_32:7
    [GraphBuilder-XCY.make_initializer] init7_s2_1_32[7:(2,)]
    [GraphBuilder-XCY.update_node_constant] new constant 'init7_s2_1_32', node=None
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Reshape
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
      - Transpose: ['p_layers_2_weight::T10'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
      + Reshape: ['p_layers_2_weight::T10', 'init7_s2_1_32'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Reshape
    [GraphBuilder-XCY.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
    [GraphBuilder-XCY.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
    [GraphBuilderPatternOptimization-XCY.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] applied.
    [GraphBuilderPatternOptimization-XCY.optimize] - add ['Reshape']
    [GraphBuilderPatternOptimization-XCY.optimize] done MatchResult: TransposeEqualReshapePattern replaces ['Transpose']: -1 +1 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C1F1 - done with 1 applied patterns
    [GraphBuilderPatternOptimization-XCY.optimize] done all: -1 +1 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C1F1 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C1F1 - remove_duplicated_shape done -1 +1 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C1F1 - remove_identity
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 5
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 5 nodes in 5.086099554318935e-05 seconds
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C1F1 - remove_identity done -1 +1 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C1F1 - remove_unused
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C1F1 - remove_unused done -1 +1 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=5C1F1 - next
    [GraphBuilderPatternOptimization-XCY.optimize] iteration 6: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0 - matching_step
    [PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
    [ConcatReshapePattern.match] NONE - line: 1303:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
    [PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
    [IdentityPattern.match] NONE - line: 803:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
    [PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
    [ReshapePattern.match] NONE - line: 37:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
    [MatMulAddPattern.match] NONE - line: 204:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [MatMulAddPattern.match] NONE - line: 201:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
    [GemmTransposePattern.match] NONE - line: 536:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [GemmTransposePattern.match] NONE - line: 536:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
    [ShapeBasedReshapeIsSqueezePattern.match] NONE - line: 2002:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
    [ShapeBasedEditDistanceReshapePattern.match] NONE - line: 1817:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
    [ShapeBasedIdentityPattern.match] NONE - line: 1080:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
    [ShapedBasedReshapePattern.match] NONE - line: 170:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
    [ReshapeReshapePattern.match] NONE - line: 473:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
    [PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
    [StaticConcatReshapePattern.match] NONE - line: 1510:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
    [SwapExpandReshapePattern.match] NONE - line: 1876:experimental_experiment.xoptim.patterns.onnx_expand, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
    [SwapUnaryPattern.match] NONE - line: 1227:experimental_experiment.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [SwapUnaryPattern.match] NONE - line: 1227:experimental_experiment.xoptim.patterns.onnx_any, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
    [SwapUnsqueezeTransposePattern.match] NONE - line: 970:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
    [SwitchReshapeActivationPattern.match] NONE - line: 2051:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Relu, name=relu, inputs=linear
    [PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
    [TransposeEqualReshapePattern.match] NONE - line: 656:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
    [TransposeMatMulPattern.match] NONE - line: 1575:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
    [TransposeMatMulPattern.match] NONE - line: 1537:experimental_experiment.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--Opset32, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
    [PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
    [TransposeReshapeTransposePattern.match] NONE - line: 364:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
    [TransposeTransposePattern.match] NONE - line: 134:experimental_experiment.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--Opset, inputs=p_layers_0_weight::T10
    [PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
    [UnsqueezeOrSqueezeReshapePattern.match] NONE - line: 2328:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
    [UnsqueezeReshapePattern.match] NONE - line: 2159:experimental_experiment.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--Opset3, inputs=p_layers_2_weight::T10,init7_s2_1_32
    [PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
    [PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0 - matching_step done 0
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0F0 - apply_step with 0 matches
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0F0 - done with 0 applied patterns
    [GraphBuilderPatternOptimization-XCY.optimize] done all: -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0F0 - remove_duplicated_shape
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0F0 - remove_duplicated_shape done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0F0 - remove_identity
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 5
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 5 nodes in 5.087700264994055e-05 seconds
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0F0 - remove_identity done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0F0 - remove_unused
    [GraphBuilderPatternOptimization-XCY.optimize] it=6C0F0 - remove_unused done -0 +0 nodes
    [GraphBuilderPatternOptimization-XCY.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-XCY.optimize] done after 7 iterations with 5 nodes in 0.038
        STAT apply_GemmTransposePattern +4 -2 #it=1 maxmatch=1 i=2 - time=0.0008988990157376975
        STAT apply_MatMulAddPattern +2 -4 #it=1 maxmatch=1 i=2 - time=0.0006979960016906261
        STAT apply_TransposeEqualReshapePattern +1 -1 #it=1 maxmatch=0 i=1 - time=0.0007019930053502321
        STAT build_graph_for_pattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0005685810174327344
        STAT check_pattern_00 +0 -0 #it=1 maxmatch=0 i=0 - time=4.427200474310666e-05
        STAT check_pattern_A10 +0 -0 #it=3 maxmatch=0 i=0 - time=9.665018296800554e-06
        STAT check_pattern_A20 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00047239099512808025
        STAT check_pattern_BD0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0002726509846979752
        STAT check_pattern_BI0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00041115700150839984
        STAT check_pattern_BU0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00023771902488078922
        STAT insert_and_remove_nodes +0 -0 #it=0 maxmatch=0 i=0 - time=0.0009022859885590151
        STAT iteration_0 +0 -0 #it=1 maxmatch=0 i=0 - time=0.008494958994560875
        STAT iteration_1 +0 -0 #it=1 maxmatch=0 i=0 - time=0.0033169399976031855
        STAT iteration_2 +0 -0 #it=1 maxmatch=0 i=0 - time=0.007157069994718768
        STAT iteration_3 +0 -0 #it=1 maxmatch=0 i=0 - time=0.007553197996458039
        STAT iteration_4 +0 -0 #it=1 maxmatch=0 i=0 - time=0.0033452989882789552
        STAT iteration_5 +0 -0 #it=1 maxmatch=0 i=0 - time=0.00407597899902612
        STAT match_AttentionGQAPattern +0 -0 #it=5 maxmatch=2 i=0 - time=5.344899545889348e-05
        STAT match_BatchNormalizationPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0002000499953282997
        STAT match_BatchNormalizationTrainingPattern +0 -0 #it=7 maxmatch=0 i=0 - time=8.16040119389072e-05
        STAT match_CastCastBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0005302950157783926
        STAT match_CastCastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00012154098658356816
        STAT match_CastLayerNormalizationCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.384296961594373e-05
        STAT match_CastOpCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0006375209923135117
        STAT match_CastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.669402788858861e-05
        STAT match_ClipClipPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.29780037747696e-05
        STAT match_ConcatEmptyPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.503199983853847e-05
        STAT match_ConcatGatherPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0001514919858891517
        STAT match_ConcatReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0001211280032293871
        STAT match_ConcatTwiceUnaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.50990114081651e-05
        STAT match_ConstantToInitializerPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.661000952590257e-05
        STAT match_ConvBiasNullPattern +0 -0 #it=7 maxmatch=0 i=0 - time=6.830500205978751e-05
        STAT match_DropoutPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.249600846786052e-05
        STAT match_ExpandBroadcastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.494399662595242e-05
        STAT match_ExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.779800893738866e-05
        STAT match_ExpandSwapPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00042804499389603734
        STAT match_FunctionAttentionGQAPattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.120098548009992e-05
        STAT match_FunctionAttentionPattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.171597776003182e-05
        STAT match_FunctionCausalMaskMulAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0003218140045646578
        STAT match_FunctionCausalMaskPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.48569839540869e-05
        STAT match_FunctionCosSinCachePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00022897199960425496
        STAT match_FunctionHalfRotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.767598774284124e-05
        STAT match_GathersSplitPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.622201181016862e-05
        STAT match_GeluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=3.102602204307914e-05
        STAT match_GemmTransposePattern +0 -0 #it=6 maxmatch=2 i=2 - time=0.0003217880002921447
        STAT match_IdentityPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.002710092012421228
        STAT match_LayerNormalizationPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010123699030373245
        STAT match_LayerNormalizationScalePattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.443298040423542e-05
        STAT match_LeakyReluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0029463650134857744
        STAT match_MatMulAddPattern +0 -0 #it=4 maxmatch=2 i=2 - time=0.0003688649885589257
        STAT match_MatMulReshape2Of3Pattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00022686201555188745
        STAT match_MulMulMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001494080206612125
        STAT match_MulMulMulScalarPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001477160258218646
        STAT match_NotNotPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.924900612328202e-05
        STAT match_NotWherePattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.782900451682508e-05
        STAT match_RMSNormalizationMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00012878200504928827
        STAT match_RMSNormalizationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=5.682700430043042e-05
        STAT match_ReduceArgTopKPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010152101458515972
        STAT match_ReduceReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=7.813898264430463e-05
        STAT match_ReduceSumNormalizePattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.485300350002944e-05
        STAT match_Reshape2Of3Pattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0003211639850633219
        STAT match_ReshapeMatMulReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00029400299536064267
        STAT match_ReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0001954309846041724
        STAT match_ReshapeReshapeBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0006200089846970513
        STAT match_ReshapeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.078200673684478e-05
        STAT match_RotaryConcatPartPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.179400512948632e-05
        STAT match_RotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.150199184659868e-05
        STAT match_SameChildrenFromInputPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00015050501679070294
        STAT match_SameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00016276499081868678
        STAT match_SequenceConstructAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.886600749567151e-05
        STAT match_ShapeBasedConcatExpandPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.807401950936764e-05
        STAT match_ShapeBasedEditDistanceReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0004751899978145957
        STAT match_ShapeBasedExpandBroadcastMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00014930400357116014
        STAT match_ShapeBasedExpandBroadcastPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016366397903766483
        STAT match_ShapeBasedExpandCastWhereSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.336199294310063e-05
        STAT match_ShapeBasedExpandSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016657100059092045
        STAT match_ShapeBasedIdentityPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0001243860024260357
        STAT match_ShapeBasedMatMulToMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00014051100879441947
        STAT match_ShapeBasedReshapeIsSqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013096199836581945
        STAT match_ShapeBasedSameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.902600918896496e-05
        STAT match_ShapeBasedShapeShapeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00025802801246754825
        STAT match_ShapeBasedStaticExpandPattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.762099266983569e-05
        STAT match_ShapedBasedReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00010847298835869879
        STAT match_SliceSlicePattern +0 -0 #it=6 maxmatch=2 i=0 - time=5.683998460881412e-05
        STAT match_SlicesSplitPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.264301191549748e-05
        STAT match_SoftmaxCrossEntropyLossCastPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.003598995986976661
        STAT match_SplitConcatPattern +0 -0 #it=6 maxmatch=2 i=0 - time=7.542301318608224e-05
        STAT match_SqueezeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00022952201834414154
        STAT match_SqueezeBinaryUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.540399383287877e-05
        STAT match_SqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.782997272443026e-05
        STAT match_StaticConcatReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.798700244165957e-05
        STAT match_Sub1MulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.173497240524739e-05
        STAT match_SwapExpandReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=8.152604277711362e-05
        STAT match_SwapRangeAddScalarPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.784900324419141e-05
        STAT match_SwapUnaryPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00016839700401760638
        STAT match_SwapUnsqueezeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00010207398736383766
        STAT match_SwitchOrderBinaryPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001109329896280542
        STAT match_SwitchReshapeActivationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00018926500342786312
        STAT match_TransposeEqualReshapePattern +0 -0 #it=6 maxmatch=2 i=1 - time=0.0002392939932178706
        STAT match_TransposeGatherPattern +0 -0 #it=7 maxmatch=2 i=0 - time=8.874699415173382e-05
        STAT match_TransposeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0010139839869225398
        STAT match_TransposeReshapeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00019234300998505205
        STAT match_TransposeReshapeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00010940598440356553
        STAT match_TransposeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00018784102576319128
        STAT match_UnsqueezeEqualPattern +0 -0 #it=6 maxmatch=2 i=0 - time=7.795401324983686e-05
        STAT match_UnsqueezeOrSqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0001690680073807016
        STAT match_UnsqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.533300180919468e-05
        STAT match_UnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0007005840016063303
        STAT match_WhereAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=7.378001464530826e-05
        STAT remove_duplicated_shape +0 -0 #it=7 maxmatch=0 i=0 - time=4.816200817003846e-05
        STAT remove_identity_nodes +0 -0 #it=7 maxmatch=0 i=0 - time=0.00332356798753608
        STAT remove_unused +0 -0 #it=7 maxmatch=0 i=0 - time=0.0020281249890103936
    --MODEL: 5 nodes, 1 inputs, 1 outputs, 5 initializers--
             INPUT:   1 x 1t
         INPUT-SEQ:   1 x Falset
            OUTPUT:   1 x 1t
        OUTPUT-SEQ:   1 x Falset
              INIT:   4 x 1t
              INIT:   1 x 7t
              NODE:   2 x Gemm
              NODE:   1 x Relu
              NODE:   1 x Reshape
              NODE:   1 x Transpose
    --MODEL: 5 nodes, 1 inputs, 1 outputs, 5 initializers--DETAILED--
         INPUT:   1 x 1t[3x10]
        OUTPUT:   1 x 1t[3x1]
          INIT:   1 x 1t[10x32]
          INIT:   1 x 1t[1]
          INIT:   1 x 1t[32]
          INIT:   1 x 1t[32x1]
          INIT:   1 x 7t[2]
          NODE:   1 x Gemm -SIG- 1t[3x10], 1t[32x10], 1t[32]
          NODE:   1 x Gemm -SIG- 1t[3x32], 1t[1x32], 1t[1]
          NODE:   1 x Relu -SIG- 1t[3x32]
          NODE:   1 x Reshape -SIG- 1t[32x1], 7t[2]
          NODE:   1 x Transpose -SIG- 1t[10x32]-perm=1;0
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 5
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 5 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 5 nodes in 0.00022484299552161247 seconds
    [GraphBuilder-XCY.constant_folding] -- starts with 7 constants and 5 nodes.
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: p_layers_0_weight::T10
    [GraphBuilder-XCY.constant_folding] cst:: . :: _onx_matmul_relu
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: init7_s2_1_32
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: GemmTransposePattern--p_layers_2_weight::T10
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: GemmTransposePattern--p_layers_0_weight::T10
    [GraphBuilder-XCY.constant_folding] cst:: . :: x
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: p_layers_2_weight::T10
    [GraphBuilder-XCY.constant_folding] cst:: . :: relu
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: layers.0.bias
    [GraphBuilder-XCY.constant_folding] cst:: . :: output_0
    [GraphBuilder-XCY.constant_folding] cst:: . :: linear
    [GraphBuilder-XCY.constant_folding] cst:: 1 :: layers.2.bias
    [GraphBuilder-XCY.constant_folding] cst:: . :: _onx_matmul_x
    [GraphBuilder-XCY.constant_folding] initializer: p_layers_0_weight::T10
    [GraphBuilder-XCY.constant_folding] initializer: p_layers_2_weight::T10
    [GraphBuilder-XCY.constant_folding] initializer: layers.0.bias
    [GraphBuilder-XCY.constant_folding] initializer: layers.2.bias
    [GraphBuilder-XCY.constant_folding] from: Transpose(GemmTransposePattern--p_layers_0_weight::T10)
    [GraphBuilder-XCY.set_type] GemmTransposePattern--p_layers_0_weight::T10:1
    [GraphBuilder-XCY.make_initializer] GemmTransposePattern--p_layers_0_weight::T10[1:(32, 10)]
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=None
    [GraphBuilder-XCY.constant_folding] fold_constant:Transpose:GemmTransposePattern--p_layers_0_weight::T10[torch.float32:torch.Size([32, 10])]:from:p_layers_0_weight::T10
    [GraphBuilder-XCY.constant_folding] from: Reshape(GemmTransposePattern--p_layers_2_weight::T10)
    [GraphBuilder-XCY.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
    [GraphBuilder-XCY.make_initializer] GemmTransposePattern--p_layers_2_weight::T10[1:(1, 32)]
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=None
    [GraphBuilder-XCY.constant_folding] fold_constant:Reshape:GemmTransposePattern--p_layers_2_weight::T10[float32:(1, 32)]:from:init7_s2_1_32,p_layers_2_weight::T10
    [GraphBuilder-XCY.constant_folding] initializer: init7_s2_1_32
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=None
    [GraphBuilder-XCY.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=None
    [GraphBuilder-XCY.constant_folding] ends with 7 constants and 3 nodes in 0.0008634520054329187 seconds
    [GraphBuilder-XCY.remove_unused] remove_initializer 1:0/7:p_layers_0_weight::T10
    [GraphBuilder-XCY.remove_unused] remove_initializer 2:1/7:p_layers_2_weight::T10
    [GraphBuilder-XCY.remove_unused] remove_initializer 3:4/7:init7_s2_1_32:int64[(2,)]
    [GraphBuilder-XCY.remove_identity_nodes] -- starts with 3
    [GraphBuilder-XCY.remove_identity_nodes] found 0 replacements
    [GraphBuilder-XCY.remove_identity_nodes] kept 3 nodes
    [GraphBuilder-XCY.remove_identity_nodes] ends with 3 nodes in 4.045999958179891e-05 seconds
    [OrderOptimization.optimize] ALGO-2
    [OrderOptimization.random_order] -- starts with 3 nodes, 4 initializers
    [OrderOptimization.shape_order] done after in 7.340499723795801e-05s with changed=0 scale=0
    [GraphBuilder-XCY.optimize] done with 3 nodes in 0.050
        STAT apply_GemmTransposePattern +4 -2 #it=1 maxmatch=1 i=2 - time=0.0008988990157376975
        STAT apply_MatMulAddPattern +2 -4 #it=1 maxmatch=1 i=2 - time=0.0006979960016906261
        STAT apply_TransposeEqualReshapePattern +1 -1 #it=1 maxmatch=0 i=1 - time=0.0007019930053502321
        STAT apply_constant_folding__Reshape +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
        STAT apply_constant_folding__Transpose +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
        STAT apply_constant_folding_new_inits +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
        STAT build_graph_for_pattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0005685810174327344
        STAT check_A-dynamic_dimension_naming +0 -0 #it=0 maxmatch=0 i=0 - time=2.405099803581834e-05
        STAT check_A-opt-sub +0 -0 #it=0 maxmatch=0 i=0 - time=2.7734989998862147e-05
        STAT check_constant_folding-2 +0 -0 #it=0 maxmatch=0 i=0 - time=4.1328006773255765e-05
        STAT check_constant_folding-7 +0 -0 #it=0 maxmatch=0 i=0 - time=3.465599729679525e-05
        STAT check_order-12 +0 -0 #it=0 maxmatch=0 i=0 - time=2.0171995856799185e-05
        STAT check_orderA +0 -0 #it=0 maxmatch=0 i=0 - time=2.581201260909438e-05
        STAT check_orderL +0 -0 #it=0 maxmatch=0 i=0 - time=1.9495011656545103e-05
        STAT check_pattern_00 +0 -0 #it=1 maxmatch=0 i=0 - time=4.427200474310666e-05
        STAT check_pattern_A10 +0 -0 #it=3 maxmatch=0 i=0 - time=9.665018296800554e-06
        STAT check_pattern_A20 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00047239099512808025
        STAT check_pattern_BD0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0002726509846979752
        STAT check_pattern_BI0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00041115700150839984
        STAT check_pattern_BU0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00023771902488078922
        STAT check_patterns-4 +0 -0 #it=0 maxmatch=0 i=0 - time=0.0005061419942649081
        STAT check_remove_duplicated_initializer-9 +0 -0 #it=0 maxmatch=0 i=0 - time=2.348900306969881e-05
        STAT check_remove_identity-0 +0 -0 #it=0 maxmatch=0 i=0 - time=2.8569993446581066e-05
        STAT check_remove_identity-10 +0 -0 #it=0 maxmatch=0 i=0 - time=2.190099621657282e-05
        STAT check_remove_identity-6 +0 -0 #it=0 maxmatch=0 i=0 - time=4.6200992073863745e-05
        STAT check_remove_unused-1 +0 -0 #it=0 maxmatch=0 i=0 - time=2.7139001758769155e-05
        STAT check_remove_unused-11 +0 -0 #it=0 maxmatch=0 i=0 - time=2.0122999558225274e-05
        STAT check_remove_unused-3 +0 -0 #it=0 maxmatch=0 i=0 - time=0.00015784399874974042
        STAT check_remove_unused-5 +0 -0 #it=0 maxmatch=0 i=0 - time=6.156699964776635e-05
        STAT check_remove_unused-8 +0 -0 #it=0 maxmatch=0 i=0 - time=3.3966993214562535e-05
        STAT constant_folding +0 -2 #it=0 maxmatch=0 i=0 - time=0.0013271040079416707
        STAT dynamic_dimension_naming +0 -0 #it=0 maxmatch=0 i=0 - time=3.8950995076447725e-05
        STAT insert_and_remove_nodes +0 -0 #it=0 maxmatch=0 i=0 - time=0.0009022859885590151
        STAT iteration_0 +0 -0 #it=1 maxmatch=0 i=0 - time=0.008494958994560875
        STAT iteration_1 +0 -0 #it=1 maxmatch=0 i=0 - time=0.0033169399976031855
        STAT iteration_2 +0 -0 #it=1 maxmatch=0 i=0 - time=0.007157069994718768
        STAT iteration_3 +0 -0 #it=1 maxmatch=0 i=0 - time=0.007553197996458039
        STAT iteration_4 +0 -0 #it=1 maxmatch=0 i=0 - time=0.0033452989882789552
        STAT iteration_5 +0 -0 #it=1 maxmatch=0 i=0 - time=0.00407597899902612
        STAT match_AttentionGQAPattern +0 -0 #it=5 maxmatch=2 i=0 - time=5.344899545889348e-05
        STAT match_BatchNormalizationPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0002000499953282997
        STAT match_BatchNormalizationTrainingPattern +0 -0 #it=7 maxmatch=0 i=0 - time=8.16040119389072e-05
        STAT match_CastCastBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0005302950157783926
        STAT match_CastCastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00012154098658356816
        STAT match_CastLayerNormalizationCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.384296961594373e-05
        STAT match_CastOpCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0006375209923135117
        STAT match_CastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.669402788858861e-05
        STAT match_ClipClipPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.29780037747696e-05
        STAT match_ConcatEmptyPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.503199983853847e-05
        STAT match_ConcatGatherPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0001514919858891517
        STAT match_ConcatReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0001211280032293871
        STAT match_ConcatTwiceUnaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.50990114081651e-05
        STAT match_ConstantToInitializerPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.661000952590257e-05
        STAT match_ConvBiasNullPattern +0 -0 #it=7 maxmatch=0 i=0 - time=6.830500205978751e-05
        STAT match_DropoutPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.249600846786052e-05
        STAT match_ExpandBroadcastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.494399662595242e-05
        STAT match_ExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.779800893738866e-05
        STAT match_ExpandSwapPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00042804499389603734
        STAT match_FunctionAttentionGQAPattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.120098548009992e-05
        STAT match_FunctionAttentionPattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.171597776003182e-05
        STAT match_FunctionCausalMaskMulAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0003218140045646578
        STAT match_FunctionCausalMaskPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.48569839540869e-05
        STAT match_FunctionCosSinCachePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00022897199960425496
        STAT match_FunctionHalfRotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.767598774284124e-05
        STAT match_GathersSplitPattern +0 -0 #it=6 maxmatch=0 i=0 - time=6.622201181016862e-05
        STAT match_GeluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=3.102602204307914e-05
        STAT match_GemmTransposePattern +0 -0 #it=6 maxmatch=2 i=2 - time=0.0003217880002921447
        STAT match_IdentityPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.002710092012421228
        STAT match_LayerNormalizationPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010123699030373245
        STAT match_LayerNormalizationScalePattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.443298040423542e-05
        STAT match_LeakyReluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0029463650134857744
        STAT match_MatMulAddPattern +0 -0 #it=4 maxmatch=2 i=2 - time=0.0003688649885589257
        STAT match_MatMulReshape2Of3Pattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00022686201555188745
        STAT match_MulMulMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001494080206612125
        STAT match_MulMulMulScalarPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0001477160258218646
        STAT match_NotNotPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.924900612328202e-05
        STAT match_NotWherePattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.782900451682508e-05
        STAT match_RMSNormalizationMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00012878200504928827
        STAT match_RMSNormalizationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=5.682700430043042e-05
        STAT match_ReduceArgTopKPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010152101458515972
        STAT match_ReduceReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=7.813898264430463e-05
        STAT match_ReduceSumNormalizePattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.485300350002944e-05
        STAT match_Reshape2Of3Pattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0003211639850633219
        STAT match_ReshapeMatMulReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00029400299536064267
        STAT match_ReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0001954309846041724
        STAT match_ReshapeReshapeBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0006200089846970513
        STAT match_ReshapeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.078200673684478e-05
        STAT match_RotaryConcatPartPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.179400512948632e-05
        STAT match_RotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.150199184659868e-05
        STAT match_SameChildrenFromInputPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00015050501679070294
        STAT match_SameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00016276499081868678
        STAT match_SequenceConstructAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.886600749567151e-05
        STAT match_ShapeBasedConcatExpandPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.807401950936764e-05
        STAT match_ShapeBasedEditDistanceReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0004751899978145957
        STAT match_ShapeBasedExpandBroadcastMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00014930400357116014
        STAT match_ShapeBasedExpandBroadcastPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016366397903766483
        STAT match_ShapeBasedExpandCastWhereSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.336199294310063e-05
        STAT match_ShapeBasedExpandSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00016657100059092045
        STAT match_ShapeBasedIdentityPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0001243860024260357
        STAT match_ShapeBasedMatMulToMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00014051100879441947
        STAT match_ShapeBasedReshapeIsSqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013096199836581945
        STAT match_ShapeBasedSameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.902600918896496e-05
        STAT match_ShapeBasedShapeShapeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00025802801246754825
        STAT match_ShapeBasedStaticExpandPattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.762099266983569e-05
        STAT match_ShapedBasedReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00010847298835869879
        STAT match_SliceSlicePattern +0 -0 #it=6 maxmatch=2 i=0 - time=5.683998460881412e-05
        STAT match_SlicesSplitPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.264301191549748e-05
        STAT match_SoftmaxCrossEntropyLossCastPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.003598995986976661
        STAT match_SplitConcatPattern +0 -0 #it=6 maxmatch=2 i=0 - time=7.542301318608224e-05
        STAT match_SqueezeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00022952201834414154
        STAT match_SqueezeBinaryUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.540399383287877e-05
        STAT match_SqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.782997272443026e-05
        STAT match_StaticConcatReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=7.798700244165957e-05
        STAT match_Sub1MulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.173497240524739e-05
        STAT match_SwapExpandReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=8.152604277711362e-05
        STAT match_SwapRangeAddScalarPattern +0 -0 #it=6 maxmatch=2 i=0 - time=6.784900324419141e-05
        STAT match_SwapUnaryPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00016839700401760638
        STAT match_SwapUnsqueezeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00010207398736383766
        STAT match_SwitchOrderBinaryPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0001109329896280542
        STAT match_SwitchReshapeActivationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00018926500342786312
        STAT match_TransposeEqualReshapePattern +0 -0 #it=6 maxmatch=2 i=1 - time=0.0002392939932178706
        STAT match_TransposeGatherPattern +0 -0 #it=7 maxmatch=2 i=0 - time=8.874699415173382e-05
        STAT match_TransposeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0010139839869225398
        STAT match_TransposeReshapeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00019234300998505205
        STAT match_TransposeReshapeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00010940598440356553
        STAT match_TransposeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00018784102576319128
        STAT match_UnsqueezeEqualPattern +0 -0 #it=6 maxmatch=2 i=0 - time=7.795401324983686e-05
        STAT match_UnsqueezeOrSqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0001690680073807016
        STAT match_UnsqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.533300180919468e-05
        STAT match_UnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0007005840016063303
        STAT match_WhereAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=7.378001464530826e-05
        STAT order +0 -0 #it=0 maxmatch=0 i=0 - time=0.00013709398626815528
        STAT patterns +0 -0 #it=0 maxmatch=0 i=0 - time=0.04441013500036206
        STAT remove_duplicated_initializer +0 -0 #it=0 maxmatch=0 i=0 - time=7.919099880382419e-05
        STAT remove_duplicated_shape +0 -0 #it=7 maxmatch=0 i=0 - time=4.816200817003846e-05
        STAT remove_identity +0 -0 #it=0 maxmatch=0 i=0 - time=0.0008589580102125183
        STAT remove_identity_nodes +0 -0 #it=7 maxmatch=0 i=0 - time=0.00332356798753608
        STAT remove_unused +0 -0 #it=7 maxmatch=0 i=0 - time=0.0033763899991754442
        STAT shape_order +0 -0 #it=0 maxmatch=0 i=0 - time=8.243600314017385e-05
    --MODEL: 3 nodes, 1 inputs, 1 outputs, 4 initializers--
             INPUT:   1 x 1t
         INPUT-SEQ:   1 x Falset
            OUTPUT:   1 x 1t
        OUTPUT-SEQ:   1 x Falset
              INIT:   4 x 1t
              NODE:   2 x Gemm
              NODE:   1 x Relu
    --MODEL: 3 nodes, 1 inputs, 1 outputs, 4 initializers--DETAILED--
         INPUT:   1 x 1t[3x10]
        OUTPUT:   1 x 1t[3x1]
          INIT:   1 x 1t[1]
          INIT:   1 x 1t[1x32]
          INIT:   1 x 1t[32]
          INIT:   1 x 1t[32x10]
          NODE:   1 x Gemm -SIG- 1t[3x10], 1t[32x10], 1t[32]
          NODE:   1 x Gemm -SIG- 1t[3x32], 1t[1x32], 1t[1]
          NODE:   1 x Relu -SIG- 1t[3x32]
    [GraphBuilder-XCY.to_onnx] make_model 4 inits 0 params
    [GraphBuilder-XCY.time_evaluation_constants_] 0
    [GraphBuilder-XCY._build_initializers] start with 4 initializers, large_model=False, external_threshold=1024
    [GraphBuilder-XCY._build_initializers] switch low/high order
    [GraphBuilder-XCY._build_initializers] TensorProto-layers.0.bias:1[(32,)]
    [GraphBuilder-XCY._build_initializers] TensorProto-layers.2.bias:1[(1,)]
    [GraphBuilder-XCY._build_initializers] <Tensor>-GemmTransposePattern--p_layers_0_weight::T10:torch.float32[torch.Size([32, 10])]
    [proto_from_array] 1[torch.Size([32, 10])]
    [GraphBuilder-XCY._build_initializers] <ndarray>-GemmTransposePattern--p_layers_2_weight::T10:float32[(1, 32)]
    [GraphBuilder-XCY._build_initializers] done in 3.278997610323131e-06s with 4 initializers, 0 large initializers
    [GraphBuilder-XCY._add_shape_information] dynamic shapes replacements={}

Select the pattern to use

Class OptimizationOptions is used to enable or disable patterns.

<<<

import onnx
from experimental_experiment.xbuilder import GraphBuilder, OptimizationOptions

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(
        patterns="TransposeTranspose,TransposeMatMul", verbose=1
    ),
)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-LLO.optimize] start with 5 nodes
    [GraphBuilder-LLO.optimize] #patterns=2
    [GraphBuilderPatternOptimization-LLO.optimize] start with 5 nodes, 4 initializers, 2 patterns, priorities=[0, 1], max_iter=20
    [GraphBuilderPatternOptimization-LLO.optimize] iteration 0: 5 nodes, priority=0
    [GraphBuilderPatternOptimization-LLO.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-LLO.optimize] iteration 1: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-LLO.optimize] stops current_priority_index=2, priorities=[0, 1]
    [GraphBuilderPatternOptimization-LLO.optimize] done after 2 iterations with 5 nodes in 0.001
    [OrderOptimization.optimize] ALGO-2
    [OrderOptimization.random_order] -- starts with 5 nodes, 4 initializers
    [OrderOptimization.shape_order] done after in 0.00013405299978330731s with changed=0 scale=0
    [GraphBuilder-LLO.optimize] done with 5 nodes in 0.005

There exists some predefined lists of patterns:

  • default: includes all patterns using only standard onnx patterns.

  • onnxruntime: patterns specific to onnxruntime, the final model may be executed by onnxruntime and possibly only onnxruntime as it may introduce patterns from Supported Operators and Data Types.

<<<

import onnx
from experimental_experiment.xbuilder import GraphBuilder, OptimizationOptions

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default+onnxruntime", verbose=1),
)
opt_onx = gr.to_onnx(optimize=True)

>>>

    [GraphBuilder-NQE.optimize] start with 5 nodes
    [GraphBuilder-NQE.optimize] #patterns=121
    [GraphBuilderPatternOptimization-NQE.optimize] start with 5 nodes, 4 initializers, 121 patterns, priorities=[0, 1, 2, 3], max_iter=40
    [GraphBuilderPatternOptimization-NQE.optimize] same children={'SameChildrenFromInputPattern', 'SameChildrenPattern'}
    [GraphBuilderPatternOptimization-NQE.optimize] iteration 0: 5 nodes, priority=0
    [GraphBuilderPatternOptimization-NQE.optimize] increase priority to 1
    [GraphBuilderPatternOptimization-NQE.optimize] iteration 1: 5 nodes, priority=1
    [GraphBuilderPatternOptimization-NQE.optimize] increase priority to 2
    [GraphBuilderPatternOptimization-NQE.optimize] iteration 2: 5 nodes, priority=2
    [GraphBuilderPatternOptimization-NQE.optimize] increase priority to 3
    [GraphBuilderPatternOptimization-NQE.optimize] iteration 3: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-NQE.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.004 | max_time=ShapeBasedConcatExpandPattern:0.001
    [GraphBuilderPatternOptimization-NQE.optimize] iteration 4: 3 nodes, priority=3
    [GraphBuilderPatternOptimization-NQE.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.001 | max_time=GemmTransposePattern:0.000
    [GraphBuilderPatternOptimization-NQE.optimize] iteration 5: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-NQE.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.002 | max_time=ShapeBasedSameChildrenPattern:0.000
    [GraphBuilderPatternOptimization-NQE.optimize] iteration 6: 5 nodes, priority=3
    [GraphBuilderPatternOptimization-NQE.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
    [GraphBuilderPatternOptimization-NQE.optimize] done after 7 iterations with 5 nodes in 0.034
    [OrderOptimization.optimize] ALGO-2
    [OrderOptimization.random_order] -- starts with 3 nodes, 4 initializers
    [OrderOptimization.shape_order] done after in 8.029201126191765e-05s with changed=0 scale=0
    [GraphBuilder-NQE.optimize] done with 3 nodes in 0.039

Statistics

This can be used to see when a pattern is applied and how long it takes.

<<<

import pandas
import onnx
from experimental_experiment.xbuilder import GraphBuilder, OptimizationOptions

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default"),
)
stat = gr.optimize()

print(pandas.DataFrame(stat))

>>>

                                  pattern  ...  algo
    0            dynamic_dimension_naming  ...   NaN
    1    check_A-dynamic_dimension_naming  ...   NaN
    2                     check_A-opt-sub  ...   NaN
    3                     remove_identity  ...   NaN
    4             check_remove_identity-0  ...   NaN
    ..                                ...  ...   ...
    698                      check_orderL  ...   NaN
    699                       shape_order  ...   NaN
    700                             order  ...     2
    701                    check_order-12  ...   NaN
    702                      optimization  ...   NaN
    
    [703 rows x 13 columns]

It can be aggregated:

<<<

import pandas
import onnx
from experimental_experiment.xbuilder import GraphBuilder, OptimizationOptions

onx = onnx.load("temp_doc_mlp.onnx")

gr = GraphBuilder(
    onx,
    infer_shapes_options=True,
    optimization_options=OptimizationOptions(patterns="default"),
)
stat = gr.optimize()

df = pandas.DataFrame(stat)
for c in df.columns:
    if "time" not in c and "pattern" not in c and "exit_point" not in c:
        df[c] = df[c].fillna(0).astype(int)
aggs = {
    "time_in": "sum",
    "added": "sum",
    "removed": "sum",
    "iteration": "max",
    "match_index": "max",
    "instances": "sum",
}
print(df.groupby("pattern").agg(aggs))

>>>

                                         time_in  ...  instances
    pattern                                       ...           
    apply_GemmTransposePattern          0.001798  ...          2
    apply_MatMulAddPattern              0.000698  ...          2
    apply_TransposeEqualReshapePattern  0.000576  ...          1
    apply_constant_folding__Reshape     0.000000  ...          0
    apply_constant_folding__Transpose   0.000000  ...          0
    ...                                      ...  ...        ...
    remove_duplicated_shape             0.000039  ...          0
    remove_identity                     0.000531  ...          0
    remove_identity_nodes               0.002764  ...          0
    remove_unused                       0.003405  ...          0
    shape_order                         0.000070  ...          0
    
    [140 rows x 6 columns]

Shape inference

The optimizers require to know the shapes to ensure they can rewrite some nodes and avoid producing a model which does not return the same results. If it is missing, some patterns cannot match for sure and they will not match.

This information can be built by running shape inference on the onnx models. That’s what is done is the previous examples. However, the best case is when this information comes from torch.

Function to_onnx converts a torch model into ONNX. While doing so, it stores the shape information coming from torch. There is no need to run shape inference on the onnx model it generates before optimizing it.

Available Patterns and API

All patterns may be found at .xoptim.patterns and .xoptim.patterns_ort.

When writing a pattern, walking along the graph or checking the shape is very common. Class GraphBuilderPatternOptimization provides the following methods.

Opsets

Patterns must rewrite using the nodes of the opset defined in the model.

Shapes, Types

Constants

  • is_constant: tells if a node is a constant (it may be a constant, an initializer or any value built on other constants)

  • is_constant_scalar: checks a constant is a scalar and compares its value to a number

  • get_computed_constant: returns the constant, computes it is a constant built from other constants

  • get_attribute: returns an attribute of a node

Graph

Nodes

  • make_node: creates a node without adding it to the graph

  • make_node_check_opset: creates a node without adding it to the graph, deals with some constraints related to opset version

Examples or Tools