Pattern Optimizer#
The pattern optimizer is implemented by class GraphBuilderPatternOptimization.
It searches for a specific sequence of nodes in the graph and
replaces it by another one without changing the inputs or the outputs
of the graph. The goal of the optimizer is to make the whole computation
graph more efficient. The goal of this implementation is to make this
optimization as fast as possible.
Assuming the nodes in an onnx graph are ordered in a way every input of a
node was created by previous nodes, the optimizer must not require
any global reordering. The cost should be in in the worst
case where N is the number of nodes, P is the number of patterns,
I is the number of iterations.
It is difficult to foresee what a pattern needs in order to rewrite a part of the graph. This API tries to give as much freedom as it can without leaving too much to do to the developer which tries to add a new pattern.
Patterns#
Patterns must inherit from PatternOptimization. This class defines two methods.
PatternOptimization.match#
def match(
self,
g: "GraphBuilderPatternOptimization",
node: NodeProto,
matched: List[MatchResult],
) -> Optional[MatchResult]:
gis aGraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.node: the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.matched: usually unused, it contains the list of nodes already matching a pattern
The method must not modify the graph.
The method returns None if no match is found or an instance of class MatchResult. It must contain:
a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewrite. If not specified, the optimizer will automatically determine the position of the new nodes.
Debugging: method none
def none(
self,
node: Optional[NodeProto] = None,
lineno: Optional[int] = None,
msg: Optional[Union[Callable[[], str], str]] = None,
):
It may be useful to know the reason why a pattern matching failed. Instead of returning None, method match can return the following expression:
return self.none(node, inspect.currentframe().f_lineno)
By setting the verbosity (see next Section), the user may then know which lines in the code returned None and which condition failed. The last parameter is used to print a more comprehensive message about the reason why the match failed.
PatternOptimization.apply#
@classmethod
def apply(
cls, g: "GraphBuilder", *nodes: Sequence[NodeProto]
) -> List[NodeProto]:
The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting. It assumes no other pattern optimizer modified them or will modify them. It receives the list of nodes returned by method match. Since it is a list of arguments, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned nodes.
PatternOptimization.fast_op_type#
@classmethod
def fast_op_type(cls) -> Set[str]:
The base class returns an empty set. Overriding this method is an optional
performance hint: when the returned set contains exactly one op_type
string, the optimizer builds an op-type → nodes index over the graph once per
matching step and restricts enumerate_matches to only the nodes of
that type. This avoids iterating over the entire graph for patterns whose
entry point is always a specific operator.
When the method returns an empty set (the default) or a set with more than one element, the full node list is used and no pre-filtering takes place.
from yobx.xoptim import PatternOptimization
class ReshapePattern(PatternOptimization):
"""Base class for patterns whose entry node is always a Reshape."""
@classmethod
def fast_op_type(cls):
return {"Reshape"}
Subclasses that always start matching from the same inherited entry point do
not need to override fast_op_type; the inherited implementation is
already correct.
Optimization Algorithm#
It is implemented in method optimize
def optimize(
self, max_iter=-1, remove_identity: bool = True
) -> List[Dict[str, Any]]:
The algorithm runs multiple iterations until the graph is not evolving or max_iter is reached. By default, it is equal to the number of nodes. An iteration is:
matches = []
builds all successors and predecessors
# Step 1: match
build op_type → nodes index (fast_nodes)
for all patterns P:
nodes_to_visit = fast_nodes[P.fast_op_type()] # pre-filtered
if len(P.fast_op_type()) == 1
else all nodes
for all nodes n in nodes_to_visit:
r = p.match(n)
if r:
if no node already scheduled to be rewritten by another match:
matches.append(r)
# Step 2: apply
for all matches r:
apply the match r
# Step 3: clean
remove unused nodes
remove identity nodes
This algorithm may apply more than one rewriting at each iteration but it guarantees the local structure when applying the rewriting was not altered by another one.
Adding a pattern#
Simple API#
We consider the following simple model:
<<<
import torch
from yobx.helpers.onnx_helper import pretty_onnx
from yobx.xbuilder import OptimizationOptions
from yobx.torch import to_onnx
class MLP(torch.nn.Module):
def __init__(self):
super().__init__()
self.layers = torch.nn.Sequential(
torch.nn.Linear(10, 32),
torch.nn.ReLU(),
torch.nn.Linear(32, 1),
)
def forward(self, x):
return self.layers(x)
x = torch.rand(3, 10)
onx = to_onnx(
MLP(), (x,), input_names=["x"], options=OptimizationOptions(patterns=None)
)
with open("temp_doc_mlp.onnx", "wb") as f:
f.write(onx.SerializeToString())
print(pretty_onnx(onx))
>>>
opset: domain='' version=21
input: name='x' type=dtype('float32') shape=[3, 10]
init: name='p_layers_0_weight::T10' type=float32 shape=(10, 32) -- GraphBuilder.constant_folding.from/fold(p_layers_0_weight)##p_layers_0_weight/DynamoInterpret.placeholder.1/P(layers.0.weight)
init: name='p_layers_2_weight::T10' type=float32 shape=(32, 1) -- GraphBuilder.constant_folding.from/fold(p_layers_2_weight)##p_layers_2_weight/DynamoInterpret.placeholder.1/P(layers.2.weight)
init: name='layers.0.bias' type=float32 shape=(32,) -- DynamoInterpret.placeholder.1/P(layers.0.bias)
init: name='layers.2.bias' type=float32 shape=(1,) -- array([-0.00138417], dtype=float32)-- DynamoInterpret.placeholder.1/P(layers.2.bias)
MatMul(x, p_layers_0_weight::T10) -> _onx_matmul_x
Add(_onx_matmul_x, layers.0.bias) -> _onx_add_matmul_x
Relu(_onx_add_matmul_x) -> relu
MatMul(relu, p_layers_2_weight::T10) -> _onx_matmul_relu
Add(_onx_matmul_relu, layers.2.bias) -> output_0
output: name='output_0' type=dtype('float32') shape=[3, 1]
Which we can render as follows:
We then apply the optimizations by writing the following code:
<<<
import onnx
from yobx.helpers.onnx_helper import pretty_onnx
from yobx.xbuilder import GraphBuilder
from yobx.doc import demo_mlp_model
onx = demo_mlp_model("temp_doc_mlp.onnx")
# The model is placed in a GraphBuilder.
# It creates dictionaries to store shapes, ranks, types
# to make it easier to the optimizers to find the information
# they need. It still uses NodeProto to store nodes
gr = GraphBuilder(onx, infer_shapes_options=True)
# Let's optimize.
opt_onx = gr.to_onnx(optimize=True)
with open("temp_doc_mlp_opt.onnx", "wb") as f:
f.write(opt_onx.SerializeToString())
print(pretty_onnx(opt_onx))
>>>
opset: domain='' version=18
input: name='x' type=dtype('float32') shape=[3, 10]
init: name='layers.0.bias' type=float32 shape=(32,) -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
init: name='layers.2.bias' type=float32 shape=(1,) -- array([-0.142], dtype=float32)-- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
init: name='GemmTransposePattern--p_layers_0_weight::T10' type=float32 shape=(32, 10)-- GraphBuilder.constant_folding.from/fold(p_layers_0_weight::T10)##p_layers_0_weight::T10/GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
init: name='GemmTransposePattern--p_layers_2_weight::T10' type=float32 shape=(1, 32)-- GraphBuilder.constant_folding.from/fold(init7_s2_1_32,p_layers_2_weight::T10)##p_layers_2_weight::T10/GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)##init7_s2_1_32/TransposeEqualReshapePattern.apply.new_shape
Gemm(x, GemmTransposePattern--p_layers_0_weight::T10, layers.0.bias, transB=1) -> linear
Relu(linear) -> relu
Gemm(relu, GemmTransposePattern--p_layers_2_weight::T10, layers.2.bias, transB=1) -> output_0
output: name='output_0' type=dtype('float32') shape=[3, 1]
Which renders as follows:
Verbosity#
<<<
import onnx
from yobx.xbuilder import GraphBuilder
from yobx.doc import demo_mlp_model
onx = demo_mlp_model("temp_doc_mlp.onnx")
gr = GraphBuilder(onx, infer_shapes_options=True, verbose=1)
opt_onx = gr.to_onnx(optimize=True)
>>>
[GraphBuilder-SAY._add_shape_information] dynamic shapes replacements={}
[GraphBuilder-SAY.optimize] start with 5 nodes
[GraphBuilder-SAY.optimize] #patterns=98
[GraphBuilder-SAY.optimize] start with subgraphs
[GraphBuilder-SAY.optimize] done with subgraphs
[GraphBuilderPatternOptimization-SAY.optimize] start with 5 nodes, 4 initializers, 98 patterns, priorities=[0, 1, 2, 3], max_iter=40
[GraphBuilderPatternOptimization-SAY.optimize] same children={'SameChildrenPattern', 'SameChildrenFromInputPattern'}
[GraphBuilderPatternOptimization-SAY.optimize] iteration 0: 5 nodes, priority=0
[GraphBuilderPatternOptimization-SAY.optimize] increase priority to 1
[GraphBuilderPatternOptimization-SAY.optimize] iteration 1: 5 nodes, priority=1
[GraphBuilderPatternOptimization-SAY.optimize] increase priority to 2
[GraphBuilderPatternOptimization-SAY.optimize] iteration 2: 5 nodes, priority=2
[GraphBuilderPatternOptimization-SAY.optimize] increase priority to 3
[GraphBuilderPatternOptimization-SAY.optimize] iteration 3: 5 nodes, priority=3
[GraphBuilderPatternOptimization-SAY.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.002 | max_time=IdentityPattern:0.000
[GraphBuilderPatternOptimization-SAY.optimize] iteration 4: 3 nodes, priority=3
[GraphBuilderPatternOptimization-SAY.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.001 | max_time=SequenceConstructAtPattern:0.000
[GraphBuilderPatternOptimization-SAY.optimize] iteration 5: 5 nodes, priority=3
[GraphBuilderPatternOptimization-SAY.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.002 | max_time=TransposeMatMulPattern:0.000
[GraphBuilderPatternOptimization-SAY.optimize] iteration 6: 5 nodes, priority=3
[GraphBuilderPatternOptimization-SAY.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-SAY.optimize] done after 7 iterations with 5 nodes in 0.033
[OrderOptimization.optimize] ALGO-2
[OrderOptimization.shape_order] -- starts with 3 nodes, 4 initializers
[OrderOptimization.shape_order] done after in 6.883900005050236e-05s with changed=0 scale=0
[GraphBuilder-SAY.optimize] done with 3 nodes in 0.037
[GraphBuilder-SAY.to_onnx] make_model 4 inits 0 params
[GraphBuilder-SAY.time_evaluation_constants_] 0
[GraphBuilder-SAY._build_initializers] start with 4 initializers, large_model=False, external_threshold=1024
[GraphBuilder-SAY._build_initializers] switch low/high order
[GraphBuilder-SAY._build_initializers] done in 2.2890000082043116e-06s with 4 initializers, 0 large initializers
[GraphBuilder-SAY._add_shape_information] dynamic shapes replacements={}
With more verbosity:
<<<
import onnx
from yobx.xbuilder import GraphBuilder
from yobx.doc import demo_mlp_model
onx = demo_mlp_model("temp_doc_mlp.onnx")
gr = GraphBuilder(onx, infer_shapes_options=True, verbose=11)
opt_onx = gr.to_onnx(optimize=True)
>>>
[GraphBuilder-HKI._update_structures_with_proto] -- starts with 5 nodes
[GraphBuilder-HKI.set_shape] p_layers_0_weight::T10:(10, 32)
[GraphBuilder-HKI.set_rank] p_layers_0_weight::T10:2
[GraphBuilder-HKI.set_type] p_layers_0_weight::T10:1
[GraphBuilder-HKI.make_initializer] p_layers_0_weight::T10[1:(10, 32)]
[GraphBuilder-HKI.update_node_constant] new constant 'p_layers_0_weight::T10', node=None
[GraphBuilder-HKI.set_shape] p_layers_2_weight::T10:(32, 1)
[GraphBuilder-HKI.set_rank] p_layers_2_weight::T10:2
[GraphBuilder-HKI.set_type] p_layers_2_weight::T10:1
[GraphBuilder-HKI.make_initializer] p_layers_2_weight::T10[1:(32, 1)]
[GraphBuilder-HKI.update_node_constant] new constant 'p_layers_2_weight::T10', node=None
[GraphBuilder-HKI.set_shape] layers.0.bias:(32,)
[GraphBuilder-HKI.set_rank] layers.0.bias:1
[GraphBuilder-HKI.set_type] layers.0.bias:1
[GraphBuilder-HKI.make_initializer] layers.0.bias[1:(32,)]
[GraphBuilder-HKI.update_node_constant] new constant 'layers.0.bias', node=None
[GraphBuilder-HKI.set_shape] layers.2.bias:(1,)
[GraphBuilder-HKI.set_rank] layers.2.bias:1
[GraphBuilder-HKI.set_type] layers.2.bias:1
[GraphBuilder-HKI.make_initializer] layers.2.bias[1:(1,)]
[GraphBuilder-HKI.update_node_constant] new constant 'layers.2.bias', node=None
[GraphBuilder-HKI.set_type] x:1
[GraphBuilder-HKI.set_shape] x:(3, 10)
[GraphBuilder-HKI.set_rank] x:2
[GraphBuilder-HKI.set_type] output_0:1
[GraphBuilder-HKI.set_shape] output_0:(3, 1)
[GraphBuilder-HKI.set_rank] output_0:2
[GraphBuilder-HKI.set_type] _onx_matmul_x:1
[GraphBuilder-HKI.set_shape] _onx_matmul_x:(3, 32)
[GraphBuilder-HKI.set_rank] _onx_matmul_x:2
[GraphBuilder-HKI.set_type] linear:1
[GraphBuilder-HKI.set_shape] linear:(3, 32)
[GraphBuilder-HKI.set_rank] linear:2
[GraphBuilder-HKI.set_type] relu:1
[GraphBuilder-HKI.set_shape] relu:(3, 32)
[GraphBuilder-HKI.set_rank] relu:2
[GraphBuilder-HKI.set_type] _onx_matmul_relu:1
[GraphBuilder-HKI.set_shape] _onx_matmul_relu:(3, 1)
[GraphBuilder-HKI.set_rank] _onx_matmul_relu:2
[GraphBuilder-HKI.set_type] output_0:1
[GraphBuilder-HKI._update_structures_with_proto] ends with 5 nodes in 0.0019173769999270007
[GraphBuilder-HKI.constant_folding] -- starts with 4 constants and 5 nodes.
[GraphBuilder-HKI.constant_folding] cst:: . :: linear
[GraphBuilder-HKI.constant_folding] cst:: 1 :: p_layers_2_weight::T10
[GraphBuilder-HKI.constant_folding] cst:: . :: _onx_matmul_x
[GraphBuilder-HKI.constant_folding] cst:: . :: _onx_matmul_relu
[GraphBuilder-HKI.constant_folding] cst:: 1 :: layers.2.bias
[GraphBuilder-HKI.constant_folding] cst:: . :: x
[GraphBuilder-HKI.constant_folding] cst:: . :: relu
[GraphBuilder-HKI.constant_folding] cst:: . :: output_0
[GraphBuilder-HKI.constant_folding] cst:: 1 :: p_layers_0_weight::T10
[GraphBuilder-HKI.constant_folding] cst:: 1 :: layers.0.bias
[GraphBuilder-HKI.constant_folding] initializer: p_layers_0_weight::T10
[GraphBuilder-HKI.constant_folding] initializer: p_layers_2_weight::T10
[GraphBuilder-HKI.constant_folding] initializer: layers.0.bias
[GraphBuilder-HKI.constant_folding] initializer: layers.2.bias
[GraphBuilder-HKI.constant_folding] ends with 4 constants and 5 nodes in 8.067700002811762e-05 seconds
[GraphBuilder-HKI._update_shape_types_with_proto] -- starts with 5 nodes and 0 shapes.
[GraphBuilder._update_shape_types_with_proto] infer shapes
[GraphBuilder._update_shape_types_with_proto] infer shapes done 0.00027463499998248153 seconds
[GraphBuilder._update_shape_types_with_proto] _clean_shapes after 0.0003218540000489156 seconds
[GraphBuilder-HKI._update_shape_types_with_proto] walk through 0 shapes.
[GraphBuilder-HKI.set_type] _onx_matmul_x:1
[_update_shape_types_with_proto_one_result] update shape(_onx_matmul_x) with (3, 32)
[GraphBuilder-HKI.set_type] linear:1
[_update_shape_types_with_proto_one_result] update shape(linear) with (3, 32)
[GraphBuilder-HKI.set_type] relu:1
[_update_shape_types_with_proto_one_result] update shape(relu) with (3, 32)
[GraphBuilder-HKI.set_type] _onx_matmul_relu:1
[_update_shape_types_with_proto_one_result] update shape(_onx_matmul_relu) with (3, 1)
[GraphBuilder-HKI._update_shape_types_with_proto] ends in 0.00012019700000109879 seconds.
[GraphBuilder-HKI._add_shape_information] dynamic shapes replacements={}
[GraphBuilder-HKI.optimize] start with 5 nodes
[GraphBuilder-HKI.optimize] options=OptimizationOptions(constant_folding={'Squeeze', 'Transpose', 'Exp', 'Add', 'Mul', 'Sqrt', 'Div', 'Concat', 'Cast', 'Reshape', 'Reciprocal', 'Unsqueeze', 'Sub'}, patterns=[BatchNormalizationPattern(), BatchNormalizationTrainingPattern(), CastLayerNormalizationCastPattern(), CastPattern(), CastCastBinaryPattern(), CastCastPattern(), CastOpCastPattern(), ClipClipPattern(), ConcatEmptyPattern(), ConcatGatherPattern(), ConcatReshapePattern(), ConcatTwiceUnaryPattern(), ConstantToInitializerPattern(), ConvBiasNullPattern(), PadConvPattern(), DropoutPattern(), ExpandPattern(), ExpandBroadcastPattern(), ExpandSwapPattern(), ExpandUnsqueezeExpandPattern(), GathersSplitPattern(), GeluPattern(), IdentityPattern(), LayerNormalizationPattern(), LayerNormalizationScalePattern(), LeakyReluPattern(), MaxReluPattern(), MulMulMulScalarPattern(), MulUnsqueezeUnsqueezePattern(), NotNotPattern(), NotWherePattern(), ReduceArgTopKPattern(), ReduceReshapePattern(), ReduceSumNormalizePattern(), ReshapePattern(), ReshapeMatMulReshapePattern(), Reshape2Of3Pattern(), ReshapeReshapeBinaryPattern(), MatMulAddPattern(), GemmTransposePattern(), MatMulReshape2Of3Pattern(), MulMulMatMulPattern(), ShapeBasedReshapeIsSqueezePattern(), ShapeBasedStaticExpandPattern(), ShapeBasedConcatExpandPattern(), ShapeBasedEditDistanceReshapePattern(), ShapeBasedIdentityPattern(), ShapeBasedExpandBroadcastPattern(), ShapeBasedExpandBroadcastMatMulPattern(), ShapeBasedExpandCastWhereSwapPattern(), ShapeBasedExpandSwapPattern(), ShapeBasedMatMulToMulPattern(), ShapedBasedReshapePattern(), ShapeBasedSameChildrenPattern(), ShapeBasedShapeShapeAddPattern(), ReshapeReshapePattern(), RotaryEmbeddingPattern(), SameChildrenPattern(), SameChildrenFromInputPattern(), SequenceConstructAtPattern(), SplitToSequenceSequenceAtPattern(), SliceSlicePattern(), SlicesSplitPattern(), SoftmaxCrossEntropyLossCastPattern(), SplitConcatPattern(), SqueezeAddPattern(), SqueezeBinaryUnsqueezePattern(), SqueezeUnsqueezePattern(), StaticConcatReshapePattern(), Sub1MulPattern(), SwapExpandReshapePattern(), SwapExpandUnsqueezePattern(), SwapRangeAddScalarPattern(), SwapUnaryPattern(), SwapUnsqueezeTransposePattern(), SwitchOrderBinaryPattern(), SwitchReshapeActivationPattern(), TransposeEqualReshapePattern(), TransposeGatherPattern(), TransposeMatMulPattern(), TransposeReshapeMatMulPattern(), TransposeReshapeTransposePattern(), TransposeTransposePattern(), UnsqueezeEqualPattern(), UnsqueezeOrSqueezeReshapePattern(), UnsqueezeReshapePattern(), UnsqueezeUnsqueezePattern(), WhereAddPattern(), RotaryConcatPartPattern(), FunctionAttentionPattern(), FunctionAttentionGQAPattern(), FunctionCausalMaskPattern(), FunctionCausalMaskMulAddPattern(), FunctionCosSinCachePattern(), FunctionHalfRotaryEmbeddingPattern(), RMSNormalizationPattern(), RMSNormalizationMulPattern(), AttentionGQAPattern()], verbose=11, order=SHAPE)
-- GRAPH BEFORE OPTIMIZATION --
opset: : 18
init: p_layers_0_weight::T10: CP1: (10, 32) -- GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
init: p_layers_2_weight::T10: CP1: (32, 1) -- GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)
init: layers.0.bias: CP1: (32,) -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
init: layers.2.bias: CP1: (1,) -- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
input:: x |T1: 3 x 10
MatMul: x, p_layers_0_weight::T10 -> _onx_matmul_x |T1: 3 x 32
Add: _onx_matmul_x, layers.0.bias -> linear |T1: 3 x 32
Relu: linear -> relu |T1: 3 x 32
MatMul: relu, p_layers_2_weight::T10 -> _onx_matmul_relu |T1: 3 x 1
Add: _onx_matmul_relu, layers.2.bias -> output_0 |T1: 3 x 1
output:: output_0 |T1: 3 x 1
-- END --
[GraphBuilder-HKI.optimize] start with subgraphs
[GraphBuilder-HKI.optimize] done with subgraphs
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 5
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 5 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 5 nodes in 4.95150000006106e-05 seconds
[GraphBuilder-HKI.constant_folding] -- starts with 4 constants and 5 nodes.
[GraphBuilder-HKI.constant_folding] cst:: . :: linear
[GraphBuilder-HKI.constant_folding] cst:: 1 :: p_layers_2_weight::T10
[GraphBuilder-HKI.constant_folding] cst:: . :: _onx_matmul_x
[GraphBuilder-HKI.constant_folding] cst:: . :: _onx_matmul_relu
[GraphBuilder-HKI.constant_folding] cst:: 1 :: layers.2.bias
[GraphBuilder-HKI.constant_folding] cst:: . :: x
[GraphBuilder-HKI.constant_folding] cst:: . :: relu
[GraphBuilder-HKI.constant_folding] cst:: . :: output_0
[GraphBuilder-HKI.constant_folding] cst:: 1 :: p_layers_0_weight::T10
[GraphBuilder-HKI.constant_folding] cst:: 1 :: layers.0.bias
[GraphBuilder-HKI.constant_folding] initializer: p_layers_0_weight::T10
[GraphBuilder-HKI.constant_folding] initializer: p_layers_2_weight::T10
[GraphBuilder-HKI.constant_folding] initializer: layers.0.bias
[GraphBuilder-HKI.constant_folding] initializer: layers.2.bias
[GraphBuilder-HKI.constant_folding] ends with 4 constants and 5 nodes in 0.00010264200000165147 seconds
[GraphBuilderPatternOptimization-HKI.optimize] start with 5 nodes, 4 initializers, 98 patterns, priorities=[0, 1, 2, 3], max_iter=40
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 1/98 - P0 - BatchNormalizationPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 2/98 - P0 - BatchNormalizationTrainingPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 3/98 - P0 - CastCastPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 4/98 - P0 - CastPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 5/98 - P0 - ConcatGatherPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 6/98 - P0 - ConcatReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 7/98 - P0 - ConvBiasNullPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 8/98 - P0 - ExpandPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 9/98 - P0 - ExpandUnsqueezeExpandPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 10/98 - P0 - FunctionAttentionGQAPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 11/98 - P0 - FunctionAttentionPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 12/98 - P0 - GeluPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 13/98 - P0 - IdentityPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 14/98 - P0 - LeakyReluPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 15/98 - P0 - MulUnsqueezeUnsqueezePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 16/98 - P0 - PadConvPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 17/98 - P0 - ReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 18/98 - P0 - ReshapeReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 19/98 - P0 - SameChildrenFromInputPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 20/98 - P0 - SameChildrenPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 21/98 - P0 - ShapeBasedEditDistanceReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 22/98 - P0 - ShapeBasedIdentityPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 23/98 - P0 - ShapeBasedReshapeIsSqueezePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 24/98 - P0 - ShapeBasedSameChildrenPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 25/98 - P0 - ShapeBasedShapeShapeAddPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 26/98 - P0 - ShapeBasedStaticExpandPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 27/98 - P0 - ShapedBasedReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 28/98 - P0 - SoftmaxCrossEntropyLossCastPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 29/98 - P0 - SqueezeAddPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 30/98 - P0 - SqueezeBinaryUnsqueezePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 31/98 - P0 - SqueezeUnsqueezePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 32/98 - P0 - StaticConcatReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 33/98 - P0 - SwapExpandReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 34/98 - P0 - SwapExpandUnsqueezePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 35/98 - P0 - SwapUnaryPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 36/98 - P0 - SwapUnsqueezeTransposePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 37/98 - P0 - TransposeGatherPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 38/98 - P0 - TransposeReshapeTransposePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 39/98 - P0 - TransposeTransposePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 40/98 - P0 - UnsqueezeOrSqueezeReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 41/98 - P0 - UnsqueezeReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 42/98 - P0 - UnsqueezeUnsqueezePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 43/98 - P1 - CastCastBinaryPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 44/98 - P1 - CastLayerNormalizationCastPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 45/98 - P1 - CastOpCastPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 46/98 - P1 - ClipClipPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 47/98 - P1 - ConcatEmptyPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 48/98 - P1 - ConcatTwiceUnaryPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 49/98 - P1 - ConstantToInitializerPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 50/98 - P1 - DropoutPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 51/98 - P1 - ExpandBroadcastPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 52/98 - P1 - ExpandSwapPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 53/98 - P1 - FunctionCausalMaskMulAddPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 54/98 - P1 - FunctionCausalMaskPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 55/98 - P1 - FunctionCosSinCachePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 56/98 - P1 - FunctionHalfRotaryEmbeddingPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 57/98 - P1 - GathersSplitPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 58/98 - P1 - GemmTransposePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 59/98 - P1 - LayerNormalizationPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 60/98 - P1 - LayerNormalizationScalePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 61/98 - P1 - MatMulReshape2Of3Pattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 62/98 - P1 - MaxReluPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 63/98 - P1 - MulMulMatMulPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 64/98 - P1 - MulMulMulScalarPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 65/98 - P1 - NotNotPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 66/98 - P1 - NotWherePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 67/98 - P1 - RMSNormalizationMulPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 68/98 - P1 - RMSNormalizationPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 69/98 - P1 - ReduceArgTopKPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 70/98 - P1 - ReduceReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 71/98 - P1 - ReduceSumNormalizePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 72/98 - P1 - Reshape2Of3Pattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 73/98 - P1 - ReshapeMatMulReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 74/98 - P1 - ReshapeReshapeBinaryPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 75/98 - P1 - RotaryConcatPartPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 76/98 - P1 - RotaryEmbeddingPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 77/98 - P1 - SequenceConstructAtPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 78/98 - P1 - ShapeBasedConcatExpandPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 79/98 - P1 - ShapeBasedExpandBroadcastMatMulPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 80/98 - P1 - ShapeBasedExpandBroadcastPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 81/98 - P1 - ShapeBasedExpandCastWhereSwapPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 82/98 - P1 - ShapeBasedExpandSwapPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 83/98 - P1 - ShapeBasedMatMulToMulPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 84/98 - P1 - SliceSlicePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 85/98 - P1 - SlicesSplitPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 86/98 - P1 - SplitConcatPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 87/98 - P1 - SplitToSequenceSequenceAtPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 88/98 - P1 - Sub1MulPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 89/98 - P1 - SwapRangeAddScalarPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 90/98 - P1 - SwitchOrderBinaryPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 91/98 - P1 - SwitchReshapeActivationPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 92/98 - P1 - TransposeEqualReshapePattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 93/98 - P1 - TransposeMatMulPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 94/98 - P1 - TransposeReshapeMatMulPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 95/98 - P1 - UnsqueezeEqualPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 96/98 - P1 - WhereAddPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 97/98 - P2 - AttentionGQAPattern()
[GraphBuilderPatternOptimization-HKI.optimize] use pattern 98/98 - P3 - MatMulAddPattern()
-- optimize starts with...
opset: : 18
init: p_layers_0_weight::T10: CP1: (10, 32) -- GraphBuilder._update_structures_with_proto.1/from(p_layers_0_weight::T10)
init: p_layers_2_weight::T10: CP1: (32, 1) -- GraphBuilder._update_structures_with_proto.1/from(p_layers_2_weight::T10)
init: layers.0.bias: CP1: (32,) -- GraphBuilder._update_structures_with_proto.1/from(layers.0.bias)
init: layers.2.bias: CP1: (1,) -- GraphBuilder._update_structures_with_proto.1/from(layers.2.bias)
input:: x |T1: 3 x 10
MatMul: x, p_layers_0_weight::T10 -> _onx_matmul_x |T1: 3 x 32
Add: _onx_matmul_x, layers.0.bias -> linear |T1: 3 x 32
Relu: linear -> relu |T1: 3 x 32
MatMul: relu, p_layers_2_weight::T10 -> _onx_matmul_relu |T1: 3 x 1
Add: _onx_matmul_relu, layers.2.bias -> output_0 |T1: 3 x 1
output:: output_0 |T1: 3 x 1
-- starts optimization
[GraphBuilderPatternOptimization-HKI.optimize] same children={'SameChildrenPattern', 'SameChildrenFromInputPattern'}
[GraphBuilderPatternOptimization-HKI.optimize] iteration 0: 5 nodes, priority=0
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0 - matching_step
[PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips CastLayerNormalizationCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips CastCastBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips CastOpCastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ClipClipPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ConcatEmptyPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips ConcatTwiceUnaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ConstantToInitializerPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips DropoutPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips ExpandBroadcastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ExpandSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips GathersSplitPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
[PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
[IdentityPattern.match] NONE - line: 730:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[IdentityPattern.match] NONE - line: 772:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[GraphBuilderPatternOptimization-HKI.optimize] skips LayerNormalizationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips LayerNormalizationScalePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
[GraphBuilder-YLS.make_tensor_input] x[0:None] -- marker=_build_pattern1_x
[GraphBuilder-YLS.set_type] x:0
[GraphBuilder-YLS.set_type] x:-1
[GraphBuilder-YLS.make_tensor_input] zero[0:None] -- marker=_build_pattern1_zero
[GraphBuilder-YLS.set_type] zero:0
[GraphBuilder-YLS.set_type] zero:-1
[GraphBuilder-YLS.make_tensor_input] slope[0:None] -- marker=_build_pattern1_slope
[GraphBuilder-YLS.set_type] slope:0
[GraphBuilder-YLS.set_type] slope:-1
[GraphBuilder-YLS.3.make_node] [tt:-] Greater: ['x', 'zero']->['_onx_greater_x']
[GraphBuilder-YLS.set_type] _onx_greater_x:9
[GraphBuilder-YLS.3.make_node] [tt:-] Mul: ['x', 'slope']->['_onx_mul_x']
[GraphBuilder-YLS.set_type] _onx_mul_x:-1
[GraphBuilder-YLS.3.make_node] [ttt:-] Where: ['_onx_greater_x', 'x', '_onx_mul_x']->['_onx_where_greater_x']
[GraphBuilder-YLS.set_type] _onx_where_greater_x:-1
[GraphBuilder-YLS.make_tensor_output] _onx_where_greater_x[0: None]
[GraphBuilderPatternOptimization-HKI.optimize] skips MaxReluPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips MulMulMulScalarPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips NotNotPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips NotWherePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ReduceArgTopKPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ReduceReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ReduceSumNormalizePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips ReshapeMatMulReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips Reshape2Of3Pattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ReshapeReshapeBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips GemmTransposePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips MatMulReshape2Of3Pattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips MulMulMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips ShapeBasedConcatExpandPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips ShapeBasedExpandBroadcastPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ShapeBasedExpandBroadcastMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ShapeBasedExpandCastWhereSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ShapeBasedExpandSwapPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips ShapeBasedMatMulToMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
[ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips RotaryEmbeddingPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips SequenceConstructAtPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips SplitToSequenceSequenceAtPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips SliceSlicePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips SlicesSplitPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
[GraphBuilder-YIA.make_tensor_input] X[0:None] -- marker=_build_pattern1_X
[GraphBuilder-YIA.set_type] X:0
[GraphBuilder-YIA.set_type] X:-1
[GraphBuilder-YIA.make_tensor_input] indices[0:None] -- marker=_build_pattern1_indices
[GraphBuilder-YIA.set_type] indices:0
[GraphBuilder-YIA.set_type] indices:-1
[GraphBuilder-YIA.make_tensor_input] axis[0:None] -- marker=_build_pattern1_axis
[GraphBuilder-YIA.set_type] axis:0
[GraphBuilder-YIA.set_type] axis:-1
[GraphBuilder-YIA.make_tensor_input] zerof[0:None] -- marker=_build_pattern1_zerof
[GraphBuilder-YIA.set_type] zerof:0
[GraphBuilder-YIA.set_type] zerof:-1
[GraphBuilder-YIA.make_tensor_input] zeroi[0:None] -- marker=_build_pattern1_zeroi
[GraphBuilder-YIA.set_type] zeroi:0
[GraphBuilder-YIA.set_type] zeroi:-1
[GraphBuilder-YIA.make_tensor_input] b[0:None] -- marker=_build_pattern1_b
[GraphBuilder-YIA.set_type] b:0
[GraphBuilder-YIA.set_type] b:-1
[GraphBuilder-YIA.3.make_node] [tt:-] Equal: ['indices', 'b']->['_onx_equal_indices']
[GraphBuilder-YIA.set_type] _onx_equal_indices:9
[GraphBuilder-YIA.3.make_node] [t:-] Not: ['_onx_equal_indices']->['_onx_not_equal_indices']
[GraphBuilder-YIA.set_type] _onx_not_equal_indices:9
[GraphBuilder-YIA.3.make_node] [ttt:-] Where: ['_onx_not_equal_indices', 'indices', 'zeroi']->['_onx_where_not_equal_indices']
[GraphBuilder-YIA.set_type] _onx_where_not_equal_indices:-1
[GraphBuilder-YIA.3.make_node] [tt:-] Unsqueeze: ['_onx_where_not_equal_indices', 'axis']->['_onx_where_not_equal_indices::UnSq']
[GraphBuilder-YIA.set_type] _onx_where_not_equal_indices::UnSq:-1
[GraphBuilder-YIA.3.make_node] [t:-] LogSoftmax: ['X']->['_onx_logsoftmax_X']
[GraphBuilder-YIA.set_type] _onx_logsoftmax_X:-1
[GraphBuilder-YIA.set_type] _onx_gatherelements_logsoftmax_X:-1
[GraphBuilder-YIA.3.make_node] [tt:t] GatherElements: ['_onx_logsoftmax_X', '_onx_where_not_equal_indices::UnSq']->['_onx_gatherelements_logsoftmax_X']
[GraphBuilder-YIA.set_type] _onx_gatherelements_logsoftmax_X:-1
[GraphBuilder-YIA.3.make_node] [tt:-] Squeeze: ['_onx_gatherelements_logsoftmax_X', 'axis']->['_onx_gatherelements_logsoftmax_X::Sq']
[GraphBuilder-YIA.set_type] _onx_gatherelements_logsoftmax_X::Sq:-1
[GraphBuilder-YIA.3.make_node] [t:-] Neg: ['_onx_gatherelements_logsoftmax_X::Sq']->['_onx_neg_gatherelements_logsoftmax_X::Sq']
[GraphBuilder-YIA.set_type] _onx_neg_gatherelements_logsoftmax_X::Sq:-1
[GraphBuilder-YIA.3.make_node] [ttt:-] Where: ['_onx_not_equal_indices', '_onx_neg_gatherelements_logsoftmax_X::Sq', 'zerof']->['_onx_where_not_equal_indices2']
[GraphBuilder-YIA.set_type] _onx_where_not_equal_indices2:-1
[GraphBuilder-YIA.3.make_node] [t:-] Cast: ['_onx_not_equal_indices']->['_onx_not_equal_indices::C1']
[GraphBuilder-YIA.set_type] _onx_not_equal_indices::C1:1
[GraphBuilder-YIA.3.make_node] [t:-] ReduceSum: ['_onx_not_equal_indices::C1']->['_onx_reducesum_not_equal_indices::C1']
[GraphBuilder-YIA.set_shape] _onx_reducesum_not_equal_indices::C1:()
[GraphBuilder-YIA.set_rank] _onx_reducesum_not_equal_indices::C1:0
[GraphBuilder-YIA.set_type] _onx_reducesum_not_equal_indices::C1:1
[GraphBuilder-YIA.3.make_node] [#:-] Cast: ['_onx_reducesum_not_equal_indices::C1']->['_onx_reducesum_not_equal_indices::C1::C10']
[GraphBuilder-YIA.set_type] _onx_reducesum_not_equal_indices::C1::C10:10
[GraphBuilder-YIA.set_shape] _onx_reducesum_not_equal_indices::C1::C10:()
[GraphBuilder-YIA.set_rank] _onx_reducesum_not_equal_indices::C1::C10:0
[GraphBuilder-YIA.3.make_node] [t:-] Cast: ['_onx_where_not_equal_indices2']->['_onx_where_not_equal_indices2::C1']
[GraphBuilder-YIA.set_type] _onx_where_not_equal_indices2::C1:1
[GraphBuilder-YIA.3.make_node] [t:-] ReduceSum: ['_onx_where_not_equal_indices2::C1']->['_onx_reducesum_where_not_equal_indices2::C1']
[GraphBuilder-YIA.set_shape] _onx_reducesum_where_not_equal_indices2::C1:()
[GraphBuilder-YIA.set_rank] _onx_reducesum_where_not_equal_indices2::C1:0
[GraphBuilder-YIA.set_type] _onx_reducesum_where_not_equal_indices2::C1:1
[GraphBuilder-YIA.3.make_node] [#:-] Cast: ['_onx_reducesum_where_not_equal_indices2::C1']->['_onx_reducesum_where_not_equal_indices2::C1::C10']
[GraphBuilder-YIA.set_type] _onx_reducesum_where_not_equal_indices2::C1::C10:10
[GraphBuilder-YIA.set_shape] _onx_reducesum_where_not_equal_indices2::C1::C10:()
[GraphBuilder-YIA.set_rank] _onx_reducesum_where_not_equal_indices2::C1::C10:0
[GraphBuilder-YIA.3.make_node] [##:-] Div: ['_onx_reducesum_where_not_equal_indices2::C1::C10', '_onx_reducesum_not_equal_indices::C1::C10']->['_onx_div_reducesum_where_not_equal_indices2::C1::C10']
[GraphBuilder-YIA.set_type] _onx_div_reducesum_where_not_equal_indices2::C1::C10:10
[GraphBuilder-YIA.set_shape] _onx_div_reducesum_where_not_equal_indices2::C1::C10:()
[GraphBuilder-YIA.set_rank] _onx_div_reducesum_where_not_equal_indices2::C1::C10:0
[GraphBuilder-YIA.make_tensor_output] _onx_div_reducesum_where_not_equal_indices2::C1::C10[0: None]
[GraphBuilderPatternOptimization-HKI.optimize] skips SplitConcatPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
[SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips Sub1MulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips SwapRangeAddScalarPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips SwitchOrderBinaryPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips SwitchReshapeActivationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips TransposeEqualReshapePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips TransposeMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips TransposeReshapeMatMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips UnsqueezeEqualPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips WhereAddPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips RotaryConcatPartPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips FunctionCausalMaskPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips FunctionCausalMaskMulAddPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips FunctionCosSinCachePattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips FunctionHalfRotaryEmbeddingPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips RMSNormalizationPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips RMSNormalizationMulPattern, pattern.priority=1, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] skips AttentionGQAPattern, pattern.priority=2, current_priority_index=0, priorities[current_priority_index]=0 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0 - matching_step done 0
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0F0 - apply_step with 0 matches
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0F0 - done with 0 applied patterns
[GraphBuilderPatternOptimization-HKI.optimize] done all: -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0F0 - remove_duplicated_shape
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0F0 - remove_duplicated_shape done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0F0 - remove_identity
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 5
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 5 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 5 nodes in 8.269299996754853e-05 seconds
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0F0 - remove_identity done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0F0 - remove_unused
[GraphBuilderPatternOptimization-HKI.optimize] it=0C0F0 - remove_unused done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] increase priority to 1
[GraphBuilderPatternOptimization-HKI.optimize] it=0C1F0 - next
[GraphBuilderPatternOptimization-HKI.optimize] iteration 1: 5 nodes, priority=1
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0 - matching_step
[PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
[CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
[CastOpCastPattern.match] NONE - line: 454:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[CastOpCastPattern.match] NONE - line: 451:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
[PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
[IdentityPattern.match] NONE - line: 730:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[IdentityPattern.match] NONE - line: 772:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
[PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
[ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
[Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
[ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[GraphBuilderPatternOptimization-HKI.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=1, priorities[current_priority_index]=1 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
[MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
[MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
[ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
[ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
[PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
[SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
[SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
[PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
[TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
[FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] skips AttentionGQAPattern, pattern.priority=2, current_priority_index=1, priorities[current_priority_index]=1 priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0 - matching_step done 0
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0F0 - apply_step with 0 matches
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0F0 - done with 0 applied patterns
[GraphBuilderPatternOptimization-HKI.optimize] done all: -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0F0 - remove_duplicated_shape
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0F0 - remove_duplicated_shape done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0F0 - remove_identity
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 5
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 5 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 5 nodes in 7.099600009041751e-05 seconds
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0F0 - remove_identity done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0F0 - remove_unused
[GraphBuilderPatternOptimization-HKI.optimize] it=1C0F0 - remove_unused done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] increase priority to 2
[GraphBuilderPatternOptimization-HKI.optimize] it=1C1F0 - next
[GraphBuilderPatternOptimization-HKI.optimize] iteration 2: 5 nodes, priority=2
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0 - matching_step
[PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
[CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
[CastOpCastPattern.match] NONE - line: 454:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[CastOpCastPattern.match] NONE - line: 451:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
[PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
[IdentityPattern.match] NONE - line: 730:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[IdentityPattern.match] NONE - line: 772:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
[PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
[ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
[Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
[ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[GraphBuilderPatternOptimization-HKI.optimize] skips MatMulAddPattern, pattern.priority=3, current_priority_index=2, priorities[current_priority_index]=2 priorities=[0, 1, 2, 3]
[PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
[MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
[MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
[ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
[ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
[PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
[SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
[SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
[PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
[TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
[FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0 - matching_step done 0
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0F0 - apply_step with 0 matches
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0F0 - done with 0 applied patterns
[GraphBuilderPatternOptimization-HKI.optimize] done all: -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0F0 - remove_duplicated_shape
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0F0 - remove_duplicated_shape done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0F0 - remove_identity
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 5
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 5 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 5 nodes in 6.415899997591623e-05 seconds
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0F0 - remove_identity done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0F0 - remove_unused
[GraphBuilderPatternOptimization-HKI.optimize] it=2C0F0 - remove_unused done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] increase priority to 3
[GraphBuilderPatternOptimization-HKI.optimize] it=2C1F0 - next
[GraphBuilderPatternOptimization-HKI.optimize] iteration 3: 5 nodes, priority=3
[GraphBuilderPatternOptimization-HKI.optimize] it=3C0 - matching_step
[PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
[CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[CastCastBinaryPattern.match] NONE - line: 312:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
[CastOpCastPattern.match] NONE - line: 454:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[CastOpCastPattern.match] NONE - line: 451:yobx.xoptim.patterns.onnx_cast, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
[PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
[IdentityPattern.match] NONE - line: 730:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[IdentityPattern.match] NONE - line: 772:yobx.xoptim.patterns.onnx_any, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
[PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
[ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ReshapeMatMulReshapePattern.match] NONE - line: 1035:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
[Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[Reshape2Of3Pattern.match] NONE - line: 684:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
[ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ReshapeReshapeBinaryPattern.match] NONE - line: 934:yobx.xoptim.patterns.onnx_reshape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
[MatchResult.match] MATCH MatMulAddPattern with 2 nodes and types ['MatMul', 'Add'] - []
[GraphBuilderPatternOptimization-HKI.optimize] match=MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
[MatchResult.match] MATCH MatMulAddPattern with 2 nodes and types ['MatMul', 'Add'] - []
[GraphBuilderPatternOptimization-HKI.optimize] match=MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
[PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
[MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[MatMulReshape2Of3Pattern.match] NONE - line: 556:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
[MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[MulMulMatMulPattern.match] NONE - line: 922:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedExpandBroadcastPattern.match] NONE - line: 383:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ShapeBasedExpandBroadcastMatMulPattern.match] NONE - line: 1081:yobx.xoptim.patterns.onnx_expand, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
[ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedExpandSwapPattern.match] NONE - line: 874:yobx.xoptim.patterns.onnx_expand, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
[ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[ShapeBasedMatMulToMulPattern.match] NONE - line: 1734:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
[ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[ShapeBasedShapeShapeAddPattern.match] NONE - line: 25:yobx.xoptim.patterns.onnx_shape, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
[PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
[SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[SqueezeAddPattern.match] NONE - line: 396:yobx.xoptim.patterns.onnx_unsqueeze, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
[SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
[PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
[TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=x,p_layers_0_weight::T10
[TransposeReshapeMatMulPattern.match] NONE - line: 1398:yobx.xoptim.patterns.onnx_matmul, op_type=MatMul, name=, inputs=relu,p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
[FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_x,layers.0.bias
[FunctionCausalMaskMulAddPattern.match] NONE - line: 1510:yobx.xoptim.patterns.onnx_rotary, op_type=Add, name=, inputs=_onx_matmul_relu,layers.2.bias
[PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] it=3C0 - matching_step done 2
[GraphBuilderPatternOptimization-HKI.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.003 | max_time=IdentityPattern:0.000
[GraphBuilderPatternOptimization-HKI.optimize] it=3C0F1 - apply_step with 2 matches
[GraphBuilderPatternOptimization-HKI.optimize] apply MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'], inputs: ['x', 'p_layers_0_weight::T10', '_onx_matmul_x', 'layers.0.bias'], outputs: ['_onx_matmul_x', 'linear']
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
- MatMul: ['x', 'p_layers_0_weight::T10'] -> ['_onx_matmul_x']
- Add: ['_onx_matmul_x', 'layers.0.bias'] -> ['linear']
+ Gemm: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
[GraphBuilder-HKI.set_type] linear:1
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'] applied.
[GraphBuilderPatternOptimization-HKI.optimize] - add ['Gemm']
[GraphBuilderPatternOptimization-HKI.optimize] done MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']: -2 +1 nodes
[GraphBuilderPatternOptimization-HKI.optimize] removed outputs {'_onx_matmul_x'}
[GraphBuilderPatternOptimization-HKI.optimize] apply MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'], inputs: ['relu', 'p_layers_2_weight::T10', '_onx_matmul_relu', 'layers.2.bias'], outputs: ['_onx_matmul_relu', 'output_0']
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']
- MatMul: ['relu', 'p_layers_2_weight::T10'] -> ['_onx_matmul_relu']
- Add: ['_onx_matmul_relu', 'layers.2.bias'] -> ['output_0']
+ Gemm: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
[GraphBuilder-HKI.set_type] output_0:1
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: MatMulAddPattern replaces ['MatMul', 'Add'] applied.
[GraphBuilderPatternOptimization-HKI.optimize] - add ['Gemm']
[GraphBuilderPatternOptimization-HKI.optimize] done MatchResult: MatMulAddPattern replaces ['MatMul', 'Add']: -2 +1 nodes
[GraphBuilderPatternOptimization-HKI.optimize] removed outputs {'_onx_matmul_relu'}
[GraphBuilderPatternOptimization-HKI.optimize] it=3C1F1 - done with 2 applied patterns
[GraphBuilderPatternOptimization-HKI.optimize] done all: -4 +2 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=3C1F1 - remove_duplicated_shape
[GraphBuilderPatternOptimization-HKI.optimize] it=3C1F1 - remove_duplicated_shape done -4 +2 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=3C1F1 - remove_identity
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 3
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 3 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 3 nodes in 4.9554999918655085e-05 seconds
[GraphBuilderPatternOptimization-HKI.optimize] it=3C1F1 - remove_identity done -4 +2 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=3C1F1 - remove_unused
[GraphBuilderPatternOptimization-HKI.optimize] it=3C1F1 - remove_unused done -4 +2 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=3C1F1 - next
[GraphBuilderPatternOptimization-HKI.optimize] iteration 4: 3 nodes, priority=3
[GraphBuilderPatternOptimization-HKI.optimize] it=4C0 - matching_step
[PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
[PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
[PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
[MatMulAddPattern.match] NONE - line: 130:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--, inputs=x,p_layers_0_weight::T10,layers.0.bias
[MatMulAddPattern.match] NONE - line: 127:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--2, inputs=relu,p_layers_2_weight::T10,layers.2.bias
[PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
[MatchResult.match] MATCH GemmTransposePattern with 1 nodes and types ['Gemm'] - []
[GraphBuilderPatternOptimization-HKI.optimize] match=MatchResult: GemmTransposePattern replaces ['Gemm']
[MatchResult.match] MATCH GemmTransposePattern with 1 nodes and types ['Gemm'] - []
[GraphBuilderPatternOptimization-HKI.optimize] match=MatchResult: GemmTransposePattern replaces ['Gemm']
[PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
[PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
[SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
[PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--, inputs=x,p_layers_0_weight::T10,layers.0.bias
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=MatMulAddPattern--2, inputs=relu,p_layers_2_weight::T10,layers.2.bias
[PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] it=4C0 - matching_step done 2
[GraphBuilderPatternOptimization-HKI.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.004 | max_time=SameChildrenPattern:0.003
[GraphBuilderPatternOptimization-HKI.optimize] it=4C0F1 - apply_step with 2 matches
[GraphBuilderPatternOptimization-HKI.optimize] apply MatchResult: GemmTransposePattern replaces ['Gemm'], inputs: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'], outputs: ['linear']
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=Transpose
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm']
- Gemm: ['x', 'p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
+ Transpose: ['p_layers_0_weight::T10'] -> ['GemmTransposePattern--p_layers_0_weight::T10']
+ Gemm: ['x', 'GemmTransposePattern--p_layers_0_weight::T10', 'layers.0.bias'] -> ['linear']
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=Transpose
[GraphBuilder-HKI.set_type] GemmTransposePattern--p_layers_0_weight::T10:1
[GraphBuilder-HKI.set_shape] GemmTransposePattern--p_layers_0_weight::T10:(32, 10)
[GraphBuilder-HKI.set_rank] GemmTransposePattern--p_layers_0_weight::T10:2
[GraphBuilder-HKI.set_type] linear:1
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm'] applied.
[GraphBuilderPatternOptimization-HKI.optimize] - add ['Transpose', 'Gemm']
[GraphBuilderPatternOptimization-HKI.optimize] done MatchResult: GemmTransposePattern replaces ['Gemm']: -1 +2 nodes
[GraphBuilderPatternOptimization-HKI.optimize] apply MatchResult: GemmTransposePattern replaces ['Gemm'], inputs: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'], outputs: ['output_0']
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Transpose
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm']
- Gemm: ['relu', 'p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
+ Transpose: ['p_layers_2_weight::T10'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
+ Gemm: ['relu', 'GemmTransposePattern--p_layers_2_weight::T10', 'layers.2.bias'] -> ['output_0']
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Transpose
[GraphBuilder-HKI.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
[GraphBuilder-HKI.set_shape] GemmTransposePattern--p_layers_2_weight::T10:(1, 32)
[GraphBuilder-HKI.set_rank] GemmTransposePattern--p_layers_2_weight::T10:2
[GraphBuilder-HKI.set_type] output_0:1
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: GemmTransposePattern replaces ['Gemm'] applied.
[GraphBuilderPatternOptimization-HKI.optimize] - add ['Transpose', 'Gemm']
[GraphBuilderPatternOptimization-HKI.optimize] done MatchResult: GemmTransposePattern replaces ['Gemm']: -1 +2 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=4C1F1 - done with 2 applied patterns
[GraphBuilderPatternOptimization-HKI.optimize] done all: -2 +4 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=4C1F1 - remove_duplicated_shape
[GraphBuilderPatternOptimization-HKI.optimize] it=4C1F1 - remove_duplicated_shape done -2 +4 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=4C1F1 - remove_identity
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 5
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 5 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 5 nodes in 6.239599997570622e-05 seconds
[GraphBuilderPatternOptimization-HKI.optimize] it=4C1F1 - remove_identity done -2 +4 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=4C1F1 - remove_unused
[GraphBuilderPatternOptimization-HKI.optimize] it=4C1F1 - remove_unused done -2 +4 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=4C1F1 - next
[GraphBuilderPatternOptimization-HKI.optimize] iteration 5: 5 nodes, priority=3
[GraphBuilderPatternOptimization-HKI.optimize] it=5C0 - matching_step
[PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
[PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
[IdentityPattern.match] NONE - line: 649:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[IdentityPattern.match] NONE - line: 649:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
[PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
[MatMulAddPattern.match] NONE - line: 130:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
[MatMulAddPattern.match] NONE - line: 127:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
[PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
[GemmTransposePattern.match] NONE - line: 405:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
[GemmTransposePattern.match] NONE - line: 405:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
[PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
[ShapeBasedIdentityPattern.match] NONE - line: 880:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[ShapeBasedIdentityPattern.match] NONE - line: 880:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
[PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
[SwapUnaryPattern.match] NONE - line: 983:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[SwapUnaryPattern.match] NONE - line: 983:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
[SwapUnsqueezeTransposePattern.match] NONE - line: 715:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[SwapUnsqueezeTransposePattern.match] NONE - line: 715:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
[SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
[PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
[TransposeEqualReshapePattern.match] NONE - line: 493:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[MatchResult.match] MATCH TransposeEqualReshapePattern with 1 nodes and types ['Transpose'] - []
[GraphBuilderPatternOptimization-HKI.optimize] match=MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
[PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
[TransposeMatMulPattern.match] NONE - line: 1231:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
[TransposeMatMulPattern.match] NONE - line: 1231:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
[PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
[TransposeReshapeTransposePattern.match] NONE - line: 245:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[TransposeReshapeTransposePattern.match] NONE - line: 245:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
[TransposeTransposePattern.match] NONE - line: 99:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[TransposeTransposePattern.match] NONE - line: 99:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10
[PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] it=5C0 - matching_step done 1
[GraphBuilderPatternOptimization-HKI.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.002 | max_time=TransposeMatMulPattern:0.000
[GraphBuilderPatternOptimization-HKI.optimize] it=5C0F1 - apply_step with 1 matches
[GraphBuilderPatternOptimization-HKI.optimize] apply MatchResult: TransposeEqualReshapePattern replaces ['Transpose'], inputs: ['p_layers_2_weight::T10'], outputs: ['GemmTransposePattern--p_layers_2_weight::T10']
[GraphBuilder-HKI.set_shape] init7_s2_1_32:(2,)
[GraphBuilder-HKI.set_rank] init7_s2_1_32:1
[GraphBuilder-HKI.set_type] init7_s2_1_32:7
[GraphBuilder-HKI.make_initializer] init7_s2_1_32[7:(2,)]
[GraphBuilder-HKI.update_node_constant] new constant 'init7_s2_1_32', node=None
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Reshape
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose']
- Transpose: ['p_layers_2_weight::T10'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
+ Reshape: ['p_layers_2_weight::T10', 'init7_s2_1_32'] -> ['GemmTransposePattern--p_layers_2_weight::T10']
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=Reshape
[GraphBuilder-HKI.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
[GraphBuilder-HKI.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
[GraphBuilderPatternOptimization-HKI.apply_match] MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] applied.
[GraphBuilderPatternOptimization-HKI.optimize] - add ['Reshape']
[GraphBuilderPatternOptimization-HKI.optimize] done MatchResult: TransposeEqualReshapePattern replaces ['Transpose']: -1 +1 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=5C1F1 - done with 1 applied patterns
[GraphBuilderPatternOptimization-HKI.optimize] done all: -1 +1 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=5C1F1 - remove_duplicated_shape
[GraphBuilderPatternOptimization-HKI.optimize] it=5C1F1 - remove_duplicated_shape done -1 +1 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=5C1F1 - remove_identity
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 5
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 5 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 5 nodes in 6.321200010006578e-05 seconds
[GraphBuilderPatternOptimization-HKI.optimize] it=5C1F1 - remove_identity done -1 +1 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=5C1F1 - remove_unused
[GraphBuilderPatternOptimization-HKI.optimize] it=5C1F1 - remove_unused done -1 +1 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=5C1F1 - next
[GraphBuilderPatternOptimization-HKI.optimize] iteration 6: 5 nodes, priority=3
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0 - matching_step
[PatternOptimization.enumerate_matches] start BatchNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start BatchNormalizationTrainingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastLayerNormalizationCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start CastOpCastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ClipClipPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatEmptyPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConcatReshapePattern with main_opset=18 and min_opset=1
[ConcatReshapePattern.match] NONE - line: 1079:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start ConcatTwiceUnaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConstantToInitializerPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ConvBiasNullPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start PadConvPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start DropoutPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ExpandUnsqueezeExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GathersSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start GeluPattern with main_opset=18 and min_opset=20
[PatternOptimization.enumerate_matches] start IdentityPattern with main_opset=18 and min_opset=1
[IdentityPattern.match] NONE - line: 649:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[PatternOptimization.enumerate_matches] start LayerNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LayerNormalizationScalePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start LeakyReluPattern with main_opset=18 and min_opset=6
[PatternOptimization.enumerate_matches] start MaxReluPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMulScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulUnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotNotPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start NotWherePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceArgTopKPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReduceSumNormalizePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapePattern with main_opset=18 and min_opset=1
[ReshapePattern.match] NONE - line: 42:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start ReshapeMatMulReshapePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start Reshape2Of3Pattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeReshapeBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MatMulAddPattern with main_opset=18 and min_opset=1
[MatMulAddPattern.match] NONE - line: 130:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
[MatMulAddPattern.match] NONE - line: 127:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
[PatternOptimization.enumerate_matches] start GemmTransposePattern with main_opset=18 and min_opset=1
[GemmTransposePattern.match] NONE - line: 405:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
[GemmTransposePattern.match] NONE - line: 405:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
[PatternOptimization.enumerate_matches] start MatMulReshape2Of3Pattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start MulMulMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedReshapeIsSqueezePattern with main_opset=18 and min_opset=1
[ShapeBasedReshapeIsSqueezePattern.match] NONE - line: 1689:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start ShapeBasedStaticExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedConcatExpandPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedEditDistanceReshapePattern with main_opset=18 and min_opset=1
[ShapeBasedEditDistanceReshapePattern.match] NONE - line: 1538:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start ShapeBasedIdentityPattern with main_opset=18 and min_opset=1
[ShapeBasedIdentityPattern.match] NONE - line: 880:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandBroadcastMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandCastWhereSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedExpandSwapPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedMatMulToMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapedBasedReshapePattern with main_opset=18 and min_opset=1
[ShapedBasedReshapePattern.match] NONE - line: 121:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start ShapeBasedSameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ShapeBasedShapeShapeAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start ReshapeReshapePattern with main_opset=18 and min_opset=1
[ReshapeReshapePattern.match] NONE - line: 352:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start RotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SameChildrenFromInputPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SequenceConstructAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SplitToSequenceSequenceAtPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SliceSlicePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SlicesSplitPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SoftmaxCrossEntropyLossCastPattern with main_opset=18 and min_opset=14
[PatternOptimization.enumerate_matches] start SplitConcatPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeBinaryUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start StaticConcatReshapePattern with main_opset=18 and min_opset=1
[StaticConcatReshapePattern.match] NONE - line: 1256:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start Sub1MulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapExpandReshapePattern with main_opset=18 and min_opset=1
[SwapExpandReshapePattern.match] NONE - line: 1724:yobx.xoptim.patterns.onnx_expand, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start SwapExpandUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapRangeAddScalarPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwapUnaryPattern with main_opset=18 and min_opset=1
[SwapUnaryPattern.match] NONE - line: 983:yobx.xoptim.patterns.onnx_any, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[SwapUnaryPattern.match] NONE - line: 983:yobx.xoptim.patterns.onnx_any, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start SwapUnsqueezeTransposePattern with main_opset=18 and min_opset=1
[SwapUnsqueezeTransposePattern.match] NONE - line: 715:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[PatternOptimization.enumerate_matches] start SwitchOrderBinaryPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start SwitchReshapeActivationPattern with main_opset=18 and min_opset=1
[SwitchReshapeActivationPattern.match] NONE - line: 1601:yobx.xoptim.patterns.onnx_matmul, op_type=Relu, name=, inputs=linear
[PatternOptimization.enumerate_matches] start TransposeEqualReshapePattern with main_opset=18 and min_opset=1
[TransposeEqualReshapePattern.match] NONE - line: 493:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[PatternOptimization.enumerate_matches] start TransposeGatherPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeMatMulPattern with main_opset=18 and min_opset=1
[TransposeMatMulPattern.match] NONE - line: 1231:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--2, inputs=x,GemmTransposePattern--p_layers_0_weight::T10,layers.0.bias
[TransposeMatMulPattern.match] NONE - line: 1193:yobx.xoptim.patterns.onnx_matmul, op_type=Gemm, name=GemmTransposePattern--MatMulAddPattern--23, inputs=relu,GemmTransposePattern--p_layers_2_weight::T10,layers.2.bias
[PatternOptimization.enumerate_matches] start TransposeReshapeMatMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start TransposeReshapeTransposePattern with main_opset=18 and min_opset=1
[TransposeReshapeTransposePattern.match] NONE - line: 245:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[PatternOptimization.enumerate_matches] start TransposeTransposePattern with main_opset=18 and min_opset=1
[TransposeTransposePattern.match] NONE - line: 99:yobx.xoptim.patterns.onnx_transpose, op_type=Transpose, name=GemmTransposePattern--MatMulAddPattern--, inputs=p_layers_0_weight::T10
[PatternOptimization.enumerate_matches] start UnsqueezeEqualPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start UnsqueezeOrSqueezeReshapePattern with main_opset=18 and min_opset=1
[UnsqueezeOrSqueezeReshapePattern.match] NONE - line: 1923:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start UnsqueezeReshapePattern with main_opset=18 and min_opset=1
[UnsqueezeReshapePattern.match] NONE - line: 1796:yobx.xoptim.patterns.onnx_reshape, op_type=Reshape, name=TransposeEqualReshapePattern--B--GemmTransposePattern--MatMulAddPattern--22, inputs=p_layers_2_weight::T10,init7_s2_1_32
[PatternOptimization.enumerate_matches] start UnsqueezeUnsqueezePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start WhereAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RotaryConcatPartPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionAttentionGQAPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCausalMaskMulAddPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionCosSinCachePattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start FunctionHalfRotaryEmbeddingPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start RMSNormalizationMulPattern with main_opset=18 and min_opset=1
[PatternOptimization.enumerate_matches] start AttentionGQAPattern with main_opset=18 and min_opset=1
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0 - matching_step done 0
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0F0 - apply_step with 0 matches
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0F0 - done with 0 applied patterns
[GraphBuilderPatternOptimization-HKI.optimize] done all: -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0F0 - remove_duplicated_shape
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0F0 - remove_duplicated_shape done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0F0 - remove_identity
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 5
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 5 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 5 nodes in 6.586599999991449e-05 seconds
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0F0 - remove_identity done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0F0 - remove_unused
[GraphBuilderPatternOptimization-HKI.optimize] it=6C0F0 - remove_unused done -0 +0 nodes
[GraphBuilderPatternOptimization-HKI.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-HKI.optimize] done after 7 iterations with 5 nodes in 0.042
STAT apply_GemmTransposePattern +4 -2 #it=1 maxmatch=1 i=2 - time=0.0012291799998820352
STAT apply_MatMulAddPattern +2 -4 #it=1 maxmatch=1 i=2 - time=0.0006576200000836252
STAT apply_TransposeEqualReshapePattern +1 -1 #it=1 maxmatch=0 i=1 - time=0.0009010479999460586
STAT build_graph_for_pattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0005438709998770719
STAT check_pattern_00 +0 -0 #it=1 maxmatch=0 i=0 - time=4.5497999963117763e-05
STAT check_pattern_A10 +0 -0 #it=3 maxmatch=0 i=0 - time=1.0028000019701722e-05
STAT check_pattern_A20 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00038403500002459623
STAT check_pattern_BD0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0002653129998861914
STAT check_pattern_BI0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0003437830000621034
STAT check_pattern_BUS0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00032371599991165567
STAT insert_and_remove_nodes +0 -0 #it=0 maxmatch=0 i=0 - time=0.0011406959998794264
STAT iteration_0 +0 -0 #it=1 maxmatch=0 i=0 - time=0.011335422000001927
STAT iteration_1 +0 -0 #it=1 maxmatch=0 i=0 - time=0.004502276000039274
STAT iteration_2 +0 -0 #it=1 maxmatch=0 i=0 - time=0.0043449649999729445
STAT iteration_3 +0 -0 #it=1 maxmatch=0 i=0 - time=0.005403369000077873
STAT iteration_4 +0 -0 #it=1 maxmatch=0 i=0 - time=0.007461611999929119
STAT iteration_5 +0 -0 #it=1 maxmatch=0 i=0 - time=0.004072901999961687
STAT match_AttentionGQAPattern +0 -0 #it=5 maxmatch=2 i=0 - time=6.093399986184522e-05
STAT match_BatchNormalizationPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00018760300019948772
STAT match_BatchNormalizationTrainingPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00011644699986845808
STAT match_CastCastBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0003632849999348764
STAT match_CastCastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00021162799987450853
STAT match_CastLayerNormalizationCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010914799986494472
STAT match_CastOpCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0002584290001550471
STAT match_CastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00010558399992532941
STAT match_ClipClipPattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.925900008056487e-05
STAT match_ConcatEmptyPattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.787600004074193e-05
STAT match_ConcatGatherPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00010822600006576977
STAT match_ConcatReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00018450600009600748
STAT match_ConcatTwiceUnaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010614699999678123
STAT match_ConstantToInitializerPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010193200012054149
STAT match_ConvBiasNullPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00011287300003459677
STAT match_DropoutPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00012209099986648653
STAT match_ExpandBroadcastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.702299985292484e-05
STAT match_ExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.65620000670242e-05
STAT match_ExpandSwapPattern +0 -0 #it=6 maxmatch=0 i=0 - time=7.897299997239315e-05
STAT match_ExpandUnsqueezeExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.113299995533453e-05
STAT match_FunctionAttentionGQAPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00012034700000640441
STAT match_FunctionAttentionPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00012150099996688368
STAT match_FunctionCausalMaskMulAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00018914899987976241
STAT match_FunctionCausalMaskPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00012748900019232678
STAT match_FunctionCosSinCachePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013424700011910318
STAT match_FunctionHalfRotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.072299997114897e-05
STAT match_GathersSplitPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.405399989896978e-05
STAT match_GeluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=3.274699986377527e-05
STAT match_GemmTransposePattern +0 -0 #it=6 maxmatch=2 i=2 - time=0.0003180850000035207
STAT match_IdentityPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0013532210001585554
STAT match_LayerNormalizationPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00016531599999325408
STAT match_LayerNormalizationScalePattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.35880002543854e-05
STAT match_LeakyReluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0030293199998823184
STAT match_MatMulAddPattern +0 -0 #it=4 maxmatch=2 i=2 - time=0.00043326099989826616
STAT match_MatMulReshape2Of3Pattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.000263324000115972
STAT match_MaxReluPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010708800004977093
STAT match_MulMulMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00018273999990015
STAT match_MulMulMulScalarPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.749700009502703e-05
STAT match_MulUnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00010485400002835377
STAT match_NotNotPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.032900007037824e-05
STAT match_NotWherePattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.011900001747563e-05
STAT match_PadConvPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.68370001146468e-05
STAT match_RMSNormalizationMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=7.10979998075345e-05
STAT match_RMSNormalizationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013200400007917779
STAT match_ReduceArgTopKPattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.438299991870736e-05
STAT match_ReduceReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.669999985817412e-05
STAT match_ReduceSumNormalizePattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.678799997596798e-05
STAT match_Reshape2Of3Pattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00025902100003349915
STAT match_ReshapeMatMulReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0003051040000627836
STAT match_ReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.000194491999877755
STAT match_ReshapeReshapeBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00018244999989747157
STAT match_ReshapeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013202000013734505
STAT match_RotaryConcatPartPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00010713099982240237
STAT match_RotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.472700005768274e-05
STAT match_SameChildrenFromInputPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00020956699972884962
STAT match_SameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.002861637999899358
STAT match_SequenceConstructAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.914800011756597e-05
STAT match_ShapeBasedConcatExpandPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.455999989109841e-05
STAT match_ShapeBasedEditDistanceReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.000291691999905197
STAT match_ShapeBasedExpandBroadcastMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002063240000325095
STAT match_ShapeBasedExpandBroadcastPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002769240001043727
STAT match_ShapeBasedExpandCastWhereSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00010320600006252789
STAT match_ShapeBasedExpandSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00021911400017415872
STAT match_ShapeBasedIdentityPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013641600003211352
STAT match_ShapeBasedMatMulToMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002300610000247616
STAT match_ShapeBasedReshapeIsSqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00016148399993198836
STAT match_ShapeBasedSameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019053199991958536
STAT match_ShapeBasedShapeShapeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.000266869999904884
STAT match_ShapeBasedStaticExpandPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00011159700000007433
STAT match_ShapedBasedReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00016658999993524048
STAT match_SliceSlicePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013132099979884515
STAT match_SlicesSplitPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.813800002371863e-05
STAT match_SoftmaxCrossEntropyLossCastPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0056450589997893985
STAT match_SplitConcatPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.768900008566561e-05
STAT match_SplitToSequenceSequenceAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.740699990994472e-05
STAT match_SqueezeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00033666900003481715
STAT match_SqueezeBinaryUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00011543799996616144
STAT match_SqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00010852399998384499
STAT match_StaticConcatReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013172399997074535
STAT match_Sub1MulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.331800006544654e-05
STAT match_SwapExpandReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013922399989496625
STAT match_SwapExpandUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013368300005822675
STAT match_SwapRangeAddScalarPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.476000004975504e-05
STAT match_SwapUnaryPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019015200007288513
STAT match_SwapUnsqueezeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00012810700013687892
STAT match_SwitchOrderBinaryPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00014255499991122633
STAT match_SwitchReshapeActivationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002475159999448806
STAT match_TransposeEqualReshapePattern +0 -0 #it=6 maxmatch=2 i=1 - time=0.00019954600008986745
STAT match_TransposeGatherPattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.950699995897594e-05
STAT match_TransposeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0004822440002953954
STAT match_TransposeReshapeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00019548000000213506
STAT match_TransposeReshapeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00014657699989584216
STAT match_TransposeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013666299980741314
STAT match_UnsqueezeEqualPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.666600012929848e-05
STAT match_UnsqueezeOrSqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00011844700009078224
STAT match_UnsqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00011628999982349342
STAT match_UnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019342299992786138
STAT match_WhereAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00010244199995668168
STAT remove_duplicated_shape +0 -0 #it=7 maxmatch=0 i=0 - time=5.0262999820915866e-05
STAT remove_identity_nodes +0 -0 #it=7 maxmatch=0 i=0 - time=0.003185961000099269
STAT remove_unused +0 -0 #it=7 maxmatch=0 i=0 - time=0.0021243229998617608
--MODEL: 5 nodes, 1 inputs, 1 outputs, 5 initializers--
INPUT: 1 x 1t
INPUT-SEQ: 1 x Falset
OUTPUT: 1 x 1t
OUTPUT-SEQ: 1 x Falset
INIT: 4 x 1t
INIT: 1 x 7t
NODE: 2 x Gemm
NODE: 1 x Relu
NODE: 1 x Reshape
NODE: 1 x Transpose
--MODEL: 5 nodes, 1 inputs, 1 outputs, 5 initializers--DETAILED--
INPUT: 1 x 1t[3x10]
OUTPUT: 1 x 1t[3x1]
INIT: 1 x 1t[10x32]
INIT: 1 x 1t[1]
INIT: 1 x 1t[32]
INIT: 1 x 1t[32x1]
INIT: 1 x 7t[2]
NODE: 1 x Gemm -SIG- 1t[3x10], 1t[32x10], 1t[32]
NODE: 1 x Gemm -SIG- 1t[3x32], 1t[1x32], 1t[1]
NODE: 1 x Relu -SIG- 1t[3x32]
NODE: 1 x Reshape -SIG- 1t[32x1], 7t[2]
NODE: 1 x Transpose -SIG- 1t[10x32]-perm=1;0
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 5
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 5 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 5 nodes in 6.754200001068966e-05 seconds
[GraphBuilder-HKI.constant_folding] -- starts with 7 constants and 5 nodes.
[GraphBuilder-HKI.constant_folding] cst:: . :: linear
[GraphBuilder-HKI.constant_folding] cst:: 1 :: GemmTransposePattern--p_layers_0_weight::T10
[GraphBuilder-HKI.constant_folding] cst:: 1 :: GemmTransposePattern--p_layers_2_weight::T10
[GraphBuilder-HKI.constant_folding] cst:: 1 :: p_layers_2_weight::T10
[GraphBuilder-HKI.constant_folding] cst:: . :: _onx_matmul_x
[GraphBuilder-HKI.constant_folding] cst:: . :: _onx_matmul_relu
[GraphBuilder-HKI.constant_folding] cst:: 1 :: layers.2.bias
[GraphBuilder-HKI.constant_folding] cst:: . :: x
[GraphBuilder-HKI.constant_folding] cst:: . :: relu
[GraphBuilder-HKI.constant_folding] cst:: . :: output_0
[GraphBuilder-HKI.constant_folding] cst:: 1 :: init7_s2_1_32
[GraphBuilder-HKI.constant_folding] cst:: 1 :: p_layers_0_weight::T10
[GraphBuilder-HKI.constant_folding] cst:: 1 :: layers.0.bias
[GraphBuilder-HKI.constant_folding] initializer: p_layers_0_weight::T10
[GraphBuilder-HKI.constant_folding] initializer: p_layers_2_weight::T10
[GraphBuilder-HKI.constant_folding] initializer: layers.0.bias
[GraphBuilder-HKI.constant_folding] initializer: layers.2.bias
[GraphBuilder-HKI.constant_folding] from: Transpose(GemmTransposePattern--p_layers_0_weight::T10)
[GraphBuilder-HKI.set_type] GemmTransposePattern--p_layers_0_weight::T10:1
[GraphBuilder-HKI.make_initializer] GemmTransposePattern--p_layers_0_weight::T10[1:(32, 10)]
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=None
[GraphBuilder-HKI.constant_folding] fold_constant:Transpose:GemmTransposePattern--p_layers_0_weight::T10[float32:(32, 10)]:from:p_layers_0_weight::T10
[GraphBuilder-HKI.constant_folding] from: Reshape(GemmTransposePattern--p_layers_2_weight::T10)
[GraphBuilder-HKI.set_type] GemmTransposePattern--p_layers_2_weight::T10:1
[GraphBuilder-HKI.make_initializer] GemmTransposePattern--p_layers_2_weight::T10[1:(1, 32)]
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=None
[GraphBuilder-HKI.constant_folding] fold_constant:Reshape:GemmTransposePattern--p_layers_2_weight::T10[float32:(1, 32)]:from:init7_s2_1_32,p_layers_2_weight::T10
[GraphBuilder-HKI.constant_folding] initializer: init7_s2_1_32
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_0_weight::T10', node=None
[GraphBuilder-HKI.update_node_constant] new constant 'GemmTransposePattern--p_layers_2_weight::T10', node=None
[GraphBuilder-HKI.constant_folding] ends with 7 constants and 3 nodes in 0.0006276640000351108 seconds
[GraphBuilder-HKI.remove_unused] remove_initializer 1:0/7:p_layers_0_weight::T10
[GraphBuilder-HKI.remove_unused] remove_initializer 2:1/7:p_layers_2_weight::T10
[GraphBuilder-HKI.remove_unused] remove_initializer 3:4/7:init7_s2_1_32:int64[(2,)]
[GraphBuilder-HKI.remove_identity_nodes] -- starts with 3
[GraphBuilder-HKI.remove_identity_nodes] found 0 replacements
[GraphBuilder-HKI.remove_identity_nodes] kept 3 nodes
[GraphBuilder-HKI.remove_identity_nodes] ends with 3 nodes in 3.482900001472444e-05 seconds
[OrderOptimization.optimize] ALGO-2
[OrderOptimization.shape_order] -- starts with 3 nodes, 4 initializers
[OrderOptimization.shape_order] done after in 6.21889998910774e-05s with changed=0 scale=0
[GraphBuilder-HKI.optimize] done with 3 nodes in 0.050
STAT apply_GemmTransposePattern +4 -2 #it=1 maxmatch=1 i=2 - time=0.0012291799998820352
STAT apply_MatMulAddPattern +2 -4 #it=1 maxmatch=1 i=2 - time=0.0006576200000836252
STAT apply_TransposeEqualReshapePattern +1 -1 #it=1 maxmatch=0 i=1 - time=0.0009010479999460586
STAT apply_constant_folding__Reshape +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
STAT apply_constant_folding__Transpose +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
STAT apply_constant_folding_new_inits +0 -0 #it=1 maxmatch=0 i=0 - time=0.0
STAT build_graph_for_pattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0005438709998770719
STAT check_A-dynamic_dimension_naming +0 -0 #it=0 maxmatch=0 i=0 - time=2.3690000034548575e-05
STAT check_A-opt-sub +0 -0 #it=0 maxmatch=0 i=0 - time=2.646799998728966e-05
STAT check_constant_folding-2 +0 -0 #it=0 maxmatch=0 i=0 - time=2.817600000071252e-05
STAT check_constant_folding-7 +0 -0 #it=0 maxmatch=0 i=0 - time=3.5955999919679016e-05
STAT check_order-12 +0 -0 #it=0 maxmatch=0 i=0 - time=1.722100000733917e-05
STAT check_orderA +0 -0 #it=0 maxmatch=0 i=0 - time=2.3943000087456312e-05
STAT check_orderL +0 -0 #it=0 maxmatch=0 i=0 - time=1.624100002572959e-05
STAT check_pattern_00 +0 -0 #it=1 maxmatch=0 i=0 - time=4.5497999963117763e-05
STAT check_pattern_A10 +0 -0 #it=3 maxmatch=0 i=0 - time=1.0028000019701722e-05
STAT check_pattern_A20 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00038403500002459623
STAT check_pattern_BD0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0002653129998861914
STAT check_pattern_BI0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.0003437830000621034
STAT check_pattern_BUS0 +0 -0 #it=7 maxmatch=0 i=0 - time=0.00032371599991165567
STAT check_patterns-4 +0 -0 #it=0 maxmatch=0 i=0 - time=7.055399998989742e-05
STAT check_remove_duplicated_initializer-9 +0 -0 #it=0 maxmatch=0 i=0 - time=1.8834999991668155e-05
STAT check_remove_identity-0 +0 -0 #it=0 maxmatch=0 i=0 - time=4.19909999891388e-05
STAT check_remove_identity-10 +0 -0 #it=0 maxmatch=0 i=0 - time=1.9769000004998816e-05
STAT check_remove_identity-6 +0 -0 #it=0 maxmatch=0 i=0 - time=3.764699999919685e-05
STAT check_remove_unused-1 +0 -0 #it=0 maxmatch=0 i=0 - time=3.872100000990031e-05
STAT check_remove_unused-11 +0 -0 #it=0 maxmatch=0 i=0 - time=1.8206000049758586e-05
STAT check_remove_unused-3 +0 -0 #it=0 maxmatch=0 i=0 - time=4.461499997887586e-05
STAT check_remove_unused-5 +0 -0 #it=0 maxmatch=0 i=0 - time=5.365200001961057e-05
STAT check_remove_unused-8 +0 -0 #it=0 maxmatch=0 i=0 - time=2.3382999984278285e-05
STAT constant_folding +0 -2 #it=0 maxmatch=0 i=0 - time=0.0011071710000578605
STAT dynamic_dimension_naming +0 -0 #it=0 maxmatch=0 i=0 - time=3.664799999114621e-05
STAT insert_and_remove_nodes +0 -0 #it=0 maxmatch=0 i=0 - time=0.0011406959998794264
STAT iteration_0 +0 -0 #it=1 maxmatch=0 i=0 - time=0.011335422000001927
STAT iteration_1 +0 -0 #it=1 maxmatch=0 i=0 - time=0.004502276000039274
STAT iteration_2 +0 -0 #it=1 maxmatch=0 i=0 - time=0.0043449649999729445
STAT iteration_3 +0 -0 #it=1 maxmatch=0 i=0 - time=0.005403369000077873
STAT iteration_4 +0 -0 #it=1 maxmatch=0 i=0 - time=0.007461611999929119
STAT iteration_5 +0 -0 #it=1 maxmatch=0 i=0 - time=0.004072901999961687
STAT match_AttentionGQAPattern +0 -0 #it=5 maxmatch=2 i=0 - time=6.093399986184522e-05
STAT match_BatchNormalizationPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00018760300019948772
STAT match_BatchNormalizationTrainingPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00011644699986845808
STAT match_CastCastBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0003632849999348764
STAT match_CastCastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00021162799987450853
STAT match_CastLayerNormalizationCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010914799986494472
STAT match_CastOpCastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0002584290001550471
STAT match_CastPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00010558399992532941
STAT match_ClipClipPattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.925900008056487e-05
STAT match_ConcatEmptyPattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.787600004074193e-05
STAT match_ConcatGatherPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00010822600006576977
STAT match_ConcatReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00018450600009600748
STAT match_ConcatTwiceUnaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010614699999678123
STAT match_ConstantToInitializerPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010193200012054149
STAT match_ConvBiasNullPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00011287300003459677
STAT match_DropoutPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00012209099986648653
STAT match_ExpandBroadcastPattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.702299985292484e-05
STAT match_ExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.65620000670242e-05
STAT match_ExpandSwapPattern +0 -0 #it=6 maxmatch=0 i=0 - time=7.897299997239315e-05
STAT match_ExpandUnsqueezeExpandPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.113299995533453e-05
STAT match_FunctionAttentionGQAPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00012034700000640441
STAT match_FunctionAttentionPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00012150099996688368
STAT match_FunctionCausalMaskMulAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00018914899987976241
STAT match_FunctionCausalMaskPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00012748900019232678
STAT match_FunctionCosSinCachePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013424700011910318
STAT match_FunctionHalfRotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.072299997114897e-05
STAT match_GathersSplitPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.405399989896978e-05
STAT match_GeluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=3.274699986377527e-05
STAT match_GemmTransposePattern +0 -0 #it=6 maxmatch=2 i=2 - time=0.0003180850000035207
STAT match_IdentityPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0013532210001585554
STAT match_LayerNormalizationPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00016531599999325408
STAT match_LayerNormalizationScalePattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.35880002543854e-05
STAT match_LeakyReluPattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.0030293199998823184
STAT match_MatMulAddPattern +0 -0 #it=4 maxmatch=2 i=2 - time=0.00043326099989826616
STAT match_MatMulReshape2Of3Pattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.000263324000115972
STAT match_MaxReluPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00010708800004977093
STAT match_MulMulMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00018273999990015
STAT match_MulMulMulScalarPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.749700009502703e-05
STAT match_MulUnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.00010485400002835377
STAT match_NotNotPattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.032900007037824e-05
STAT match_NotWherePattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.011900001747563e-05
STAT match_PadConvPattern +0 -0 #it=7 maxmatch=0 i=0 - time=9.68370001146468e-05
STAT match_RMSNormalizationMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=7.10979998075345e-05
STAT match_RMSNormalizationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013200400007917779
STAT match_ReduceArgTopKPattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.438299991870736e-05
STAT match_ReduceReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=9.669999985817412e-05
STAT match_ReduceSumNormalizePattern +0 -0 #it=6 maxmatch=0 i=0 - time=8.678799997596798e-05
STAT match_Reshape2Of3Pattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00025902100003349915
STAT match_ReshapeMatMulReshapePattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.0003051040000627836
STAT match_ReshapePattern +0 -0 #it=7 maxmatch=0 i=0 - time=0.000194491999877755
STAT match_ReshapeReshapeBinaryPattern +0 -0 #it=6 maxmatch=0 i=0 - time=0.00018244999989747157
STAT match_ReshapeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013202000013734505
STAT match_RotaryConcatPartPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00010713099982240237
STAT match_RotaryEmbeddingPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.472700005768274e-05
STAT match_SameChildrenFromInputPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00020956699972884962
STAT match_SameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.002861637999899358
STAT match_SequenceConstructAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.914800011756597e-05
STAT match_ShapeBasedConcatExpandPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.455999989109841e-05
STAT match_ShapeBasedEditDistanceReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.000291691999905197
STAT match_ShapeBasedExpandBroadcastMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002063240000325095
STAT match_ShapeBasedExpandBroadcastPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002769240001043727
STAT match_ShapeBasedExpandCastWhereSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00010320600006252789
STAT match_ShapeBasedExpandSwapPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00021911400017415872
STAT match_ShapeBasedIdentityPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013641600003211352
STAT match_ShapeBasedMatMulToMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002300610000247616
STAT match_ShapeBasedReshapeIsSqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00016148399993198836
STAT match_ShapeBasedSameChildrenPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019053199991958536
STAT match_ShapeBasedShapeShapeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.000266869999904884
STAT match_ShapeBasedStaticExpandPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00011159700000007433
STAT match_ShapedBasedReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00016658999993524048
STAT match_SliceSlicePattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00013132099979884515
STAT match_SlicesSplitPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.813800002371863e-05
STAT match_SoftmaxCrossEntropyLossCastPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.0056450589997893985
STAT match_SplitConcatPattern +0 -0 #it=6 maxmatch=2 i=0 - time=9.768900008566561e-05
STAT match_SplitToSequenceSequenceAtPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.740699990994472e-05
STAT match_SqueezeAddPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00033666900003481715
STAT match_SqueezeBinaryUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00011543799996616144
STAT match_SqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00010852399998384499
STAT match_StaticConcatReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013172399997074535
STAT match_Sub1MulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.331800006544654e-05
STAT match_SwapExpandReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013922399989496625
STAT match_SwapExpandUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013368300005822675
STAT match_SwapRangeAddScalarPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.476000004975504e-05
STAT match_SwapUnaryPattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019015200007288513
STAT match_SwapUnsqueezeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00012810700013687892
STAT match_SwitchOrderBinaryPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00014255499991122633
STAT match_SwitchReshapeActivationPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0002475159999448806
STAT match_TransposeEqualReshapePattern +0 -0 #it=6 maxmatch=2 i=1 - time=0.00019954600008986745
STAT match_TransposeGatherPattern +0 -0 #it=7 maxmatch=2 i=0 - time=9.950699995897594e-05
STAT match_TransposeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.0004822440002953954
STAT match_TransposeReshapeMatMulPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00019548000000213506
STAT match_TransposeReshapeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00014657699989584216
STAT match_TransposeTransposePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00013666299980741314
STAT match_UnsqueezeEqualPattern +0 -0 #it=6 maxmatch=2 i=0 - time=8.666600012929848e-05
STAT match_UnsqueezeOrSqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00011844700009078224
STAT match_UnsqueezeReshapePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00011628999982349342
STAT match_UnsqueezeUnsqueezePattern +0 -0 #it=7 maxmatch=2 i=0 - time=0.00019342299992786138
STAT match_WhereAddPattern +0 -0 #it=6 maxmatch=2 i=0 - time=0.00010244199995668168
STAT order +0 -0 #it=0 maxmatch=0 i=0 - time=0.00010825200001818303
STAT patterns +0 -0 #it=0 maxmatch=0 i=0 - time=0.045605396000041765
STAT remove_duplicated_initializer +0 -0 #it=0 maxmatch=0 i=0 - time=7.781099998283025e-05
STAT remove_duplicated_shape +0 -0 #it=7 maxmatch=0 i=0 - time=5.0262999820915866e-05
STAT remove_identity +0 -0 #it=0 maxmatch=0 i=0 - time=0.0007607559999769364
STAT remove_identity_nodes +0 -0 #it=7 maxmatch=0 i=0 - time=0.003185961000099269
STAT remove_unused +0 -0 #it=7 maxmatch=0 i=0 - time=0.0032281339998689873
STAT shape_order +0 -0 #it=0 maxmatch=0 i=0 - time=6.781199999750243e-05
--MODEL: 3 nodes, 1 inputs, 1 outputs, 4 initializers--
INPUT: 1 x 1t
INPUT-SEQ: 1 x Falset
OUTPUT: 1 x 1t
OUTPUT-SEQ: 1 x Falset
INIT: 4 x 1t
NODE: 2 x Gemm
NODE: 1 x Relu
--MODEL: 3 nodes, 1 inputs, 1 outputs, 4 initializers--DETAILED--
INPUT: 1 x 1t[3x10]
OUTPUT: 1 x 1t[3x1]
INIT: 1 x 1t[1]
INIT: 1 x 1t[1x32]
INIT: 1 x 1t[32]
INIT: 1 x 1t[32x10]
NODE: 1 x Gemm -SIG- 1t[3x10], 1t[32x10], 1t[32]
NODE: 1 x Gemm -SIG- 1t[3x32], 1t[1x32], 1t[1]
NODE: 1 x Relu -SIG- 1t[3x32]
[GraphBuilder-HKI.to_onnx] make_model 4 inits 0 params
[GraphBuilder-HKI.time_evaluation_constants_] 0
[GraphBuilder-HKI._build_initializers] start with 4 initializers, large_model=False, external_threshold=1024
[GraphBuilder-HKI._build_initializers] switch low/high order
[GraphBuilder-HKI._build_initializers] TensorProto-layers.0.bias:1[(32,)]
[GraphBuilder-HKI._build_initializers] TensorProto-layers.2.bias:1[(1,)]
[GraphBuilder-HKI._build_initializers] <ndarray>-GemmTransposePattern--p_layers_0_weight::T10:float32[(32, 10)]
[GraphBuilder-HKI._build_initializers] <ndarray>-GemmTransposePattern--p_layers_2_weight::T10:float32[(1, 32)]
[GraphBuilder-HKI._build_initializers] done in 2.8069999871149776e-06s with 4 initializers, 0 large initializers
[GraphBuilder-HKI._add_shape_information] dynamic shapes replacements={}
Select the pattern to use#
Class OptimizationOptions
is used to enable or disable patterns.
<<<
import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.doc import demo_mlp_model
onx = demo_mlp_model("temp_doc_mlp.onnx")
gr = GraphBuilder(
onx,
infer_shapes_options=True,
optimization_options=OptimizationOptions(
patterns="TransposeTranspose,TransposeMatMul", verbose=1
),
)
opt_onx = gr.to_onnx(optimize=True)
>>>
[GraphBuilder-NSY.optimize] start with 5 nodes
[GraphBuilder-NSY.optimize] #patterns=2
[GraphBuilderPatternOptimization-NSY.optimize] start with 5 nodes, 4 initializers, 2 patterns, priorities=[0, 1], max_iter=20
[GraphBuilderPatternOptimization-NSY.optimize] iteration 0: 5 nodes, priority=0
[GraphBuilderPatternOptimization-NSY.optimize] increase priority to 1
[GraphBuilderPatternOptimization-NSY.optimize] iteration 1: 5 nodes, priority=1
[GraphBuilderPatternOptimization-NSY.optimize] stops current_priority_index=2, priorities=[0, 1]
[GraphBuilderPatternOptimization-NSY.optimize] done after 2 iterations with 5 nodes in 0.001
[OrderOptimization.optimize] ALGO-2
[OrderOptimization.shape_order] -- starts with 5 nodes, 4 initializers
[OrderOptimization.shape_order] done after in 6.880000000819564e-05s with changed=0 scale=0
[GraphBuilder-NSY.optimize] done with 5 nodes in 0.003
There exists some predefined lists of patterns:
default: includes all patterns using only standard onnx patterns.onnxruntime: patterns specific to onnxruntime, the final model may be executed by onnxruntime and possibly only onnxruntime as it may introduce patterns from Supported Operators and Data Types.
<<<
import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.doc import demo_mlp_model
onx = demo_mlp_model("temp_doc_mlp.onnx")
gr = GraphBuilder(
onx,
infer_shapes_options=True,
optimization_options=OptimizationOptions(patterns="default+onnxruntime", verbose=1),
)
opt_onx = gr.to_onnx(optimize=True)
>>>
[GraphBuilder-GLC.optimize] start with 5 nodes
[GraphBuilder-GLC.optimize] #patterns=138
[GraphBuilderPatternOptimization-GLC.optimize] start with 5 nodes, 4 initializers, 138 patterns, priorities=[0, 1, 2, 3], max_iter=40
[GraphBuilderPatternOptimization-GLC.optimize] same children={'SameChildrenPattern', 'SameChildrenFromInputPattern'}
[GraphBuilderPatternOptimization-GLC.optimize] iteration 0: 5 nodes, priority=0
[GraphBuilderPatternOptimization-GLC.optimize] increase priority to 1
[GraphBuilderPatternOptimization-GLC.optimize] iteration 1: 5 nodes, priority=1
[GraphBuilderPatternOptimization-GLC.optimize] increase priority to 2
[GraphBuilderPatternOptimization-GLC.optimize] iteration 2: 5 nodes, priority=2
[GraphBuilderPatternOptimization-GLC.optimize] increase priority to 3
[GraphBuilderPatternOptimization-GLC.optimize] iteration 3: 5 nodes, priority=3
[GraphBuilderPatternOptimization-GLC.optimize] applies 2 matches, 2*MatMulAddPattern - time=0.002 | max_time=IdentityPattern:0.000
[GraphBuilderPatternOptimization-GLC.optimize] iteration 4: 3 nodes, priority=3
[GraphBuilderPatternOptimization-GLC.optimize] applies 2 matches, 2*GemmTransposePattern - time=0.001 | max_time=ShapeBasedConcatExpandPattern:0.000
[GraphBuilderPatternOptimization-GLC.optimize] iteration 5: 5 nodes, priority=3
[GraphBuilderPatternOptimization-GLC.optimize] applies 1 matches, [0]=MatchResult: TransposeEqualReshapePattern replaces ['Transpose'] - time=0.002 | max_time=TransposeMatMulPattern:0.000
[GraphBuilderPatternOptimization-GLC.optimize] iteration 6: 5 nodes, priority=3
[GraphBuilderPatternOptimization-GLC.optimize] stops current_priority_index=4, priorities=[0, 1, 2, 3]
[GraphBuilderPatternOptimization-GLC.optimize] done after 7 iterations with 5 nodes in 0.027
[OrderOptimization.optimize] ALGO-2
[OrderOptimization.shape_order] -- starts with 3 nodes, 4 initializers
[OrderOptimization.shape_order] done after in 8.861900005285861e-05s with changed=0 scale=0
[GraphBuilder-GLC.optimize] done with 3 nodes in 0.031
Statistics#
This can be used to see when a pattern is applied and how long it takes.
<<<
import pandas
import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.doc import demo_mlp_model
onx = demo_mlp_model("temp_doc_mlp.onnx")
gr = GraphBuilder(
onx,
infer_shapes_options=True,
optimization_options=OptimizationOptions(patterns="default"),
)
stat = gr.optimize()
print(pandas.DataFrame(stat))
>>>
pattern removed added time_in value iteration instances match_index n_nodes exit_point changed scale algo
0 dynamic_dimension_naming 0.0 0.0 0.000035 NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 check_A-dynamic_dimension_naming NaN NaN 0.000036 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 check_A-opt-sub NaN NaN 0.000022 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 remove_identity 0.0 0.0 0.000195 NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 check_remove_identity-0 NaN NaN 0.000025 NaN NaN NaN NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
738 check_orderL NaN NaN 0.000014 NaN NaN NaN NaN NaN NaN NaN NaN NaN
739 shape_order NaN NaN 0.000055 NaN NaN NaN NaN NaN NaN 0.0 0.0 NaN
740 order NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2
741 check_order-12 NaN NaN 0.000014 NaN NaN NaN NaN NaN NaN NaN NaN NaN
742 optimization 2.0 0.0 0.034161 NaN NaN NaN NaN NaN NaN NaN NaN NaN
[743 rows x 13 columns]
It can be aggregated:
<<<
import pandas
import onnx
from yobx.xbuilder import GraphBuilder, OptimizationOptions
from yobx.doc import demo_mlp_model
onx = demo_mlp_model("temp_doc_mlp.onnx")
gr = GraphBuilder(
onx,
infer_shapes_options=True,
optimization_options=OptimizationOptions(patterns="default"),
)
stat = gr.optimize()
df = pandas.DataFrame(stat)
for c in df.columns:
if "time" not in c and "pattern" not in c and "exit_point" not in c:
df[c] = df[c].fillna(0).astype(int)
aggs = {
"time_in": "sum",
"added": "sum",
"removed": "sum",
"iteration": "max",
"match_index": "max",
"instances": "sum",
}
print(df.groupby("pattern").agg(aggs))
>>>
time_in added removed iteration match_index instances
pattern
apply_GemmTransposePattern 0.001037 4 2 4 1 2
apply_MatMulAddPattern 0.000743 2 4 3 1 2
apply_TransposeEqualReshapePattern 0.000684 1 1 5 0 1
apply_constant_folding__Reshape 0.000000 0 0 0 0 0
apply_constant_folding__Transpose 0.000000 0 0 0 0 0
... ... ... ... ... ... ...
remove_duplicated_shape 0.000039 0 0 6 0 0
remove_identity 0.000658 0 0 0 0 0
remove_identity_nodes 0.002072 0 0 6 0 0
remove_unused 0.003203 0 0 6 0 0
shape_order 0.000088 0 0 0 0 0
[146 rows x 6 columns]
Matching Algorithm#
EasyPatternOptimization
implements a bidirectional subgraph-matching algorithm that avoids a full
enumeration of all possible node assignments. Rather than writing a custom
match method, the user only has to declare the subgraph to look for
(match_pattern) and the replacement (apply_pattern) using the same
builder API that is used to build ONNX graphs.
Pattern definition#
Both match_pattern and apply_pattern are written as regular Python
functions that call g.op.<OpType>(...) to create nodes.
Each positional argument becomes a symbolic input to the subgraph.
The function returns the name(s) of the symbolic output(s).
class TransposeTransposePattern(EasyPatternOptimization):
def match_pattern(self, g: "GraphBuilder", x):
t1 = g.op.Transpose(x)
return g.op.Transpose(t1)
def apply_pattern(self, g: "GraphBuilder", x):
return x # two transposes cancel each other
At build time the framework converts each function into a small
GraphBuilderPatternOptimization that stores the nodes in
topological order. The last node of the match pattern is used as the
anchor: the matching loop only fires when a graph node has the same
op_type as that anchor.
Bidirectional matching#
Given a candidate graph node with the same type as the anchor, the algorithm expands the match iteratively with a stack-based approach:
marked = {anchor_pattern_key: (graph_node, anchor_pattern_node)}
stacked = [anchor_pattern_key]
while stacked:
(graph_node, pattern_node) = pop(stacked)
# --- backward pass ---
# Walk up the predecessors of pattern_node.
# For each predecessor in the pattern, find the corresponding
# predecessor in the graph. Fail if types or arities differ.
backward_match(graph_node, pattern_node)
# --- forward pass ---
# Walk down the successors of pattern_node.
# For each successor in the pattern, find the corresponding
# successor in the graph. Fail if types or arities differ.
forward_match(graph_node, pattern_node)
# New matched pairs are pushed onto stacked.
The two sub-routines are implemented in
_match_backward
and
_match_forward.
Ambiguity detection#
A dictionary pair_results_names maps every pattern result name to the
graph result name it has been paired with. Before recording a new pair the
algorithm checks that neither name already points to a different name
(ambiguity). An ambiguity means the same pattern result would have to
correspond to two different graph results simultaneously, which would be
inconsistent; the match is rejected in that case.
Validation#
After all pattern nodes have been matched the algorithm performs two additional checks:
validate_attribute_mapping– verifies that the attributes of the matched graph nodes are consistent with those declared in the pattern (e.g. sameaxisvalue).validate_mapping– an optional hook for subclasses to add arbitrary semantic checks (e.g. verify that a constant operand has a specific numerical value).
Only when both validations succeed does the method return a
MatchResult that schedules the matched
nodes for replacement.
Overlap prevention#
The outer loop (see Optimization Algorithm above) maintains a marked set
of all node identifiers that have already been claimed by an earlier
MatchResult. A candidate match is
discarded if any of its nodes appears in that set, so no two rewrites ever
touch the same node during the same pass.
Worked examples#
The two classes cover the same use-cases but at different levels of abstraction. The examples below both implement a Not + Not → Identity fusion so that the difference is easy to compare.
PatternOptimization (manual match / apply)
The developer writes the matching logic by hand, navigating the graph with the
helpers provided by
GraphBuilderPatternOptimization.
import inspect
from typing import List, Optional
from onnx import NodeProto
from yobx.xoptim import PatternOptimization, MatchResult
class NotNotPattern(PatternOptimization):
"""Fuses ``Not(Not(x))`` into ``Identity(x)``."""
def match(
self,
g: "GraphBuilderPatternOptimization",
node: NodeProto,
matched: List[MatchResult],
) -> Optional[MatchResult]:
# Only consider Not nodes.
if node.op_type != "Not" or node.domain != "":
return self.none()
# Walk one step backward: the producer of node's input must also be Not.
not_before = g.node_before(node.input[0])
if not_before is None or not_before.op_type != "Not" or not_before.domain != "":
return self.none(node, inspect.currentframe().f_lineno)
# Return both nodes as the rewrite target.
return MatchResult(self, [not_before, node], self.apply, insert_at=node)
def apply(
self,
g: "GraphBuilder",
not_before: NodeProto,
not_after: NodeProto,
) -> List[NodeProto]:
pre_nodes = []
# Keep the first Not if its output is consumed elsewhere.
if g.is_used_more_than_once(not_before.output[0]):
pre_nodes.append(not_before)
return [
*pre_nodes,
g.make_node(
"Identity",
[not_before.input[0]],
[not_after.output[0]],
name=f"{self.__class__.__name__}--{not_after.name}",
),
]
EasyPatternOptimization (declarative match_pattern / apply_pattern)
The developer declares the subgraph to look for and the replacement as builder calls. The framework takes care of matching and result renaming automatically.
from typing import List, Optional
from onnx import NodeProto
from yobx.xoptim import EasyPatternOptimization, MatchResult
class NotNotEasyPattern(EasyPatternOptimization):
"""Fuses ``Not(Not(x))`` into ``Identity(x)`` using the easy API."""
def match_pattern(self, g: "GraphBuilder", x):
t = g.op.Not(x) # first Not
return g.op.Not(t) # second Not <-- anchor node
def apply_pattern(self, g: "GraphBuilder", x):
return g.op.Identity(x)
Key differences#
Aspect |
|
|
|---|---|---|
Matching logic |
Written by hand in |
Declared as a Python function |
Replacement logic |
Written by hand in |
Declared as a Python function |
Flexibility |
Full control: can inspect any attribute, handle optional inputs, cope with multi-output rewrites, or make graph-wide checks. |
More constrained: the subgraph must have a fixed topology with no
branching within the pattern. Attribute checks require overriding
|
Typical use-case |
Complex rewrites (e.g. Attention fusion) where the matching involves many conditional checks that are hard to express as a fixed topology. |
Simple structural fusions (e.g. double-Not, LeakyRelu decomposition, Gelu decomposition) where the topology is fixed and self-describing. |
Shape inference#
The optimizers require to know the shapes to ensure they can rewrite some nodes and avoid producing a model which does not return the same results. If it is missing, some patterns cannot match for sure and they will not match.
This information can be built by running shape inference on the onnx models. That’s what is done in the previous examples. However, the best case is when this information comes from torch.
Function to_onnx
converts a torch model into ONNX. While doing so, it stores the shape
information coming from torch. There is no need to run shape inference
on the onnx model it generates before optimizing it.
Available Patterns and API#
All patterns are documented in Available Patterns.
When writing a pattern, walking along the graph or checking the shape
is very common. Class GraphBuilderPatternOptimization
provides the following methods.
Opsets#
Patterns must rewrite using the nodes of the opset defined in the model.
main_opset: returns the opset
Shapes, Types#
has_type: tells if a result type is knownget_type: returns a result type, fails if not knownhas_shape: tells if a result shape is knownget_shape: returns a result shape, fails if not knownhas_rank: tells if a result rank is knownget_rank: returns a result rank, fails if not knowntry_infer_type: returns a type if it can be guessedtry_infer_shape: returns a shape if it can be guessedhas_device: tells if a result device is knownget_device: returns a result device, fails if not known
Constants#
is_constant: tells if a node is a constant (it may be a constant, an initializer or any value built on other constants)is_constant_scalar: checks a constant is a scalar and compares its value to a numberget_computed_constant: returns the constant, computing it if it is a constant built from other constantsget_attribute: returns an attribute of a node
Graph#
next_node: returns the next node only if there is only onenext_nodes: returns the node consuming this resultnode_before: returns the node producing the resultis_output: tells if a result is an outputis_used_by_subgraph: tells if a result is used by a subgraphis_used_more_than_once: tells if a result is used more than onceis_used_only_by: tells if a result is only used by specific nodes
Nodes#
make_node: creates a node without adding it to the graphmake_node_check_opset: creates a node without adding it to the graph, deals with some constraints related to opset version
Debugging Optimization with Environment Variables#
Several environment variables can be set to help debug the pattern optimizer.
LOG_PATTERN_OPTIMIZE: sets the verbosity level for all patterns. Setting it to10produces the most detailed output. Example:LOG_PATTERN_OPTIMIZE=10 python my_script.py
PATTERN: increases the verbosity to10for one or more specific patterns (comma-separated class names or class names with thePatternsuffix removed). This is useful to focus on a single pattern without flooding the output with information from all the others. Example:PATTERN=ReshapeReshapePattern python my_script.py
<ClassName>: setting an environment variable whose name matches the class name of a pattern (e.g.ReshapeReshapePattern=10) sets the verbosity for that individual pattern. This is equivalent to usingPATTERNbut more explicit.DROPPATTERN: comma-separated list of pattern class names to exclude from the optimizer. Useful to bisect which pattern is causing a wrong result or an unexpected error. Example:DROPPATTERN=ReshapeReshapePattern,CastPattern python my_script.py
DUMPPATTERNS: when set to a folder path, the optimizer writes the matched nodes and their replacements to that folder for every successful pattern application. Useful for inspecting what the optimizer is actually doing. Example:DUMPPATTERNS=/tmp/dump_patterns python my_script.py
PATTERNNOREMOVE: when set to a result name, the optimizer raises an exception if an optimization step removes that name from the graph. Useful to track down which pattern is eliminating a particular node or result. Example:PATTERNNOREMOVE=output_0 python my_script.py
PATTERNSTEP: when set to1,True, ortrue, the optimizer runs one optimization step at a time, which can help narrow down which step introduces a problem. Example:PATTERNSTEP=1 python my_script.py