yobx.xoptim.patterns_ort.llm_optim#
- class yobx.xoptim.patterns_ort.llm_optim.Attention3DPattern(verbose: int = 0, priority: int = 2)[source]#
Fuses nodes into Attention from com.microsoft domain. In progress.
- apply(g: GraphBuilder, mm_q: NodeProto, re_q: NodeProto, tr_q: NodeProto, mm_k: NodeProto, re_k: NodeProto, tr_k: NodeProto, mm_v: NodeProto, re_v: NodeProto, tr_v: NodeProto, attention: NodeProto, transpose: NodeProto, reshape: NodeProto) List[NodeProto][source]#
The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.
- Parameters:
nodes – nodes returned by method match, there are then removed
- Returns:
nodes to add to graph.
- match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#
Determines nodes around node which can be rewritten.
- Parameters:
g – is a
GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.
matched – usually unused, it returns of nodes already matching a pattern
The method must not modify the graph. The method returns None if no match is found or an instance of class
MatchResult. It must contain:a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.
- class yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbedding3DPattern(verbose: int = 0, priority: int = 1, min_opset: int = 1)[source]#
Extension to
yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbeddingPattern, turn the operator into a 3D operator including the transpose.Model with nodes to be fused:
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 icrote_m2x2(["ContribRotaryEmbeddingPattern--m2x2 FLOAT(NEWDIM_range, 2)"]) I_position_ids(["position_ids INT64(a, e)"]) icrote_m1x2(["ContribRotaryEmbeddingPattern--m1x2 FLOAT(NEWDIM_range, 2)"]) I_X(["X FLOAT(a, c, 2, d)"]) Transpose_0[["Transpose(., perm=[0, 2, 1, 3])"]] RotaryEmbedding_1[["com.microsoft.RotaryEmbedding(., ., ., .)"]] I_X -->|"FLOAT(a, c, 2, d)"| Transpose_0 Transpose_0 -->|"FLOAT(a, 2, c, d)"| RotaryEmbedding_1 I_position_ids -->|"INT64(a, e)"| RotaryEmbedding_1 icrote_m1x2 -->|"FLOAT(NEWDIM_range, 2)"| RotaryEmbedding_1 icrote_m2x2 -->|"FLOAT(NEWDIM_range, 2)"| RotaryEmbedding_1 O_Y(["Y FLOAT(a, b, c, d)"]) RotaryEmbedding_1 --> O_Y class icrote_m2x2,I_position_ids,icrote_m1x2,I_X,O_Y ioNode class Transpose_0,RotaryEmbedding_1 opNodeOutcome of the fusion:
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 icrote_m2x2(["ContribRotaryEmbeddingPattern--m2x2 FLOAT(NEWDIM_range, 2)"]) I_position_ids(["position_ids INT64(a, e)"]) icrote_m1x2(["ContribRotaryEmbeddingPattern--m1x2 FLOAT(NEWDIM_range, 2)"]) I_X(["X FLOAT(a, c, 2, d)"]) Reshape_0[["Reshape(., [0, 0, -1])"]] RotaryEmbedding_1[["com.microsoft.RotaryEmbedding(., ., ., .)"]] Shape_2[["Shape(., start=3)"]] Concat_3[["Concat([0, 0, -1], ., axis=0)"]] Reshape_4[["Reshape(., .)"]] Transpose_5[["Transpose(., perm=[0, 2, 1, 3])"]] I_X -->|"FLOAT(a, c, 2, d)"| Reshape_0 Reshape_0 -->|"FLOAT(a, c, 2*d)"| RotaryEmbedding_1 I_position_ids -->|"INT64(a, e)"| RotaryEmbedding_1 icrote_m1x2 -->|"FLOAT(NEWDIM_range, 2)"| RotaryEmbedding_1 icrote_m2x2 -->|"FLOAT(NEWDIM_range, 2)"| RotaryEmbedding_1 I_X -->|"FLOAT(a, c, 2, d)"| Shape_2 Shape_2 -->|"INT64(1)"| Concat_3 RotaryEmbedding_1 -->|"FLOAT(a, c, 2*d)"| Reshape_4 Concat_3 -->|"INT64(4)"| Reshape_4 Reshape_4 -->|"FLOAT(a, c, 2, d)"| Transpose_5 O_Y(["Y FLOAT(a, b, c, d)"]) Transpose_5 --> O_Y class icrote_m2x2,I_position_ids,icrote_m1x2,I_X,O_Y ioNode class Reshape_0,RotaryEmbedding_1,Shape_2,Concat_3,Reshape_4,Transpose_5 opNode- apply(g: GraphBuilder, transpose: NodeProto, rotary: NodeProto) List[NodeProto][source]#
The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.
- Parameters:
nodes – nodes returned by method match, there are then removed
- Returns:
nodes to add to graph.
- match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#
Determines nodes around node which can be rewritten.
- Parameters:
g – is a
GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.
matched – usually unused, it returns of nodes already matching a pattern
The method must not modify the graph. The method returns None if no match is found or an instance of class
MatchResult. It must contain:a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.
- class yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbeddingPattern(verbose: int = 0, priority: int = 2)[source]#
Very similar to
yobx.xoptim.patterns.onnx_rotary.RotaryEmbeddingPattern.Model with nodes to be fused:
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 I_X(["X FLOAT(a, 2, c, 2*e)"]) I_m1(["m1 FLOAT(1, 1, c, e)"]) I_m2(["m2 FLOAT(1, 1, c, e)"]) Concat_0[["Concat(., ., axis=-1)"]] Concat_1[["Concat(., ., axis=-1)"]] HalfRotaryEmbedding_2[["intermediate.HalfRotaryEmbedding(., ., .)"]] I_m2 -->|"FLOAT(1, 1, c, e)"| Concat_0 I_m1 -->|"FLOAT(1, 1, c, e)"| Concat_1 I_X -->|"FLOAT(a, 2, c, 2*e)"| HalfRotaryEmbedding_2 Concat_0 --> HalfRotaryEmbedding_2 Concat_1 --> HalfRotaryEmbedding_2 O_Y(["Y FLOAT(a, b, c, 2*e)"]) HalfRotaryEmbedding_2 --> O_Y class I_X,I_m1,I_m2,O_Y ioNode class Concat_0,Concat_1,HalfRotaryEmbedding_2 opNodeOutcome of the fusion:
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 I_X(["X FLOAT(a, 2, c, 2*e)"]) I_m1(["m1 FLOAT(1, 1, c, e)"]) I_m2(["m2 FLOAT(1, 1, c, e)"]) Squeeze_0[["Squeeze(., [0, 1])"]] Squeeze_1[["Squeeze(., [0, 1])"]] Shape_2[["Shape(., end=1, start=0)"]] Shape_3[["Shape(., end=3, start=2)"]] Squeeze_4[["Squeeze(.)"]] Range_5[["Range(0, ., 1)"]] Concat_6[["Concat(., [1], axis=0)"]] Expand_7[["Expand(., .)"]] RotaryEmbedding_8[["com.microsoft.RotaryEmbedding(., ., ., .)"]] I_m2 -->|"FLOAT(1, 1, c, e)"| Squeeze_0 I_m1 -->|"FLOAT(1, 1, c, e)"| Squeeze_1 I_X -->|"FLOAT(a, 2, c, 2*e)"| Shape_2 I_X -->|"FLOAT(a, 2, c, 2*e)"| Shape_3 Shape_3 -->|"INT64(1)"| Squeeze_4 Squeeze_4 -->|"INT64()"| Range_5 Shape_2 -->|"INT64(1)"| Concat_6 Range_5 -->|"INT64(NEWDIM_range_0)"| Expand_7 Concat_6 -->|"INT64(2)"| Expand_7 I_X -->|"FLOAT(a, 2, c, 2*e)"| RotaryEmbedding_8 Expand_7 -->|"INT64(a, NEWDIM_range_0)"| RotaryEmbedding_8 Squeeze_0 -->|"FLOAT(c, e)"| RotaryEmbedding_8 Squeeze_1 -->|"FLOAT(c, e)"| RotaryEmbedding_8 O_Y(["Y FLOAT(a, b, c, 2*e)"]) RotaryEmbedding_8 --> O_Y class I_X,I_m1,I_m2,O_Y ioNode class Squeeze_0,Squeeze_1,Shape_2,Shape_3,Squeeze_4,Range_5,Concat_6,Expand_7 opNode class RotaryEmbedding_8 opNode- apply(g: GraphBuilder, expand_node: NodeProto | None, concat_cos: NodeProto, concat_sin: NodeProto, split_node: NodeProto, half_node: NodeProto, concat_node: NodeProto, *prefix_nodes: Sequence[NodeProto]) List[NodeProto][source]#
The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.
- Parameters:
nodes – nodes returned by method match, there are then removed
- Returns:
nodes to add to graph.
- match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#
Determines nodes around node which can be rewritten.
- Parameters:
g – is a
GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.
matched – usually unused, it returns of nodes already matching a pattern
The method must not modify the graph. The method returns None if no match is found or an instance of class
MatchResult. It must contain:a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.
- class yobx.xoptim.patterns_ort.llm_optim.GroupQueryAttention3DPattern(verbose: int = 0, priority: int = 2)[source]#
Fuse LocalAttention into GroupQueryAttention.
biasis not supported by this kernel on CUDA.graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 I_query(["query FLOAT(batch, 8, seq_length, 32)"]) I_past_value(["past_value FLOAT(batch, 4, past_length, 32)"]) I_key(["key FLOAT(batch, 4, seq_length, 32)"]) I_value(["value FLOAT(batch, 4, seq_length, 32)"]) I_past_key(["past_key FLOAT(batch, 4, past_length, 32)"]) I_bitwise_not(["bitwise_not BOOL(seq_length, total_length)"]) Concat_0[["Concat(., ., axis=2)"]] Concat_1[["Concat(., ., axis=2)"]] locatt2[["intermediate.LocalAttentionGQASW_to1( ., ., ., ., [0.4204482], [1, 1, 2, 1, 1], [0, 8, -1, 32])"]] I_past_key -->|"FLOAT(batch, 4, past_length, 32)"| Concat_0 I_key -->|"FLOAT(batch, 4, seq_length, 32)"| Concat_0 I_past_value -->|"FLOAT(batch, 4, past_length, 32)"| Concat_1 I_value -->|"FLOAT(batch, 4, seq_length, 32)"| Concat_1 I_query -->|"FLOAT(batch, 8, seq_length, 32)"| locatt2 Concat_0 --> locatt2 Concat_1 --> locatt2 I_bitwise_not -->|"BOOL(seq_length, total_length)"| locatt2 O_output_0(["output_0 FLOAT(batch, 8, seq_length, 32)"]) locatt2 --> O_output_0 O_cat_1(["cat_1 FLOAT(batch, 4, past_length+seq_length, 32)"]) Concat_1 --> O_cat_1 O_cat(["cat FLOAT(batch, 4, past_length+seq_length, 32)"]) Concat_0 --> O_cat class I_query,I_past_value,I_key,I_value,I_past_key,I_bitwise_not ioNode class O_output_0,O_cat_1,O_cat ioNode class Concat_0,Concat_1,locatt2 opNodeOutcome of the fusion:
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 I_query(["query FLOAT(batch, 8, seq_length, 32)"]) I_past_value(["past_value FLOAT(batch, 4, past_length, 32)"]) I_key(["key FLOAT(batch, 4, seq_length, 32)"]) I_value(["value FLOAT(batch, 4, seq_length, 32)"]) I_past_key(["past_key FLOAT(batch, 4, past_length, 32)"]) I_bitwise_not(["bitwise_not BOOL(seq_length, total_length)"]) Where_0[["Where(., [-3.4028235e+38], [0.0])"]] Shape_1[["Shape(., end=1, start=0)"]] Unsqueeze_2[["Unsqueeze(., [0, 1])"]] Shape_3[["Shape(., start=-1)"]] Cast_4[["Cast(., to=INT32)"]] Sub_5[["Sub(., [1])"]] Expand_6[["Expand(., .)"]] Transpose_7[["Transpose(., perm=[0, 2, 1, 3])"]] Transpose_8[["Transpose(., perm=[0, 2, 1, 3])"]] Transpose_9[["Transpose(., perm=[0, 2, 1, 3])"]] Reshape_10[["Reshape(., [0, 0, -1])"]] Reshape_11[["Reshape(., [0, 0, -1])"]] Reshape_12[["Reshape(., [0, 0, -1])"]] gqa13[["com.microsoft.GroupQueryAttention(., ., ., ., ., ., ., , , , .)"]] Reshape_14[["Reshape(., [0, 0, -1, 32])"]] Transpose_15[["Transpose(., perm=[0, 2, 1, 3])"]] I_bitwise_not -->|"BOOL(seq_length, total_length)"| Where_0 I_query -->|"FLOAT(batch, 8, seq_length, 32)"| Shape_1 Where_0 --> Unsqueeze_2 Where_0 --> Shape_3 Shape_3 --> Cast_4 Cast_4 --> Sub_5 Sub_5 --> Expand_6 Shape_1 --> Expand_6 I_query -->|"FLOAT(batch, 8, seq_length, 32)"| Transpose_7 I_key -->|"FLOAT(batch, 4, seq_length, 32)"| Transpose_8 I_value -->|"FLOAT(batch, 4, seq_length, 32)"| Transpose_9 Transpose_7 --> Reshape_10 Transpose_8 --> Reshape_11 Transpose_9 --> Reshape_12 Reshape_10 --> gqa13 Reshape_11 --> gqa13 Reshape_12 --> gqa13 I_past_key -->|"FLOAT(batch, 4, past_length, 32)"| gqa13 I_past_value -->|"FLOAT(batch, 4, past_length, 32)"| gqa13 Expand_6 --> gqa13 Cast_4 --> gqa13 Unsqueeze_2 --> gqa13 gqa13 --> Reshape_14 Reshape_14 --> Transpose_15 O_output_0(["output_0 FLOAT(batch, 8, seq_length, 32)"]) Transpose_15 --> O_output_0 O_cat_1(["cat_1 FLOAT(batch, 4, past_length+seq_length, 32)"]) gqa13 --> O_cat_1 O_cat(["cat FLOAT(batch, 4, past_length+seq_length, 32)"]) gqa13 --> O_cat class I_query,I_past_value,I_key,I_value,I_past_key,I_bitwise_not ioNode class O_output_0,O_cat_1,O_cat ioNode class Where_0,Shape_1,Unsqueeze_2,Shape_3,Cast_4,Sub_5,Expand_6,Transpose_7 opNode class Transpose_8,Transpose_9,Reshape_10,Reshape_11,Reshape_12 opNode class gqa13,Reshape_14,Transpose_15 opNode- apply(g: GraphBuilder, keys_concat_node: NodeProto, values_concat_node: NodeProto, local_attention_gqa: NodeProto) List[NodeProto][source]#
The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.
- Parameters:
nodes – nodes returned by method match, there are then removed
- Returns:
nodes to add to graph.
- match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#
Determines nodes around node which can be rewritten.
- Parameters:
g – is a
GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.
matched – usually unused, it returns of nodes already matching a pattern
The method must not modify the graph. The method returns None if no match is found or an instance of class
MatchResult. It must contain:a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.
- class yobx.xoptim.patterns_ort.llm_optim.MultiHeadAttention3DPattern(verbose: int = 0, priority: int = 2)[source]#
Merges multiple nodes into MultiHeadAttention. It assumes pattern
yobx.xoptim.patterns.onnx_attention.FunctionAttentionPatternwas triggered before.Model with nodes to be fused:
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 I_mask(["mask BOOL(am, 1, cm, dm)"]) I_past_values(["past_values FLOAT(pav, 8, pcv, 64)"]) I_values(["values FLOAT(av, bv, 8, 64)"]) I_query(["query FLOAT(aq, bq, 8, 64)"]) I_past_keys(["past_keys FLOAT(pak, 8, pck, 64)"]) I_keys(["keys FLOAT(ak, bk, 8, 64)"]) Transpose_0[["Transpose(., perm=[0, 2, 1, 3])"]] Transpose_1[["Transpose(., perm=[0, 2, 1, 3])"]] Concat_2[["Concat(., ., axis=-2)"]] Transpose_3[["Transpose(., perm=[0, 2, 1, 3])"]] Concat_4[["Concat(., ., axis=-2)"]] LocalAttention_to1_5[["intermediate.LocalAttention_to1(., ., ., ., [0.31622776])"]] Transpose_6[["Transpose(., perm=[0, 2, 1, 3])"]] I_query -->|"FLOAT(aq, bq, 8, 64)"| Transpose_0 I_keys -->|"FLOAT(ak, bk, 8, 64)"| Transpose_1 I_past_keys -->|"FLOAT(pak, 8, pck, 64)"| Concat_2 Transpose_1 --> Concat_2 I_values -->|"FLOAT(av, bv, 8, 64)"| Transpose_3 I_past_values -->|"FLOAT(pav, 8, pcv, 64)"| Concat_4 Transpose_3 --> Concat_4 Transpose_0 --> LocalAttention_to1_5 Concat_2 --> LocalAttention_to1_5 Concat_4 --> LocalAttention_to1_5 I_mask -->|"BOOL(am, 1, cm, dm)"| LocalAttention_to1_5 LocalAttention_to1_5 --> Transpose_6 O_ct_values(["ct_values FLOAT(pav, 8, pcv+bv, 64)"]) Concat_4 --> O_ct_values O_Y(["Y FLOAT(ay, by, cy, dy)"]) Transpose_6 --> O_Y O_ct_keys(["ct_keys FLOAT(pak, 8, pck+bk, 64)"]) Concat_2 --> O_ct_keys class I_mask,I_past_values,I_values,I_query,I_past_keys,I_keys ioNode class O_ct_values,O_Y,O_ct_keys ioNode class Transpose_0,Transpose_1,Concat_2,Transpose_3,Concat_4 opNode class LocalAttention_to1_5,Transpose_6 opNodeOutcome of the fusion:
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 I_mask(["mask BOOL(am, 1, cm, dm)"]) I_past_values(["past_values FLOAT(pav, 8, pcv, 64)"]) I_values(["values FLOAT(av, bv, 8, 64)"]) I_query(["query FLOAT(aq, bq, 8, 64)"]) I_past_keys(["past_keys FLOAT(pak, 8, pck, 64)"]) I_keys(["keys FLOAT(ak, bk, 8, 64)"]) Reshape_0[["Reshape(., [0, 0, -1])"]] Reshape_1[["Reshape(., [0, 0, -1])"]] Reshape_2[["Reshape(., [0, 0, -1])"]] Where_3[["Where(., [0.0], [-inf])"]] MultiHeadAttention_4[["com.microsoft.MultiHeadAttention(., ., ., , , ., ., .)"]] Reshape_5[["Reshape(., [0, 0, -1, 64])"]] I_query -->|"FLOAT(aq, bq, 8, 64)"| Reshape_0 I_keys -->|"FLOAT(ak, bk, 8, 64)"| Reshape_1 I_values -->|"FLOAT(av, bv, 8, 64)"| Reshape_2 I_mask -->|"BOOL(am, 1, cm, dm)"| Where_3 Reshape_0 -->|"FLOAT(aq, bq, 512)"| MultiHeadAttention_4 Reshape_1 -->|"FLOAT(ak, bk, 512)"| MultiHeadAttention_4 Reshape_2 -->|"FLOAT(av, bv, 512)"| MultiHeadAttention_4 Where_3 -->|"FLOAT(am, 1, cm, dm)"| MultiHeadAttention_4 I_past_keys -->|"FLOAT(pak, 8, pck, 64)"| MultiHeadAttention_4 I_past_values -->|"FLOAT(pav, 8, pcv, 64)"| MultiHeadAttention_4 MultiHeadAttention_4 -->|"FLOAT(aq, bq, 512)"| Reshape_5 O_ct_values(["ct_values FLOAT(pav, 8, pcv+bv, 64)"]) MultiHeadAttention_4 --> O_ct_values O_Y(["Y FLOAT(ay, by, cy, dy)"]) Reshape_5 --> O_Y O_ct_keys(["ct_keys FLOAT(pak, 8, pck+bk, 64)"]) MultiHeadAttention_4 --> O_ct_keys class I_mask,I_past_values,I_values,I_query,I_past_keys,I_keys ioNode class O_ct_values,O_Y,O_ct_keys ioNode class Reshape_0,Reshape_1,Reshape_2,Where_3,MultiHeadAttention_4,Reshape_5 opNode- apply(g: GraphBuilder, q_transpose: NodeProto, k_transpose: NodeProto, k_concat: NodeProto, v_transpose: NodeProto, v_concat: NodeProto, attention: NodeProto, transpose: NodeProto) List[NodeProto][source]#
The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.
- Parameters:
nodes – nodes returned by method match, there are then removed
- Returns:
nodes to add to graph.
- match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#
Determines nodes around node which can be rewritten.
- Parameters:
g – is a
GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.
matched – usually unused, it returns of nodes already matching a pattern
The method must not modify the graph. The method returns None if no match is found or an instance of class
MatchResult. It must contain:a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.