yobx.xoptim.patterns_ort.llm_optim#

class yobx.xoptim.patterns_ort.llm_optim.Attention3DPattern(verbose: int = 0, priority: int = 2)[source]#

Fuses nodes into Attention from com.microsoft domain. In progress.

apply(g: GraphBuilder, mm_q: NodeProto, re_q: NodeProto, tr_q: NodeProto, mm_k: NodeProto, re_k: NodeProto, tr_k: NodeProto, mm_v: NodeProto, re_v: NodeProto, tr_v: NodeProto, attention: NodeProto, transpose: NodeProto, reshape: NodeProto) List[NodeProto][source]#

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.

Parameters:

nodes – nodes returned by method match, there are then removed

Returns:

nodes to add to graph.

match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#

Determines nodes around node which can be rewritten.

Parameters:
  • g – is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.

  • node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.

  • matched – usually unused, it returns of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

  • a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.

  • A function doing the rewriting (usually method apply of the pattern class).

  • An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.

class yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbedding3DPattern(verbose: int = 0, priority: int = 1, min_opset: int = 1)[source]#

Extension to yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbeddingPattern, turn the operator into a 3D operator including the transpose.

Model with nodes to be fused:

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    icrote_m2x2(["ContribRotaryEmbeddingPattern--m2x2 FLOAT(NEWDIM_range, 2)"])
    I_position_ids(["position_ids INT64(a, e)"])
    icrote_m1x2(["ContribRotaryEmbeddingPattern--m1x2 FLOAT(NEWDIM_range, 2)"])
    I_X(["X FLOAT(a, c, 2, d)"])

    Transpose_0[["Transpose(., perm=[0, 2, 1, 3])"]]
    RotaryEmbedding_1[["com.microsoft.RotaryEmbedding(., ., ., .)"]]

    I_X -->|"FLOAT(a, c, 2, d)"| Transpose_0
    Transpose_0 -->|"FLOAT(a, 2, c, d)"| RotaryEmbedding_1
    I_position_ids -->|"INT64(a, e)"| RotaryEmbedding_1
    icrote_m1x2 -->|"FLOAT(NEWDIM_range, 2)"| RotaryEmbedding_1
    icrote_m2x2 -->|"FLOAT(NEWDIM_range, 2)"| RotaryEmbedding_1

    O_Y(["Y FLOAT(a, b, c, d)"])
    RotaryEmbedding_1 --> O_Y

    class icrote_m2x2,I_position_ids,icrote_m1x2,I_X,O_Y ioNode
    class Transpose_0,RotaryEmbedding_1 opNode
    

Outcome of the fusion:

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    icrote_m2x2(["ContribRotaryEmbeddingPattern--m2x2 FLOAT(NEWDIM_range, 2)"])
    I_position_ids(["position_ids INT64(a, e)"])
    icrote_m1x2(["ContribRotaryEmbeddingPattern--m1x2 FLOAT(NEWDIM_range, 2)"])
    I_X(["X FLOAT(a, c, 2, d)"])

    Reshape_0[["Reshape(., [0, 0, -1])"]]
    RotaryEmbedding_1[["com.microsoft.RotaryEmbedding(., ., ., .)"]]
    Shape_2[["Shape(., start=3)"]]
    Concat_3[["Concat([0, 0, -1], ., axis=0)"]]
    Reshape_4[["Reshape(., .)"]]
    Transpose_5[["Transpose(., perm=[0, 2, 1, 3])"]]

    I_X -->|"FLOAT(a, c, 2, d)"| Reshape_0
    Reshape_0 -->|"FLOAT(a, c, 2*d)"| RotaryEmbedding_1
    I_position_ids -->|"INT64(a, e)"| RotaryEmbedding_1
    icrote_m1x2 -->|"FLOAT(NEWDIM_range, 2)"| RotaryEmbedding_1
    icrote_m2x2 -->|"FLOAT(NEWDIM_range, 2)"| RotaryEmbedding_1
    I_X -->|"FLOAT(a, c, 2, d)"| Shape_2
    Shape_2 -->|"INT64(1)"| Concat_3
    RotaryEmbedding_1 -->|"FLOAT(a, c, 2*d)"| Reshape_4
    Concat_3 -->|"INT64(4)"| Reshape_4
    Reshape_4 -->|"FLOAT(a, c, 2, d)"| Transpose_5

    O_Y(["Y FLOAT(a, b, c, d)"])
    Transpose_5 --> O_Y

    class icrote_m2x2,I_position_ids,icrote_m1x2,I_X,O_Y ioNode
    class Reshape_0,RotaryEmbedding_1,Shape_2,Concat_3,Reshape_4,Transpose_5 opNode
    
apply(g: GraphBuilder, transpose: NodeProto, rotary: NodeProto) List[NodeProto][source]#

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.

Parameters:

nodes – nodes returned by method match, there are then removed

Returns:

nodes to add to graph.

match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#

Determines nodes around node which can be rewritten.

Parameters:
  • g – is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.

  • node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.

  • matched – usually unused, it returns of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

  • a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.

  • A function doing the rewriting (usually method apply of the pattern class).

  • An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.

class yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbeddingPattern(verbose: int = 0, priority: int = 2)[source]#

Very similar to yobx.xoptim.patterns.onnx_rotary.RotaryEmbeddingPattern.

Model with nodes to be fused:

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    I_X(["X FLOAT(a, 2, c, 2*e)"])
    I_m1(["m1 FLOAT(1, 1, c, e)"])
    I_m2(["m2 FLOAT(1, 1, c, e)"])

    Concat_0[["Concat(., ., axis=-1)"]]
    Concat_1[["Concat(., ., axis=-1)"]]
    HalfRotaryEmbedding_2[["intermediate.HalfRotaryEmbedding(., ., .)"]]

    I_m2 -->|"FLOAT(1, 1, c, e)"| Concat_0
    I_m1 -->|"FLOAT(1, 1, c, e)"| Concat_1
    I_X -->|"FLOAT(a, 2, c, 2*e)"| HalfRotaryEmbedding_2
    Concat_0 --> HalfRotaryEmbedding_2
    Concat_1 --> HalfRotaryEmbedding_2

    O_Y(["Y FLOAT(a, b, c, 2*e)"])
    HalfRotaryEmbedding_2 --> O_Y

    class I_X,I_m1,I_m2,O_Y ioNode
    class Concat_0,Concat_1,HalfRotaryEmbedding_2 opNode
    

Outcome of the fusion:

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    I_X(["X FLOAT(a, 2, c, 2*e)"])
    I_m1(["m1 FLOAT(1, 1, c, e)"])
    I_m2(["m2 FLOAT(1, 1, c, e)"])

    Squeeze_0[["Squeeze(., [0, 1])"]]
    Squeeze_1[["Squeeze(., [0, 1])"]]
    Shape_2[["Shape(., end=1, start=0)"]]
    Shape_3[["Shape(., end=3, start=2)"]]
    Squeeze_4[["Squeeze(.)"]]
    Range_5[["Range(0, ., 1)"]]
    Concat_6[["Concat(., [1], axis=0)"]]
    Expand_7[["Expand(., .)"]]
    RotaryEmbedding_8[["com.microsoft.RotaryEmbedding(., ., ., .)"]]

    I_m2 -->|"FLOAT(1, 1, c, e)"| Squeeze_0
    I_m1 -->|"FLOAT(1, 1, c, e)"| Squeeze_1
    I_X -->|"FLOAT(a, 2, c, 2*e)"| Shape_2
    I_X -->|"FLOAT(a, 2, c, 2*e)"| Shape_3
    Shape_3 -->|"INT64(1)"| Squeeze_4
    Squeeze_4 -->|"INT64()"| Range_5
    Shape_2 -->|"INT64(1)"| Concat_6
    Range_5 -->|"INT64(NEWDIM_range_0)"| Expand_7
    Concat_6 -->|"INT64(2)"| Expand_7
    I_X -->|"FLOAT(a, 2, c, 2*e)"| RotaryEmbedding_8
    Expand_7 -->|"INT64(a, NEWDIM_range_0)"| RotaryEmbedding_8
    Squeeze_0 -->|"FLOAT(c, e)"| RotaryEmbedding_8
    Squeeze_1 -->|"FLOAT(c, e)"| RotaryEmbedding_8

    O_Y(["Y FLOAT(a, b, c, 2*e)"])
    RotaryEmbedding_8 --> O_Y

    class I_X,I_m1,I_m2,O_Y ioNode
    class Squeeze_0,Squeeze_1,Shape_2,Shape_3,Squeeze_4,Range_5,Concat_6,Expand_7 opNode
    class RotaryEmbedding_8 opNode
    
apply(g: GraphBuilder, expand_node: NodeProto | None, concat_cos: NodeProto, concat_sin: NodeProto, split_node: NodeProto, half_node: NodeProto, concat_node: NodeProto, *prefix_nodes: Sequence[NodeProto]) List[NodeProto][source]#

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.

Parameters:

nodes – nodes returned by method match, there are then removed

Returns:

nodes to add to graph.

match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#

Determines nodes around node which can be rewritten.

Parameters:
  • g – is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.

  • node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.

  • matched – usually unused, it returns of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

  • a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.

  • A function doing the rewriting (usually method apply of the pattern class).

  • An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.

class yobx.xoptim.patterns_ort.llm_optim.GroupQueryAttention3DPattern(verbose: int = 0, priority: int = 2)[source]#

Fuse LocalAttention into GroupQueryAttention. bias is not supported by this kernel on CUDA.

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    I_query(["query FLOAT(batch, 8, seq_length, 32)"])
    I_past_value(["past_value FLOAT(batch, 4, past_length, 32)"])
    I_key(["key FLOAT(batch, 4, seq_length, 32)"])
    I_value(["value FLOAT(batch, 4, seq_length, 32)"])
    I_past_key(["past_key FLOAT(batch, 4, past_length, 32)"])
    I_bitwise_not(["bitwise_not BOOL(seq_length, total_length)"])

    Concat_0[["Concat(., ., axis=2)"]]
    Concat_1[["Concat(., ., axis=2)"]]
    locatt2[["intermediate.LocalAttentionGQASW_to1(
    ., ., ., ., [0.4204482], [1, 1, 2, 1, 1], [0, 8, -1, 32])"]]

    I_past_key -->|"FLOAT(batch, 4, past_length, 32)"| Concat_0
    I_key -->|"FLOAT(batch, 4, seq_length, 32)"| Concat_0
    I_past_value -->|"FLOAT(batch, 4, past_length, 32)"| Concat_1
    I_value -->|"FLOAT(batch, 4, seq_length, 32)"| Concat_1
    I_query -->|"FLOAT(batch, 8, seq_length, 32)"| locatt2
    Concat_0 --> locatt2
    Concat_1 --> locatt2
    I_bitwise_not -->|"BOOL(seq_length, total_length)"| locatt2

    O_output_0(["output_0 FLOAT(batch, 8, seq_length, 32)"])
    locatt2 --> O_output_0
    O_cat_1(["cat_1 FLOAT(batch, 4, past_length+seq_length, 32)"])
    Concat_1 --> O_cat_1
    O_cat(["cat FLOAT(batch, 4, past_length+seq_length, 32)"])
    Concat_0 --> O_cat

    class I_query,I_past_value,I_key,I_value,I_past_key,I_bitwise_not ioNode
    class O_output_0,O_cat_1,O_cat ioNode
    class Concat_0,Concat_1,locatt2 opNode
    

Outcome of the fusion:

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    I_query(["query FLOAT(batch, 8, seq_length, 32)"])
    I_past_value(["past_value FLOAT(batch, 4, past_length, 32)"])
    I_key(["key FLOAT(batch, 4, seq_length, 32)"])
    I_value(["value FLOAT(batch, 4, seq_length, 32)"])
    I_past_key(["past_key FLOAT(batch, 4, past_length, 32)"])
    I_bitwise_not(["bitwise_not BOOL(seq_length, total_length)"])

    Where_0[["Where(., [-3.4028235e+38], [0.0])"]]
    Shape_1[["Shape(., end=1, start=0)"]]
    Unsqueeze_2[["Unsqueeze(., [0, 1])"]]
    Shape_3[["Shape(., start=-1)"]]
    Cast_4[["Cast(., to=INT32)"]]
    Sub_5[["Sub(., [1])"]]
    Expand_6[["Expand(., .)"]]
    Transpose_7[["Transpose(., perm=[0, 2, 1, 3])"]]
    Transpose_8[["Transpose(., perm=[0, 2, 1, 3])"]]
    Transpose_9[["Transpose(., perm=[0, 2, 1, 3])"]]
    Reshape_10[["Reshape(., [0, 0, -1])"]]
    Reshape_11[["Reshape(., [0, 0, -1])"]]
    Reshape_12[["Reshape(., [0, 0, -1])"]]
    gqa13[["com.microsoft.GroupQueryAttention(., ., ., ., ., ., ., , , , .)"]]
    Reshape_14[["Reshape(., [0, 0, -1, 32])"]]
    Transpose_15[["Transpose(., perm=[0, 2, 1, 3])"]]

    I_bitwise_not -->|"BOOL(seq_length, total_length)"| Where_0
    I_query -->|"FLOAT(batch, 8, seq_length, 32)"| Shape_1
    Where_0 --> Unsqueeze_2
    Where_0 --> Shape_3
    Shape_3 --> Cast_4
    Cast_4 --> Sub_5
    Sub_5 --> Expand_6
    Shape_1 --> Expand_6
    I_query -->|"FLOAT(batch, 8, seq_length, 32)"| Transpose_7
    I_key -->|"FLOAT(batch, 4, seq_length, 32)"| Transpose_8
    I_value -->|"FLOAT(batch, 4, seq_length, 32)"| Transpose_9
    Transpose_7 --> Reshape_10
    Transpose_8 --> Reshape_11
    Transpose_9 --> Reshape_12
    Reshape_10 --> gqa13
    Reshape_11 --> gqa13
    Reshape_12 --> gqa13
    I_past_key -->|"FLOAT(batch, 4, past_length, 32)"| gqa13
    I_past_value -->|"FLOAT(batch, 4, past_length, 32)"| gqa13
    Expand_6 --> gqa13
    Cast_4 --> gqa13
    Unsqueeze_2 --> gqa13
    gqa13 --> Reshape_14
    Reshape_14 --> Transpose_15

    O_output_0(["output_0 FLOAT(batch, 8, seq_length, 32)"])
    Transpose_15 --> O_output_0
    O_cat_1(["cat_1 FLOAT(batch, 4, past_length+seq_length, 32)"])
    gqa13 --> O_cat_1
    O_cat(["cat FLOAT(batch, 4, past_length+seq_length, 32)"])
    gqa13 --> O_cat

    class I_query,I_past_value,I_key,I_value,I_past_key,I_bitwise_not ioNode
    class O_output_0,O_cat_1,O_cat ioNode
    class Where_0,Shape_1,Unsqueeze_2,Shape_3,Cast_4,Sub_5,Expand_6,Transpose_7 opNode
    class Transpose_8,Transpose_9,Reshape_10,Reshape_11,Reshape_12 opNode
    class gqa13,Reshape_14,Transpose_15 opNode
    
apply(g: GraphBuilder, keys_concat_node: NodeProto, values_concat_node: NodeProto, local_attention_gqa: NodeProto) List[NodeProto][source]#

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.

Parameters:

nodes – nodes returned by method match, there are then removed

Returns:

nodes to add to graph.

match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#

Determines nodes around node which can be rewritten.

Parameters:
  • g – is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.

  • node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.

  • matched – usually unused, it returns of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

  • a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.

  • A function doing the rewriting (usually method apply of the pattern class).

  • An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.

class yobx.xoptim.patterns_ort.llm_optim.MultiHeadAttention3DPattern(verbose: int = 0, priority: int = 2)[source]#

Merges multiple nodes into MultiHeadAttention. It assumes pattern yobx.xoptim.patterns.onnx_attention.FunctionAttentionPattern was triggered before.

Model with nodes to be fused:

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    I_mask(["mask BOOL(am, 1, cm, dm)"])
    I_past_values(["past_values FLOAT(pav, 8, pcv, 64)"])
    I_values(["values FLOAT(av, bv, 8, 64)"])
    I_query(["query FLOAT(aq, bq, 8, 64)"])
    I_past_keys(["past_keys FLOAT(pak, 8, pck, 64)"])
    I_keys(["keys FLOAT(ak, bk, 8, 64)"])

    Transpose_0[["Transpose(., perm=[0, 2, 1, 3])"]]
    Transpose_1[["Transpose(., perm=[0, 2, 1, 3])"]]
    Concat_2[["Concat(., ., axis=-2)"]]
    Transpose_3[["Transpose(., perm=[0, 2, 1, 3])"]]
    Concat_4[["Concat(., ., axis=-2)"]]
    LocalAttention_to1_5[["intermediate.LocalAttention_to1(., ., ., ., [0.31622776])"]]
    Transpose_6[["Transpose(., perm=[0, 2, 1, 3])"]]

    I_query -->|"FLOAT(aq, bq, 8, 64)"| Transpose_0
    I_keys -->|"FLOAT(ak, bk, 8, 64)"| Transpose_1
    I_past_keys -->|"FLOAT(pak, 8, pck, 64)"| Concat_2
    Transpose_1 --> Concat_2
    I_values -->|"FLOAT(av, bv, 8, 64)"| Transpose_3
    I_past_values -->|"FLOAT(pav, 8, pcv, 64)"| Concat_4
    Transpose_3 --> Concat_4
    Transpose_0 --> LocalAttention_to1_5
    Concat_2 --> LocalAttention_to1_5
    Concat_4 --> LocalAttention_to1_5
    I_mask -->|"BOOL(am, 1, cm, dm)"| LocalAttention_to1_5
    LocalAttention_to1_5 --> Transpose_6

    O_ct_values(["ct_values FLOAT(pav, 8, pcv+bv, 64)"])
    Concat_4 --> O_ct_values
    O_Y(["Y FLOAT(ay, by, cy, dy)"])
    Transpose_6 --> O_Y
    O_ct_keys(["ct_keys FLOAT(pak, 8, pck+bk, 64)"])
    Concat_2 --> O_ct_keys

    class I_mask,I_past_values,I_values,I_query,I_past_keys,I_keys ioNode
    class O_ct_values,O_Y,O_ct_keys ioNode
    class Transpose_0,Transpose_1,Concat_2,Transpose_3,Concat_4 opNode
    class LocalAttention_to1_5,Transpose_6 opNode
    

Outcome of the fusion:

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    I_mask(["mask BOOL(am, 1, cm, dm)"])
    I_past_values(["past_values FLOAT(pav, 8, pcv, 64)"])
    I_values(["values FLOAT(av, bv, 8, 64)"])
    I_query(["query FLOAT(aq, bq, 8, 64)"])
    I_past_keys(["past_keys FLOAT(pak, 8, pck, 64)"])
    I_keys(["keys FLOAT(ak, bk, 8, 64)"])

    Reshape_0[["Reshape(., [0, 0, -1])"]]
    Reshape_1[["Reshape(., [0, 0, -1])"]]
    Reshape_2[["Reshape(., [0, 0, -1])"]]
    Where_3[["Where(., [0.0], [-inf])"]]
    MultiHeadAttention_4[["com.microsoft.MultiHeadAttention(., ., ., , , ., ., .)"]]
    Reshape_5[["Reshape(., [0, 0, -1, 64])"]]

    I_query -->|"FLOAT(aq, bq, 8, 64)"| Reshape_0
    I_keys -->|"FLOAT(ak, bk, 8, 64)"| Reshape_1
    I_values -->|"FLOAT(av, bv, 8, 64)"| Reshape_2
    I_mask -->|"BOOL(am, 1, cm, dm)"| Where_3
    Reshape_0 -->|"FLOAT(aq, bq, 512)"| MultiHeadAttention_4
    Reshape_1 -->|"FLOAT(ak, bk, 512)"| MultiHeadAttention_4
    Reshape_2 -->|"FLOAT(av, bv, 512)"| MultiHeadAttention_4
    Where_3 -->|"FLOAT(am, 1, cm, dm)"| MultiHeadAttention_4
    I_past_keys -->|"FLOAT(pak, 8, pck, 64)"| MultiHeadAttention_4
    I_past_values -->|"FLOAT(pav, 8, pcv, 64)"| MultiHeadAttention_4
    MultiHeadAttention_4 -->|"FLOAT(aq, bq, 512)"| Reshape_5

    O_ct_values(["ct_values FLOAT(pav, 8, pcv+bv, 64)"])
    MultiHeadAttention_4 --> O_ct_values
    O_Y(["Y FLOAT(ay, by, cy, dy)"])
    Reshape_5 --> O_Y
    O_ct_keys(["ct_keys FLOAT(pak, 8, pck+bk, 64)"])
    MultiHeadAttention_4 --> O_ct_keys

    class I_mask,I_past_values,I_values,I_query,I_past_keys,I_keys ioNode
    class O_ct_values,O_Y,O_ct_keys ioNode
    class Reshape_0,Reshape_1,Reshape_2,Where_3,MultiHeadAttention_4,Reshape_5 opNode
    
apply(g: GraphBuilder, q_transpose: NodeProto, k_transpose: NodeProto, k_concat: NodeProto, v_transpose: NodeProto, v_concat: NodeProto, attention: NodeProto, transpose: NodeProto) List[NodeProto][source]#

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.

Parameters:

nodes – nodes returned by method match, there are then removed

Returns:

nodes to add to graph.

match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#

Determines nodes around node which can be rewritten.

Parameters:
  • g – is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.

  • node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.

  • matched – usually unused, it returns of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

  • a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.

  • A function doing the rewriting (usually method apply of the pattern class).

  • An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.