yobx.xoptim.patterns_ort.embed_layer_normalization#

class yobx.xoptim.patterns_ort.embed_layer_normalization.EmbedLayerNormalizationPattern(verbose: int = 0, priority: int = 2)[source]#

Fuses the sequence of Gather + Add + LayerNormalization nodes into com.microsoft.EmbedLayerNormalization.

This pattern handles transformer model embedding layers where word, position, and optionally segment embeddings are looked up via Gather nodes, summed via Add nodes, and then normalized via LayerNormalization.

Model with nodes to be fused (3-embedding BERT variant):

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    I_input_ids(["input_ids INT64(B, S)"])
    I_segment_ids(["segment_ids INT64(B, S)"])
    I_position_ids(["position_ids INT64(B, S)"])
    I_word_table(["word_table FLOAT(V, D)"])
    I_seg_table(["seg_table FLOAT(NS, D)"])
    I_pos_table(["pos_table FLOAT(NP, D)"])
    I_gamma(["gamma FLOAT(D)"])
    I_beta(["beta FLOAT(D)"])

    Constant_0[["Constant() -#gt; word_table"]]
    Constant_1[["Constant() -#gt; pos_table"]]
    Constant_2[["Constant() -#gt; seg_table"]]
    Gather_0[["Gather(., .)"]]
    Gather_1[["Gather(., .)"]]
    Gather_2[["Gather(., .)"]]
    Add_0[["Add(., .)"]]
    Add_1[["Add(., .)"]]
    LayerNormalization_2[["LayerNormalization(., ., .)"]]

    I_input_ids -->|"INT64(B, S)"| Gather_0
    Constant_0 -->|"FLOAT(V, D)"| Gather_0
    I_position_ids -->|"INT64(B, S)"| Gather_1
    Constant_1 -->|"FLOAT(NP, D)"| Gather_1
    I_segment_ids -->|"INT64(B, S)"| Gather_2
    Constant_2 -->|"FLOAT(NS, D)"| Gather_2
    Gather_0 -->|"FLOAT(B, S, D)"| Add_0
    Gather_1 -->|"FLOAT(B, S, D)"| Add_0
    Add_0 -->|"FLOAT(B, S, D)"| Add_1
    Gather_2 -->|"FLOAT(B, S, D)"| Add_1
    Add_1 -->|"FLOAT(B, S, D)"| LayerNormalization_2
    I_gamma -->|"FLOAT(D)"| LayerNormalization_2
    I_beta -->|"FLOAT(D)"| LayerNormalization_2

    O_Y(["Y FLOAT(B, S, D)"])
    LayerNormalization_2 --> O_Y

    class I_input_ids,I_segment_ids,I_position_ids,I_gamma,I_beta,O_Y ioNode
    class Constant_0,Constant_1,Constant_2 constNode
    class Gather_0,Gather_1,Gather_2,Add_0,Add_1,LayerNormalization_2 opNode

Outcome of the fusion:

        graph TD

    classDef ioNode fill:#dfd,stroke:#333,color:#333
    classDef initNode fill:#cccc00,stroke:#333,color:#333
    classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333
    classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333

    I_input_ids(["input_ids INT64(B, S)"])
    I_segment_ids(["segment_ids INT64(B, S)"])
    I_position_ids(["position_ids INT64(B, S)"])
    I_word_table(["word_table FLOAT(V, D)"])
    I_seg_table(["seg_table FLOAT(NS, D)"])
    I_pos_table(["pos_table FLOAT(NP, D)"])
    I_gamma(["gamma FLOAT(D)"])
    I_beta(["beta FLOAT(D)"])

    EmbedLayerNormalization[["com.microsoft.EmbedLayerNormalization(7 inputs)"]]
    I_input_ids -->|"INT64(B, S)"| EmbedLayerNormalization
    I_segment_ids -->|"INT64(B, S)"| EmbedLayerNormalization
    I_word_table -->|"FLOAT(V, D)"| EmbedLayerNormalization
    I_pos_table -->|"FLOAT(NP, D)"| EmbedLayerNormalization
    I_seg_table -->|"FLOAT(NS, D)"| EmbedLayerNormalization
    I_gamma -->|"FLOAT(D)"| EmbedLayerNormalization
    I_beta -->|"FLOAT(D)"| EmbedLayerNormalization

    O_Y(["Y FLOAT(B, S, D)"])
    EmbedLayerNormalization --> O_Y

    class I_input_ids,I_segment_ids,I_position_ids,I_gamma,I_beta,O_Y ioNode
    class EmbedLayerNormalization opNode

apply(g: GraphBuilder, gather_0: NodeProto, gather_1: NodeProto, gather_seg: NodeProto | None, inner_or_outer_add: NodeProto, outer_add: NodeProto | None, ln_node: NodeProto) → List[NodeProto][source]#

The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.

Parameters:: nodes – nodes returned by method match, there are then removed
Returns:: nodes to add to graph.

match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) → MatchResult | None[source]#

Determines nodes around node which can be rewritten.

Parameters:

g – is a GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.
node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.
matched – usually unused, it returns of nodes already matching a pattern

The method must not modify the graph. The method returns None if no match is found or an instance of class MatchResult. It must contain:

a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.