yobx.xoptim.patterns_ort.embed_layer_normalization#
- class yobx.xoptim.patterns_ort.embed_layer_normalization.EmbedLayerNormalizationPattern(verbose: int = 0, priority: int = 2)[source]#
Fuses the sequence of Gather + Add + LayerNormalization nodes into
com.microsoft.EmbedLayerNormalization.This pattern handles transformer model embedding layers where word, position, and optionally segment embeddings are looked up via
Gathernodes, summed viaAddnodes, and then normalized viaLayerNormalization.Model with nodes to be fused (3-embedding BERT variant):
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 I_input_ids(["input_ids INT64(B, S)"]) I_segment_ids(["segment_ids INT64(B, S)"]) I_position_ids(["position_ids INT64(B, S)"]) I_word_table(["word_table FLOAT(V, D)"]) I_seg_table(["seg_table FLOAT(NS, D)"]) I_pos_table(["pos_table FLOAT(NP, D)"]) I_gamma(["gamma FLOAT(D)"]) I_beta(["beta FLOAT(D)"]) Constant_0[["Constant() -#gt; word_table"]] Constant_1[["Constant() -#gt; pos_table"]] Constant_2[["Constant() -#gt; seg_table"]] Gather_0[["Gather(., .)"]] Gather_1[["Gather(., .)"]] Gather_2[["Gather(., .)"]] Add_0[["Add(., .)"]] Add_1[["Add(., .)"]] LayerNormalization_2[["LayerNormalization(., ., .)"]] I_input_ids -->|"INT64(B, S)"| Gather_0 Constant_0 -->|"FLOAT(V, D)"| Gather_0 I_position_ids -->|"INT64(B, S)"| Gather_1 Constant_1 -->|"FLOAT(NP, D)"| Gather_1 I_segment_ids -->|"INT64(B, S)"| Gather_2 Constant_2 -->|"FLOAT(NS, D)"| Gather_2 Gather_0 -->|"FLOAT(B, S, D)"| Add_0 Gather_1 -->|"FLOAT(B, S, D)"| Add_0 Add_0 -->|"FLOAT(B, S, D)"| Add_1 Gather_2 -->|"FLOAT(B, S, D)"| Add_1 Add_1 -->|"FLOAT(B, S, D)"| LayerNormalization_2 I_gamma -->|"FLOAT(D)"| LayerNormalization_2 I_beta -->|"FLOAT(D)"| LayerNormalization_2 O_Y(["Y FLOAT(B, S, D)"]) LayerNormalization_2 --> O_Y class I_input_ids,I_segment_ids,I_position_ids,I_gamma,I_beta,O_Y ioNode class Constant_0,Constant_1,Constant_2 constNode class Gather_0,Gather_1,Gather_2,Add_0,Add_1,LayerNormalization_2 opNodeOutcome of the fusion:
graph TD classDef ioNode fill:#dfd,stroke:#333,color:#333 classDef initNode fill:#cccc00,stroke:#333,color:#333 classDef constNode fill:#f9f,stroke:#333,stroke-width:2px,color:#333 classDef opNode fill:#bbf,stroke:#333,stroke-width:2px,color:#333 I_input_ids(["input_ids INT64(B, S)"]) I_segment_ids(["segment_ids INT64(B, S)"]) I_position_ids(["position_ids INT64(B, S)"]) I_word_table(["word_table FLOAT(V, D)"]) I_seg_table(["seg_table FLOAT(NS, D)"]) I_pos_table(["pos_table FLOAT(NP, D)"]) I_gamma(["gamma FLOAT(D)"]) I_beta(["beta FLOAT(D)"]) EmbedLayerNormalization[["com.microsoft.EmbedLayerNormalization(7 inputs)"]] I_input_ids -->|"INT64(B, S)"| EmbedLayerNormalization I_segment_ids -->|"INT64(B, S)"| EmbedLayerNormalization I_word_table -->|"FLOAT(V, D)"| EmbedLayerNormalization I_pos_table -->|"FLOAT(NP, D)"| EmbedLayerNormalization I_seg_table -->|"FLOAT(NS, D)"| EmbedLayerNormalization I_gamma -->|"FLOAT(D)"| EmbedLayerNormalization I_beta -->|"FLOAT(D)"| EmbedLayerNormalization O_Y(["Y FLOAT(B, S, D)"]) EmbedLayerNormalization --> O_Y class I_input_ids,I_segment_ids,I_position_ids,I_gamma,I_beta,O_Y ioNode class EmbedLayerNormalization opNode- apply(g: GraphBuilder, gather_0: NodeProto, gather_1: NodeProto, gather_seg: NodeProto | None, inner_or_outer_add: NodeProto, outer_add: NodeProto | None, ln_node: NodeProto) List[NodeProto][source]#
The method does the rewriting. It assumes it can happen. It takes a list of nodes impacted by the rewriting assumes no other pattern optimizer will be modify them. It receives the list of nodes returned by method apply. Since it is a list of argument, method match can include None values. The method returns the new nodes. The optimizer considers that any node given to this function is removed from the graph, and any node returned by it are added. If a received node must be kept, it must be added to the list of returned node.
- Parameters:
nodes – nodes returned by method match, there are then removed
- Returns:
nodes to add to graph.
- match(g: GraphBuilderPatternOptimization, node: NodeProto, matched: List[MatchResult]) MatchResult | None[source]#
Determines nodes around node which can be rewritten.
- Parameters:
g – is a
GraphBuilderPatternOptimization, it holds all the existing nodes, is able to return any information about type, shape, the node before, the node after another one.node – the matching must determine if some nodes around this one are part of set of nodes this pattern optimizer can rewrite. From there, the function explores wherever it needs, checking any condition it needs.
matched – usually unused, it returns of nodes already matching a pattern
The method must not modify the graph. The method returns None if no match is found or an instance of class
MatchResult. It must contain:a list of nodes involved in the rewriting. It does not mean all of them will be removed but all of them are needed to do the rewriting and must not be impacted by other pattern optimizer.
A function doing the rewriting (usually method apply of the pattern class).
An existing node where the rewritten nodes can be inserted. Knowing it makes it faster to rewriter. If not specified, the optimizer will automatically determine the position of the new nodes.