yobx.xoptim.patterns_ort#

get_onnxruntime_patterns#

yobx.xoptim.patterns_ort.get_onnxruntime_patterns(verbose: int = 0) List[PatternOptimization][source]#

Returns a default list of optimization patterns for onnxruntime. It is equal to the following list.

<<<

from yobx.xoptim.patterns_api import pattern_table_doc
from yobx.xoptim.patterns_ort import get_onnxruntime_patterns

print(pattern_table_doc(get_onnxruntime_patterns(), as_rst=True))
print()

>>>

name

short_name

priority

doc

0

Attention3DPattern

Attention3D

2

Fuses nodes into Attention from com.microsoft domain. In progress.

1

BiasGeluPattern

BiasGelu

1

Replaces by y = BiasGelu(x, B)

2

BiasSoftmaxPattern

BiasSoftmax

1

Replaces Softmax(Add(x,y), axis=-1) by BiasSoftmax(x,y,axis=-1) Model with nodes to be fused…

3

BiasSplitGeluPattern

BiasSplitGelu

1

Replaces by y = BiasSplitGelu(x, B)

4

CausalConvWithStatePattern

CausalConvWithState

2

Fuses Concat + Conv (+ Slice) into com.microsoft.CausalConvWithState. The operator performs a stateful causal depthwise 1-D convolution and replaces the streaming pattern that concatenates a past-state buffer with the current input, runs a depthwise Conv, and optionally slices the last K-1 frames back out as the next state. Model with nodes to be fused…

5

ComplexMulPattern

ComplexMul

2

Replaces a decomposed complex multiplication by com.microsoft.ComplexMul(A, B). Complex multiplication is defined as

6

ComplexMulConjPattern

ComplexMulConj

2

Replaces a decomposed complex multiplication with conjugate by com.microsoft.ComplexMulConj(A, B). Complex multiplication with conjugate is defined as

7

ContribRotaryEmbeddingPattern

ContribRotaryEmbedding

2

Very similar to yobx.xoptim.patterns.onnx_rotary.RotaryEmbeddingPattern. Model with nodes to be fused…

8

ContribRotaryEmbedding3DPattern

ContribRotaryEmbedding3D

1

Extension to yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbeddingPattern, turn the operator into a 3D operator including the transpose. Model with nodes to be fused…

9

ContribGemmaRotaryEmbeddingPattern

ContribGemmaRotaryEmbedding

2

Fuses two intermediate.HalfRotaryEmbedding nodes that share cos/sin inputs traced back through Unsqueeze([Cast(]Cos/Sin(emb)[)]) into a single com.microsoft.GemmaRotaryEmbedding node. Model with nodes to be fused (after yobx.xoptim.patterns.onnx_rotary.FunctionHalfRotaryEmbeddingPattern)…

10

EmbedLayerNormalizationPattern

EmbedLayerNormalization

2

Fuses the sequence of Gather + Add + LayerNormalization nodes into com.microsoft.EmbedLayerNormalization. This pattern handles transformer model embedding layers where word, position, and optionally segment embeddings are looked up via Gather nodes, summed via Add nodes, and then normalized via LayerNormalization. Model with nodes to be fused (3-embedding BERT variant)…

11

GeluOrtPattern

GeluOrt

0

Detects the decomposed version of Gelu with Tanh

12

GeluErfPattern

GeluErf

0

Detects the decomposed version of Gelu with Erf. Model with nodes to be fused…

13

GroupQueryAttention3DPattern

GroupQueryAttention3D

2

Fuse LocalAttention into GroupQueryAttention. bias is not supported by this kernel on CUDA.

14

FusedConvPattern

FusedConv

2

Replaces the Conv + Relu into FusedConv. Model with nodes to be fused…

15

FastGeluPattern

FastGelu

1

Replaces Gelu by FastGelu. Model with nodes to be fused…

16

FusedMatMulPattern

FusedMatMul

2

Replaces the sequence Transpose, Matmul into FusedMatMul. Model with nodes to be fused…

17

FusedMatMulActivationPattern

FusedMatMulActivation

2

Replaces the sequence (Fused)MatMul followed by an activation function into com.microsoft.FusedMatMulActivation. Supported activations: Relu, Tanh, Sigmoid, LeakyRelu, HardSigmoid. Model with nodes to be fused…

18

FusedMatMulx2Pattern

FusedMatMulx2

3

Replaces the sequence Div by a scalar consumed by two FusedMatMul. Model with nodes to be fused…

19

FusedMatMulDivPattern

FusedMatMulDiv

2

Replaces the Matmul, Div into FusedMatMul. Model with nodes to be fused…

20

FusedMatMulTransposePattern

FusedMatMulTranspose

3

Replaces the sequence (Fused)Matmul(A,B) + Transpose into FusedMatMul(B.T, A.T). Model with nodes to be fused…

21

GemmFastGeluPattern

GemmFastGelu

1

Replaces MatMul + Add(bias) + FastGelu or MatMul + FastGelu by GemmFastGelu(A, B, [bias]). Three cases are handled: * Case 1 — MatMul(A, B) Add(AB, bias) FastGelu(AB+bias) * Case 2 — MatMul(A, B) FastGelu(AB, bias) (FastGelu with two inputs) * Case 3 — MatMul(A, B) FastGelu(AB) (no bias) Model with nodes to be fused (Case 1)…

22

GatedRelativePositionBiasPattern

GatedRelativePositionBias

2

Implements the fusion of gated relative position bias computation (DeBERTa-v2/v3 style) into com.microsoft.GatedRelativePositionBias. The fused pattern corresponds to the DeBERTa disentangled self-attention gating computation, which applies a learned sigmoid gate to modulate a pre-computed relative position bias tensor. Model with nodes to be fused…

23

GreedySearchPattern

GreedySearch

2

Ensures com.microsoft.GreedySearch receives INT32 integer inputs. The ORT contrib operator GreedySearch requires all integer tensors (input_ids, max_length, min_length, vocab_mask, prefix_vocab_mask, and attention_mask) to be of type INT32. PyTorch typically produces INT64 tensors, so without this pattern the node would fail at runtime. This pattern matches any com.microsoft.GreedySearch node that has at least one integer input with dtype INT64 and inserts Cast(INT64→INT32) nodes for every such input. Model with nodes to be fused…

24

MissingCosSinPattern

MissingCosSin

1

Replaces Cos/Sin by Cast Cos/Sin Cast because of some missing kernels.

25

MissingRangePattern

MissingRange

1

Replaces Range by Cast Range Cast because of some missing kernels.

26

MissingReduceMaxPattern

MissingReduceMax

1

Replaces Range by Cast Range Cast because of some missing kernels.

27

MissingTopKPattern

MissingTopK

1

Replaces Range by Cast Range Cast because of some missing kernels.

28

MoEPattern

MoE

2

Fuses the Mixture-of-Experts (MoE) computation pattern into a single com.microsoft.MoE node. The pattern matches a standard top-k expert dispatch with two FC layers and an element-wise activation between them. The routing probabilities must already be computed (e.g. via Softmax) before the pattern. Model with nodes to be fused (k=1, relu, both biases present)…

29

MultiHeadAttention3DPattern

MultiHeadAttention3D

2

Merges multiple nodes into MultiHeadAttention. It assumes pattern yobx.xoptim.patterns.onnx_attention.FunctionAttentionPattern was triggered before. Model with nodes to be fused…

30

QuickGeluPattern

QuickGelu

1

Replaces Mul(x, Sigmoid(x)) by QuickGelu(x, alpha=1) Model with nodes to be fused…

31

ReshapeGemmPattern

ReshapeGemm

3

Replaces the sequence Reshape(-1, …) + Gemm into FusedMatMul(). Model with nodes to be fused…

32

ReshapeGemmReshapePattern

ReshapeGemmReshape

3

Replaces the sequence Reshape + Gemm + Reshape into FusedMatMul. Model with nodes to be fused…

33

RelativePositionBiasPattern

RelativePositionBias

2

Fuses the relative position bias computation (T5-style, encoder) into com.microsoft.RelativePositionBias. The fused pattern corresponds to the T5 bidirectional relative attention bias computation, recognizable by a Gather node reading from a learnable bias table, whose indices are computed through a bucketing function of absolute relative positions. Model with nodes to be fused…

34

SimplifiedLayerNormalizationPattern

SimplifiedLayerNormalization

1

Fuses the nodes equivalent to SimplifiedLayerNormalization. Model with nodes to be fused…

35

SimplifiedLayerNormalizationMulPattern

SimplifiedLayerNormalizationMul

1

Replaces the sequence SimplifiedLayerNormalization + Mul by SimplifiedLayerNormalization. Model with nodes to be fused…

36

SkipLayerNormalizationPattern

SkipLayerNormalization

1

Replaces the sequence Add + LayerNormalization into SkipLayerNormalization. Model with nodes to be fused…

37

SkipSimplifiedLayerNormalizationPattern

SkipSimplifiedLayerNormalization

1

Replaces the sequence Add + SimplifiedLayerNormalization by SkipSimplifiedLayerNormalization. Model with nodes to be fused…

38

SkipSimplifiedLayerNormalizationMulPattern

SkipSimplifiedLayerNormalizationMul

1

Replaces the sequence SkipSimplifiedLayerNormalization + Mul by SkipSimplifiedLayerNormalization. Model with nodes to be fused…

39

TransposeFusedMatMulBPattern

TransposeFusedMatMulB

3

Replaces the sequence Transpose(B, [0, 2, 3, 1] + (Fused)Matmul(A,B) into Transpose(A, [0, 2, 1, 3]) + FusedMatMul(A, B, transB=1). Model with nodes to be fused…