yobx.xoptim.patterns_ort#
modules
- yobx.xoptim.patterns_ort.activation
- yobx.xoptim.patterns_ort.causal_conv
- yobx.xoptim.patterns_ort.complex_mul
- yobx.xoptim.patterns_ort.embed_layer_normalization
- yobx.xoptim.patterns_ort.fused_conv
- yobx.xoptim.patterns_ort.fused_matmul
- yobx.xoptim.patterns_ort.greedy_search
- yobx.xoptim.patterns_ort.llm_optim
- yobx.xoptim.patterns_ort.missing_kernels
- yobx.xoptim.patterns_ort.moe
- yobx.xoptim.patterns_ort.relative_position_bias
- yobx.xoptim.patterns_ort.simplified_layer_normalization
get_onnxruntime_patterns#
- yobx.xoptim.patterns_ort.get_onnxruntime_patterns(verbose: int = 0) List[PatternOptimization][source]#
Returns a default list of optimization patterns for onnxruntime. It is equal to the following list.
<<<
from yobx.xoptim.patterns_api import pattern_table_doc from yobx.xoptim.patterns_ort import get_onnxruntime_patterns print(pattern_table_doc(get_onnxruntime_patterns(), as_rst=True)) print()
>>>
name
short_name
priority
doc
0
Attention3DPattern
Attention3D
2
Fuses nodes into Attention from com.microsoft domain. In progress.
1
BiasGeluPattern
BiasGelu
1
Replaces by
y = BiasGelu(x, B)2
BiasSoftmaxPattern
BiasSoftmax
1
Replaces Softmax(Add(x,y), axis=-1) by BiasSoftmax(x,y,axis=-1) Model with nodes to be fused…
3
BiasSplitGeluPattern
BiasSplitGelu
1
Replaces by
y = BiasSplitGelu(x, B)4
CausalConvWithStatePattern
CausalConvWithState
2
Fuses
Concat + Conv (+ Slice)intocom.microsoft.CausalConvWithState. The operator performs a stateful causal depthwise 1-D convolution and replaces the streaming pattern that concatenates a past-state buffer with the current input, runs a depthwise Conv, and optionally slices the lastK-1frames back out as the next state. Model with nodes to be fused…5
ComplexMulPattern
ComplexMul
2
Replaces a decomposed complex multiplication by
com.microsoft.ComplexMul(A, B). Complex multiplication is defined as6
ComplexMulConjPattern
ComplexMulConj
2
Replaces a decomposed complex multiplication with conjugate by
com.microsoft.ComplexMulConj(A, B). Complex multiplication with conjugate is defined as7
ContribRotaryEmbeddingPattern
ContribRotaryEmbedding
2
Very similar to
yobx.xoptim.patterns.onnx_rotary.RotaryEmbeddingPattern. Model with nodes to be fused…8
ContribRotaryEmbedding3DPattern
ContribRotaryEmbedding3D
1
Extension to
yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbeddingPattern, turn the operator into a 3D operator including the transpose. Model with nodes to be fused…9
ContribGemmaRotaryEmbeddingPattern
ContribGemmaRotaryEmbedding
2
Fuses two
intermediate.HalfRotaryEmbeddingnodes that share cos/sin inputs traced back throughUnsqueeze([Cast(]Cos/Sin(emb)[)])into a singlecom.microsoft.GemmaRotaryEmbeddingnode. Model with nodes to be fused (afteryobx.xoptim.patterns.onnx_rotary.FunctionHalfRotaryEmbeddingPattern)…10
EmbedLayerNormalizationPattern
EmbedLayerNormalization
2
Fuses the sequence of Gather + Add + LayerNormalization nodes into
com.microsoft.EmbedLayerNormalization. This pattern handles transformer model embedding layers where word, position, and optionally segment embeddings are looked up viaGathernodes, summed viaAddnodes, and then normalized viaLayerNormalization. Model with nodes to be fused (3-embedding BERT variant)…11
GeluOrtPattern
GeluOrt
0
Detects the decomposed version of Gelu with Tanh
12
GeluErfPattern
GeluErf
0
Detects the decomposed version of Gelu with Erf. Model with nodes to be fused…
13
GroupQueryAttention3DPattern
GroupQueryAttention3D
2
Fuse LocalAttention into GroupQueryAttention.
biasis not supported by this kernel on CUDA.14
FusedConvPattern
FusedConv
2
Replaces the Conv + Relu into FusedConv. Model with nodes to be fused…
15
FastGeluPattern
FastGelu
1
Replaces Gelu by FastGelu. Model with nodes to be fused…
16
FusedMatMulPattern
FusedMatMul
2
Replaces the sequence Transpose, Matmul into FusedMatMul. Model with nodes to be fused…
17
FusedMatMulActivationPattern
FusedMatMulActivation
2
Replaces the sequence (Fused)MatMul followed by an activation function into com.microsoft.FusedMatMulActivation. Supported activations:
Relu,Tanh,Sigmoid,LeakyRelu,HardSigmoid. Model with nodes to be fused…18
FusedMatMulx2Pattern
FusedMatMulx2
3
Replaces the sequence Div by a scalar consumed by two FusedMatMul. Model with nodes to be fused…
19
FusedMatMulDivPattern
FusedMatMulDiv
2
Replaces the Matmul, Div into FusedMatMul. Model with nodes to be fused…
20
FusedMatMulTransposePattern
FusedMatMulTranspose
3
Replaces the sequence (Fused)Matmul(A,B) + Transpose into FusedMatMul(B.T, A.T). Model with nodes to be fused…
21
GemmFastGeluPattern
GemmFastGelu
1
Replaces
MatMul + Add(bias) + FastGeluorMatMul + FastGelubyGemmFastGelu(A, B, [bias]). Three cases are handled: * Case 1 —MatMul(A, B) → Add(AB, bias) → FastGelu(AB+bias)* Case 2 —MatMul(A, B) → FastGelu(AB, bias)(FastGelu with two inputs) * Case 3 —MatMul(A, B) → FastGelu(AB)(no bias) Model with nodes to be fused (Case 1)…22
GatedRelativePositionBiasPattern
GatedRelativePositionBias
2
Implements the fusion of gated relative position bias computation (DeBERTa-v2/v3 style) into
com.microsoft.GatedRelativePositionBias. The fused pattern corresponds to the DeBERTa disentangled self-attention gating computation, which applies a learned sigmoid gate to modulate a pre-computed relative position bias tensor. Model with nodes to be fused…23
GreedySearchPattern
GreedySearch
2
Ensures
com.microsoft.GreedySearchreceives INT32 integer inputs. The ORT contrib operatorGreedySearchrequires all integer tensors (input_ids,max_length,min_length,vocab_mask,prefix_vocab_mask, andattention_mask) to be of type INT32. PyTorch typically produces INT64 tensors, so without this pattern the node would fail at runtime. This pattern matches anycom.microsoft.GreedySearchnode that has at least one integer input with dtype INT64 and insertsCast(INT64→INT32)nodes for every such input. Model with nodes to be fused…24
MissingCosSinPattern
MissingCosSin
1
Replaces Cos/Sin by Cast Cos/Sin Cast because of some missing kernels.
25
MissingRangePattern
MissingRange
1
Replaces Range by Cast Range Cast because of some missing kernels.
26
MissingReduceMaxPattern
MissingReduceMax
1
Replaces Range by Cast Range Cast because of some missing kernels.
27
MissingTopKPattern
MissingTopK
1
Replaces Range by Cast Range Cast because of some missing kernels.
28
MoEPattern
MoE
2
Fuses the Mixture-of-Experts (MoE) computation pattern into a single
com.microsoft.MoEnode. The pattern matches a standard top-k expert dispatch with two FC layers and an element-wise activation between them. The routing probabilities must already be computed (e.g. viaSoftmax) before the pattern. Model with nodes to be fused (k=1, relu, both biases present)…29
MultiHeadAttention3DPattern
MultiHeadAttention3D
2
Merges multiple nodes into MultiHeadAttention. It assumes pattern
yobx.xoptim.patterns.onnx_attention.FunctionAttentionPatternwas triggered before. Model with nodes to be fused…30
QuickGeluPattern
QuickGelu
1
Replaces Mul(x, Sigmoid(x)) by QuickGelu(x, alpha=1) Model with nodes to be fused…
31
ReshapeGemmPattern
ReshapeGemm
3
Replaces the sequence Reshape(-1, …) + Gemm into FusedMatMul(). Model with nodes to be fused…
32
ReshapeGemmReshapePattern
ReshapeGemmReshape
3
Replaces the sequence Reshape + Gemm + Reshape into FusedMatMul. Model with nodes to be fused…
33
RelativePositionBiasPattern
RelativePositionBias
2
Fuses the relative position bias computation (T5-style, encoder) into
com.microsoft.RelativePositionBias. The fused pattern corresponds to the T5 bidirectional relative attention bias computation, recognizable by aGathernode reading from a learnable bias table, whose indices are computed through a bucketing function of absolute relative positions. Model with nodes to be fused…34
SimplifiedLayerNormalizationPattern
SimplifiedLayerNormalization
1
Fuses the nodes equivalent to SimplifiedLayerNormalization. Model with nodes to be fused…
35
SimplifiedLayerNormalizationMulPattern
SimplifiedLayerNormalizationMul
1
Replaces the sequence SimplifiedLayerNormalization + Mul by SimplifiedLayerNormalization. Model with nodes to be fused…
36
SkipLayerNormalizationPattern
SkipLayerNormalization
1
Replaces the sequence Add + LayerNormalization into SkipLayerNormalization. Model with nodes to be fused…
37
SkipSimplifiedLayerNormalizationPattern
SkipSimplifiedLayerNormalization
1
Replaces the sequence Add + SimplifiedLayerNormalization by SkipSimplifiedLayerNormalization. Model with nodes to be fused…
38
SkipSimplifiedLayerNormalizationMulPattern
SkipSimplifiedLayerNormalizationMul
1
Replaces the sequence SkipSimplifiedLayerNormalization + Mul by SkipSimplifiedLayerNormalization. Model with nodes to be fused…
39
TransposeFusedMatMulBPattern
TransposeFusedMatMulB
3
Replaces the sequence Transpose(B, [0, 2, 3, 1] + (Fused)Matmul(A,B) into Transpose(A, [0, 2, 1, 3]) + FusedMatMul(A, B, transB=1). Model with nodes to be fused…