yobx.xoptim.patterns_ort#

modules

get_onnxruntime_patterns#

yobx.xoptim.patterns_ort.get_onnxruntime_patterns(verbose: int = 0) → List[PatternOptimization][source]#

Returns a default list of optimization patterns for onnxruntime. It is equal to the following list.

<<<

from yobx.xoptim.patterns_api import pattern_table_doc
from yobx.xoptim.patterns_ort import get_onnxruntime_patterns

print(pattern_table_doc(get_onnxruntime_patterns(), as_rst=True))
print()

>>>

	name	short_name	priority	doc
0	Attention3DPattern	Attention3D	2	Fuses nodes into Attention from com.microsoft domain. In progress.
1	BiasGeluPattern	BiasGelu	1	Replaces by `y = BiasGelu(x, B)`
2	BiasSoftmaxPattern	BiasSoftmax	1	Replaces Softmax(Add(x,y), axis=-1) by BiasSoftmax(x,y,axis=-1) Model with nodes to be fused…
3	BiasSplitGeluPattern	BiasSplitGelu	1	Replaces by `y = BiasSplitGelu(x, B)`
4	CausalConvWithStatePattern	CausalConvWithState	2	Fuses `Concat + Conv (+ Slice)` into `com.microsoft.CausalConvWithState`. The operator performs a stateful causal depthwise 1-D convolution and replaces the streaming pattern that concatenates a past-state buffer with the current input, runs a depthwise Conv, and optionally slices the last `K-1` frames back out as the next state. Model with nodes to be fused…
5	ComplexMulPattern	ComplexMul	2	Replaces a decomposed complex multiplication by `com.microsoft.ComplexMul(A, B)`. Complex multiplication is defined as
6	ComplexMulConjPattern	ComplexMulConj	2	Replaces a decomposed complex multiplication with conjugate by `com.microsoft.ComplexMulConj(A, B)`. Complex multiplication with conjugate is defined as
7	ContribRotaryEmbeddingPattern	ContribRotaryEmbedding	2	Very similar to `yobx.xoptim.patterns.onnx_rotary.RotaryEmbeddingPattern`. Model with nodes to be fused…
8	ContribRotaryEmbedding3DPattern	ContribRotaryEmbedding3D	1	Extension to `yobx.xoptim.patterns_ort.llm_optim.ContribRotaryEmbeddingPattern`, turn the operator into a 3D operator including the transpose. Model with nodes to be fused…
9	ContribGemmaRotaryEmbeddingPattern	ContribGemmaRotaryEmbedding	2	Fuses two `intermediate.HalfRotaryEmbedding` nodes that share cos/sin inputs traced back through `Unsqueeze([Cast(]Cos/Sin(emb)[)])` into a single `com.microsoft.GemmaRotaryEmbedding` node. Model with nodes to be fused (after `yobx.xoptim.patterns.onnx_rotary.FunctionHalfRotaryEmbeddingPattern`)…
10	DecoderAttentionPattern	DecoderAttention	2	Fuses a sequence-first decoder cross-attention or self-attention computation into `com.microsoft.DecoderAttention`. The operator expects inputs in sequence-first format `(S, B, H)` (sequence length, batch size, hidden size) and separate weight matrices: * `query` – `(S, B, H)` * `key` – `(T, B, H)` (same as query for self-attention) * `q_weight` – `(H, H)` * `kv_weight` – `(H, 2H)` (K and V weights concatenated) `bias` – `(3H,)` (Q, K, V biases concatenated) `static_kv` – bool scalar: `True` for cross-attention, `False` for self-attention * `use_past` – bool scalar: `False` (no KV-cache in this pattern) * `has_layer_state` – bool scalar: `False` * `has_key_padding_mask` – bool scalar: `False` Cross-attention is detected when the source of the Q projection differs from the source of the K/V projections. Model with nodes to be fused (seq-first cross-attention, no cache, no mask)…
11	EmbedLayerNormalizationPattern	EmbedLayerNormalization	2	Fuses the sequence of Gather + Add + LayerNormalization nodes into `com.microsoft.EmbedLayerNormalization`. This pattern handles transformer model embedding layers where word, position, and optionally segment embeddings are looked up via `Gather` nodes, summed via `Add` nodes, and then normalized via `LayerNormalization`. Model with nodes to be fused (3-embedding BERT variant)…
12	GeluOrtPattern	GeluOrt	0	Detects the decomposed version of Gelu with Tanh
13	GeluErfPattern	GeluErf	0	Detects the decomposed version of Gelu with Erf. Model with nodes to be fused…
14	GroupQueryAttention3DPattern	GroupQueryAttention3D	2	Fuse LocalAttention into GroupQueryAttention. `bias` is not supported by this kernel on CUDA.
15	FusedConvPattern	FusedConv	2	Replaces the Conv + Relu into FusedConv. Model with nodes to be fused…
16	FastGeluPattern	FastGelu	1	Replaces Gelu by FastGelu. Model with nodes to be fused…
17	LinearAttentionPattern	LinearAttention	2	Fuses a linear-attention recurrent state update into `com.microsoft.LinearAttention`. The pattern supports two update_rule variants (`'linear'` and `'gated'`), which correspond exactly to the `update_rule` attribute of the ORT contrib op. Inputs expected by the pattern (all 3-D packed, i.e. batch-first with heads folded into the last dimension): * `query` – `FLOAT(B, T, H_q * d_k)` * `key` – `FLOAT(B, T, H_kv * d_k)` * `value` – `FLOAT(B, T, H_kv * d_v)` * `past_state` (optional) – `FLOAT(B, H_kv, d_k, d_v)` * `decay` (optional, gated only) – `FLOAT(B, T, H_kv * d_k)` or `FLOAT(B, T, H_kv)` Update rules (where ⊗ denotes outer product): * `'linear'`: `S_t = S_{t-1} + k_t ⊗ v_t` * `'gated'`: `S_t = exp(g_t) * S_{t-1} + k_t ⊗ v_t` followed in all cases by: `o_t = scale * q_t^T S_t` The pattern operates on the 4-D internal representation obtained after unpacking and transposing the 3-D packed inputs: `[B, T, H * d] → Reshape → Transpose → [B, H, T, d]` For the decoding case (`T = 1`) the sequence/time dimension is squeezed before the core computation and unsqueezed afterwards. Model with nodes to be fused (`'linear'` rule, `T = 1`)…
18	FusedMatMulPattern	FusedMatMul	2	Replaces the sequence Transpose, Matmul into FusedMatMul. Model with nodes to be fused…
19	FusedMatMulx2Pattern	FusedMatMulx2	3	Replaces the sequence Div by a scalar consumed by two FusedMatMul. Model with nodes to be fused…
20	FusedMatMulDivPattern	FusedMatMulDiv	2	Replaces the Matmul, Div into FusedMatMul. Model with nodes to be fused…
21	FusedMatMulTransposePattern	FusedMatMulTranspose	3	Replaces the sequence (Fused)Matmul(A,B) + Transpose into FusedMatMul(B.T, A.T). Model with nodes to be fused…
22	GatedRelativePositionBiasPattern	GatedRelativePositionBias	2	Implements the fusion of gated relative position bias computation (DeBERTa-v2/v3 style) into `com.microsoft.GatedRelativePositionBias`. The fused pattern corresponds to the DeBERTa disentangled self-attention gating computation, which applies a learned sigmoid gate to modulate a pre-computed relative position bias tensor. Model with nodes to be fused…
23	GreedySearchPattern	GreedySearch	2	Ensures `com.microsoft.GreedySearch` receives INT32 integer inputs. The ORT contrib operator `GreedySearch` requires all integer tensors (`input_ids`, `max_length`, `min_length`, `vocab_mask`, `prefix_vocab_mask`, and `attention_mask`) to be of type INT32. PyTorch typically produces INT64 tensors, so without this pattern the node would fail at runtime. This pattern matches any `com.microsoft.GreedySearch` node that has at least one integer input with dtype INT64 and inserts `Cast(INT64→INT32)` nodes for every such input. Model with nodes to be fused…
24	MissingCosSinPattern	MissingCosSin	1	Replaces Cos/Sin by Cast Cos/Sin Cast because of some missing kernels.
25	MissingRangePattern	MissingRange	1	Replaces Range by Cast Range Cast because of some missing kernels.
26	MissingReduceMaxPattern	MissingReduceMax	1	Replaces Range by Cast Range Cast because of some missing kernels.
27	MissingTopKPattern	MissingTopK	1	Replaces Range by Cast Range Cast because of some missing kernels.
28	MoEPattern	MoE	2	Fuses the Mixture-of-Experts (MoE) computation pattern into a single `com.microsoft.MoE` node. The pattern matches a standard top-k expert dispatch with two FC layers and an element-wise activation between them. The routing probabilities must already be computed (e.g. via `Softmax`) before the pattern. Model with nodes to be fused (k=1, relu, both biases present)…
29	MultiHeadAttention3DPattern	MultiHeadAttention3D	2	Merges multiple nodes into MultiHeadAttention. It assumes pattern `yobx.xoptim.patterns.onnx_attention.FunctionAttentionPattern` was triggered before. Model with nodes to be fused…
30	QuickGeluPattern	QuickGelu	1	Replaces Mul(x, Sigmoid(x)) by QuickGelu(x, alpha=1) Model with nodes to be fused…
31	ReshapeGemmPattern	ReshapeGemm	3	Replaces the sequence Reshape(-1, …) + Gemm into FusedMatMul(). Model with nodes to be fused…
32	ReshapeGemmReshapePattern	ReshapeGemmReshape	3	Replaces the sequence Reshape + Gemm + Reshape into FusedMatMul. Model with nodes to be fused…
33	RelativePositionBiasPattern	RelativePositionBias	2	Fuses the relative position bias computation (T5-style, encoder) into `com.microsoft.RelativePositionBias`. The fused pattern corresponds to the T5 bidirectional relative attention bias computation, recognizable by a `Gather` node reading from a learnable bias table, whose indices are computed through a bucketing function of absolute relative positions. Model with nodes to be fused…
34	SimplifiedLayerNormalizationPattern	SimplifiedLayerNormalization	1	Fuses the nodes equivalent to SimplifiedLayerNormalization. Model with nodes to be fused…
35	SimplifiedLayerNormalizationMulPattern	SimplifiedLayerNormalizationMul	1	Replaces the sequence SimplifiedLayerNormalization + Mul by SimplifiedLayerNormalization. Model with nodes to be fused…
36	SkipLayerNormalizationPattern	SkipLayerNormalization	1	Replaces the sequence Add + LayerNormalization into SkipLayerNormalization. Model with nodes to be fused…
37	SkipSimplifiedLayerNormalizationPattern	SkipSimplifiedLayerNormalization	1	Replaces the sequence Add + SimplifiedLayerNormalization by SkipSimplifiedLayerNormalization. Model with nodes to be fused…
38	SkipSimplifiedLayerNormalizationMulPattern	SkipSimplifiedLayerNormalizationMul	1	Replaces the sequence SkipSimplifiedLayerNormalization + Mul by SkipSimplifiedLayerNormalization. Model with nodes to be fused…
39	TransposeFusedMatMulBPattern	TransposeFusedMatMulB	3	Replaces the sequence Transpose(B, [0, 2, 3, 1] + (Fused)Matmul(A,B) into Transpose(A, [0, 2, 1, 3]) + FusedMatMul(A, B, transB=1). Model with nodes to be fused…