Source code for onnx_extended.ortops.optim.cuda

import os
import textwrap
from typing import List
from ... import _get_ort_ext_libs


[docs] def get_ort_ext_libs() -> List[str]: """ Returns the list of libraries implementing new simple :epkg:`onnxruntime` kernels implemented for the :epkg:`CUDAExecutionProvider`. """ libs = _get_ort_ext_libs(os.path.dirname(__file__)) return [lib for lib in libs if "cuda_cuda" not in lib]
def documentation() -> List[str]: """ Returns a list of rst string documenting every implemented kernels in this subfolder. """ return list( map( textwrap.dedent, [ """ onnx_extended.ortops.optim.cuda.AddAdd ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two consecutive element-wise addition assuming all tensors have the same shape (broadcast limited to the first dimensions). **Provider** CUDAExecutionProvider **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T **Outputs** * A+B+C (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.AddAddAdd ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Three consecutive element-wise addition assuming all tensors have the same shape (broadcast limited to the first dimensions). **Provider** CUDAExecutionProvider **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T * D (T): tensor of type T **Outputs** * A+B+C+D (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.AddMul ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two consecutive element-wise Add, Mul assuming all tensors have the same shape (broadcast limited to the first dimensions). **Provider** CUDAExecutionProvider **Attributes** * transposeMiddle: bool, if True, applies transposition [0, 2, 1, 3] on the result **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T **Outputs** * (A+B)*C (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.AddSharedInput ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Parallel Additions with one common input. Support for Broadcast is limited (broadcast limited to the first dimensions). Computes A + B, A + C. **Provider** CUDAExecutionProvider **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T **Outputs** * A+B (T): element-wise * A+C (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.MaskedScatterNDOfShape ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ConstantOfShape + Where + ScatterND, updates a null matrix with updates if only indices are not equal to a value (usually -1) **Provider** CUDAExecutionProvider **Attributes** * maskedValue (int): updates are ignore the indices are equal to this value. **Inputs** * shape (I): tensor of type I * indices (I): tensor of type I * updates (T): tensor of type T **Outputs** * Z (T): updated tensor **Constraints** * I: int64 * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.MulAdd ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two consecutive element-wise Mul, Add assuming all tensors have the same shape (broadcast limited to the first dimensions). **Provider** CUDAExecutionProvider **Attributes** * transposeMiddle: bool, if True, applies transposition [0, 2, 1, 3] on the result **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T **Outputs** * A*B+C (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.MulMul ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two consecutive element-wise multiplication assuming all tensors have the same shape (broadcast limited to the first dimensions). **Provider** CUDAExecutionProvider **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T **Outputs** * ABC (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.MulMulMul ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two consecutive element-wise multiplication assuming all tensors have the same shape (broadcast limited to the first dimensions). **Provider** CUDAExecutionProvider **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T * D (T): tensor of type T **Outputs** * ABCD (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.MulSoftmax ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ MulSoftmax, equivalent to Mul(X, Softmax(X)) **Provider** CUDAExecutionProvider **Inputs** * X (T): tensor **Outputs** * Z (T): result **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.MulSharedInput ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Parallel Multiplications with one common input. Support for Broadcast is limited (broadcast limited to the first dimensions). Computes A * B, A * C. **Provider** CUDAExecutionProvider **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T **Outputs** * A*B (T): element-wise * A*C (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.MulSub ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two consecutive element-wise Mul, Sub assuming all tensors have the same shape (broadcast limited to the first dimensions). **Provider** CUDAExecutionProvider **Attribute** * negative: to switch the order of the subtraction **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T **Outputs** * (A*B)-C (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.NegXplus1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 - X **Provider** CUDAExecutionProvider **Inputs** * X (T): tensor of type T **Outputs** * Z (T): result **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.ReplaceZero ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ MulSoftmax, equivalent to Where(X == 0, cst, X) **Provider** CUDAExecutionProvider **Inputs** * X (T): tensor of type T **Outputs** * Z (T): updated tensor **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.Rotary ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rotary, equivalent to (side=="RIGHT") * Split(X, axis=-1) -> X1, X2 * Concat(-X2, X1) **Provider** CUDAExecutionProvider **Inputs** * X (T): tensor * splits (I): split size on the last dimension Only splitting in half is implemented. **Outputs** * Z (T): result **Constraints** * T: float, float16 * I: int64 """, """ onnx_extended.ortops.optim.cuda.ScatterNDOfShape ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ConstantOfShape + ScatterND **Provider** CUDAExecutionProvider **Inputs** * shape (I): tensor of type I * indices (I): tensor of type I * updates (T): tensor of type T **Outputs** * Z (T): updated tensor **Constraints** * I: int64 * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.SubMul ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two consecutive element-wise Sub, Mul assuming all tensors have the same shape (broadcast limited to the first dimensions). **Provider** CUDAExecutionProvider **Attribute** * negative: to switch the order of the subtraction **Inputs** * A (T): tensor of type T * B (T): tensor of type T * C (T): tensor of type T **Outputs** * (A-B)*C (T): element-wise **Constraints** * T: float, float16 """, """ onnx_extended.ortops.optim.cuda.Transpose2DCast16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Transposes a 2D matrix the cast it into float16. **Provider** CUDAExecutionProvider **Inputs** * X (T1): tensor Only splitting in half is implemented. **Outputs** * Z (T2): result **Constraints** * T1: float32 * T2: float16 """, """ onnx_extended.ortops.optim.cuda.Transpose2DCast32 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Transposes a 2D matrix the cast it into float32. **Provider** CUDAExecutionProvider **Inputs** * X (T1): tensor Only splitting in half is implemented. **Outputs** * Z (T2): result **Constraints** * T1: float16 * T2: float32 """, """ onnx_extended.ortops.optim.cuda.TriMatrix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Creates a matrix. :: mat[i < j] = upper mat[i == j] = diag mat[i > j] = lower **Provider** CUDAExecutionProvider **Inputs** * shape (I): tensor of type I * cst (T): lower, diag, upper values **Outputs** * Z (T): matrix **Constraints** * I: int64 * T: float, float16 """, ], ) )