validation.cuda¶

C API¶

cuda_example_py¶

class onnx_extended.validation.cuda.cuda_example_py.FpemuMode¶

Available option for parameter mode in function fpemu_cuda_forward.

Members:

E4M3_RNE

property name¶

onnx_extended.validation.cuda.cuda_example_py.cuda_device_count() → int¶: Returns the number of cuda devices.

onnx_extended.validation.cuda.cuda_example_py.cuda_device_memory(device: int = 0) → tuple¶: Returns the free and total memory for a particular device.

onnx_extended.validation.cuda.cuda_example_py.cuda_devices_memory() → list¶: Returns the free and total memory for all devices.

onnx_extended.validation.cuda.cuda_example_py.cuda_version() → int¶: Returns the CUDA version the project was compiled with.

onnx_extended.validation.cuda.cuda_example_py.fpemu_cuda_forward(input: numpy.ndarray[numpy.float32], mode: onnx_extended.validation.cuda.cuda_example_py.FpemuMode = <FpemuMode.E4M3_RNE: 1>, inplace: bool = False, scale: float = 1.0, block_norm: bool = False, block_size: int = 1, cuda_device: int = 0) → numpy.ndarray[numpy.float32]¶

Experimental

Parameters:

input – array
mode – which quantization type
inplace – modification inplace instead of a new outoput
scale – scale
block_norm – normalization accrocess blocks
block_size – block size
cuda_device – device id (if mulitple one)

Returns:

forward pass

onnx_extended.validation.cuda.cuda_example_py.gemm_benchmark_test(test_id: int = 0, N: int = 10, m: int = 16, n: int = 16, k: int = 16, lda: int = 16, ldb: int = 16, ldd: int = 16) → Dict[str, float]

Benchmark Gemm on CUDA

Parameters:

test_id – a test configuration (int)
N – number of repetitions
m – dimensions of the matrices
n – dimensions of the matrices
k – dimensions of the matrices
lda – leading dimension of A
ldb – leading dimension of B
ldd – leading dimension of the result

Returns:

metrics in a dictionary

cuda_monitor¶

onnx_extended.validation.cuda.cuda_monitor.cuda_version() → int¶: Returns the CUDA version the project was compiled with.

onnx_extended.validation.cuda.cuda_monitor.nvml_device_get_count() → int¶: Returns the number of GPU units.

onnx_extended.validation.cuda.cuda_monitor.nvml_device_get_memory_info(device: int = 0) → tuple¶: Returns the free memory, the used memory, the total memory for a GPU device.

onnx_extended.validation.cuda.cuda_monitor.nvml_init() → None¶: Initializes memory managment from nvml library.

onnx_extended.validation.cuda.cuda_monitor.nvml_shutdown() → None¶: Closes memory managment from nvml library.