validation.cuda

C API

cuda_example_py

class onnx_extended.validation.cuda.cuda_example_py.FpemuMode

Available option for parameter mode in function fpemu_cuda_forward.

Members:

E4M3_RNE

property name
onnx_extended.validation.cuda.cuda_example_py.cuda_device_count() int

Returns the number of cuda devices.

onnx_extended.validation.cuda.cuda_example_py.cuda_device_memory(device: SupportsInt = 0) tuple

Returns the free and total memory for a particular device.

onnx_extended.validation.cuda.cuda_example_py.cuda_devices_memory() list

Returns the free and total memory for all devices.

onnx_extended.validation.cuda.cuda_example_py.cuda_version() int

Returns the CUDA version the project was compiled with.

onnx_extended.validation.cuda.cuda_example_py.fpemu_cuda_forward(input: typing.Annotated[numpy.typing.ArrayLike, numpy.float32], mode: onnx_extended.validation.cuda.cuda_example_py.FpemuMode = <FpemuMode.E4M3_RNE: 1>, inplace: bool = False, scale: typing.SupportsFloat = 1.0, block_norm: bool = False, block_size: typing.SupportsInt = 1, cuda_device: typing.SupportsInt = 0) numpy.typing.NDArray[numpy.float32]

Experimental

Parameters:
  • input – array

  • mode – which quantization type

  • inplace – modification inplace instead of a new outoput

  • scale – scale

  • block_norm – normalization accrocess blocks

  • block_size – block size

  • cuda_device – device id (if mulitple one)

Returns:

forward pass

onnx_extended.validation.cuda.cuda_example_py.gemm_benchmark_test(test_id: SupportsInt = 0, N: SupportsInt = 10, m: SupportsInt = 16, n: SupportsInt = 16, k: SupportsInt = 16, lda: SupportsInt = 16, ldb: SupportsInt = 16, ldd: SupportsInt = 16) dict[str, float]

Benchmark Gemm on CUDA

Parameters:
  • test_id – a test configuration (int)

  • N – number of repetitions

  • m – dimensions of the matrices

  • n – dimensions of the matrices

  • k – dimensions of the matrices

  • lda – leading dimension of A

  • ldb – leading dimension of B

  • ldd – leading dimension of the result

Returns:

metrics in a dictionary

cuda_monitor

onnx_extended.validation.cuda.cuda_monitor.cuda_version() int

Returns the CUDA version the project was compiled with.

onnx_extended.validation.cuda.cuda_monitor.nvml_device_get_count() int

Returns the number of GPU units.

onnx_extended.validation.cuda.cuda_monitor.nvml_device_get_memory_info(device: SupportsInt = 0) tuple

Returns the free memory, the used memory, the total memory for a GPU device.

onnx_extended.validation.cuda.cuda_monitor.nvml_init() None

Initializes memory managment from nvml library.

onnx_extended.validation.cuda.cuda_monitor.nvml_shutdown() None

Closes memory managment from nvml library.