validation.cuda¶
C API¶
cuda_example_py¶
- teachcompute.validation.cuda.cuda_example_py.cuda_device_count() int¶
Returns the number of cuda devices.
- teachcompute.validation.cuda.cuda_example_py.cuda_device_memory(device: int = 0) tuple¶
Returns the free and total memory for a particular device.
- teachcompute.validation.cuda.cuda_example_py.cuda_devices_memory() list¶
Returns the free and total memory for all devices.
- teachcompute.validation.cuda.cuda_example_py.cuda_version() int¶
Returns the CUDA version the project was compiled with.
- teachcompute.validation.cuda.cuda_example_py.get_device_prop(device_id: int = 0) dict¶
Returns the device properties.
- teachcompute.validation.cuda.cuda_example_py.vector_add(v1: numpy.ndarray[numpy.float32], v2: numpy.ndarray[numpy.float32], cuda_device: int = 0, repeat: int = 1) numpy.ndarray[numpy.float32]¶
Computes the additions of two vectors of the same size with CUDA.
- Paramètres:
v1 – array
v2 – array
repeat – number of times to repeat the addition
cuda_device – device id (if mulitple one)
- Renvoie:
addition of the two arrays
- teachcompute.validation.cuda.cuda_example_py.vector_sum0(vect: numpy.ndarray[numpy.float32], max_threads: int = 256, cuda_device: int = 0) float¶
Computes the sum of all coefficients with CUDA. Naive method.
- Paramètres:
vect – array
max_threads – number of threads to use (it must be a power of 2)
cuda_device – device id (if mulitple one)
- Renvoie:
sum
- teachcompute.validation.cuda.cuda_example_py.vector_sum6(vect: numpy.ndarray[numpy.float32], max_threads: int = 256, cuda_device: int = 0) float¶
Computes the sum of all coefficients with CUDA. More efficient method.
- Paramètres:
vect – array
max_threads – number of threads to use (it must be a power of 2)
cuda_device – device id (if mulitple one)
- Renvoie:
sum
- teachcompute.validation.cuda.cuda_example_py.vector_sum_atomic(vect: numpy.ndarray[numpy.float32], max_threads: int = 256, cuda_device: int = 0) float¶
Computes the sum of all coefficients with CUDA. Uses atomicAdd
- Paramètres:
vect – array
max_threads – number of threads to use (it must be a power of 2)
cuda_device – device id (if mulitple one)
- Renvoie:
sum
cuda_gemm¶
- teachcompute.validation.cuda.cuda_gemm.matmul_v1_cuda(n_rows1: int, n_cols1: int, A: int, n_rows2: int, n_cols2: int, B: int, C: int, transA: bool = False, transB: bool = False) int¶
Naive Implementation doing a Matrix Multplication supporting transposition on CUDA.
- Paramètres:
n_rows1 – number of rows for A
n_cols1 – number of rows for A
A – pointer on CUDA
n_rows2 – number of rows for B
n_cols2 – number of rows for B
B – pointer on CUDA
C – allocated pointer on CUDA
transA – A needs to be transposed?
transB – B needs to be transposed?
- teachcompute.validation.cuda.cuda_gemm.matmul_v2_cuda(n_rows1: int, n_cols1: int, A: int, n_rows2: int, n_cols2: int, B: int, C: int, transA: bool = False, transB: bool = False) int¶
Naive Implementation with tiles doing a Matrix Multplication supporting transposition on CUDA.
- Paramètres:
n_rows1 – number of rows for A
n_cols1 – number of rows for A
A – pointer on CUDA
n_rows2 – number of rows for B
n_cols2 – number of rows for B
B – pointer on CUDA
C – allocated pointer on CUDA
transA – A needs to be transposed?
transB – B needs to be transposed?
- teachcompute.validation.cuda.cuda_gemm.matmul_v3_cuda(n_rows1: int, n_cols1: int, A: int, n_rows2: int, n_cols2: int, B: int, C: int, transA: bool = False, transB: bool = False) int¶
Implementation doing a Matrix Multplication supporting transposition on CUDA. It proceeds by blocks within tiles.
- Paramètres:
n_rows1 – number of rows for A
n_cols1 – number of rows for A
A – pointer on CUDA
n_rows2 – number of rows for B
n_cols2 – number of rows for B
B – pointer on CUDA
C – allocated pointer on CUDA
transA – A needs to be transposed?
transB – B needs to be transposed?
cuda_monitor¶
- teachcompute.validation.cuda.cuda_monitor.cuda_version() int¶
Returns the CUDA version the project was compiled with.
- teachcompute.validation.cuda.cuda_monitor.nvml_device_get_count() int¶
Returns the number of GPU units.
- teachcompute.validation.cuda.cuda_monitor.nvml_device_get_memory_info(device: int = 0) tuple¶
Returns the free memory, the used memory, the total memory for a GPU device.