Skip to content

Documentation teachcompute 0.2.0

Documentation teachcompute 0.2.0

Lectures

Introduction
Build
Collections d’articles périssables
Code inclus dans cette librairie

Exercices

Exposé
Notebooks sur Spark
Parallelization of a vector sum with C++
Tensor manipulations with CUDA
Parallelization of a dot product
Parallelization, Matrix Calculation
- Compares filtering implementations (numpy, cython)
Parallelization with processes
- Parallelization of a dot product with processes (joblib)
- Parallelization of a dot product with processes (concurrent.futures)
pytorch
- Compares implementations for a Piecewise Linear
- Export a LLAMA model into ONNX

Compléments

En diagonal
License
Change Logs

2024-06-01: GEMM¶

Articles:

How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
NVIDIA_SGEMM_PRACTICE
CUTLASS: Fast Linear Algebra in CUDA C++
Understanding Latency Hiding on GPUs

2024-05-31: Feuille de route 2023-2024 (3A)

2025-05-31: Feuille de route 2024-2025 (3A)

Copyright © 2023-2024, Xavier Dupré

Made with Sphinx and @pradyunsg's Furo