CUDA is an extension of C, and designed to let you do general purpose computation on a graphics processor. GPUs often far surpass the computational speed of even the fastest modern CPU today. If you have an application that does a large number of computations, then CUDA may be the most practical way to get extremely high perforance out of your application.
- What is CUDA? An Introduction. This article gives a brief introduction as to exactly what CUDA is.
- CUDA Memory and Cache Architecture. This article gives a basic explanation of what the memory and cache hierarchy is for modern Fermi architecture GPUs.
- Practical Applications for CUDA. This article gives a number of applications which have already been very successful using CUDA.
- CUDA – The Basics. Learn about the basics of CUDA from a programming perspective. If you’re completely new to programming with CUDA, this is probably where you want to start.
- CUDA – Tutorial 1 – Getting Started. This tutorial helps point the way to you getting CUDA up and running on your computer, even if you don’t have a CUDA-capable nVidia graphics chip.
- CUDA – Tutorial 2 – The Kernel. This tutorial explains exactly what a kernel is, and why it is so essential to CUDA programs.
- CUDA – Tutorial 3 – Thread Communication. This tutorial explains how to use shared or global memory in order to have different threads communicate data with each other.
- CUDA – Tutorial 4 – Atomic operations. This tutorial explains how to use atomic operations with CUDA, and how it can affect program performance.
- CUDA – Tutorial 5 – Performance of atomic operations. This tutorial demonstrate how to use, and how not to use atomic operations.
- CUDA – Tutorial 6 – Simple linear search with CUDA. This simple tutorial shows you how to perform a linear search with an atomic function.
- CUDA – Tutorial 7 – Image Processing with CUDA. This tutorial shows how incredibly easy it is to port CPU only image processing code to CUDA.
- CUDA – Tutorial 8 – Advanced Image Processing with CUDA. This tutorial shows a more advanced image processing algorithm which requires substantial memory per thread.