In the previous tutorial, intro to image processing with CUDA, we examined how easy it is to port simple image processing functions over to CUDA. In this tutorial, we’ll be going over a substantially more complex algorithm, and how to port it to CUDA with incredible ease. Continue reading ‘Advanced Image Processing with CUDA’ »
Posts tagged ‘Cache’
Understanding the basic memory architecture of whatever system you’re programming for is necessary to create high performance applications. Most desktop systems consist of large amounts of system memory connected to a single CPU, which may have 2 or three levels or fully coherent cache. Before you get started with CUDA, you should read this to understand the basic memory hierarchy of modern CUDA capable compute devices. Continue reading ‘CUDA Memory and Cache Architecture’ »
High level languages such as C, C++, C#, FORTRAN, and Java all do a great job of abstracting the hardware away from the language. This means that programmers generally don’t have to worry about how the hardware goes about executing their program. However, in order to get the maximum amount of performance out of your programs, it is necessary to start thinking about how the hardware is actually going to execute your program.