Welcome to the second tutorial in how to write high performance CUDA based applications. This tutorial will cover the basics of how to write a kernel, and how to organize threads, blocks, and grids. For this tutorial, we will complete the previous tutorial by writing a kernel function. The goal of this application is very simple. The idea is to take two arrays of floating point numbers, and perform an operation on them and store the result in a third floating point array. We will then study how fast the code executes on a CUDA device, and compare it to a traditional CPU. The data analysis will take place toward the end of the article. Continue reading ‘CUDA – Tutorial 2 – The Kernel’ »
Posts tagged ‘Basic’
Welcome to the first tutorial for getting started programming with CUDA. This tutorial will show you how to do calculations with your CUDA-capable GPU. Any nVidia chip with is series 8 or later is CUDA -capable. This tutorial will also give you some data on how much faster the GPU can do calculations when compared to a CPU. Continue reading ‘CUDA – Tutorial 1 – Getting Started’ »
Welcome to my tutorial on the very basics of OpenMP. OpenMP is a powerful and easy tool which makes multi-threaded programming very easy. If you would like your program to run faster on dual, or quad core computers, then your project may be very well suited to OpenMP. Continue reading ‘OpenMP tutorial – the basics’ »
Welcome to the first article in a series of tutorials to teach you the basics of using CUDA. These tutorials will teach you, in a user-friendly way, how CUDA works, and how to take advantage of the massive computational ability of modern GPUs.