This tutorial will be discussing how different threads can communicate with each other. In the previous tutorial, each thread operated without any interaction or data dependency from other threads. However, most parallel algorithms require some amount of data to be communicated between threads. Continue reading ‘CUDA – Tutorial 3 – Thread Communication’ »
Posts tagged ‘HPC’
Virtually all useful programs have some sort of loop in the code, whether it is a for, do, or while loop. This is especially true for all programs which take a significant amount of time to execute. Much of the time, different iterations of these loops have nothing to do with each other, therefore making these loops a prime target for parallelization. OpenMP effectively exploits these common program characteristics, so it is extremely easy to allow an OpenMP program to use multiple processors simply by adding a few lines of compiler directives into your source code. Continue reading ‘Tutorial – Parallel For Loops with OpenMP’ »
Welcome to the second tutorial in how to write high performance CUDA based applications. This tutorial will cover the basics of how to write a kernel, and how to organize threads, blocks, and grids. For this tutorial, we will complete the previous tutorial by writing a kernel function. The goal of this application is very simple. The idea is to take two arrays of floating point numbers, and perform an operation on them and store the result in a third floating point array. We will then study how fast the code executes on a CUDA device, and compare it to a traditional CPU. The data analysis will take place toward the end of the article. Continue reading ‘CUDA – Tutorial 2 – The Kernel’ »
Welcome to my tutorial on the very basics of OpenMP. OpenMP is a powerful and easy tool which makes multi-threaded programming very easy. If you would like your program to run faster on dual, or quad core computers, then your project may be very well suited to OpenMP. Continue reading ‘OpenMP tutorial – the basics’ »
CUDA stands for Compute Unified Device Architecture, and is an extension of the C programming language and was created by nVidia. Using CUDA allows the programmer to take advantage of the massive parallel computing power of an nVidia graphics card in order to do general purpose computation. Continue reading ‘What is CUDA? An Introduction’ »