Author Archive

Image warps and other distortions are significantly more complicated than simple image processing techniques such as convolution. This tutorial will cover how to twist an image in the center. This exact code can be modified to do twists or other types of image warps. Continue reading ‘Image twist and swirl algorithm’ »

Understanding the basic memory architecture of whatever system you’re programming for is necessary to create high performance applications. Most desktop systems consist of large amounts of system memory connected to a single CPU, which may have 2 or three levels or fully coherent cache. Before you get started with CUDA, you should read this to understand the basic memory hierarchy of modern CUDA capable compute devices. Continue reading ‘CUDA Memory and Cache Architecture’ »

Using SSE to process images or video is essential to achieving good performance. Most popular multimedia applications use SSE to greatly accelerate application performance. Unfortunately, like everything in life, if SSE is used incorrectly it can actually perform worse than non-SSE code. This article will take you through some code and discuss the performance of each. Continue reading ‘Image Processing with SSE’ »

If you use Microsoft’s Visual Studio to develop your applications, chances are you either have the express or professional editions, which are free or $549 respectively. Unfortunately, neither of these editions comes with a code profiler! Instead, if you want to use a built-in code profiler for Visual Studio out of the box, you’ll need to have either the premium or ultimate edition for $5,469 or $11,899 respectively. No joke! Luckily, you don’t need to use Visual Studio’s built-in profiler to effectively and easily profile your code.

Continue reading ‘How to profile C++ code in Visual Studio for free’ »

Searching is a common task in computer science, and fortunately, it is also perfectly suited for CUDA. For this article, we’re talking about searching through an unsorted text file for a specific word or phrase. For example, if you have a 50 megabyte text file open in Microsoft Visual Studio, you’re sure to notice that searching for a word can take several seconds, which is more than any person wants to wait just to find a word in a document. This article will demonstrate a simple kernel which can perform simple string matches.

Continue reading ‘Search algorithm with CUDA’ »

Taking an image and making it look like an oil painting is not only visually impressive, but also easy, from an algorithmic point of view. This page will show you how to write code to achieve the oil painting effect.

Continue reading ‘Oil Painting Algorithm’ »

Unlike most programming languages, CUDA is coupled very closely together with the hardware implementation. While x86 processors have not changed very much over the past 10 years, CUDA hardware has had a significant change in architecture several times. First, the introduction of CUDA with the 80 series, followed shortly by the 200 series, and now nVidia has begun selling cards in the 400 series, namely the GTX 480 and GTX 470.

Continue reading ‘Optimizing CUDA programs for GTX 400 series’ »

Taking the square root of a floating point number is essential in many engineering applications. Whether you are doing nBody simulations, simulating molecules, or linear algebra, the ability to accurately and quickly perform thousands or even millions of square root operations is essential. Unfortunately, the square root functions on most CPUs are very time consuming, even with specialized SSE instructions. Fortunately enough, GPUs have specialized hardware to perform such square root operations extremely fast. CUDA, NVidia’s solution to extremely high performance parallel computing, puts the onboard specialized hardware to full use, and easily outperforms modern Intel or AMD CPUs by a factor of over a hundred.

Continue reading ‘Performance of sqrt in CUDA’ »

Image convolution is the most vital image processing algorithm available. Using simple 2-D convolution, you can blur, sharpen, emboss, and even detect edges in an image. Not only is convolution so powerful, but it is also very easy to perform. Simply put, the value of a modified pixel is determined solely by it’s original value summed up with weighted values of it’s neighboring pixels. After the weighted sum is completed, a division takes place to normalize the value of the pixel, usually so that the brightness of the image remains the same. Sometimes, an offset can be added after the normalization for certain effects. Continue reading ‘Image Convolution with GDI+’ »

This tutorial shows you how to download an HTML page, or any other type of web page, using C++ or C. This tutorial is only applicable for Windows programs, since the methods described here utilize a library written for Windows only. In this tutorial, we will be calling a function which will read a webpage, and save it to a file. After the file is created and saved, we can proceed to read that file through standard methods. At first glance, it may seem like this method is very inefficient, since hard drive accesses take a long time. But in actuality, the vast majority of the performance penalty will be from downloading the web page from the internet. Since the we read the file directly after creating it, you can be assured that the file is in cache, so there won’t be such a performance hit.

Step 1: Include and link the appropriate library

#include <urlmon.h>

Aside from including the library header file, you will need to link the urlmon.lib. To do this, right click on your project in the solution explorer windows, and select Properties from the pop-up menu. Go to the Configuration Properties -> Linker -> Input window. In the “Additional Dependencies” field, type urlmon.lib and press enter. Apply your changes, and close the project properties window.

Step 2: Choose Unicode or ASCII for your project

There are two types of character sets that can be used in an application. The first, ASCII, has only 8 bits, or 1 byte, per character. ASCII is often considered outdated, but is much simpler to deal with. Unicode uses more 16 bits per character, which facilitates muli-lingual programs. There are two sets of functions in the urlmon library, one set of functions is for ASCII, and the other set of functions is for Unicode. I have set the project in this tutorial to compile with the ASCII character set. You may choose to use Unicode, of course, but it just important that you know what character set your project is set to compile. To find out, open up the project properties window, and go to the Configuration Properties -> General window. Notice what the “Character Set” field is set to. “Not Set” corresponds to using the ASCII character set.

Step 3: Download the web page to a file

To download the web page, simply use the URLDownloadToFile function. This function returns an HRESULT error code, which is really just a long. When dealing with HRESULTs, just keep in mind that zero is returned as success. Therefore, it is always best to explicitly use the error code definitions, such as S_OK for success.

char webAddress[256];
char szFileName[80] = "result.html";

cout << "Please enter web address: ";	// example: https://supercomputingblog.com
cin >> webAddress;

HRESULT hr = URLDownloadToFile(NULL, webAddress, szFileName,0, NULL);
if (hr == S_OK)
{
	cout << "Success!\n";
	// Open the file and print it to the console window
	// Since the file was just written, it should still be in cache somewhere.
	ifstream fin(szFileName);
	char szBuff[2048];
	while(fin.getline(szBuff, 2048))
	{
		cout << szBuff << "\n";
	}
}
else
{
	cout << "Operation failed with error code: " << hr << "\n";
}

Download the source code

You can download the source code for this tutorial here