This tutorial shows you how to download an HTML page, or any other type of web page, using C++ or C. This tutorial is only applicable for Windows programs, since the methods described here utilize a library written for Windows only. In this tutorial, we will be calling a function which will read a webpage, and save it to a file. After the file is created and saved, we can proceed to read that file through standard methods. At first glance, it may seem like this method is very inefficient, since hard drive accesses take a long time. But in actuality, the vast majority of the performance penalty will be from downloading the web page from the internet. Since the we read the file directly after creating it, you can be assured that the file is in cache, so there won’t be such a performance hit.
Step 1: Include and link the appropriate library
#include <urlmon.h>
Aside from including the library header file, you will need to link the urlmon.lib. To do this, right click on your project in the solution explorer windows, and select Properties from the pop-up menu. Go to the Configuration Properties -> Linker -> Input window. In the “Additional Dependencies” field, type urlmon.lib and press enter. Apply your changes, and close the project properties window.
Step 2: Choose Unicode or ASCII for your project
There are two types of character sets that can be used in an application. The first, ASCII, has only 8 bits, or 1 byte, per character. ASCII is often considered outdated, but is much simpler to deal with. Unicode uses more 16 bits per character, which facilitates muli-lingual programs. There are two sets of functions in the urlmon library, one set of functions is for ASCII, and the other set of functions is for Unicode. I have set the project in this tutorial to compile with the ASCII character set. You may choose to use Unicode, of course, but it just important that you know what character set your project is set to compile. To find out, open up the project properties window, and go to the Configuration Properties -> General window. Notice what the “Character Set” field is set to. “Not Set” corresponds to using the ASCII character set.
Step 3: Download the web page to a file
To download the web page, simply use the URLDownloadToFile function. This function returns an HRESULT error code, which is really just a long. When dealing with HRESULTs, just keep in mind that zero is returned as success. Therefore, it is always best to explicitly use the error code definitions, such as S_OK for success.
char webAddress[256]; char szFileName[80] = "result.html"; cout << "Please enter web address: "; // example: http://supercomputingblog.com cin >> webAddress; HRESULT hr = URLDownloadToFile(NULL, webAddress, szFileName,0, NULL); if (hr == S_OK) { cout << "Success!\n"; // Open the file and print it to the console window // Since the file was just written, it should still be in cache somewhere. ifstream fin(szFileName); char szBuff[2048]; while(fin.getline(szBuff, 2048)) { cout << szBuff << "\n"; } } else { cout << "Operation failed with error code: " << hr << "\n"; }