Posts tagged ‘MPI’

As with any parallel program, there is an overhead associated with the amount of time threads spend communicating with each other, and waiting for each other to finish. This means that parallel programs are often less efficient than serial programs. However, in most applications, we care about performance, or wall time performance. A computer user typically only cares how fast his or her program runs, not how efficient the program is. In this article, we will be examining tangible numbers dealing with the overhead which can be associated with MPI programs.

It only makes sense to parallelize a program or part of a program only when it makes sense to do so. But when is there enough computations to justify spending your valuable time coding a parallel program? The answer, of course, depends on the nature of the computations you will be performing.

For this article, a program started with a large array with a maximum number of 100,000,000 integer elements. We will be running two tests:

array[i] = i*i
array[i] = i*i - (int)sqrt((double)i);

For different array sizes, the program measures the amount of time it takes to complete all the computations. The program was run with just 1 thread, and again with two threads on a dual core processor. Unforunately, I do not have access to a quad core computer.

mpi_overhead

As you can see in the graph above, the results are quite interesting. When doing something as simple as a multiplication, 1 thread actually finishes faster than 2 threads. This is because the computation is so simple and fast that the amount of overhead is enough to make the program slower when run on more than 1 thread. However, the more complicated and time consuming calculation i*i – sqrt(i) tells a different story. Because square root is a relatively slow operation, the amount of overhead is much less than before when compared percentage wise. As you can see with the more complicated calculation is that the program efficiency increases when parallelizing over a larger dataset, which makes sense.

There are many sources of overhead in MPI programs, including the time wasted by blocking operations such as MPI_Send, MPI_Recv, MPI_Barrier, and the actual communication itself. In this example, the two cores were on the same substrate, so communication was very fast. However, the overhead of communication is hundreds of even thousands of times higher when transferring data over a network between threads running on different physical computers. So before you decide to parallelize a part of your program by writing with MPI or any other parallel language, it’s best to make sure that the benefit of parallelization outweighs the added overhead of thread communication. As you can see above, the more complicated and time consuming your calculations are, the more efficiently your program will run across multiple cores. Otherwise, you might need a very large dataset in order to receive any benefit from paralellization. Feel free to use code similar to that below to run a sanity check before making a full scale program.

for (int i=nTasks*4; i < C_MAX_ARRAY_SIZE; i += (C_MAX_ARRAY_SIZE >> 4))
{
	int numOfElems = i/nTasks;
	int startIndex = numOfElems * rank;
	double startTime = MPI_Wtime();
	for (int j=startIndex; j < startIndex + numOfElems; j++)
	{
		g_testArray[j] = j*j - sqrt((double)j);	// do a simple computation
	}
	// each thread needs to send results to thread 0
	if (rank == 0)
	{
		// The master thread will need to receive all computations from all other threads.
		MPI_Status status;
		for (int j=1; j < nTasks; j++)
		{
			MPI_Recv(&g_testArray[j*numOfElems], numOfElems, MPI_INT, j,3, MPI_COMM_WORLD, &status);
		}
	}
	else
	{
		MPI_Send(&g_testArray[startIndex], numOfElems, MPI_INT, 0, 3, MPI_COMM_WORLD);
	}
	double endTime = MPI_Wtime();
	if (rank == 0) printf("i = %d    time = %f\n", i, (float)(endTime - startTime));
}

Until now, we have only talked about synchronous, blocking communications in MPI. This tutorial will focus on Asynchronous, non-blocking communication with MPI. Asynchronous communication is often the key to achieving high performance computing with MPI applications. Using asynchronous communication has several advantages. Continue reading ‘MPI – Tutorial 5 – Asynchronous communication’ »

In our previous tutorial, Thread Communication with MPI, we covered the basics of how to send data between threads. However, in the previous tutorial, only integers were sent. However, sometimes large amounts of data need to be sent between threads.

Continue reading ‘Sending large datasets in MPI’ »

Welcome to the thread communication with MPI tutorial. If you’re new to MPI, I suggest you go back and read the previous tutorials first. Otherwise, continue on to learn basic thread communication with MPI!

Continue reading ‘Thread Communication with MPI’ »

There are two ways to go about debugging MPI applications using Microsoft Visual Studio. There’s the easy way, and the hard way. Unfortunately for many, the easy way is only available for the professional versions of Microsoft Visual Studio. If you right click on your project, select properties, select the debugging page, you can simply choose to use the MPI Cluster Debugger. If you don’t have a professional version of visual studio, don’t panic, you can still debug your applications.

Continue reading ‘Debugging an MPI application with Microsoft Visual Studio’ »

Hello, and welcome to the first tutorial in my MPI tutorial series! This tutorial will work fine for Visual Studio 2005 Express, Visual Studio 2005 Standard, Visual Studio 2008 Express, and Visual Studio 2008 Standard edition. Much of this tutorial can also be applied to the professional versions of Visual Studio, with some minor differences.

Continue reading ‘Getting Started with MPI using Visual Studio 2008 Express’ »

MPI, or Message Passing Interface, is a library of functions for C,C++, and even Fortran. The purpose of this library is for you to easily create an application that fully utilizes multiple processors. Not only can you utilize multiple processors with MPI, but you can even run a single instance of a single application across multiple computers with MPI. Continue reading ‘What is MPI?’ »