Adventures in Applied Math: Supercomputing Course: OpenMP Syntax and Directives, Part I

Sunday, December 14, 2008

Supercomputing Course: OpenMP Syntax and Directives, Part I

The OpenMP standard covers the C, C++, and Fortran programming languages. There is one syntax for C and C++, and another for Fortran. In this tutorial, I will focus on the C/C++ syntax.*

The basic format of an OpenMP directive is as follows:
#pragma omp directive-name [clause] newline

(Incidentally, the word pragma is Greek for "thing.") If you have a compiler that doesn't understand OpenMP, it will simply ignore the pragma lines, making it possible to use the same code file for both OpenMP-parallelized and serial code.**

A few more points about the syntax:

Note the newline at the end of the directive line. We will see some pragma that require braces around the code to be parallelized. You must put the beginning curly brace on the line after the directive, not on the same line as the directive like many people do in the case of for loops.
The pragma are case sensitive, and follow all the standard C/C++ pragma rules.
If your directive is really long, you can continue it on the next line by putting a backslash as the final character in the previous line.

With that out of the way, let's talk about the most important directive, the parallel directive. Its basic syntax is as follows:
#pragma omp parallel private(list) shared(list)
{
/* put parallel code here */
}

The parallel directive is used to create a block of code that is executed by multiple threads. The code in brown are optional arguments to the directive -- private and shared variables, which we'll talk about later.

Here is a sample code -- the OpenMP equivalent of "Hello World":

#include <stdio.h>
#include <omp.h>
int main (int argc, char *argv[]) {
  int tid;
  printf("Hello world from threads:\n");
  #pragma omp parallel private(tid)
  {
    tid = omp_get_thread_num();
    printf("(%d)\n", tid);
  }
  printf("I am sequential now\n");
  return 0;
}

Suppose we run this program using four threads. What will be the output of this example?

Hello world from threads:
(0)
(1)
(2)
(3)
I am sequential now

may be the answer you're counting on, but it may not be what you receive. The output

Hello world from threads:
(3)
(1)
(0)
(2)
I am sequential now

is just as valid and just as likely. The point being, the operating system schedules the threads, and you should not count on them executing in any particular order.

The most common directive other than the parallel section directive (which you must use in every program to contain the parallel section of code) is the parallel for directive. The syntax is as follows:

#pragma omp for schedule(type [,chunk]) \
private(list) shared(list) nowait
{
/* for loop */
}

where type = {static, dynamic, guided, runtime}. If nowait is specified, then the threads won't wait around for all threads to finish up the for loop before moving on to the next thing.

The default scheduling is implementation dependent. The scheduling choices are somewhat intuitive in meaning. Static scheduling simply means that the ID of the thread that is assigned a given iteration is a function of the iteration number and the number of threads, and is determined at the beginning of the loop. This is a simple way to assign iterations, but it could be a bad idea if different iterations have different amounts of work to them. For example, if every fourth iteration has extra work, and we have four threads, then a certain thread will be assigned that work-laden iteration every time, and we will have a load imbalance.

In that case, the dynamic scheduling algorithm may be a better choice. The threads are assigned iterations of the loop over the course of the execution of the loop, in a round robin fashion. So, a thread that is idle gets the next piece of work to do, meaning that threads are assigned work based on their ability to complete the work. In this way, load balance can occur.

You are probably wondering how to use the parallel for construct, but this entry has gotten very long so I will stop here for now. Next time, we will discuss the parallel for further, and I will provide an example.

* This is because as a computer scientist, I think that C and C++ are pretty nice languages, and Fortran, which in a just world would be nothing more than a fascinating relic of the past, is an undead zombie language that refuses to die.

** This is true for OpenMP directives, but if you use any of the runtime library routines, you will need to do a little more to make things work. We will talk about this when we talk about runtime library routines.

1 comment:

Unknown said...: Darling, I love your open MP course. I need to know, but I don't yet, and you are helping.

I will, however, take issue with your assessment of Fortran's black sheep status. Have you seen the newest 2003 standard? It has taken a leap past C++ in terms of object orientation. I'm not an expert on such matters, but I am starting to incorporate, and I am having success. It looks cool too.; 12/16/2008 11:51 AM