Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 222222
Loughborough University

IT Services : High Performance Computing

OpenMP Implementation


Implementation

In this simple C++ example assigning random values to an array that is shared by all threads is farmed out in parallel to all threads.

#include <iostream>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <omp.h>

using namespace std;

int main(int argc, char **argv) { 
  const unsigned int s = 100; // 100 elements over all threads.
  vector<double> v(s); // Allocate 100 elements 
 
  /* 
   * Parallel for loop over all threads, which are distributed over
   * num_threads threads of execution automatically as each calculation is
   * independent.
   */
#pragma omp parallel for
  for (int i = 0; i < s; i++)
    generate(v.begin(), v.end(), drand48);
  
  return 0;
}     
      

Note, when writing programs you should avoid too many runs of a #pragma omp parallel for to avoid excessive thread start and stop overhead.

For example, the following is not ideal:

for (int i = 0; i < 1000; i++)
#pragma omp parallel for
  for (int j = 0; j < 1000; j++) {
    // do stuff
  }
      

In preference to the above the algorithm should either be restructured or an enclosing #pragma omp parallel be used outside the outer loop to force creation of the team of threads that can then be used by a #pragma omp for statement. More care has to be taken with variables in this case, and what is shared between threads.

#pragma omp parallel
{
for (int i = 0; i < 1000; i++)
#pragma omp for
  for (int j = 0; j < 1000; j++) {
    // do stuff
  }
}
      

OpenMP 4.0

OpenMP 4.0 provides extensions which provide pragmas for vector computation. See Vectorisation for a discussion of vectorisation.