Is Mixing OpenMP® Runtimes Safe?

Note that Betteridge's law applies!

Oct 31, 2023

Introduction

This post is inspired by a Stack Overflow question (here) about compiling OpenMP code with both GCC and LLVM (clang) and then linking it against both runtime libraries.

The questioner reasonably asks whether this is a safe thing to do. Here I will demonstrate that (as you may expect from my invocation of Betteridge’s law), it is not!

What is the Problem?

Suppose I have a large code that uses some libraries, and both the code and the libraries use OpenMP. It may be that the libraries are only available in binary form, and have been compiled with GCC, while I have chosen to use LLVM for my code.

As a result I seem to need to have both the LLVM OpenMP runtime library (libomp) and the GCC one (libgomp) linked into the final executable (or, loaded as dynamic libraries). That appears to be necessary, since although OpenMP is a portable application programming interface, it does not specify a binary interface for the code generated by the compiler for OpenMP constructs.

As a simple example, consider this trivial program, which we’ll use throughout1:-

#include <stdio.h>
#include <omp.h>

int main (int argc, char **argv)
{
  printf ("Outside parallel: omp_in_parallel() = %s, omp_max_threads() = %d\n", 
           omp_in_parallel() ? "True":"False", 
           omp_get_max_threads());
#pragma omp parallel
  {
    printf ("Inside parallel: omp_in_parallel() = %s, I am thread %d\n", 
            omp_in_parallel() ? "True":"False", omp_get_thread_num());
  }
  return 0;
}

If we compile it with gcc -c -fopenmp and look at the external symbols in the object file, we see this

% nm mixedLibs.o | grep U
                 U _GOMP_parallel
                 U _omp_get_max_threads
                 U _omp_get_thread_num
                 U _omp_in_parallel
                 U _printf

whereas if we compile it with clang -c -fopenmp we see this set of symbols.

% nm mixedLibs.o | grep U
                 U ___kmpc_fork_call
                 U _omp_get_max_threads
                 U _omp_get_thread_num
                 U _omp_in_parallel
                 U _printf

We can see that, as you’d expect, the named interface functions from OpenMP are the same (_omp_get_max_threads, _omp_get_thread_num, _omp_in_parallel), but that the two compilers have a different interface to their runtimes for the call required to enter a parallel region.

It therefore looks as if a program that mixes OpenMP code which has been compiled by the two compilers will need both runtimes to be available. Which brings us back the original question of whether doing that is safe.

The Answer is “No”

Consider this example in which I compile and link the code above with clang, while explicitly forcing the gcc OpenMP runtime to be dynamically linked in, (because I am pretending I have some other code that needs it), like this:-

% clang -fopenmp mixedLibs.c -L/opt/homebrew/opt/gcc/lib/gcc/current\
        -lgomp

What happens when I run the code?

% ./a.out
Outside parallel: omp_in_parallel() = False, omp_max_threads() = 8
Inside parallel: omp_in_parallel() = False, I am thread 0
Inside parallel: omp_in_parallel() = False, I am thread 0
Inside parallel: omp_in_parallel() = False, I am thread 0
Inside parallel: omp_in_parallel() = False, I am thread 0
Inside parallel: omp_in_parallel() = False, I am thread 0
Inside parallel: omp_in_parallel() = False, I am thread 0
Inside parallel: omp_in_parallel() = False, I am thread 0
Inside parallel: omp_in_parallel() = False, I am thread 0
%

That is clearly utterly broken!

What is Going On?

As we saw above, the OpenMP interface functions are not mangled or otherwise messed around with by the compiler, and will, therefore, be present with the same names (and interfaces) in the runtime libraries from both compilers. However, the compiler generated interface functions are different.

So, what’s going on here is that when the dynamic linker looks for the omp_* functions it finds them in the first OpenMP runtime it searches, which will be GCC’s libgomp, however when it calls the library to enter the parallel region, libgomp has no definition for ___kmpc_fork_call, so the one in LLVM’s libomp will be called.

Therefore the internal library state being queried by the omp_* functions is that from libgomp, but as far as libgomp is aware the code has not gone parallel, since no call to _GOMP_parallel has occurred.

The result is the utter brokenness which you can see in the execution. Although the code is executing in parallel, it doesn’t think it is.

Is There Any Way Around This?

For simple OpenMP codes, there is a chance that you can get around this, because the LLVM runtime (libomp) provides many of the GCC interfaces. So if we do the same test as above, but using gcc to compile and link while forcing the LLVM runtime to be present, we’ll see this:-

% gcc-13 -fopenmp mixedLibs.c -L/opt/homebrew/opt/llvm/lib -lomp
$ ./a.out
Outside parallel: omp_in_parallel() = False, omp_max_threads() = 8
Inside parallel: omp_in_parallel() = True, I am thread 0
Inside parallel: omp_in_parallel() = True, I am thread 2
Inside parallel: omp_in_parallel() = True, I am thread 3
Inside parallel: omp_in_parallel() = True, I am thread 6
Inside parallel: omp_in_parallel() = True, I am thread 7
Inside parallel: omp_in_parallel() = True, I am thread 5
Inside parallel: omp_in_parallel() = True, I am thread 4
Inside parallel: omp_in_parallel() = True, I am thread 1
% otool -L a.out
a.out:
	/opt/homebrew/opt/llvm/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
	/opt/homebrew/opt/gcc/lib/gcc/current/libgomp.1.dylib (compatibility version 2.0.0, current version 2.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3)

You can see that the code ran correctly, despite both runtimes being present. Of course, this remains dangerous, since both libraries are linked in.

Better is simply to link only with the LLVM runtime, since then if there are missing symbols which GCC compiled code requires, but which are not present, you’ll find out.

% gcc-13 -c -fopenmp mixedLibs.c 
% clang -fopenmp mixedLibs.o
% ./a.out
Outside parallel: omp_in_parallel() = False, omp_max_threads() = 8
Inside parallel: omp_in_parallel() = True, I am thread 0
Inside parallel: omp_in_parallel() = True, I am thread 4
Inside parallel: omp_in_parallel() = True, I am thread 3
Inside parallel: omp_in_parallel() = True, I am thread 6
Inside parallel: omp_in_parallel() = True, I am thread 5
Inside parallel: omp_in_parallel() = True, I am thread 7
Inside parallel: omp_in_parallel() = True, I am thread 2
Inside parallel: omp_in_parallel() = True, I am thread 1
% otool -L a.out
a.out:
	/opt/homebrew/opt/llvm/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

Of course, if people are so foolish as to statically link an OpenMP runtime into some library, there’s nothing you can do about that, other than invoke $deity.

One, somewhat unpleasant, possibility if you have a dynamic shared library that has been linked agains libgomp, is to provide a symbolic link from the appropriate libgomp name to the appropriate libomp. Since the dynamic linker looks for files by name, it can be deceived by such a link. This is clearly a somewhat dangerous approach, though!

What Did We Learn?

Mixing OpenMP runtimes in the same program can easily cause fatal breakage, so you shouldn’t do it.

At least for non-offload codes, the LLVM runtime tries to provide the interfaces required by GCC compiled OpenMP code, so that you need only link with libomp.

Environment

I am running all of these examples on a Mac M1, using recently installed compilers from homebrew. “gcc version 13.2.0 (Homebrew GCC 13.2.0) “ and “Homebrew clang version 17.0.3”.

If you are on Linux you will need to use ldd, rather than otool to investigate which libraries are being used.

The library paths you need to find the appropriate OpenMP from the different compilers will obviously depend on the configuration of your machine.

I know. it shouldn’t be doing printfs from inside a parallel region without guarding them, but that is not the point here!

CPU fun

Discussion about this post