false-sharing-openmp

False sharing (OpenMP)

False sharing is a performance problem that occurs when two threads write to different variables that happen to occupy the same cache line. Cache lines are typically 64 bytes wide. The hardware cache-coherence protocol must keep that line consistent across cores, so every write by one thread invalidates the cached copy in the other and forces a round-trip to main memory. The threads are accessing distinct variables and no actual race condition exists, but performance collapses as if they were contending on the same location.

int counter[MAX_THREADS];  // adjacent ints, very likely on the same cache line
 
#pragma omp parallel
{
    int tid = omp_get_thread_num();
    for (int i = 0; i < N; i++)
        counter[tid]++;   // each thread writes a different element, but the same cache line
}

The fix is to avoid per-thread arrays of small values entirely, preferring private variables with a single write-back at the end:

// private accumulation; no shared memory touched inside the loop
int total = 0;
#pragma omp parallel reduction(+:total)
{
    for (int i = 0; i < N; i++)
        total++;
}

When a per-thread array is genuinely needed, wrap each element in a struct with alignas(64) so that the compiler places each element on its own cache line:

struct alignas(64) padded_int {
    int val;
    // 60 bytes of implicit padding follow; one struct per cache line
};
 
struct padded_int counter[MAX_THREADS];
 
#pragma omp parallel
{
    int tid = omp_get_thread_num();
    for (int i = 0; i < N; i++)
        counter[tid].val++;   // each element on a separate cache line; no false sharing
}

alignas is part of C11 / C++11; include <stdalign.h> in C or <cstddef> in C++. False sharing is silent: the program produces correct results at a fraction of the expected speed, so it tends to surface only when measured speedup falls well below the thread count.

Table of Contents

False sharing (OpenMP)