False sharing is a performance problem that occurs when two threads write to different variables that happen to occupy the same cache line. Cache lines are typically 64 bytes wide. The hardware cache-coherence protocol must keep that line consistent across cores, so every write by one thread invalidates the cached copy in the other and forces a round-trip to main memory. The threads are accessing distinct variables and no actual race condition exists, but performance collapses as if they were contending on the same location.
int counter[MAX_THREADS]; // adjacent ints, very likely on the same cache line #pragma omp parallel { int tid = omp_get_thread_num(); for (int i = 0; i < N; i++) counter[tid]++; // each thread writes a different element, but the same cache line }
The fix is to avoid per-thread arrays of small values entirely, preferring private variables with a single write-back at the end:
// private accumulation; no shared memory touched inside the loop int total = 0; #pragma omp parallel reduction(+:total) { for (int i = 0; i < N; i++) total++; }
When a per-thread array is genuinely needed, wrap each element in a struct with alignas(64) so that the compiler places each element on its own cache line:
struct alignas(64) padded_int { int val; // 60 bytes of implicit padding follow; one struct per cache line }; struct padded_int counter[MAX_THREADS]; #pragma omp parallel { int tid = omp_get_thread_num(); for (int i = 0; i < N; i++) counter[tid].val++; // each element on a separate cache line; no false sharing }
alignas is part of C11 / C++11; include <stdalign.h> in C or <cstddef> in C++. False sharing is silent: the program produces correct results at a fraction of the expected speed, so it tends to surface only when measured speedup falls well below the thread count.