Thread affinity (OpenMP)

Thread affinity (OpenMP)

On a laptop or single-socket desktop, every core reaches the same RAM at the same speed. On a multi-socket server, each socket has its own bank of RAM (NUMA, non-uniform memory access): accessing local memory is fast, but crossing the inter-socket interconnect to reach the other socket's RAM costs roughly 2–3× more. By default, the OS is free to migrate threads between cores and sockets, silently moving a thread away from the data it allocated. Thread affinity pins threads to specific hardware locations to prevent migration and keep threads close to their data.

#pragma omp parallel proc_bind(spread) num_threads(8)
{
    // threads are spread evenly across sockets; good for bandwidth-bound work
}
 
#pragma omp parallel proc_bind(close)
{
    // threads are packed near the master thread; good for latency-sensitive work
    // that shares a last-level cache
}

The proc_bind policies are:

spread — distribute threads as evenly as possible across the available places (sockets or cores); maximises total memory bandwidth
close — place threads as close together as possible relative to the master; maximises shared cache reuse
master — place all threads on the same place as the master thread

OMP_PLACES defines what a “place” is: sockets, cores (default on most systems), or threads (hardware threads / hyperthreads). OMP_PROC_BIND sets the default policy for all parallel regions that do not specify proc_bind explicitly. On a single-socket desktop the difference is negligible; on a dual- or quad-socket HPC node it can determine whether bandwidth scales or saturates at a fraction of peak.

Table of Contents

Thread affinity (OpenMP)