**This is an old revision of the document!**
Table of Contents
Thread affinity (OpenMP)
On a laptop or single-socket desktop, every core reaches the same RAM at the same speed. On a multi-socket server, each socket has its own bank of RAM (NUMA, non-uniform memory access): accessing local memory is fast, but crossing the inter-socket interconnect to reach the other socket's RAM costs roughly 2–3× more. By default, the OS is free to migrate threads between cores and sockets, silently moving a thread away from the data it allocated. Thread affinity pins threads to specific hardware locations to prevent migration and keep threads close to their data.
#pragma omp parallel proc_bind(spread) num_threads(8) { // threads are spread evenly across sockets; good for bandwidth-bound work } #pragma omp parallel proc_bind(close) { // threads are packed near the master thread; good for latency-sensitive work // that shares a last-level cache }
The proc_bind policies are:
spread— distribute threads as evenly as possible across the available places (sockets or cores); maximises total memory bandwidthclose— place threads as close together as possible relative to the master; maximises shared cache reusemaster— place all threads on the same place as the master thread
OMP_PLACES defines what a “place” is: sockets, cores (default on most systems), or threads (hardware threads / hyperthreads). OMP_PROC_BIND sets the default policy for all parallel regions that do not specify proc_bind explicitly. On a single-socket desktop the difference is negligible; on a dual- or quad-socket HPC node it can determine whether bandwidth scales or saturates at a fraction of peak.
