# Hybrid MPI+OpenMP (MPI) A naive parallel program launches one MPI rank per core. That works, but it means every rank keeps its own copy of the data (there is no shared address space between MPI processes), and the number of messages scales with total core count. A better model on a multi-core node is one MPI rank per node (or per socket), with OpenMP threads filling the cores within it. Inter-node communication goes through MPI; intra-node parallelism uses shared memory through OpenMP. **Hybrid MPI+OpenMP** is the dominant model on modern HPC clusters. When threads are involved, MPI must be initialised with `MPI_Init_thread` instead of `MPI_Init` to declare the required level of thread safety: ```c int provided; MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided); if (provided < MPI_THREAD_FUNNELED) { fprintf(stderr, "insufficient MPI thread support\n"); MPI_Abort(MPI_COMM_WORLD, 1); } ``` The four thread-safety levels are: - `MPI_THREAD_SINGLE` — only one thread will execute; equivalent to `MPI_Init` - `MPI_THREAD_FUNNELED` — multiple threads exist but only the main thread makes MPI calls; the most common level for MPI+OpenMP - `MPI_THREAD_SERIALIZED` — multiple threads make MPI calls but not concurrently; the application serialises them - `MPI_THREAD_MULTIPLE` — multiple threads call MPI concurrently; requires a thread-safe MPI build and has higher overhead With `MPI_THREAD_FUNNELED`, all MPI calls must happen on the master thread, either outside parallel regions or inside one guarded with `#pragma omp master`. The typical structure is to post non-blocking communication on the master thread, enter a parallel region to compute the interior while halos travel, then wait for communication before computing the boundary. ```c while (!converged) { MPI_Startall(nreqs, reqs); // post halo exchange on master thread #pragma omp parallel for for (int i = interior_lo; i < interior_hi; i++) update(i); // interior computation overlaps with comms MPI_Waitall(nreqs, reqs, MPI_STATUSES_IGNORE); #pragma omp parallel for for (int i = 0; i < halo_size; i++) update_halo(i); } ```