Collectives (MPI)

Collectives (MPI)

Many parallel patterns require all processes to participate in a coordinated operation: distributing input, combining results, or synchronising before the next phase. These can always be assembled from point-to-point sends and receives, but writing them manually is verbose and misses optimisation opportunities. A broadcast, for instance, could be written as rank 0 sending the value to each other process in a loop:

if (rank == 0)
    for (int i = 1; i < size; i++)
        MPI_Send(&x, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
else
    MPI_Recv(&x, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

This puts O(N) serial sends on the root. MPI_Bcast expresses the same intent as a single collective call that all processes participate in, allowing the runtime to use a tree-structured dissemination that completes in O(log N) steps.

int x;
if (rank == 0) x = 99;
MPI_Bcast(&x, 1, MPI_INT, 0, MPI_COMM_WORLD);
// all processes now have x == 99

All collective operations must be called by every process in the communicator and none return until the operation is complete across all of them. MPI_Barrier is the simplest: it synchronises all processes with no data exchange, commonly used before timed regions to ensure every process starts the clock at the same point. MPI_Allgather is a gather followed by a broadcast: each process contributes a chunk and every process receives the fully assembled result. MPI_Alltoall sends a distinct chunk from every process to every other process, which is the communication pattern at the core of distributed FFTs and matrix transposes.

Table of Contents

Collectives (MPI)