Non-blocking collectives (MPI)

Non-blocking point-to-point communication lets a process post a send or receive and continue computing while the transfer happens in the background (see blocking and non-blocking). Non-blocking collectives extend the same idea to collective operations. MPI-3 introduced variants that return immediately with a request handle rather than blocking until the operation completes. MPI_Ibcast, MPI_Ireduce, MPI_Iallreduce, MPI_Iscatter, and others follow the same request/wait pattern. The send and receive buffers must not be touched between the call and MPI_Wait.

double local = compute_first_part();
double global;
MPI_Request req;
MPI_Iallreduce(&local, &global, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD, &req);
do_independent_work();
MPI_Wait(&req, MPI_STATUS_IGNORE);
use(global);

Not every implementation overlaps the collective with the intervening computation; some simply defer the work to MPI_Wait. On implementations that do use hardware-assisted collectives (common in high-end interconnects like InfiniBand), the overlap can hide significant latency in reduction-heavy algorithms.

Ivan's wiki

Table of Contents

Non-blocking collectives (MPI)