MPI

MPI (Message Passing Interface) is a standard for distributed-memory parallel programming in C, C++, and Fortran. Unlike OpenMP where you add pragmas and the threads share your program's memory, with MPI you launch N independent copies of your program, each with its own private address space, and they coordinate by explicitly sending and receiving messages. Because the communication model makes no assumption that processes share any hardware, MPI programs scale from a laptop to a cluster with thousands of nodes without code changes.

The mental model takes some adjustment if you are coming from single-process C or even OpenMP. You are not writing one program that spawns workers. You are writing a program that will be instantiated N times simultaneously, and each copy plays a different role based on which number — the rank — it receives at launch. Rank 0 might distribute data, ranks 1 through N-1 might compute, rank 0 might collect results. Same source file, same binary, different runtime behaviour.

The execution model is SPMD (single program, multiple data): all processes launch together via mpirun or mpiexec and run until they all call MPI_Finalize. Every MPI program must call MPI_Init before any other MPI function and MPI_Finalize at the end.

#include <mpi.h>
#include <stdio.h>
 
int main(int argc, char **argv) {
    MPI_Init(&argc, &argv);
 
    int rank, size;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    printf("process %d of %d\n", rank, size);
 
    MPI_Finalize();
    return 0;
}

Launch with mpirun -n 4 ./program to start 4 processes. Every process executes the full main, so printf is called by all four. Output order is non-deterministic. Compile with mpicc (C) or mpicxx (C++), which wrap the system compiler with the right include paths and link flags — you do not call gcc directly.

Practice

Compile and run the hello-world above:

$ mpicc -o hello hello.c
$ mpirun -n 4 ./hello
process 2 of 4
process 0 of 4
process 3 of 4
process 1 of 4

Output order is non-deterministic. Run it a few times and the order changes. Try mpirun -n 1 and mpirun -n 8. The binary does not change; the number of processes does. This is SPMD in action: one executable, different runtime identity per instantiation.

The hello-world does not communicate, so it does not show what MPI is actually for. Here is the simplest program that does: rank 0 sends an integer to rank 1, which receives and prints it.

// compile: mpicc -o ping ping.c
// run: mpirun -n 2 ./ping
// description: rank 0 sends a value to rank 1
 
#include <mpi.h>
#include <stdio.h>
 
int main(int argc, char **argv) {
    MPI_Init(&argc, &argv);
    int rank, value = 0;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 
    if (rank == 0) {
        value = 42;
        MPI_Send(&value, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
        printf("rank 0: sent %d\n", value);
    } else {
        MPI_Recv(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        printf("rank 1: received %d\n", value);
    }
 
    MPI_Finalize();
    return 0;
}

The if (rank == 0) branch and the else branch are in the same source file but execute on different processes, potentially on different machines. Rank 0 blocks on MPI_Send until rank 1 reaches MPI_Recv. That blocking behaviour is one of the most important things to internalise early — MPI communication is not fire-and-forget. More on this in blocking and non-blocking communication.

From here, point-to-point communication is the natural next concept, then collectives once that feels comfortable.

Concepts

Overview

Functions

// Initialisation
MPI_Init(&argc, &argv)                               // initialise MPI; must be first
MPI_Finalize()                                       // shut down MPI; must be last
MPI_Abort(comm, errorcode)                           // terminate all processes in comm
 
// Communicator queries
MPI_Comm_rank(comm, &rank)                           // rank of calling process in comm
MPI_Comm_size(comm, &size)                           // number of processes in comm
MPI_Comm_split(comm, color, key, &newcomm)           // partition into sub-communicators
MPI_Comm_free(&comm)                                 // release a communicator
 
// Point-to-point
MPI_Send(buf, count, type, dest, tag, comm)
MPI_Recv(buf, count, type, src, tag, comm, &status)
MPI_Sendrecv(sbuf, sc, st, dest, stag,
             rbuf, rc, rt, src,  rtag, comm, &status)
MPI_Isend(buf, count, type, dest, tag, comm, &req)
MPI_Irecv(buf, count, type, src,  tag, comm, &req)
MPI_Wait(&req, &status)
MPI_Test(&req, &flag, &status)
MPI_Waitall(count, reqs, statuses)
 
// Collectives
MPI_Barrier(comm)
MPI_Bcast(buf, count, type, root, comm)
MPI_Scatter(sbuf, sc, st, rbuf, rc, rt, root, comm)
MPI_Gather(sbuf, sc, st, rbuf, rc, rt, root, comm)
MPI_Scatterv(sbuf, scounts, displs, st, rbuf, rc, rt, root, comm)
MPI_Gatherv(sbuf, sc, st, rbuf, rcounts, displs, rt, root, comm)
MPI_Allgather(sbuf, sc, st, rbuf, rc, rt, comm)
MPI_Alltoall(sbuf, sc, st, rbuf, rc, rt, comm)
MPI_Reduce(sbuf, rbuf, count, type, op, root, comm)
MPI_Allreduce(sbuf, rbuf, count, type, op, comm)
MPI_Scan(sbuf, rbuf, count, type, op, comm)          // inclusive prefix reduction
 
// Derived datatypes
MPI_Type_contiguous(count, oldtype, &newtype)
MPI_Type_vector(count, blocklength, stride, oldtype, &newtype)
MPI_Type_create_struct(count, blocklengths, displs, types, &newtype)
MPI_Type_commit(&type)
MPI_Type_free(&type)
 
// One-sided
MPI_Win_create(base, size, disp_unit, info, comm, &win)
MPI_Put(obuf, oc, ot, target_rank, disp, tc, tt, win)
MPI_Get(obuf, oc, ot, target_rank, disp, tc, tt, win)
MPI_Accumulate(obuf, oc, ot, target_rank, disp, tc, tt, op, win)
MPI_Win_fence(assert, win)
MPI_Win_free(&win)
 
// Timing
MPI_Wtime()                                          // wall-clock time in seconds
MPI_Wtick()                                          // resolution of MPI_Wtime

Built-in datatypes

C type	MPI type
`int`	`MPI_INT`
`long`	`MPI_LONG`
`float`	`MPI_FLOAT`
`double`	`MPI_DOUBLE`
`char`	`MPI_CHAR`
`unsigned char`	`MPI_UNSIGNED_CHAR`
`long long`	`MPI_LONG_LONG`

Built-in reduction operators

Operator	Meaning
`MPI_SUM`	sum
`MPI_PROD`	product
`MPI_MAX`	maximum
`MPI_MIN`	minimum
`MPI_LAND`	logical and
`MPI_LOR`	logical or
`MPI_BAND`	bitwise and
`MPI_BOR`	bitwise or
`MPI_MAXLOC`	maximum value and the rank that holds it
`MPI_MINLOC`	minimum value and the rank that holds it

Ivan's wiki

Table of Contents

MPI