MPI
MPI (Message Passing Interface) is a standard for distributed-memory parallel programming in C, C++, and Fortran. Unlike OpenMP where you add pragmas and the threads share your program's memory, with MPI you launch N independent copies of your program, each with its own private address space, and they coordinate by explicitly sending and receiving messages. Because the communication model makes no assumption that processes share any hardware, MPI programs scale from a laptop to a cluster with thousands of nodes without code changes.
The mental model takes some adjustment if you are coming from single-process C or even OpenMP. You are not writing one program that spawns workers. You are writing a program that will be instantiated N times simultaneously, and each copy plays a different role based on which number — the rank — it receives at launch. Rank 0 might distribute data, ranks 1 through N-1 might compute, rank 0 might collect results. Same source file, same binary, different runtime behaviour.
The execution model is SPMD (single program, multiple data): all processes launch together via mpirun or mpiexec and run until they all call MPI_Finalize. Every MPI program must call MPI_Init before any other MPI function and MPI_Finalize at the end.
#include <mpi.h> #include <stdio.h> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int rank, size; MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("process %d of %d\n", rank, size); MPI_Finalize(); return 0; }
Launch with mpirun -n 4 ./program to start 4 processes. Every process executes the full main, so printf is called by all four. Output order is non-deterministic. Compile with mpicc (C) or mpicxx (C++), which wrap the system compiler with the right include paths and link flags — you do not call gcc directly.
Practice
Compile and run the hello-world above:
$ mpicc -o hello hello.c $ mpirun -n 4 ./hello process 2 of 4 process 0 of 4 process 3 of 4 process 1 of 4
Output order is non-deterministic. Run it a few times and the order changes. Try mpirun -n 1 and mpirun -n 8. The binary does not change; the number of processes does. This is SPMD in action: one executable, different runtime identity per instantiation.
The hello-world does not communicate, so it does not show what MPI is actually for. Here is the simplest program that does: rank 0 sends an integer to rank 1, which receives and prints it.
// compile: mpicc -o ping ping.c // run: mpirun -n 2 ./ping // description: rank 0 sends a value to rank 1 #include <mpi.h> #include <stdio.h> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int rank, value = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { value = 42; MPI_Send(&value, 1, MPI_INT, 1, 0, MPI_COMM_WORLD); printf("rank 0: sent %d\n", value); } else { MPI_Recv(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("rank 1: received %d\n", value); } MPI_Finalize(); return 0; }
The if (rank == 0) branch and the else branch are in the same source file but execute on different processes, potentially on different machines. Rank 0 blocks on MPI_Send until rank 1 reaches MPI_Recv. That blocking behaviour is one of the most important things to internalise early — MPI communication is not fire-and-forget. More on this in blocking and non-blocking communication.
From here, point-to-point communication is the natural next concept, then collectives once that feels comfortable.
Concepts
Overview
Functions
// Initialisation MPI_Init(&argc, &argv) // initialise MPI; must be first MPI_Finalize() // shut down MPI; must be last MPI_Abort(comm, errorcode) // terminate all processes in comm // Communicator queries MPI_Comm_rank(comm, &rank) // rank of calling process in comm MPI_Comm_size(comm, &size) // number of processes in comm MPI_Comm_split(comm, color, key, &newcomm) // partition into sub-communicators MPI_Comm_free(&comm) // release a communicator // Point-to-point MPI_Send(buf, count, type, dest, tag, comm) MPI_Recv(buf, count, type, src, tag, comm, &status) MPI_Sendrecv(sbuf, sc, st, dest, stag, rbuf, rc, rt, src, rtag, comm, &status) MPI_Isend(buf, count, type, dest, tag, comm, &req) MPI_Irecv(buf, count, type, src, tag, comm, &req) MPI_Wait(&req, &status) MPI_Test(&req, &flag, &status) MPI_Waitall(count, reqs, statuses) // Collectives MPI_Barrier(comm) MPI_Bcast(buf, count, type, root, comm) MPI_Scatter(sbuf, sc, st, rbuf, rc, rt, root, comm) MPI_Gather(sbuf, sc, st, rbuf, rc, rt, root, comm) MPI_Scatterv(sbuf, scounts, displs, st, rbuf, rc, rt, root, comm) MPI_Gatherv(sbuf, sc, st, rbuf, rcounts, displs, rt, root, comm) MPI_Allgather(sbuf, sc, st, rbuf, rc, rt, comm) MPI_Alltoall(sbuf, sc, st, rbuf, rc, rt, comm) MPI_Reduce(sbuf, rbuf, count, type, op, root, comm) MPI_Allreduce(sbuf, rbuf, count, type, op, comm) MPI_Scan(sbuf, rbuf, count, type, op, comm) // inclusive prefix reduction // Derived datatypes MPI_Type_contiguous(count, oldtype, &newtype) MPI_Type_vector(count, blocklength, stride, oldtype, &newtype) MPI_Type_create_struct(count, blocklengths, displs, types, &newtype) MPI_Type_commit(&type) MPI_Type_free(&type) // One-sided MPI_Win_create(base, size, disp_unit, info, comm, &win) MPI_Put(obuf, oc, ot, target_rank, disp, tc, tt, win) MPI_Get(obuf, oc, ot, target_rank, disp, tc, tt, win) MPI_Accumulate(obuf, oc, ot, target_rank, disp, tc, tt, op, win) MPI_Win_fence(assert, win) MPI_Win_free(&win) // Timing MPI_Wtime() // wall-clock time in seconds MPI_Wtick() // resolution of MPI_Wtime
Built-in datatypes
| C type | MPI type |
|---|---|
int | MPI_INT |
long | MPI_LONG |
float | MPI_FLOAT |
double | MPI_DOUBLE |
char | MPI_CHAR |
unsigned char | MPI_UNSIGNED_CHAR |
long long | MPI_LONG_LONG |
Built-in reduction operators
| Operator | Meaning |
|---|---|
MPI_SUM | sum |
MPI_PROD | product |
MPI_MAX | maximum |
MPI_MIN | minimum |
MPI_LAND | logical and |
MPI_LOR | logical or |
MPI_BAND | bitwise and |
MPI_BOR | bitwise or |
MPI_MAXLOC | maximum value and the rank that holds it |
MPI_MINLOC | minimum value and the rank that holds it |
