Differences

This shows you the differences between two versions of the page.

--- mpi [June 10, 2026 at 21:55] – Ivan Janevski
+++ mpi [June 13, 2026 at 03:13] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
 # MPI
-**MPI** (or **Message Passing Interface**) is a parallel computing API used for distributed parallelism.
+**MPI** (Message Passing Interface) is a standard for distributed-memory parallel programming in C, C++, and Fortran. Unlike [[openmp|OpenMP]] where you add pragmas and the threads share your program's memory, with MPI you launch N independent copies of your program, each with its own private address space, and they coordinate by explicitly sending and receiving messages. Because the communication model makes no assumption that processes share any hardware, MPI programs scale from a laptop to a cluster with thousands of nodes without code changes.
+The mental model takes some adjustment if you are coming from single-process C or even OpenMP. You are not writing one program that spawns workers. You are writing a program that will be instantiated N times simultaneously, and each copy plays a different role based on which number — the **rank** — it receives at launch. Rank 0 might distribute data, ranks 1 through N-1 might compute, rank 0 might collect results. Same source file, same binary, different runtime behaviour.
+The execution model is **SPMD** (single program, multiple data): all processes launch together via `mpirun` or `mpiexec` and run until they all call `MPI_Finalize`. Every MPI program must call `MPI_Init` before any other MPI function and `MPI_Finalize` at the end.
-## Overview
-C programming language:
 ```c
-int MPI_Init(...);
+#include <mpi.h>
-int MPI_Finalize(...);
+#include <stdio.h>
-int MPI_Send (...);
-int MPI_Recv (...);
+int main(int argc, char **argv) {
-int MPI_Bcast(...);
+    MPI_Init(&argc, &argv);
-int MPI_Reduce(...);
-int MPI_Group_size(...);
+    int rank, size;
-int MPI_Group_rank(...);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
-int MPI_Comm_rank (...);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
-int MPI_Comm_size (...);
+    printf("process %d of %d\n", rank, size);
-int MPI_Type_commit(...);
-int MPI_Type_contiguous(...);
+    MPI_Finalize();
-int MPI_Type_free(...);
+    return 0;
+}
 ```
-C++ programming language:
+Launch with `mpirun -n 4 ./program` to start 4 processes. Every process executes the full `main`, so `printf` is called by all four. Output order is non-deterministic. Compile with `mpicc` (C) or `mpicxx` (C++), which wrap the system compiler with the right include paths and link flags — you do not call `gcc` directly.
-```cpp
-namespace MPI {
+## Practice
-    class Comm {...};
-    class Intracomm : public Comm {...};
+Compile and run the hello-world above:
-    class Graphcomm : public Intracomm {...};
-    class Cartcomm : public Intracomm {...};
+```bash
-    class Intercomm : public Comm {...};
+$ mpicc -o hello hello.c
-    class Datatype {...};
+$ mpirun -n 4 ./hello
-    class Errhandler {...};
+process 2 of 4
-    class Exception {...};
+process 0 of 4
-    class Group {...};
+process 3 of 4
-    class Op {...};
+process 1 of 4
-    class Request {...};
-    class Prequest : public Request {...};
-    class Status {...};
-};
 ```
+Output order is non-deterministic. Run it a few times and the order changes. Try `mpirun -n 1` and `mpirun -n 8`. The binary does not change; the number of processes does. This is SPMD in action: one executable, different runtime identity per instantiation.
+The hello-world does not communicate, so it does not show what MPI is actually for. Here is the simplest program that does: rank 0 sends an integer to rank 1, which receives and prints it.
+```c
+// compile: mpicc -o ping ping.c
+// run: mpirun -n 2 ./ping
+// description: rank 0 sends a value to rank 1
+#include <mpi.h>
+#include <stdio.h>
+int main(int argc, char **argv) {
+    MPI_Init(&argc, &argv);
+    int rank, value = 0;
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    if (rank == 0) {
+        value = 42;
+        MPI_Send(&value, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
+        printf("rank 0: sent %d\n", value);
+    } else {
+        MPI_Recv(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
+        printf("rank 1: received %d\n", value);
+    }
+    MPI_Finalize();
+    return 0;
+}
+```
+The `if (rank == 0)` branch and the `else` branch are in the same source file but execute on different processes, potentially on different machines. Rank 0 blocks on `MPI_Send` until rank 1 reaches `MPI_Recv`. That blocking behaviour is one of the most important things to internalise early — MPI communication is not fire-and-forget. More on this in [[blocking-mpi|blocking and non-blocking communication]].
+From here, [[point-to-point-mpi|point-to-point communication]] is the natural next concept, then [[collectives-mpi|collectives]] once that feels comfortable.
+## Concepts
+. [[communicators-mpi|Communicators]]
+. [[point-to-point-mpi|Point-to-point communication]]
+. [[blocking-mpi|Blocking and non-blocking]]
+. [[send-modes-mpi|Send modes]]
+. [[message-ordering-mpi|Message ordering]]
+. [[probing-mpi|Probing for messages]]
+. [[deadlock-mpi|Deadlock]]
+. [[performance-model-mpi|Performance model]]
+. [[collectives-mpi|Collectives]]
+. [[reduction-mpi|Reduction]]
+. [[scatter-and-gather-mpi|Scatter and gather]]
+. [[prefix-reductions-mpi|Prefix reductions]]
+. [[nonblocking-collectives-mpi|Non-blocking collectives]]
+. [[persistent-communication-mpi|Persistent communication]]
+. [[derived-datatypes-mpi|Derived datatypes]]
+. [[virtual-topologies-mpi|Virtual topologies]]
+. [[process-groups-mpi|Process groups]]
+. [[communicator-duplication-mpi|Communicator duplication]]
+. [[one-sided-mpi|One-sided communication]]
+. [[shared-memory-windows-mpi|Shared memory windows]]
+. [[parallel-io-mpi|Parallel I/O]]
+. [[time-measurement-mpi|Time measurement]]
+. [[hybrid-openmp-mpi|Hybrid MPI+OpenMP]]
+## Overview
+### Functions
+```c
+// Initialisation
+MPI_Init(&argc, &argv)                               // initialise MPI; must be first
+MPI_Finalize()                                       // shut down MPI; must be last
+MPI_Abort(comm, errorcode)                           // terminate all processes in comm
+// Communicator queries
+MPI_Comm_rank(comm, &rank)                           // rank of calling process in comm
+MPI_Comm_size(comm, &size)                           // number of processes in comm
+MPI_Comm_split(comm, color, key, &newcomm)           // partition into sub-communicators
+MPI_Comm_free(&comm)                                 // release a communicator
+// Point-to-point
+MPI_Send(buf, count, type, dest, tag, comm)
+MPI_Recv(buf, count, type, src, tag, comm, &status)
+MPI_Sendrecv(sbuf, sc, st, dest, stag,
+             rbuf, rc, rt, src,  rtag, comm, &status)
+MPI_Isend(buf, count, type, dest, tag, comm, &req)
+MPI_Irecv(buf, count, type, src,  tag, comm, &req)
+MPI_Wait(&req, &status)
+MPI_Test(&req, &flag, &status)
+MPI_Waitall(count, reqs, statuses)
+// Collectives
+MPI_Barrier(comm)
+MPI_Bcast(buf, count, type, root, comm)
+MPI_Scatter(sbuf, sc, st, rbuf, rc, rt, root, comm)
+MPI_Gather(sbuf, sc, st, rbuf, rc, rt, root, comm)
+MPI_Scatterv(sbuf, scounts, displs, st, rbuf, rc, rt, root, comm)
+MPI_Gatherv(sbuf, sc, st, rbuf, rcounts, displs, rt, root, comm)
+MPI_Allgather(sbuf, sc, st, rbuf, rc, rt, comm)
+MPI_Alltoall(sbuf, sc, st, rbuf, rc, rt, comm)
+MPI_Reduce(sbuf, rbuf, count, type, op, root, comm)
+MPI_Allreduce(sbuf, rbuf, count, type, op, comm)
+MPI_Scan(sbuf, rbuf, count, type, op, comm)          // inclusive prefix reduction
+// Derived datatypes
+MPI_Type_contiguous(count, oldtype, &newtype)
+MPI_Type_vector(count, blocklength, stride, oldtype, &newtype)
+MPI_Type_create_struct(count, blocklengths, displs, types, &newtype)
+MPI_Type_commit(&type)
+MPI_Type_free(&type)
+// One-sided
+MPI_Win_create(base, size, disp_unit, info, comm, &win)
+MPI_Put(obuf, oc, ot, target_rank, disp, tc, tt, win)
+MPI_Get(obuf, oc, ot, target_rank, disp, tc, tt, win)
+MPI_Accumulate(obuf, oc, ot, target_rank, disp, tc, tt, op, win)
+MPI_Win_fence(assert, win)
+MPI_Win_free(&win)
+// Timing
+MPI_Wtime()                                          // wall-clock time in seconds
+MPI_Wtick()                                          // resolution of MPI_Wtime
+```
+### Built-in datatypes
+^ C type ^ MPI type ^
+| `int` | `MPI_INT` |
+| `long` | `MPI_LONG` |
+| `float` | `MPI_FLOAT` |
+| `double` | `MPI_DOUBLE` |
+| `char` | `MPI_CHAR` |
+| `unsigned char` | `MPI_UNSIGNED_CHAR` |
+| `long long` | `MPI_LONG_LONG` |
+### Built-in reduction operators
+^ Operator ^ Meaning ^
+| `MPI_SUM` | sum |
+| `MPI_PROD` | product |
+| `MPI_MAX` | maximum |
+| `MPI_MIN` | minimum |
+| `MPI_LAND` | logical and |
+| `MPI_LOR` | logical or |
+| `MPI_BAND` | bitwise and |
+| `MPI_BOR` | bitwise or |
+| `MPI_MAXLOC` | maximum value and the rank that holds it |
+| `MPI_MINLOC` | minimum value and the rank that holds it |