# Performance model (MPI)
Every `write()` syscall has a fixed overhead regardless of how many bytes you write: there is a kernel crossing, buffer management, and bookkeeping before the first byte moves. For a local file the overhead is small relative to I/O cost, but over a network the per-message overhead can be much larger than the cost of moving the data itself. The **performance model** for MPI communication captures this as $\alpha + \beta n$, where $\alpha$ is the per-message latency (a fixed overhead paid regardless of size) and $\beta$ is the per-byte transfer time. Latency dominates for small messages; bandwidth dominates for large ones. The practical implication is that fewer, larger messages are almost always better: one message of 100 elements pays $\alpha$ once, while 100 separate sends pay it 100 times.

```c
// bad: N messages, each paying alpha
for (int i = 0; i < N; i++)
    MPI_Send(&vals[i], 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD);

// good: one message, alpha paid once
MPI_Send(vals, N, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD);
```

This model also explains two MPI-specific behaviors. Small messages (typically below a few kilobytes, though the threshold is implementation-defined) use the **eager protocol**: the sender copies the data into a pre-allocated buffer and returns immediately; the receiver picks it up later. Large messages use the **rendezvous protocol**: sender and receiver handshake first, then data moves directly between their buffers with no intermediate copy. The handshake is why `MPI_Send` can block on large messages even when no explicit synchronisation is requested. It is also why buffering-based deadlocks often appear only at scale: small test data fits in the eager buffer and the code runs fine; production-size data triggers rendezvous and the code hangs.