Writing guide on parallel computing

Writing guide on parallel computing

Writing guide on parallel computing (this article) covers conventions specific to parallel computing articles in this wiki. Read the general writing guide first; this document supplements it.

The central challenge

Parallel computing articles fail in one consistent way: they explain what a primitive does before establishing why a C programmer would care. The reader lands on reduction-mpi.txt and is told that “MPI_Reduce applies a commutative associative operator across the send buffers of all processes” before ever connecting that to the problem they came to solve. The result is dense correct text that is hard to follow.

The fix is to lead with something the reader already knows, then show how the parallel concept extends or replaces it. This does not mean adding a generic intro paragraph — it means opening with a specific, concrete hook.

Types of hooks

Sequential code first

Use this when the concept has a direct serial equivalent. Show the sequential version first, then introduce the parallel version as a natural extension or replacement. The sequential snippet should be short — a loop, an array operation, an index calculation.

Examples: - Reduction: show for (int i = 0; i < N; i++) sum += arr[i] before MPI_Allreduce or reduction(+:sum) - Scatter/gather: show a loop of MPI_Send calls before MPI_Scatter - Prefix reductions: show a sequential scan loop before MPI_Exscan - Parallel loops: show a serial for loop before #pragma omp parallel for - Derived datatypes: show the manual pack-into-buffer loop before MPI_Type_vector - Virtual topologies: show the manual (row-1+P)%P * Q + col neighbour arithmetic before MPI_Cart_shift

Broken naive code first

Use this when the concept exists to fix a correctness problem that the naive parallel approach has. Show the broken version — the race condition, the deadlock, the out-of-order read — then explain what goes wrong, then introduce the fix.

Examples: - OpenMP reduction: show #pragma omp parallel for on sum += a[i] as a broken race before reduction(+:sum) - OpenMP atomic: show count++ as a load/increment/store race before #pragma omp atomic - MPI deadlock: show two processes each calling MPI_Send before explaining cyclic send dependencies

Syscall or POSIX API analogy

Use this when the concept maps cleanly to something from single-machine systems programming that the reader likely already knows. Name the analogy explicitly; don't expect the reader to notice the parallel themselves.

Parallel concept	Analogy
—	—
MPI point-to-point	BSD sockets (`send`/`recv`), with rank replacing file descriptor
MPI blocking/non-blocking	`read()` vs `aio_read()` + `aio_suspend()`
MPI probing	`MSG_PEEK` / `ioctl(FIONREAD)`
MPI parallel I/O	`pwrite()` with explicit offset
MPI persistent communication	HTTP keep-alive vs per-request reconnect
OpenMP critical section	`pthread_mutex_lock` / `pthread_mutex_unlock`
OpenMP flush	`volatile`: prevent register caching, force memory visibility

Hardware or memory model analogy

Use this for concepts that are grounded in hardware behaviour the reader may not have encountered in single-threaded work.

Examples: - SIMD: “a CPU can add four floats in a single instruction rather than one” before defining SSE/AVX - False sharing: explain cache line width (typically 64 bytes) and coherence invalidation before showing the performance collapse - MPI one-sided: mmap(MAP_SHARED) within a machine, then extend the idea across nodes - Shared memory windows: the messaging layer adds copy overhead even between processes on the same node that share physical memory - Thread affinity / NUMA: laptop (uniform memory) vs. multi-socket server (each socket has its own RAM bank) before introducing OMP_PLACES

Scope or namespace analogy

Use this for concepts about isolation, scoping, and context.

Examples: - MPI communicators: TCP port numbers scope traffic between programs; communicators scope traffic between MPI contexts - MPI communicator duplication: a library inheriting your file descriptors can corrupt stdin; passing MPI_COMM_WORLD to a library lets it intercept your messages — MPI_Comm_dup is the dup() equivalent

What level of familiarity to assume

For basic concepts (point-to-point, parallel loops, reduction, data sharing, communicators): assume only C knowledge. The reader knows pointers, structs, loops, and POSIX. They do not know what a rank is, what a communicator is, or what fork-join means. Define these at point of first use.

For intermediate concepts (collectives, non-blocking, scheduling, tasks, send modes): assume the reader has read the parent article (mpi.txt or openmp.txt) and the basic concepts that precede this one in the numbered list. Light cross-links are preferred over re-explaining.

For advanced concepts (one-sided communication, virtual topologies, communicator duplication, process groups, hybrid MPI+OpenMP, NUMA affinity): assume MPI or OpenMP familiarity. The hook can be shorter. Cross-link to prerequisite articles rather than restating them.

Code snippets in parallel computing articles

Two short snippets in the same section are fine when contrasting a before and after (sequential vs. parallel, broken vs. correct). The sequential or broken snippet does not need a compile/run/description header. The parallel or corrected snippet is still a concept illustration, not a full MVE, so it also does not need a header unless the article is specifically a how-to.

Keep the sequential “before” snippet to a few lines. Its only job is to give the reader a footing before the parallel version. If it grows beyond a loop body, it is probably carrying too much weight and should move to its own section or article.

Correctness vs. performance articles

Parallel computing articles tend to fall into two categories. Treat them differently.

Correctness concepts (data sharing, race conditions, deadlock, atomic, critical sections, flush, message ordering) should lead with the failure mode — what goes wrong without the construct — before explaining the fix. The broken code or the pathological scenario is the hook.

Performance concepts (scheduling, false sharing, SIMD, thread affinity, performance model, persistent communication, non-blocking communication) should lead with the observable symptom — slower than expected, threads idle, bandwidth wasted — before explaining the underlying cause and remedy.

Structure of overview articles

openmp.txt and mpi.txt are overview articles, not concept articles. They follow the same four-part pattern as docker.txt and perf.txt:

Intro — conversational and second-person. Explain what the tool is, what problem it solves, and the core execution model (fork-join, SPMD). The hello-world code block goes here. Aim for the tone of docker.txt: “if you have X problem, this is the shortest path.” Analogies comparing to something the reader already knows are welcome here too.
Practice — “try this yourself” section. Start with compiling and running the hello-world from the intro, explain the non-deterministic output, then show the simplest example that demonstrates actual parallel communication or coordination (not just printing ranks). Write in the second person throughout. The practice section should end with a pointer to where to go next in the Concepts list.
Concepts — numbered list of links to dedicated concept articles. No prose here; the links speak for themselves.
Overview — quick-reference function lists, tables of constants and operators, environment variable tables. Pure reference material. Do not add prose or tutorials here.

The intro and Practice sections should feel personal and approachable: acknowledge when something is non-obvious (“the mental model takes some adjustment”), acknowledge when something is surprisingly easy (“it genuinely feels like cheating”), and explain the rough edges (non-deterministic output, blocking semantics, -fopenmp being silently ignored when absent).

Cross-links between parallel computing articles

Internal links between related parallel concepts are encouraged. The most useful cross-links are: - From an advanced concept to the prerequisite basic concept it builds on - From a correctness fix to the failure mode it addresses (e.g. atomic links to critical-sections, nowait links to barrier) - Between OpenMP and MPI articles that cover equivalent concepts (e.g. reduction-openmp.txt and reduction-mpi.txt)

Use [[page-id|display text]] with a descriptive display name rather than the raw page ID.

Table of Contents