parallel-computing
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| parallel-computing [June 10, 2026 at 21:07] – Ivan Janevski | parallel-computing [June 10, 2026 at 22:29] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| # Parallel computing | # Parallel computing | ||
| - | **Parallel computing** is a type of software engineering | + | **Parallel computing** is a style of programming |
| - | There are essentially three spheres of parallel computing: 1. CPU parallelism ([[openmp|OpenMP]]), 2. distributed parallelism ([[mpi|MPI]]), and 3. GPU parallelism ([[cuda|CUDA]]). I would carve out a fourth category, which is 4. | + | Not every program benefits equally. [[amdahls-law|Amdahl' |
| - | Parallel computing is related to performance engineering. This makes sense, because usually increasing parallelism increases performance, | + | ## Three paradigms |
| - | Generally speaking, | + | Parallel computing splits into three broad paradigms based on where the parallelism lives. |
| - | $$S(\text{N-cores}) = \frac{1}{(1 - P) + \frac{P}{N}}$$ | + | |
| + | **Shared-memory parallelism** runs multiple threads on a single machine with a common address space. [[openmp|OpenMP]] is the standard approach in C, C++, and Fortran: a few `#pragma omp` directives turn a serial loop into a parallel one. Threads communicate by reading and writing shared variables, which makes synchronization — mutexes, barriers, atomics — the main source of bugs and overhead. | ||
| + | |||
| + | **Distributed-memory parallelism** runs processes across separate machines (or separate address spaces on one machine), each with its own private memory. [[mpi|MPI]] is the dominant standard. Processes communicate explicitly by sending and receiving messages. There is no shared state to race on, but the programmer is responsible for every byte that crosses a process boundary. MPI is the backbone of large cluster workloads. | ||
| + | |||
| + | **GPU parallelism** offloads computation to a GPU, which can run thousands of lightweight threads simultaneously. CUDA is NVIDIA' | ||
| + | |||
| + | ## Performance and correctness | ||
| + | |||
| + | Parallel programs introduce failure modes that serial programs don't have: race conditions, deadlocks, false sharing, memory ordering issues. A race condition occurs when two threads read and write shared data without synchronization and the outcome depends on the order of execution. A deadlock occurs when two threads are each waiting for a lock the other holds. False sharing is a subtler hardware-level issue: two threads write to different variables that happen to sit in the same cache line, causing the cache coherence protocol to thrash. | ||
| + | |||
| + | On the performance side, the [[roofline-model|roofline model]] is a useful frame for understanding whether a kernel is compute-bound or memory-bandwidth-bound, | ||
| + | |||
| + | ## List of concepts | ||
| + | |||
| + | - [[list-of-parallel-computing-concepts]] | ||
parallel-computing.1781125676.txt.gz · Last modified: by Ivan Janevski
