Site Tools


parallel-computing-overview

**This is an old revision of the document!**

Table of Contents

Overview

Paradigms

Paradigm Memory model API Typical scale
Shared-memory Common address space OpenMP, pthreads Single node
Distributed-memory Private per process MPI Multi-node cluster
GPU Host + device CUDA, HIP, OpenCL Single GPU
Hybrid Mixed MPI + OpenMP, MPI + CUDA Multi-node + GPU

Laws and models

Name Formula What it predicts
Amdahl's law $S = \dfrac{1}{s + (1-s)/p}$ Maximum speedup given serial fraction $s$ and $p$ processors
Gustafson's law $S = p - s(p-1)$ Speedup when problem size scales with $p$
Roofline model $\min(\pi,\; I \cdot \beta)$ Whether a kernel is compute-bound or memory-bandwidth-bound

Common failure modes

Issue Cause Detection
Race condition Concurrent unsynchronized read/write Helgrind, ThreadSanitizer
Deadlock Circular lock dependency Missing progress; gdb info threads
False sharing Two variables in the same cache line perf c2c, cache miss counters
Load imbalance Unequal work distribution Per-thread time profile
Memory ordering Relaxed atomics with wrong assumptions Litmus tests; address and memory sanitizers
parallel-computing-overview.1781515952.txt.gz ยท Last modified: by Ivan Janevski