Site Tools


parallel-computing-overview

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
parallel-computing-overview [June 15, 2026 at 09:32] Ivan Janevskiparallel-computing-overview [June 15, 2026 at 09:34] (current) Ivan Janevski
Line 79: Line 79:
 The parallel version runs roughly 4× faster on 4 cores. The answer is identical — the `reduction(+:sum)` clause handles synchronization. Try setting `OMP_NUM_THREADS` from 1 up to your core count and plot the speedup. It will flatten before reaching the theoretical maximum; that is Amdahl's law in action. The parallel version runs roughly 4× faster on 4 cores. The answer is identical — the `reduction(+:sum)` clause handles synchronization. Try setting `OMP_NUM_THREADS` from 1 up to your core count and plot the speedup. It will flatten before reaching the theoretical maximum; that is Amdahl's law in action.
  
 +## Tree
  
 +```
 +Parallel Computing
 +├── Models
 +│   ├── Shared Memory
 +│   │   ├── Threads
 +│   │   ├── Locks / Mutexes
 +│   │   ├── Atomics
 +│   │   ├── Barriers
 +│   │   ├── Thread Pools
 +│   │   └── NUMA
 +│   │
 +│   ├── Distributed Memory
 +│   │   ├── Message Passing
 +│   │   ├── Collectives
 +│   │   ├── Point-to-point
 +│   │   ├── Topologies
 +│   │   └── RDMA
 +│   │
 +│   ├── Accelerator Computing
 +│   │   ├── GPUs
 +│   │   ├── FPGAs
 +│   │   ├── TPUs
 +│   │   └── Quantum Accelerators
 +│   │
 +│   └── Hybrid
 +│       ├── MPI + OpenMP
 +│       ├── MPI + CUDA
 +│       └── MPI + OpenMP + CUDA
 +
 +├── Granularity
 +│   ├── Bit-level
 +│   ├── Instruction-level (ILP)
 +│   ├── Data-level (SIMD)
 +│   ├── Thread-level (TLP)
 +│   ├── Task-level
 +│   └── Process-level
 +
 +├── Execution Models
 +│   ├── SIMD
 +│   ├── MISD
 +│   ├── MIMD
 +│   ├── SPMD
 +│   ├── SIMT
 +│   └── BSP
 +
 +├── Synchronization
 +│   ├── Mutex
 +│   ├── Semaphore
 +│   ├── Condition Variable
 +│   ├── Barrier
 +│   ├── Atomic Operations
 +│   ├── Fences
 +│   └── Lock-free
 +
 +├── Communication
 +│   ├── Shared Variables
 +│   ├── Message Passing
 +│   ├── Channels
 +│   ├── Collectives
 +│   ├── Pipelines
 +│   └── Reduction Trees
 +
 +├── Memory Hierarchy
 +│   ├── Registers
 +│   ├── L1/L2/L3 Cache
 +│   ├── RAM
 +│   ├── NUMA Domains
 +│   ├── GPU Global Memory
 +│   ├── Shared Memory (GPU)
 +│   ├── Constant Memory
 +│   └── Distributed Memory
 +
 +├── Scheduling
 +│   ├── Static
 +│   ├── Dynamic
 +│   ├── Guided
 +│   ├── Work Stealing
 +│   ├── Gang Scheduling
 +│   └── Batch Scheduling
 +
 +├── Decomposition
 +│   ├── Data Parallelism
 +│   ├── Task Parallelism
 +│   ├── Pipeline Parallelism
 +│   ├── Domain Decomposition
 +│   ├── Functional Decomposition
 +│   └── Recursive Decomposition
 +
 +├── Performance Theory
 +│   ├── :contentReference[oaicite:0]{index=0}
 +│   ├── :contentReference[oaicite:1]{index=1}
 +│   ├── Strong Scaling
 +│   ├── Weak Scaling
 +│   ├── Speedup
 +│   ├── Efficiency
 +│   ├── Latency
 +│   ├── Bandwidth
 +│   ├── Throughput
 +│   └── Roofline Model
 +
 +├── Hardware
 +│   ├── Multicore CPUs
 +│   ├── Manycore CPUs
 +│   ├── GPUs
 +│   ├── Clusters
 +│   ├── Supercomputers
 +│   ├── Interconnects
 +│   │   ├── Ethernet
 +│   │   ├── :contentReference[oaicite:2]{index=2}
 +│   │   └── NVLink
 +│   └── Storage
 +│       ├── Parallel FS
 +│       ├── Lustre
 +│       └── GPFS
 +
 +├── Programming Models
 +│   ├── :contentReference[oaicite:3]{index=3}
 +│   ├── :contentReference[oaicite:4]{index=4}
 +│   ├── :contentReference[oaicite:5]{index=5}
 +│   ├── :contentReference[oaicite:6]{index=6}
 +│   ├── :contentReference[oaicite:7]{index=7}
 +│   ├── :contentReference[oaicite:8]{index=8}
 +│   ├── :contentReference[oaicite:9]{index=9}
 +│   └── CSP / Actors
 +
 +└── Debugging & Profiling
 +    ├── Tracing
 +    ├── Sampling
 +    ├── Race Detection
 +    ├── Deadlock Detection
 +    ├── Memory Profiling
 +    ├── Cache Profiling
 +    ├── GPU Profiling
 +    └── MPI Profiling
 +```
parallel-computing-overview.txt · Last modified: by Ivan Janevski