# SIMD (OpenMP)
A modern CPU can add four floats in a single instruction rather than one, loading them into a wide register and operating on all four lanes simultaneously. This is **SIMD** (single instruction, multiple data), exposed on x86 as SSE/AVX and on ARM as NEON. The compiler attempts to use these instructions automatically (auto-vectorisation), but it can be blocked by pointer aliasing, non-unit strides, or conditionals it cannot prove safe. `#pragma omp simd` is an explicit assertion that a loop is safe to vectorise, allowing the compiler to emit SIMD instructions even when it would otherwise be cautious.

```c
#pragma omp simd reduction(+:sum)
for (int i = 0; i < N; i++) {
    sum += a[i] * b[i];
}
```

`simd` can be combined with `parallel for` as `#pragma omp parallel for simd` to both distribute iterations across threads and vectorise the iterations within each thread. The `aligned(ptr : 32)` clause tells the compiler that `ptr` is aligned to a 32-byte boundary, which is a prerequisite for some AVX load/store instructions.