Site Tools


simd-openmp

**This is an old revision of the document!**

Table of Contents

SIMD (OpenMP)

A modern CPU can add four floats in a single instruction rather than one, loading them into a wide register and operating on all four lanes simultaneously. This is SIMD (single instruction, multiple data), exposed on x86 as SSE/AVX and on ARM as NEON. The compiler attempts to use these instructions automatically (auto-vectorisation), but it can be blocked by pointer aliasing, non-unit strides, or conditionals it cannot prove safe. #pragma omp simd is an explicit assertion that a loop is safe to vectorise, allowing the compiler to emit SIMD instructions even when it would otherwise be cautious.

#pragma omp simd reduction(+:sum)
for (int i = 0; i < N; i++) {
    sum += a[i] * b[i];
}

simd can be combined with parallel for as #pragma omp parallel for simd to both distribute iterations across threads and vectorise the iterations within each thread. The aligned(ptr : 32) clause tells the compiler that ptr is aligned to a 32-byte boundary, which is a prerequisite for some AVX load/store instructions.

simd-openmp.1781169582.txt.gz ยท Last modified: by 127.0.0.1