This is an old revision of the document!

SIMD (OpenMP)

A modern CPU can add four floats in a single instruction rather than one, loading them into a wide register and operating on all four lanes simultaneously. This is SIMD (single instruction, multiple data), exposed on x86 as SSE/AVX and on ARM as NEON. The compiler attempts to use these instructions automatically (auto-vectorisation), but it can be blocked by pointer aliasing, non-unit strides, or conditionals it cannot prove safe. #pragma omp simd is an explicit assertion that a loop is safe to vectorise, allowing the compiler to emit SIMD instructions even when it would otherwise be cautious.

#pragma omp simd reduction(+:sum)
for (int i = 0; i < N; i++) {
    sum += a[i] * b[i];
}

simd can be combined with parallel for as #pragma omp parallel for simd to both distribute iterations across threads and vectorise the iterations within each thread. The aligned(ptr : 32) clause tells the compiler that ptr is aligned to a 32-byte boundary, which is a prerequisite for some AVX load/store instructions.

Ivan's wiki

This is an old revision of the document!

Table of Contents

SIMD (OpenMP)

**This is an old revision of the document!**

Table of Contents

SIMD (OpenMP)

This is an old revision of the document!