nimforum mirror - Nim with gcc 5.4 SIMD auto-vectorization

Stefan_Salewski (orginal) [2016-12-23T13:05:03+01:00] view original

Yesterday I did a very short test. Generally SIMD (Single instruction, multiple data) may work best with 16 byte aligned data and restrict modifier to indicate non overlapping data. But even without there is some SIMD available in Nim indeed.

First we make gcc output visible with &> output redirection and enable vectorization:


cat nim.cfg
path:"$projectdir"
nimcache:"/tmp/$projectdir"
gcc.options.speed = "-save-temps -pipe -march=native -O3 -ftree-vectorize -fopt-info-vec -fno-strict-aliasing &> gcc.log"

We may also specify "-fopt-info-vec-missed" to see where vectorization failed, but that will generate much noise for all the libs. "-march=native" is used to ensure optimization for current CPU, and "-save-temps" outputs assembler listings. Test with

import random
proc test =
  var a: array[128, int]
  for i in 0 .. random(128):
    a[i] = i
  echo a[7]

test()


cat gcc.log
gcc: warning: -pipe ignored because -save-temps specified
/tmp//home/stefan/simd/simd.c:59:8: note: loop vectorized

cat simd.s
        call    random_99297_4293377359
        testq   %rax, %rax
        js      .L13
        leaq    -3(%rax), %rcx
        leaq    1(%rax), %rdi
        shrq    $2, %rcx
        addq    $1, %rcx
        cmpq    $3, %rax
        leaq    0(,%rcx,4), %rdx
        jle     .L14
        vmovdqa .LC1(%rip), %ymm0
        xorl    %esi, %esi
        vmovdqa .LC3(%rip), %ymm1

"vmovdqa" seems to be SIMD instructions. So even with a non fixed upper bound for the for loop it works. I don't know if there is any benefit in real life :-)

Mirror of forum.nim-lang.org

2677 :: Nim with gcc 5.4 SIMD auto-vectorization