Yesterday I did a very short test. Generally SIMD (Single instruction, multiple data) may work best with 16 byte aligned data and restrict modifier to indicate non overlapping data. But even without there is some SIMD available in Nim indeed.
First we make gcc output visible with &> output redirection and enable vectorization:
cat nim.cfg
path:"$projectdir"
nimcache:"/tmp/$projectdir"
gcc.options.speed = "-save-temps -pipe -march=native -O3 -ftree-vectorize -fopt-info-vec -fno-strict-aliasing &> gcc.log"
We may also specify "-fopt-info-vec-missed" to see where vectorization failed, but that will generate much noise for all the libs. "-march=native" is used to ensure optimization for current CPU, and "-save-temps" outputs assembler listings. Test with
import random
proc test =
var a: array[128, int]
for i in 0 .. random(128):
a[i] = i
echo a[7]
test()
cat gcc.log
gcc: warning: -pipe ignored because -save-temps specified
/tmp//home/stefan/simd/simd.c:59:8: note: loop vectorized
cat simd.s
call random_99297_4293377359
testq %rax, %rax
js .L13
leaq -3(%rax), %rcx
leaq 1(%rax), %rdi
shrq $2, %rcx
addq $1, %rcx
cmpq $3, %rax
leaq 0(,%rcx,4), %rdx
jle .L14
vmovdqa .LC1(%rip), %ymm0
xorl %esi, %esi
vmovdqa .LC3(%rip), %ymm1
"vmovdqa" seems to be SIMD instructions. So even with a non fixed upper bound for the for loop it works. I don't know if there is any benefit in real life :-)