Context: I'm new to Nim, not new to SIMD intrinsics. I'm using this binding library: https://github.com/bsegovia/x86_simd.nim
I can successfully add two m128i values if I do something like:
let a = set1_epi32(1)
let b = set1_epi32(1)
let c = add_epi32(a,b)
and that works fine. What I am trying to learn now is how one would use this nicely with nim for more realistic example where I have arrays and wish do do SIMD operations on them:
let myArray1 = [1,2,3,4,5,6,7,8]
let myArray2 = [1,1,1,1,1,1,1,1]
let result = ???
# now how to loop over these adding 4 elements at a time by casting portions of the array to m128i?
I would be interested in SIMD examples as well...
As you may know, there is a larger and more recent SIMD lib available, unfortunately without much examples, see
https://forum.nim-lang.org/t/212#1039 (last post)
https://github.com/jcosborn/qex/tree/master/src/simd
And recently there was this nice example:
https://forum.nim-lang.org/t/3105/2
by Mr jxy
well, I would create simd aware types that have operators overloaded. And then I would cast to these type to use simd.
# let's assume `vec' is the type with simd enabled
var myArray1 = [1,2,3,4,5,6,7,8]
var myArray2 = [1,1,1,1,1,1,1,1]
let view1 = cast[ptr array[2, vec[int, 4]]](myArray1.addr)
let view2 = cast[ptr array[2, vec[int, 4]]](myArray2.addr)
view1[0] += view2[1]
view1[1] += view2[0]
But then you would still need to create the simd types. No idea there.
@jackmott
Something like multiply_avr below does the trick.
It gives a 5x speedup on my computer when compiling in normal mode.
When using -d:release it doesn't run any faster. The compiler seems to be smart enough to use SIMD for the very simple loop in multiply.
import times, x86_avx
const
N = 8_000
M = 100_000
proc multiply(a, b, d: var seq[float32]) =
for ix in 0 ..< N:
d[ix] = a[ix] * b[ix]
proc multiply_avr(a, b, d: var seq[float32]) =
for ix in countup(0, N-1, 8):
let
av = loadu_ps_256(addr a[ix])
bv = loadu_ps_256(addr b[ix])
rv = mul_ps(av, bv)
storeu_ps(addr d[ix], rv)
proc test(f: proc (a, b, d: var seq[float32])) =
var a, b, d: seq[float32]
newSeq(a, N)
newSeq(b, N)
newSeq(d, N)
for ix in 0 ..< N:
a[ix] = float32(ix)
b[ix] = float32(ix)
let t0 = cpuTime()
for t in 1 .. M:
f(a, b, d)
let tt = cpuTime() - t0
echo("Elapsed time: ", tt, " seconds")
echo "--- normal multiply ---"
test(multiply)
echo "--- avr multiply ---"
test(multiply_avr)
echo "---"