Heya.
For quite a while now, I've been using nimsimd for any and all SIMD acceleration I add in my projects. It does the job well, but I've always dreamed of something akin to Google's Highway library, but in Nim. Think: SIMD operations that boil down to 1-3 SIMD instructions in your code with no function calls (when appropriate) and work across x86-64 and ARM64 using nimsimd.
Hence, I've been working on Overdrive for some time now. It essentially uses compile-time dispatch to turn high-level generic SIMD instructions like adding two registers' (or Vector[T], as they're known here) contents together. T can be anything from a select range of 8-bit, 16-bit, 32-bit and 64-bit signed or unsigned numeric types. Proper support for floats will be added some day. I didn't work on it now, mostly because I've not required it for the work I generally do (text processing and parsing).
Here's a simple find(string, char): int implementation using Overdrive.
import std/bitops
import pkg/overdrive
func ofind*(s: string, c: char): int =
var target: Vector[char]
target.store(c)
var i = 0
let cap = target.capacity # bytes
while i + cap <= s.len:
var blk: Vector[char]
blk.store(s[i].addr)
let masked = blk.mask(target)
if masked != 0:
let offset = countTrailingZeroBits(masked)
return i + offset
i += cap
while i < s.len:
if s[i] == c:
return i
inc i
return -1
The same code can work across AMD64 (SSE2, SSE3, SSE4.1, AVX2) and AArch64 (NEON). The following compiler flags decide which instruction set is used:
Overdrive fully replaced nimsimd in nim-url, a WHATWG URL parser for its fast-path routines which greatly benefit from SIMD acceleration.
Overdrive is not in the Nimble index yet, so it can be installed like this:
$ neo add gh:xTrayambak/overdrive
$ nimble add https://github.com/xTrayambak/overdrive
The source can obviously be found here. Enjoy! :^)
Instead of
var target: Vector[char]
target.store(c)
add some mechanism so that we can write:
let target = asVec(c)