nimforum mirror - Calling C compiler builtins -- it is fast!

Stefan_Salewski (orginal) [2016-12-27T21:43:30+01:00] view original

Recently I read: http://lemire.me/blog/2014/02/14/getting-good-performance-in-go-by-rewriting-parts-in-c/

So I thought it was time to try it in Nim:

import random
import times

# https://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc/Other-Builtins.html
# Nim/lib/system/arithm.nim
when sizeof(clong) == 8:
  proc popcount[T: int64|int](a: T): cint {.
    importc: "__builtin_popcountl", nodecl, nosideeffect.}

proc test =
  var
    s: array[1024, int]
    c: cint
    d: int
    t: float
  
  for i in 0 .. s.high: s[i] = random(8) # = 3
  
  t = cpuTime()
  for i in s:
    c += popcount(i)
  echo "Time for popcount(): ", cpuTime() - t
  echo c
  
  t = cpuTime()
  for i in s:
    d += i
  echo "Time for add: ", cpuTime() - t
  echo d

test()


$ nim c -d:release pc.nim
$ ./pc
Time for popcount(): 3.999999999999989e-06
1492
Time for add: 1.000000000000024e-06
3449

Less than 4 ns for a popcount operation.

[EDIT]

As Prof Lemire just told me:

So on a 4GHz processor, you ought to get down to 0.25 ns/popcnt.

And indeed, I forgot gcc option -march=native on my box. With that and -O3 popcount loop is more than two times faster than above, very close to plain add operation.

Mirror of forum.nim-lang.org

2681 :: Calling C compiler builtins -- it is fast!