Hi guys. I know that all benchmarks are futile, but I would like to understand why the C counterpart of the following Nim snippet to calculate the number of primes in an interval is more than 3 times faster than Nim.
Nim version:
# bm_nim.nim
proc count_prime_numbers(n_low, n_high: int): int =
var nbr = n_low
if nbr mod 2 == 0: # make sure we start with an uneven number
nbr += 1
while nbr < n_high:
let limit = nbr div 2
var is_prim = true
for divisor in 2..<limit:
if nbr mod divisor == 0:
is_prim = false
break
if is_prim:
result += 1
nbr += 2
when isMainModule:
echo(count_prime_numbers(3, 200000))
C version:
// bm_c.c
#include <stdio.h>
int count_prime_numbers(int n_low, int n_high) {
int count_of_primes = 0;
int is_prim, zahl, divisor, limit;
zahl = n_low;
if (zahl % 2 == 0) {zahl += 1;}
while (zahl < n_high) {
is_prim = 1;
limit = zahl / 2;
for (divisor = 2; divisor < limit; divisor++) {
if (zahl % divisor == 0) {
is_prim = 0;
break;
}
}
if (is_prim == 1) {
count_of_primes++;
}
zahl += 2;
}
return count_of_primes;
}
int main(void)
{
printf("%d", count_prime_numbers(3, 200000));
return 0;
}
compiled and run with:
$ nim -d:release c bm_nim
# <compiler output omitted>
$ time ./bm_nim
17983
real 0m8.406s
user 0m8.404s
sys 0m0.000s
$ gcc bm_c.c -o bm_c -O3
$ time ./bm_c
17983
real 0m2.607s
user 0m2.604s
sys 0m0.000s
Some remarks:
In these comparable programs, where does Nim lose performance over C? Are there options I missed?
Loving Nim, keep up the great work! I tried Rust (again) over the last couple of days, but Nim is way easier to use!
Kind regards,
Axel
Note that in C an int is 32bit and in Nim 64bit on my x86-64 on Linux, so probably on your system as well. Use int32 instead of int in Nim and performance is equal to C on my system with just that change.
The reason is that general purpose division (mod) is MUCH slower for 64bit numbers than 32bit (note that division by 2 is fast, just shift the number). See http://www.agner.org/optimize/instruction_tables.pdf , which tells us that on a Skylake CPU a 32bit DIV operation takes 26 cycles while a 64bit DIV operation takes 35-88 cycles, so it can be nearly four times as slow, fitting well into what you measure.
Another common effect is that 64bit numbers fill up the caches more quickly, but this is unlikely to be the case here since you only use very few variables.
Wow, you are both right (extra kudos for the almost simultaneous answer ;-) )!
Just changing the type from int to int32 in the Nim snippet equals out the performance. I knew that Nim could go faster. Definitely something to keep in mind!
Many thanks for the fast answers.
Kind regards, Axel