Hi guys. I know that all benchmarks are futile, but I would like to understand why the C counterpart of the following Nim snippet to calculate the number of primes in an interval is more than 3 times faster than Nim.
Nim version:
# bm_nim.nim
proc count_prime_numbers(n_low, n_high: int): int =
  var nbr = n_low
  if nbr mod 2 == 0:  # make sure we start with an uneven number
    nbr += 1
  while nbr < n_high:
    let limit = nbr div 2
    var is_prim = true
    for divisor in 2..<limit:
      if nbr mod divisor == 0:
        is_prim = false
        break
    if is_prim:
      result += 1
    nbr += 2
when isMainModule:
  echo(count_prime_numbers(3, 200000))
// bm_c.c
#include <stdio.h>
int count_prime_numbers(int n_low, int n_high) {
  int count_of_primes = 0;
  int is_prim, zahl, divisor, limit;
  zahl = n_low;
  if (zahl % 2 == 0) {zahl += 1;}
  while (zahl < n_high) {
    is_prim = 1;
    limit = zahl / 2;
    for (divisor = 2; divisor < limit; divisor++) {
      if (zahl % divisor == 0) {
        is_prim = 0;
        break;
      }
    }
    if (is_prim == 1) {
      count_of_primes++;
    }
    zahl += 2;
  }
  return count_of_primes;
}
int main(void)
{
    printf("%d", count_prime_numbers(3, 200000));
        return 0;
}
$ nim -d:release c bm_nim
# <compiler output omitted>
$ time ./bm_nim
17983
real    0m8.406s
user    0m8.404s
sys     0m0.000s
$ gcc bm_c.c -o bm_c -O3
$ time ./bm_c
17983
real    0m2.607s
user    0m2.604s
sys     0m0.000s
In these comparable programs, where does Nim lose performance over C? Are there options I missed?
Loving Nim, keep up the great work! I tried Rust (again) over the last couple of days, but Nim is way easier to use!
Kind regards,
Axel
Note that in C an int is 32bit and in Nim 64bit on my x86-64 on Linux, so probably on your system as well. Use int32 instead of int in Nim and performance is equal to C on my system with just that change.
The reason is that general purpose division (mod) is MUCH slower for 64bit numbers than 32bit (note that division by 2 is fast, just shift the number). See http://www.agner.org/optimize/instruction_tables.pdf , which tells us that on a Skylake CPU a 32bit DIV operation takes 26 cycles while a 64bit DIV operation takes 35-88 cycles, so it can be nearly four times as slow, fitting well into what you measure.
Another common effect is that 64bit numbers fill up the caches more quickly, but this is unlikely to be the case here since you only use very few variables.
Wow, you are both right (extra kudos for the almost simultaneous answer ;-) )!
Just changing the type from int to int32 in the Nim snippet equals out the performance. I knew that Nim could go faster. Definitely something to keep in mind!
Many thanks for the fast answers.
Kind regards, Axel