https://github.com/frol/completely-unscientific-benchmarks
Nim is ~18x slower than the fastest (C++) solution.
Because according to this https://github.com/frol/completely-unscientific-benchmarks/tree/master/nim compilation was done in debug mode.
It needs to be nim compile -d:release --opt:speed --out:main-nim main.nim.
Aside from the wrong compiler settings, there is also a performance regression compared to 0.17.2.
The underlying reason appears to be that in 0.18, if you have a tuple result, genericReset() is called at the beginning of the procedure, which is pretty expensive.
@jrenner
I just ran the c++ raw pointer and the main_fast example from nim on my linux computer. I got 0.21 for the C++ raw pointer with g++, and 0.18 for the main_fast nim example.
What compiler settings did you use, and which version? I just tried it and while it's a lot faster than the first Nim version it still takes almost twice as much time as the Ada and C++ versions on my machine (.283 and .276 seconds, respectively, vs. .547 for main_fast.nim). I used
nim c -d:release main_fast.nim
with nim 0.18.0, as well as nim c -d:release --opt:speed main_fast.nim
which doesn't seem to make much difference (except perhaps slow it down a little). This was on a MacBook Pro; haven't tried it on Linux yet.@cantanima: see the readme for the instructions how to compile.
On my machine, if I use nim c -d:release main_fast.nim I get 0.42 seconds, but if I use what it is recommended (nim compile -d:release --passC:-flto --passL:-s --gc:markAndSweep --out:main-nim main_fast.nim), it runs in 0.18 seconds.
Nim can be consistently faster than C++ with raw pointers.
See:
Nim is the fastest now with c target according to the updated table!
The Nim javascript target is twice as slow as plain javascript for the fast version and 4x slower for the naive main.nim version.
If it's in Windows, antivirus is to blame.
Oh! Do you think that anti-viruses don't check executable files created by other compilers?
If so, then this is the conspiracy against Nim! :)
Seems the C++ version has gotten some further optimisation and is now back to being the fastest..
I tried to make a "cheating" version that did the entire thing on compile-time and just spit out the result on runtime, but there is a strange bug in the VM that I have yet to locate which prevented this..