I've been looking at array access and noticed that C++ seems to yield code that is 20x-30x faster for array access.
## compiled with: nim -d:r c filename
when isMainModule:
const N = 20_000_000;
var data {.noinit.}: array[N,int]
# custom init
for i in 0 ..< N:
data[i] = i
# busy work
for r in 1 .. 50:
for i in 3 ..< N-1:
data[i] = (data[i+1]-data[i])+data[i-1]
echo "result: ",data[N-2]
// compiled with: c++ -O3 -o filename filename.cpp
#include <iostream>
int main()
{
const int N = 20000000;
long long data[N];
// custom init
for (long long i=0; i<N; i++) {
data[i] = i;
}
// busy work
for (int r=0; r<50; r++) {
for (long long i=3; i<N-1; i++)
{
data[i] = (data[i+1]-data[i])+data[i-1];
}
}
std::cout << "result: " << data[N-2] << std::endl;
return ( 0 );
}
I checked the c++ with the godbolt service to make sure the code wasn't being optimized away for some weird reason.The time I got for the modified code was about the same:
/tmp $>time ./temp
result: 19999998
real 0m0.847s
user 0m0.792s
sys 0m0.048s
/tmp $>time ./temp_cpp
result: 19999998
real 0m0.905s
user 0m0.851s
sys 0m0.051s
And the code I used:
## compiled with: nim -d:r c filename
## nim v 0.19.9
proc main =
const N = 20_000_000
var data = newSeqUninitialized[int](N)
# custom init
for i in 0 ..< N:
data[i] = i
# busy work
for r in 1 .. 50:
for i in 3 ..< N-1:
data[i] = (data[i+1]-data[i])+data[i-1]
echo "result: ",data[N-2]
main()
// compiled with: c++ -O3 -o filename filename.cpp
#include <iostream>
const int N = 20000000;
int main()
{
size_t* data = new size_t[N];
// custom init
for (size_t i=0; i<N; i++) {
data[i] = i;
}
// busy work
for (size_t r=0; r<50; r++) {
for (size_t i=3; i<N-1; i++)
{
data[i] = (data[i+1]+data[i-1]) / 2;
}
}
std::cout << "result: " << data[N-2] << std::endl;
return ( 0 );
}
If you can't break your habit, you can always add a few lines near the top (before all the release-dependent switching) of your nim.cfg:
@if r: #Allow short alias -d:r to activate fast release mode
define:release
@end
Perhaps somewhere you picked up a $HOME/.config/nim.cfg that does exactly this, and then lost it somehow moving between accounts/machines or maybe nim.cfg to .nims? There's surely also some similar Nim Script/.nims variantI really like that macro, because it is a nice example for explaining the power of Nim macros to new users. It is not too complicated, and it is easy to understand the usecase.
But note that using int32 data type is not that hard without it, if really desired: At most 3 locations would need a fix:
proc doit =
var a = [3i32, 3, 6]
for i in 0.int32 ..< 3:
echo a[i]
doAssert i is int32
doAssert a[i] is int32
for i in low(a).int32 .. high(a):
echo a[i]
doAssert i is int32
doAssert a[i] is int32
doit()
@mratsim is probably intending to refer to the .. including 50 in Nim while the C for with a < excludes 50 doing about 2% less work, but the terseness and style of of his comment may leave the wrong impression.
for i in 0..50: echo i
indeed compiles down (in release mode) to simply:
NI res = ((NI) 0);
while (1) {
if (!(res <= ((NI) 50))) goto LA3;
res += ((NI) 1);
}
LA3: ;
plus some stuff only related to my choice of echo for the body (which I removed for clarity). Any decent optimizing C compiler should treat those two ways to spell the loop (@mratsim's for and the above while) the same.
TL;DR the extra time is from the bounds of the iteration, not the language or iterator overhead.
@mratsim is probably intending to refer to the .. including 50 in Nim while the C for with a < excludes 50
I doubt that because he has written 1 .. 50 ;)
With such a brief comment, it's hard to know which is why I said "probably intending". Only one person knows. ;) Maybe he did think iterators cost more.
You are right I did misread his 1..50 as 0..50 {after looking at the first version of the ggibson Nim code, not the 2nd where he confusingly switched to 1..50 not paralleling the C as well, but correcting his amount-of-work mismatch}.
@cblake True, true. I was simply enjoying that I could write "one through fifty, so fifty times" very simply and easy to read in nim, whereas I just relied on trained C eyes to interpret the C code of its intended meaning "zero up until 50, meaning 50 times". Perhaps nim's countup() would have been even more appropriate.
@Stefan_Salewski You're making fun of my specific example! :) Yes adding i32 wasn't that cumbersome in THAT example, but I'm sure you could imagine scenarios where it would get more annoying - this was only an illustration of how it works. Also, you have to be sure that you didn't miss one, not to mention you'll already have to annotate by hand any relevant var types in the signature since the macro doesn't handle that part.
nim -d:r c foo.nim
...
Hint: operation successful (12405 lines compiled; 0.251 sec total; 16.414MiB peakmem; Debug Build) [SuccessX]
That's still a "Debug Build". You need -d:release.# nim: et
## compiled with: nim -d:release c filename
## nim v 0.19.9
proc main() =
const N = 20_000_000;
#var data {.noinit.}: array[N,int32]
var data {.noinit.} = newSeq[int32](N)
# custom init
for i in 0'i32 ..< N:
data[i] = i
# busy work
for r in 1 .. 49:
for i in 3 ..< N-1:
data[i] = (data[i-1]+data[i+1]) div 2
echo "result: ",data[N-2]
when isMainModule:
main()
$ time ./speed-nim
result: 19999998
real 0m1.576s
user 0m1.527s
sys 0m0.035s
$ time ./speed-cpp.exe
result: 19999998
real 0m1.591s
user 0m1.543s
sys 0m0.035s
Does anyone know whether {.noInit.} applies to newSeq()? Just curious?