nimforum mirror - Can I access arrays faster than this?

ggibson (orginal) [2019-03-05T04:03:02+01:00] view original

I've been looking at array access and noticed that C++ seems to yield code that is 20x-30x faster for array access.

## compiled with: nim -d:r c filename
when isMainModule:
  const N = 20_000_000;
  var data {.noinit.}: array[N,int]
  # custom init
  for i in 0 ..< N:
    data[i] = i
  # busy work
  for r in 1 .. 50:
    for i in 3 ..< N-1:
      data[i] = (data[i+1]-data[i])+data[i-1]
  echo "result: ",data[N-2]

// compiled with: c++ -O3 -o filename filename.cpp
#include <iostream>
int main()
{
  const int N = 20000000;
  long long data[N];
  // custom init
  for (long long i=0; i<N; i++) {
    data[i] = i;
  }
  // busy work
  for (int r=0; r<50; r++) {
    for (long long i=3; i<N-1; i++)
    {
      data[i] = (data[i+1]-data[i])+data[i-1];
    }
  }
  std::cout << "result: " << data[N-2] << std::endl;
  return ( 0 );
}

I checked the c++ with the godbolt service to make sure the code wasn't being optimized away for some weird reason.

Varriount (orginal) [2019-03-05T04:37:39+01:00] view original

Hm, when I compile the C++ code, I get a segfault.

ggibson (orginal) [2019-03-05T05:05:46+01:00] view original

That's pretty weird! It works for me Clang v5 and GCC 6.4 on 64-bit CentOS 7.

Varriount (orginal) [2019-03-05T07:20:38+01:00] view original

No.. I'm running it on a MacBook with 16GB of ram

ggibson (orginal) [2019-03-05T07:32:22+01:00] view original

ookay, I think it was a memory fragmentation error, so probably it would have gone away if you had rebooted. I restructured the example to grab heap memory instead. Should work now!

Ward (orginal) [2019-03-05T07:45:39+01:00] view original

try "nim c -d:release --opt:speed"

Varriount (orginal) [2019-03-05T07:51:43+01:00] view original

Ok, so I was able to get the Nim and C++ examples to compile when using the heap (rather than the stack).

Comparing the code, I think the difference in performance is caused by two things:

The Nim code isn't in a main procedure

The C++ code is using int`s (32 bits) while Nim is using `int`s (64 bits). Nim's `int type is always the size of a pointer, so you can use it for indexing arrays.

The time I got for the modified code was about the same:


/tmp $>time ./temp
result: 19999998

real	0m0.847s
user	0m0.792s
sys	0m0.048s

/tmp $>time ./temp_cpp
result: 19999998

real	0m0.905s
user	0m0.851s
sys	0m0.051s

And the code I used:

## compiled with: nim -d:r c filename
## nim v 0.19.9

proc main =
  const N = 20_000_000
  var data = newSeqUninitialized[int](N)
  # custom init
  for i in 0 ..< N:
    data[i] = i
  # busy work
  for r in 1 .. 50:
    for i in 3 ..< N-1:
      data[i] = (data[i+1]-data[i])+data[i-1]
  echo "result: ",data[N-2]

main()

// compiled with: c++ -O3 -o filename filename.cpp
#include <iostream>
const int N = 20000000;
int main()
{
  size_t* data = new size_t[N];
  // custom init
  for (size_t i=0; i<N; i++) {
    data[i] = i;
  }
  // busy work
  for (size_t r=0; r<50; r++) {
    for (size_t i=3; i<N-1; i++)
    {
      data[i] = (data[i+1]+data[i-1]) / 2;
    }
  }
  std::cout << "result: " << data[N-2] << std::endl;
  return ( 0 );
}

ggibson (orginal) [2019-03-05T07:53:46+01:00] view original

Hats off to you, sir. Thank you for pointing that out. I was under the impression opt:speed was default - nope.

cblake (orginal) [2019-03-05T12:25:30+01:00] view original

If you can't break your habit, you can always add a few lines near the top (before all the release-dependent switching) of your nim.cfg:

@if r:   #Allow short alias -d:r to activate fast release mode
  define:release
@end

Perhaps somewhere you picked up a $HOME/.config/nim.cfg that does exactly this, and then lost it somehow moving between accounts/machines or maybe nim.cfg to .nims? There's surely also some similar Nim Script/.nims variant

ggibson (orginal) [2019-03-05T15:38:21+01:00] view original

That's a neat trick! Thanks for mentioning it. I'll have to learn about how the non-.nim files all actually work as I've been avoiding them.

Stefan_Salewski (orginal) [2019-03-07T08:37:47+01:00] view original

I really like that macro, because it is a nice example for explaining the power of Nim macros to new users. It is not too complicated, and it is easy to understand the usecase.

But note that using int32 data type is not that hard without it, if really desired: At most 3 locations would need a fix:

proc doit =
  var a = [3i32, 3, 6]
  for i in 0.int32 ..< 3:
    echo a[i]
    doAssert i is int32
    doAssert a[i] is int32
  for i in low(a).int32 .. high(a):
    echo a[i]
    doAssert i is int32
    doAssert a[i] is int32

doit()

mratsim (orginal) [2019-03-07T16:42:23+01:00] view original

Also for r in 1 .. 50: does more work than C for (size_t r=0; r<50; r++) {

cblake (orginal) [2019-03-07T19:00:17+01:00] view original

@mratsim is probably intending to refer to the .. including 50 in Nim while the C for with a < excludes 50 doing about 2% less work, but the terseness and style of of his comment may leave the wrong impression.

for i in 0..50: echo i

indeed compiles down (in release mode) to simply:

NI res = ((NI) 0);
while (1) {
    if (!(res <= ((NI) 50))) goto LA3;
    res += ((NI) 1);
}
LA3: ;

plus some stuff only related to my choice of echo for the body (which I removed for clarity). Any decent optimizing C compiler should treat those two ways to spell the loop (@mratsim's for and the above while) the same.

TL;DR the extra time is from the bounds of the iteration, not the language or iterator overhead.

miran (orginal) [2019-03-07T22:23:02+01:00] view original

@mratsim is probably intending to refer to the .. including 50 in Nim while the C for with a < excludes 50

I doubt that because he has written 1 .. 50 ;)

cblake (orginal) [2019-03-07T23:17:32+01:00] view original

With such a brief comment, it's hard to know which is why I said "probably intending". Only one person knows. ;) Maybe he did think iterators cost more.

You are right I did misread his 1..50 as 0..50 {after looking at the first version of the ggibson Nim code, not the 2nd where he confusingly switched to 1..50 not paralleling the C as well, but correcting his amount-of-work mismatch}.

ggibson (orginal) [2019-03-07T23:46:52+01:00] view original

@cblake True, true. I was simply enjoying that I could write "one through fifty, so fifty times" very simply and easy to read in nim, whereas I just relied on trained C eyes to interpret the C code of its intended meaning "zero up until 50, meaning 50 times". Perhaps nim's countup() would have been even more appropriate.

@Stefan_Salewski You're making fun of my specific example! :) Yes adding i32 wasn't that cumbersome in THAT example, but I'm sure you could imagine scenarios where it would get more annoying - this was only an illustration of how it works. Also, you have to be sure that you didn't miss one, not to mention you'll already have to annotate by hand any relevant var types in the signature since the macro doesn't handle that part.

cdunn2001 (orginal) [2019-03-11T01:32:56+01:00] view original


nim -d:r c foo.nim
...
Hint: operation successful (12405 lines compiled; 0.251 sec total; 16.414MiB peakmem; Debug Build) [SuccessX]

That's still a "Debug Build". You need -d:release.

cdunn2001 (orginal) [2019-03-11T01:38:54+01:00] view original

# nim: et
## compiled with: nim -d:release c filename
## nim v 0.19.9
proc main() =
  const N = 20_000_000;
  #var data {.noinit.}: array[N,int32]
  var data {.noinit.} = newSeq[int32](N)
  # custom init
  for i in 0'i32 ..< N:
    data[i] = i
  # busy work
  for r in 1 .. 49:
    for i in 3 ..< N-1:
      data[i] = (data[i-1]+data[i+1]) div 2
  echo "result: ",data[N-2]
when isMainModule:
  main()


$ time ./speed-nim
result: 19999998

real	0m1.576s
user	0m1.527s
sys	0m0.035s

$ time ./speed-cpp.exe
result: 19999998

real	0m1.591s
user	0m1.543s
sys	0m0.035s

Does anyone know whether {.noInit.} applies to newSeq()? Just curious?

miran (orginal) [2019-03-11T06:53:19+01:00] view original

Does anyone know whether {.noInit.} applies to newSeq()?

You could use newSeqUninitialized.

Mirror of forum.nim-lang.org

4696 :: Can I access arrays faster than this?