nimforum mirror - Takeuchi numbers: Performance comparison with Java

hellonico (orginal) [2015-04-17T07:59:44+02:00] view original

I stumbled across Takeuchi numbers today, and quickly had a go with both

I actually found that nim was definitely slower for even small values.

Anything I am obviously doing bad ? Or is there a way to get better performance from the nim impl ?

gokr (orginal) [2015-04-17T10:14:31+02:00] view original

I compiled and ran both on my Ubuntu with the number 14. Both seem to do it in the same speed - 18 seconds for number 14. Did you compile Tak.nim with -d:release ?


gokr@yoda:~$ time ./Tak
2615606677

real	0m18.207s
user	0m18.079s
sys	0m0.003s
gokr@yoda:~$ time java Tak
2615606677

real	0m18.412s
user	0m18.248s
sys	0m0.016s
gokr@yoda:~$ java -version
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
gokr@yoda:~$ nim --version
Nim Compiler Version 0.10.3 (2015-04-01) [Linux: amd64]
Copyright (c) 2006-2015 by Andreas Rumpf

active boot switches: -d:release

def (orginal) [2015-04-17T10:53:15+02:00] view original

Clang is a bit slow here, but GCC is doing better (all for 14):


$ nim -d:release --cc:clang c Tak && time ./Tak
2615606677
./Tak  52.48s user 0.11s system 99% cpu 52.603 total
$ nim -d:release --cc:gcc c Tak && time ./Tak
2615606677
./Tak  33.14s user 0.08s system 99% cpu 33.235 total
$ java -version && javac Tak.java && time java Tak
java version "1.7.0_71"
OpenJDK Runtime Environment (IcedTea 2.5.3) (Gentoo package icedtea-7.2.5.3)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
2615606677
java Tak  42.31s user 0.07s system 100% cpu 42.373 total

Varriount (orginal) [2015-04-17T18:32:20+02:00] view original

Is there any way we can make using -d:release more obvious?

Jehan (orginal) [2015-04-17T19:28:21+02:00] view original

Varriount: Is there any way we can make using -d:release more obvious?

I'd be reluctant to advertise -d:release more. If anything, I think it should be used less, not more (and I'd be in favor of having a switch that keeps at the very least memory safety in place, even -- heck, especially -- for deployed code). I understand that getting obsessed over microbenchmarks that have little to no relevance for the needs of most actual applications is cool these days, but you are trading reliability and the ability to analyze failures away for performance, often unnecessarily [1]. If anything, this should be documented with a clear explanation of the tradeoffs you are making. Right now, in an attempt to win benchmark games, people often advise others to use -d:release without the warning that you're not only getting the performance of C/C++, but also gain the ability to shoot yourself in the foot (through buffer overflows etc.) as in C/C++. If you want to advertise -d:release, it needs to be clear that this is not and cannot be a free lunch.

[1] Even where you do need the performance, you may be better off turning checks off selectively for performance hotspots rather than globally.

Jehan (orginal) [2015-04-17T21:47:21+02:00] view original

I'll add to the above that templates/macros for convenient local optimization/checks might be a good addition to the standard library. E.g. something like this:

template hotspot(body: expr): expr =
  when not defined(debugging):
    {.push obj_checks: off, field_checks: off,
           bound_checks: off, overflow_checks: off,
           assertions: off, stacktrace: off,
           linetrace: off, debugger: off.}
  else:
    {.push.}
  body
  {.pop.}

template memsafe(body: expr): expr =
  {.push obj_checks: on, field_checks: on,
         bound_checks: on.}
  body
  {.pop.}

proc main {.hotspot, memsafe.} =
  var a: array[10, int]
  for i in 1..1000000000:
    a[i mod 10] = 0
main()

Varriount (orginal) [2015-04-18T10:08:43+02:00] view original

Oh,very nice - that would indeed be something quite handy to have in the standard library.

hellonico (orginal) [2015-04-20T07:28:31+02:00] view original

In this very example, it seems that the best solution would be memoization implemented at the language level. (cf: http://www.reddit.com/r/nim/comments/2w1t5l/project_euler_in_nim/)

Any way to plug that in quickly in the current nim version ?

Mirror of forum.nim-lang.org

1138 :: Takeuchi numbers: Performance comparison with Java