With this proc:
proc rfib(n: int): int =
if n < 2:
return n
rfib(n - 1) + rfib(n - 2)
the compiled binary takes more than 10 minutes to compute rfib(50). With similiar code in Go, it's 1 minute, and with Zig, it's 39 seconds. If I use a non-recursive loop-based version, Nim's times are equal to Zig and faster than Go, being all within 1 to 2 milliseconds. So I wonder why Nim's recursion is orders of magnitude slower than the others. I'm evaluating languages for developing recursive descent parsers, so recursion speed seems really important.Language | Time in s | compiled with |
---|---|---|
Nim | 0.456 | nim cpp -d:release fib.nim |
Nim | 3.845 | nim c -d:release fib.nim |
C++ | 3.843 | g++ -O3 -o fib fib.cpp |
C | 3.902 | gcc -O3 -o fib fib.c |
Having to produce simple RST tables instead of grid table or markdown table is really annoying btw
Having to produce simple RST tables instead of grid table or markdown table is really annoying btw
You don't have to do it anymore on Nim devel, but the forum hasn't been updated yet.
@dude_the_builder - as explained in the thread @mratsim links to and mentioned here the relevant family of gcc optimizations is partial recursion inlining. You can confirm this with call counts (gcc -pg & gprof) and closed form formulas for how many calls to expect and how many get used in various impls. You may find it interesting that the number of calls is very closely related to the value of the result.
The basic trick, replacing f(n)=f(n-1)+f(n-2) with f(n)=f(n-2)+2*f(n-3)+f(n-4), can be done manually in any language (but several more levels to match what gcc is likely doing..pretty sure you get binomial/Pascal's Triangle coefficients). When I looked at the assembly gcc-generated assembly was even adding those up cleverly.
Since the number of total calls in the usual recursion is exponential in n (for the non-inlined impl) but the number of terms grows only by one each substituion/n decrement, this can make huge changes in total work done. (Note this optimization is distinct from "memoization" for small arguments at the bottom of the recursion and any double recursion is an awful way to calculate the series, but almost all know this and "the whole point" is the inefficiency.)
Because auto-inlining is finicky/sensitive to internal compiler code complexity metrics, you should be careful to report your gcc -v and exact compilation options when discussing this (and maybe say which CPU you are using if -march=native is in play). nim c --verbosity:2 can show you exactly how nim c invokes gcc.
You may have better luck steering gcc with gcc --param=max-inline-recursive-depth-auto and friends than when I tried. As far as I know gcc is the only compiler doing this (not clang/Intel/Microsoft). It would be interesting to hear otherwise, especially if the others are easier to control. In principle, the right compiler flags may make the speed-up almost arbitrarily larger, like 1000x faster.
I doubt even 1000x would get the message out to count calls since there are a zillion examples going back to the dawn of programming to fight against. Auto-memoization at minor space cost is about as simple a transformation as inlining/many other modern C compiler optimizations and gives an even bigger O(exp(n)/n) speed-up. I know of no compilers auto-memoizing yet, but the risk has always been there.
I doubt we need worry about compilers "solving the recursion" to get the closed form formula for Fibonacci or similar, but that is also possible theoretically, at least for linear recurrences with constant coefficients. (Knowing how to evaluate the formulae accurately enough to be equivalent may be hard enough to prevent any automatic optimization...).
Anyway, the big lesson here is to not just assume the work you think should happen was actually done. This is a common benchmarking mistake, sometimes even made by experts. Measuring is better, and in this case many profiling frameworks from the 1980s & 1990s measure exactly what you need to know.
Though we all write mostly in jest, the often overlooked qualitative text follows 3 numerical details. So, if it were instead
Hint: UNOPTIMIZED DEBUG BUILD; 88104 lines; 0.857s; 96.652MiB peakmem;..
then we might not even need blinking red. ;-) If we could reduce this mistake by even 20% in the wild it would be worth the experiment. Hard to measure, of course.This comes up in at least several times per year in the forum alone, and an unknown number of times elsewhere.
My thought was that a bunch of numbers are more likely to make people "ignore the tail", attention-wise. I do think blinking/all caps red/inverse text would absolutely make them look. That might be too annoying by making everyone look all.the.time. I always set hint[SuccessX]=off anyway. So, it doesn't even print out for me.
Even having looked/read, they may still not understand. Maybe they all come from Python/Ruby/Perl/.. where compiler options are not a thing and these people are fundamentally beyond help, but I was trying to be more constructive than dismissive since it comes up so much.
It's often hard for seasoned people to recall what it was like to be uninitiated. In that light, @dude_the_builder could maybe help us by weighing in on what would have worked on him, just a couple days ago.
So, retracing my steps, I think there are a few confusion points for newcomers with experience in other relatively "new" languages. In my case, coming from Go, then Rust, then Crystal, I am used to the language having a primary tool that handles all things related to the build, run, test, benchmark process. In go its the go command itself, in Rust it's cargo, and in Crystal a combination of the crystal command and shards. With this mental baggage, I looked for the same in Nim's case, and found nimble. I set up a project with nimble init and used nimble build to compile it. The output says nothing about debug or release builds:
Verifying dependencies for [email protected]
Building fib/fib using c backend
Running nimble --help doesn't mention any release flags either, but does point to their documentation page, which does say that nimble build produces a debug build and to build in release mode use nimble install.
As for the nim command itself; as stated above, the fact that a debug build is the default is a little buried in that last Hint line. nim --help only mentions the use of -d:release as a note on the --opt flag:
--opt:none|speed|size optimize not at all or for speed|size
Note: use -d:release for a release build!
which in my opinion is easy to overlook and kind of lessens its importance; almost mentioned like an afterthought. What I would have expected was a --release flag similar to Rust and Crystal. Although a debug build as default is the norm in most languages I've used, the existence of a --release flag is common too.
Although I suspect this type of experience is greatly subjective, depending on mental baggage from previous languages as in my case, I hope it helps in some way to improve usability and reduce future confusion.
I greatly appreciate all the discussion in this thread. I've learned quite a bit and that's a really important part of a programming language! Believe me, I've been in quite a few language forums where "hostile environment" for newcomers is sadly the norm. I'm glad that's not the case here. :-)
This is done this way because -d defines symbols at compile time, allowing you to do things like this:
proc debugEcho(x: string) =
when defined(release): discard
else: echo x
or this:
const apiToken {.strdefine.}: string = ""
# nim c -d:apiToken="123-456-7890" example.nim
Sure, you could have something like the isMainModule variable (isReleaseBuild?) to do this, but why make something new when -d does the job just fine?
I've added your experience to the developer stories here: https://github.com/nim-lang/RFCs/issues/300#issuecomment-769410401
Nim tooling will be a focus of 2021 (https://github.com/nim-lang/RFCs/milestone/5) though the hardest part might very well be where to start.