Hi guys, I've written up a longish blog post about my recent experiments with Nim performance tuning for numerical calculations. I guess this is pretty basic stuff for people who do C/C++/assembly programming on a daily basis, but I actually learned a lot while doing this exercise and encountered some really surprising behaviour. So I think the article might be useful for other people less experienced with low-level programming too.
http://blog.johnnovak.net/2017/04/22/nim-performance-tuning-for-the-uninitiated/
(I'm hoping it's OK to announce personal blog posts related to Nim in this forum.)
Cheers
John
@JohnNovak
I just read it some parts of it but I learned things from your article. Thank you very much :)
Going to reread it again for other parts ;)
since it is basically me who took over the nim_glm maintenance. The version you fixed is very old and not the version anymore that I maintain currently anymore. I should mention that I never did performance benchmarks on that library, only correctness tests. I only applied my knowledge of what is necessary to potentionally get good performance I never actually enshured the performance was optimal. But it's good to see, that it is not horrible.
One thing that I haven't seen you talk about is alignment. Have you tried to use the Vec4 type in c++ glm instead of the Vec3 type? To my knowledge only a self aligned vec4 or vec2 type can be fully optimized to use SIMD instructions. That could give c++ quite some performance that the Nim version doesn't have.
since it is basically me who took over the nim_glm maintenance. The version you fixed is very old and not the version anymore that I maintain currently anymore. I should mention that I never did performance benchmarks on that library, only correctness tests. I only applied my knowledge of what is necessary to potentionally get good performance I never actually enshured the performance was optimal. But it's good to see, that it is not horrible.
One thing that I haven't seen you talk about is alignment. Have you tried to use the Vec4 type in c++ glm instead of the Vec3 type? To my knowledge only a self aligned vec4 or vec2 type can be fully optimized to use SIMD instructions. That could give c++ quite some performance that the Nim version doesn't have.
So if we want to inline functions across module boundaries, we need to explicitly tell the Nim compiler about so it can “manually” inline them into the generated C files.
Isn't the inline pragma stand for whether the function would be inline -ed or not?
The sentence is just above this section
Isn't the inline pragma
Sure. But I was more confused about the other statement:
Many compilers do not support link time optimisations, at least not by default, and even if they do, Nim doesn’t make use of such features yet
gcc and clang support LTO well, and Nim makes use of it of course. For gcc I have
$ cat nim.cfg
path:"$projectdir"
nimcache:"/tmp/$projectdir"
gcc.options.speed = "-march=native -O3 -flto -fstrict-aliasing"
and it works out of the box. For clang we need the gold linker to make LTO work.
Inline pragma copies the C function into all the involved C files, to ensure that inlinening works over module boundaries even without LTO enabled for the C compiler.
But I still have to read the post more carefully, maybe I misunderstand something...
[EDIT]
I just read it.
Most surprising for me is the good performance of Java and especially Java-Script. I really had not expected so good results for numerical problems.
Your post reminds us to one important point: Many unskilled people have problems understanding the difference between ref and value types and so will use them wrong. This is even a problem for people with some basic programming experience in languages like Python, Ruby or Java, where all seems to be just an object. So maybe we need a good introduction of ref and value types for beginners. Maybe Doms book will have one after the extended Manning editing period -- for the first preprint I can remember some notes about this topic, but I think it was not enough for real unskilled people. Unfortunately such a guide is not easy to write, and maybe some pictures will be necessary...
The other point to mention is inline pragma. Beginners may just forgot about that. Maybe very short procs should be inlined automatically.
Please fix this typo: "while in Java I was able able to"
Hi John, very valuable article to see whats going on behind the scenes.
it should be linked within the tutorials section.
@Stefan java is slow is a myth. Its very fast because of the hotspot vm. most executed bytecode is compiled for the target platform. thats the reason you need the "warmup phase". beware fp operations in java are not accurate. if you need precision BigDecimal should be used.
for beginners: do not look to much to ref or value types; the compiler should do the trick. For absolute beginners I think nim (and C) is better to look at than java. but first learn SQL :-) (Joe Celkos books are very good). Grab a copy of "Thinking Forth" (one of the best books to start with) and do not think about objects; think about tracking the state.