nimforum mirror - General recommendation for optimum performance?

ElAfalw (orginal) [2020-08-18T10:52:33+02:00] view original

I'm experimenting a bit with Nim and wondering if there are some general guidelines for speed.

For example, I compile with

c -d:release -d:danger --gc:arc --opt:speed --checks:off --passC:"-O3 -flto -m64" --passL:"-flto" -o:example example.nim

Does this look good?

Any other things to pay attention to? Objects vs ref objects, sequences, tuples? Specific operations that I should prefer over something else?

Thanks in advance!

Stefan_Salewski (orginal) [2020-08-18T11:07:39+02:00] view original

I think in my book I told you that -d:danger already implies -d:release. Also --opt:speed in unnecessary with d-:danger or with -d:danger. -O3 is the default for danger and release. --checks:off should be included in -d:danger. --passL:"-flto" should be not necessary, yardanico uses that, but from my experience --passC:-flto is sufficieant. No idea what -m64 makes, but you can try --passC:-march=native

Of course you have to care for all the rest also when you want maximum performance.

Araq (orginal) [2020-08-18T11:15:11+02:00] view original

Your options are a pretty good start, for --gc:arc "nim devel" usually has more optimizations implemented. Also, use --panics:on.

ElAfalw (orginal) [2020-08-18T11:40:15+02:00] view original

Thanks a lot!

Also, @araq, I'll try with the devel branch to see if I notice any difference.

Any other suggestions? (I mean unrelated to compiler options)

cblake (orginal) [2020-08-18T11:59:18+02:00] view original

I don't know if you are using nim.cfg or config.nims or neither, but once you decide on one (or a few) bundles of options then you can package them up for easy reference. E.g., you could add to your $HOME/.config/nim.cfg:


@if r: # Allow short alias -d:r to make fast executables
  define:danger
  checks:off
  panics:on
  --passC:"-flto"
@end

That is still related to compiler options (sort of). Also, on "options while building" profile-guided optimization can often help. See https://forum.nim-lang.org/t/6295

Other speed related ideas..Well, most are not Nim-specific but would also apply to C++ or something like that, but efficient data structures, not repeating work, avoiding memory allocation-free cycles by re-using objects when that's not too hard, and so on all can matter. Some of these are sort of Nim-specific. E.g., if you have a seq[T] and can estimate its size ahead of time then pre-allocating that size and using [i] will be faster than growing up by add. Similarly, pre-sizing a Table can save the grow-from zero with many intermediate resizes costs and so on.

It's much easier to give specific guidelines. So, if you can anticipate or profile some small section of code as important to your overall performance and then post that here in the Forum then you can often get good suggestions for how to speed it up.

enthus1ast (orginal) [2020-08-18T13:30:41+02:00] view original

In c++ i sometimes also got heavy performance improvments when i avoid random memory access. So looping over an array is often times much faster than accessing by key, since the calls can be vectorized better and the cpu cache can work more efficent.

gcc/gpp is also able to log the missed optimization opportunities with -fopt-info-missed and friends.

cblake (orginal) [2020-08-18T13:58:30+02:00] view original

@enthus1ast's advice is good. To add some numbers, DRAM latency is typically 60..120 ns while a CPU clock cycle is merely 0.2..0.5ns giving 120..600x possible speed ups. Server class CPUs can even do 2..6 dynamic instructions/cycle amplifying the ratio up to 240..3600x. One can not always get such boosts, but it pays to be aware of basic costs.

Those ratios are so large that modern microprocessors take every available opportunity to mask that latency. This in turn makes it all too easy to fool oneself into thinking the ratio is not there by some naive tight, hot loop benchmark. (E.g. doing nothing else allowing the CPU to perfectly predict its near future work and so prefetch memory into L1 caches perfectly.)

But this again is just very general advice/awareness. It's probably multiple whole university level courses to go into all the details while specifics can constrain the scope to something more manageable.

treeform (orginal) [2020-08-18T22:13:45+02:00] view original

My recommendation is to always profile your code. It's just not possible to guess performance issues beyond a simple program.

VTune is pretty easy to get started with. Just compile with --debuger:native (so that it adds symbols) and open VTune and run your program.

It event allows you to walk through your code looking for hot spots:

juancarlospaco (orginal) [2020-08-19T00:12:44+02:00] view original

Other interesting flags can be -ffast-math -march=native -fsingle-precision-constant.

Yardanico (orginal) [2020-08-19T03:08:32+02:00] view original

I think the fast-math and single-precision-constant can make surprising results when dealing with numbers, so I'd just add march=native if you don't need to share the binary on other computers

hugogranstrom (orginal) [2020-08-19T12:16:23+02:00] view original

I downloaded VTune and ran it on one of my programs. Is it normal that nimFrame is listed as 4/5 hotspots?

Yardanico (orginal) [2020-08-19T13:01:44+02:00] view original

You should compile with -d:danger so there are no stack traces :)

hugogranstrom (orginal) [2020-08-19T13:27:31+02:00] view original

Ohhh... I thought I had to have them so I actived them :P

Now that looks a lot better! Thank you! :D

hugogranstrom (orginal) [2020-08-19T13:43:59+02:00] view original

That feeling when you find the one thing that makes your code 50x faster, wonderful! 🤩 In this case it was caching of hashes for Nodes in a AST.

enthus1ast (orginal) [2020-08-20T14:09:31+02:00] view original

There is also KCacheGrind which can display callgrind and cachegrind output.

Mirror of forum.nim-lang.org

6693 :: General recommendation for optimum performance?