In the past few days I've been examining a lot the nimcache files trying to figure out what the final code looks like and what optimizations I could do to my original code.
For example, I discovered things that I had not known and fixed them, mainly by spotting crucial points were copy-assignments were happening (e.g. via genericSeqAssign or copyString calls, etc)
All in all, the optimizations I've made have made significant difference.
Are there any other things to pay attention to?
genericReset and genericResetAux.
See issues:
Note that one of the main source of generic reset, result assignment when result was a ref type was fixed.
Don't forget to compile with -d:danger if perf is critical, especially if you have a lot of arrays/seq accesses.
There are some cases when Nim will insert 2 zeroMem calls. This can be prevented with {.noinit.} + an initialization function that take an in-place argument.
Don't fill your stack with a 7MB object: https://github.com/status-im/nim-beacon-chain/issues/370.
And some more:
The best way to find those is to have a profiler that allows you to put the C / Nim code (via --debugger:native) or Assembly code side-by-side with perf counters.
Apple Instruments, Intel VTune (use the free System Studio edition) or linux perf all allow that.