Recently someone wrote:
DB ODBC module query data in debug and release mode, query results are not the same.
I observed also a difference in behavior recently between debug and release mode. I hesitated to report it at all, because it occurs only with gcc 5.3 with O3 and march=native and use of inline pragma. Debug mode or O2 or no inline, or clang works all fine. So it seems to be a gcc bug. And that bug is really strange, making a variable volatile or global fixed it, a plain echo also. I would be interested in the reason of course, but it is too hard for me to locate the real reason. But posting a small warning here may be OK.
Generally I wonder if O3 is really the best default for Nim with gcc. Most people seems to prefer O2 -- O3 drastically increases code size, which may have bad impact to cache usage. I think I will use O2 and march=native as my personal default.
Most people seems to prefer O2 -- O3 drastically increases code size, which may have bad impact to cache usage.
Most benchmarks I've seen show O3 outperforming O2, so IMO it's the best option for release to default too.. binary size is often much smaller than the data-sets it processes. I mean, if binary size is a performance factor for your app, you should try Os instead (which is basically O2 sans some optimizations which cause bin size increase). Release builds should always default too what is most efficient most of the time (without breaking standards like Ofast does), and I don't think (what sounds like) a bug in GCC 5.3 is a good reason to change to a slower (on average) default.
Of course, it would be a good idea to have some kind of debug vs release test suite to catch these kinds of bugs (if that doesn't already exist).
Interesting. It was my wrong impression that Nim compiler would apply C static keyword to not exported procs. So that is not the case -- now i can understand better why inline pragma has an effect even when gcc O3 is used (O3 should use inline as the manual told us). And I think I can understand better now why LTO drastically decreases executable size for Nim executables, while for ordinaty C software LTO has often no big effect, sometimes it even increases executable size.
I just did one more test -- gcc 5.3 manual told us which additional flags O3 enables, so I used O2 and these additional flags:
gcc.options.speed = "-O2 -finline-functions -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-loop-vectorize -ftree-loop-distribute-patterns -ftree-slp-vectorize -fvect-cost-model -ftree-partial-pre -fipa-cp-clone -march=native -fno-strict-aliasing"
But that works fine, while O3 does not. I will do no more tests, the problem may be very special for my hardware, and gcc 5.3 is very fresh still, I think they will fix it.