nimforum mirror - Reason for -fno-strict-aliasing?

cdome (orginal) [2017-08-24T17:41:30+02:00] view original

I have question which is likely addressed to Araq.

I have spent a couple of days investigating the reason why my Nim code is somewhat slower than similar C implementation.

I have traced it down to the fact that Nim invokes gcc with "-fno-strict-aliasing" flag. In my case it reduces level of vectorization/optimization done by gcc. I have recompiled generated C code without this flag and now my Nim code is on par with C.

The question if it is safe to remove this flag? and why it is there in the first place. If it is actually required, I am willing to help getting rid of it if it is feasible.

cdome (orginal) [2017-08-24T17:51:36+02:00] view original

Actually, I just have found this thread https://forum.nim-lang.org/t/2921

Araq, I think what you can do for casts to comply with strict aliasing rules is:

(target_type *)(void *) instead_of (target_type *)

It is not exactly what standard say, but it works on all C compilers I worked on. Would you accept such pull request with -fno-strict-aliasing removal included?

Araq (orginal) [2017-08-24T21:04:25+02:00] view original

Well the C standard solution nowadays is that the cast must go through a union type and the code generator already can do that. PRs are welcome.

pwernersbach (orginal) [2017-08-25T17:44:52+02:00] view original

I read a bit about the strict aliasing rule, and it seems that the only code that would be broken by "-fno-strict-aliasing" is code that declares two pointers to the same block of memory, and simultaneously uses both pointers to access the memory. I'd say that most programs, especially Nim programs, do not do this.

I see two potential problem areas:

The first potential problem area would be the garbage collector and memory manager. If the garbage collector and memory manager do this, then they must be converted to casting through unions, as Araq suggests.

The second potential problem area would be how the Nim garbage collector generates code for ref's and inheritance. If a ref is only used as one type, then there is no problem. If a ref is used as a parent type and a child type, then there is a potential problem there, depending on how the Nim compiler generates the cast for the ref.

After these two potential problems areas are evaluated and fixed, then "-fno-strict-aliasing" can be removed, as "safe" Nim code would not be able to violate the strict aliasing rule. Nim code that uses "cast" and ptr's in certain ways would still be able to violate the rule, but I would argue that it would be the programmer's responsibility to pass "-fno-strict-aliasing" to the compiler in this case.

pwernersbach (orginal) [2017-08-25T17:47:53+02:00] view original

Also, in my opinion, violating the strict aliasing rule is a code smell, and so I think we should avoid accommodating bad code with the "-fno-strict-aliasing" flag.

cblake (orginal) [2017-08-25T20:29:49+02:00] view original

This may be obvious, but has anyone else tried changing -fno-strict-aliasing to -fstrict-aliasing -Wstrict-aliasing in their nim.cfg and seeing if any gcc warnings arise? When I try this on Linux with gcc-6.3 and gcc-7.2 and devel nim I don't see any warnings at all. You may also need to turn off the -w general gcc warning suppression, too (maybe in the system level nim.cfg as well as $HOME one) and double check with nim c --verbosity:2 that the compiler is getting invoked as you think it should. I tried a few different garbage collector impls and a variety of small Nim programs. It's hardly exhaustive and I'm not sure that this particular gcc warning has a 0% false-negative rate (does someone know?). We may be so close to this as to be almost already there.

Tiberium (orginal) [2017-08-25T20:36:47+02:00] view original

@cblake it doesn't show any problems for me

Tiberium (orginal) [2017-08-25T20:40:46+02:00] view original

I've even changed -fno-strict-aliasing to "-fstrict-aliasing -Wstrict-aliasing" in build.sh (in csources), and nim compiler compiled without any issues!

cblake (orginal) [2017-08-25T20:48:20+02:00] view original

One other gotcha in this "test" to see how close we already are is that it seems nim catches the stderr output of the gcc invocations. In that sense the nim gcc is as if it were gcc -w anyway. If you capture the commands from --verbosity:2 and re-execute them from a /bin/sh script, you do see any gcc warnings, though. I still don't see any warnings related to aliasing.

cblake (orginal) [2017-08-25T21:08:51+02:00] view original

@Tiberium, in the csources/build.sh context you should only need to remove the -w and change the -fno-strict-aliasing in COMP_FLAGS since it's just a shell script with a zillion gcc invocations. Compiling other nim code, I only had to change my nim.cfg's gcc.options.always = "-Wall 2>>/tmp/gcc.log" or something similar to catch the warning outputs. Still yet to see any warnings related to aliasing.

Jehan (orginal) [2017-08-25T22:29:41+02:00] view original

Also, in my opinion, violating the strict aliasing rule is a code smell, and so I think we should avoid accommodating bad code with the "-fno-strict-aliasing" flag.

All the major OS kernels (Linux, FreeBSD, and OpenBSD) have strict aliasing disabled, and for good reasons. So do a plethora of other C applications and libraries.

One, it is insanely easy to accidentally violate the strict aliasing rule when manually writing normal C code and there is no way to safely statically safeguard against such occurrences. It's an easy source of Heisenbugs with basically no performance-related payoff outside of rare circumstances (where aliasing can create a performance impact, the pointers usually point to entities of the same type).

This also goes for manually written C code that's being included in Nim directly, such as static inline functions from header files for external functions.

Two, even for generated code (such as code emitted by the Nim compiler), it is difficult to avoid undefined behavior in low-level code unless the code generator was designed from the ground up for this. There are a number of reasons for that:

The gcc developers have held for the longest time that casting through a union is not sufficient to work around the pointer aliasing rule and did in fact defend that claim fairly agressively (see this thread, for example). I think they've finally gone back on that, because programmers expressed a need for that use case, but there are probably still gcc versions out there for which this isn't safe; or at least casting through a union silences warnings that you'd otherwise get. And I'm not actually 100% sure that we're safe now.

For each cast, the union through which to cast would have to be the transitive closure of all non-char types that the pointer may ever have pointed to, for example:

proc main =
  var a: int64 = 1
  var p: ptr float = cast[ptr float](addr a)
  var q: ptr int32 = cast[ptr int32](addr a)
  p[] = 1.0
  q[] = 2'i32
  echo a

main()

This example would require a union of int64, int32, and float. Which may be impossible to construct if the code is spread over several functions.

I've even changed -fno-strict-aliasing to "-fstrict-aliasing -Wstrict-aliasing" in build.sh (in csources), and nim compiler compiled without any issues!

"Absence of evidence does not mean evidence of absence", as they say. The compiler may just not have triggered the relevant optimizations in this case, or you may not have executed the relevant code. You also have no guarantee that future versions of gcc/clang won't introduce breakage.

cblake (orginal) [2017-08-25T23:06:02+02:00] view original

I did a little searching and compiling and I'd have to say Jehan is right here about "absence of evidence". See, this stackoverflow thread for a very simple example to people that is (still in 2017) too complex for (at least gcc's) Wstrict-aliasing heuristics. It sure seems like strict-aliasing is a real morass.

@cdome - perhaps a better solution would be some kind of emit and/or macro machinery that turns on -fstrict-aliasing just for the procs you need to recover performance. gcc has had this #pragma or __attribute__ way to do that for quite a few years now (2011, I think). See here.

Mirror of forum.nim-lang.org

3121 :: Reason for -fno-strict-aliasing?