I've noticed that Nimrod generates this kind of pattern for temporaries inside a proc call. For ex:
proc add (a: int, b: int): int {.noinit, inline.} =
result = a + b
let x = add (1, add (2, 3))
LOC1 = 0;
LOC1 = add_121227(2, 3);
x = add_121227(1, LOC1);
While the first zero initialization of LOC1 will be most likely optimized away by a C compiler, my concern is when using compound types or simd types in which case a memset occurs:
let af_f = mm256_add_ps (cf, mm256_add_ps (af, bf))
memset((void*)&LOC1, 0, sizeof(LOC1));
LOC1 = _mm256_add_ps(af, bf);
aff = _mm256_add_ps(cf, LOC1);
In this case, the memset is not optimized away. Is there some way to bypass this behavior?
Note that when using explicitly a let myself, this is of course not happening. For ex:
let tmp0 = mm256_add_ps (af, bf)
let af_f = mm256_add_ps (cf, tmp0)
... results in the best possible x86 code.
Thanks a lot !