Assume we have an array with index 0 to 3, we want to print elements in a never ending loop. Start index is i. Looping forward is of course
i = (i + 1) mod 4
#print element[i]
I have no idea to do it branchless in backward order currently, this is wrong of course:
i = (i - 1) mod 4
#print element[i]
Should be easy I guess, but no idea currently :-(
for i in 1..12:
echo -i and 3
To mitigate negative numbers you need to add the length of the array:
i = ((i - 1) + 4) mod 4
leaving you with:
i = (i + 3) mod 4
Thanks, works both perfect:
var i = 0 i = ((i - 1) + 4) mod 4 echo i i = ((i - 1) + 4) mod 4 echo i i = ((i - 1) + 4) mod 4 echo i i = ((i - 1) + 4) mod 4 echo i i = ((i - 1) + 4) mod 4 echo i echo "-----" i = 0 i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i
$ ./t
3
2
1
0
3
-----
3
2
1
0
3
2
Make sure that the modulo is known at compile-time so the compiler can optimize it using and (for power of 2) and shifts.
Modulo is much more expensive than a mispredicted branch.
works both perfect
@cblake's version works only if the length is power of two, my version should be more general
Out of curiosity, have you actually checked whether the branchless version is faster?
For me, using the following code is actually faster by a factor of almost three than any of the variants using bit masking or modulo:
j -= 1
if j < 0:
j = 3
This is most likely because branch predictors tend to be very good at such regular branching patterns. Results may vary if there's a lot of interleaved code, but as always: measure first, then optimize.
I'll also note that you have to be careful with assuming that x mod n will be optimized to an and operation where n is a power of 2. It generally will not happen, as the compiler can only rarely prove that x is non-negative. The reason is that in C99, integer modulo is defined to yield negative results for negative xx. You may get something like this for x mod 8, where x is a 32-bit integer.
movl %edi, %eax
sarl $31, %eax
shrl $29, %eax
addl %edi, %eax
andl $-8, %eax
subl %eax, %edi
movl %edi, %eax
A safer approach would be a template that special-cases the case where the modulo is a power of 2, e.g. via:
when (m and (m-1)) == 0: # m is a power of 2 if non-negative.
x and (m-1)
else:
# modulo code for arbitrary x and m
Alternatively, cast to uint before the mod if you know that the value is non-negative, and back to an int afterwards.
you have to be careful with assuming that x mod n will be optimized to an and operation where n is a power of 2
That is a very good hint indeed. In C code often unsigned ints are used, so its a plain "and" op, but in Nim we most often use signed ints.
I have not benchmarked these statements myself, and speed was not really a concern. It was more about simple code in this case, I typed
i = (i + 1) mod 4
for the forward loop, as it looks simpler for me than
j -= 1
if j < 0:
j = 3
and I assumed that the mod operation at least would not be slower, which seems to be wrong indeed. Indeed simple if ... else statements can be very fast due to cmov instruction, I tested it recently: