nimforum mirror - Loop backward through array -- howto do it brachless

Stefan_Salewski (orginal) [2018-10-05T22:08:32+02:00] view original

Assume we have an array with index 0 to 3, we want to print elements in a never ending loop. Start index is i. Looping forward is of course

i = (i + 1) mod 4
#print element[i]

I have no idea to do it branchless in backward order currently, this is wrong of course:

i = (i - 1) mod 4
#print element[i]

Should be easy I guess, but no idea currently :-(

cblake (orginal) [2018-10-05T22:18:22+02:00] view original

for i in 1..12:
  echo -i and 3

miran (orginal) [2018-10-06T08:13:28+02:00] view original

To mitigate negative numbers you need to add the length of the array:

i = ((i - 1) + 4) mod 4

leaving you with:

i = (i + 3) mod 4

Stefan_Salewski (orginal) [2018-10-06T09:10:22+02:00] view original

Thanks, works both perfect:

 var i = 0 i = ((i - 1) + 4) mod 4 echo i i = ((i - 1) + 4) mod 4 echo i i = ((i - 1) + 4) mod 4 echo i i = ((i - 1) + 4) mod 4 echo i i = ((i - 1) + 4) mod 4 echo i echo "-----" i = 0 i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i i = (i - 1) and 3 echo i


$ ./t
3
2
1
0
3
-----
3
2
1
0
3
2


      
    
      
        
          

            
              mratsim
             

            
              (orginal)
             

            [2018-10-06T10:04:26+02:00] 

            
              view original
            
          

          Make sure that the modulo is known at compile-time so the compiler can optimize it using and (for power of 2) and shifts.

Modulo is much more expensive than a mispredicted branch.

        
      
      
    
      
        
          

            
              Stefan_Salewski
             

            
              (orginal)
             

            [2018-10-06T10:12:14+02:00] 

            
              view original
            
          

          mratsim, a very good and important hint!

        
      
      
    
      
        
          

            
              miran
             

            
              (orginal)
             

            [2018-10-06T10:43:47+02:00] 

            
              view original
            
          

          
 works both perfect
@cblake's version works only if the length is power of two, my version should be more general

        
      
      
    
      
        
          

            
              Jehan
             

            
              (orginal)
             

            [2018-10-09T18:10:50+02:00] 

            
              view original
            
          

          Out of curiosity, have you actually checked whether the branchless version is faster?

For me, using the following code is actually faster by a factor of almost three than any of the variants using bit masking or modulo:

j -= 1
if j < 0:
  j = 3

This is most likely because branch predictors tend to be very good at such regular branching patterns. Results may vary if there's a lot of interleaved code, but as always: measure first, then optimize.

I'll also note that you have to be careful with assuming that x mod n will be optimized to an and operation where n is a power of 2. It generally will not happen, as the compiler can only rarely prove that x is non-negative. The reason is that in C99, integer modulo is defined to yield negative results for negative xx. You may get something like this for x mod 8, where x is a 32-bit integer.


        movl    %edi, %eax
        sarl    $31, %eax
        shrl    $29, %eax
        addl    %edi, %eax
        andl    $-8, %eax
        subl    %eax, %edi
        movl    %edi, %eax

A safer approach would be a template that special-cases the case where the modulo is a power of 2, e.g. via:

when (m and (m-1)) == 0: # m is a power of 2 if non-negative.
  x and (m-1)
else:
  # modulo code for arbitrary x and m

Alternatively, cast to uint before the mod if you know that the value is non-negative, and back to an int afterwards.

        
      
      
    
      
        
          

            
              Stefan_Salewski
             

            
              (orginal)
             

            [2018-10-09T19:21:19+02:00] 

            
              view original
            
          

          
 you have to be careful with assuming that x mod n will be optimized to an and operation where n is a power of 2
That is a very good hint indeed. In C code often unsigned ints are used, so its a plain "and" op, but in Nim we most often use signed ints.

I have not benchmarked these statements myself, and speed was not really a concern. It was more about simple code in this case, I typed

i = (i + 1) mod 4

for the forward loop, as it looks simpler for me than

j -= 1
if j < 0:
  j = 3

and I assumed that the mod operation at least would not be slower, which seems to be wrong indeed. Indeed simple if ... else statements can be very fast due to cmov instruction, I tested it recently:

https://forum.nim-lang.org/t/4142#25901

Mirror of forum.nim-lang.org

4282 :: Loop backward through array -- howto do it brachless