nimforum mirror - Optimizing division to masking bits

andrea (orginal) [2015-05-18T10:04:50+02:00] view original

Say I write a simple type to do modular arithmetic like the following

type Modulo[q: static[int]] = distinct int

proc modulo(q: static[int], a: int): Modulo[q] = Modulo[q](a mod q)

proc `*`[q: static[int]](a, b: Modulo[q]): Modulo[q] = Modulo[q]((a.int * b.int) mod q)

proc `$`[q: static[int]](a: Modulo[q]): string = $(a.int) & " mod " & $(q)

If I later use it like this

let
  x = modulo(32, 5)
  y = modulo(32, 17)
echo x * y

I see from the csources that the Nim compiler is able to propagate the constant 32, so that for instance the multiplication is compiled down to

N_NIMCALL(NI, HEX2A_90097)(NI a, NI b) {
  NI result;
  result = 0;
  result = ((NI) ((NI)((NI)(a * b) % ((NI) 32))));
  return result;
}

Now, performing a division or a remainder operation modulo 32 (or any power of 2) does not actually require a full division, but can be done by a single shift operation (respectively, setting to 0 the highest bits).

It is my understanding that GCC or Clang will make such kinds of optimizations. Is there a way to check whether this actually happens?

My problem, in general, is the following. I have some C code that relies on such tricks to optimize some operations. I would like to be able to exploit the Nim type system (in particular the inference of static[int] types) to write something that is more general and more similar to the mathematical description of the algorithm, while still being sure that when I finally specialize the constants to powers of 2, the resulting compiled code will be as efficient as the manually optimized version.

def (orginal) [2015-05-18T10:31:00+02:00] view original


$ nim -d:release c x.nim
$ grep gcc nimcache/x.c
/* Compiled for: Linux, amd64, gcc */
   gcc -c  -w -O3 -fno-strict-aliasing  -I/media/nim/lib -o /home/def/nimcache/x.o /home/def/nimcache/x.c */
$ gcc -c -S -w -O3 -fno-strict-aliasing  -I/media/nim/lib -o /home/def/nimcache/x.s /home/def/nimcache/x.c

Then check the output of x.s, for me it contains:


HEX2A_90103:
.LFB61:
        .cfi_startproc
        imulq	%rsi, %rdi
        movq	%rdi, %rdx
        sarq	$63, %rdx
        shrq	$59, %rdx
        leaq	(%rdi,%rdx), %rax
        andl	$31, %eax
        subq	%rdx, %rax
        ret
        .cfi_endproc

So you see the andl $31, %eax for your mod 32.

andrea (orginal) [2015-05-18T10:36:59+02:00] view original

Great, this is exactly what I was missing!

andrea (orginal) [2015-05-18T11:28:59+02:00] view original

By the way, what's up with these other instructions? I see the need, of course, for a multiplication and an and with a mask $31.

Does anyone have a clue why arithmetic shifts or leaq are needed? I am a complete noob in assembly and I have a little trouble figuring out what is going on

def (orginal) [2015-05-18T12:30:37+02:00] view original

That's the handling for modulo with a negative number. With unsigned ints it looks like this instead:


HEX2A_90103:
.LFB61:
        .cfi_startproc
        imull	%esi, %edi
        movq	%rdi, %rax
        andl	$31, %eax
        ret
        .cfi_endproc

Edit: This is what I imagine to go on:


                        rsi rdi rdx rax
                         a   b
imulq %rsi, %rdi         a  a*b
movq  %rdi, %rdx         a  a*b a*b
sarq  $63,  %rdx         a  a*b  x      x = filled with 64 1s if negative, otherwise 0
shrq  $59,  %rdx         a  a*b  y      y = 1111 if a*b negative, otherwise 0
leaq  (%rdi,%rdx), %rax  a  a*b  y   z  z = a*b + y
andl  $31,  %eax         a  a*b  y   w  w = higher half: z, lower half: z && 31
subq  $rdx, %rax         a  a*b  y   v  v = w-y
result is v

And with actual values:
                         10  -5
imulq %rsi, %rdi         10 -50
movq  %rdi, %rdx         10 -50 -50
sarq  $63,  %rdx         10 -50 2⁶⁴-1
shrq  $59,  %rdx         10 -50  15
leaq  (%rdi,%rdx), %rax  10 -50  15  -35
andl  $31,  %eax         10 -50  15   -3
subq  $rdx, %rax         10 -50  15  -18
result is -18

andrea (orginal) [2015-05-18T14:47:17+02:00] view original

Great explanation, thank you very much for your patience!

Jehan (orginal) [2015-05-18T17:02:17+02:00] view original

Note: To force unsigned modulo arithmetic on signed integers without having to convert to uints, you can use the %% operator.

E.g.:

let a = -1
let b = a %% 32
echo b

joseph_wecker (orginal) [2015-05-19T01:06:43+02:00] view original

If you care about bit twiddling brushing up on ones' complement and how its commonly twiddled might help. For your purposes though- your instinct to trust the compiler to care about stuff like that so you can get to the more abstract and semantic code is spot on.

andrea (orginal) [2015-05-19T09:33:53+02:00] view original

Thank you for your other suggestions :-)

Mirror of forum.nim-lang.org

1239 :: Optimizing division to masking bits