type Modulo[q: static[int]] = distinct int
proc modulo(q: static[int], a: int): Modulo[q] = Modulo[q](a mod q)
proc `*`[q: static[int]](a, b: Modulo[q]): Modulo[q] = Modulo[q]((a.int * b.int) mod q)
proc `$`[q: static[int]](a: Modulo[q]): string = $(a.int) & " mod " & $(q)
If I later use it like this
let
x = modulo(32, 5)
y = modulo(32, 17)
echo x * y
I see from the csources that the Nim compiler is able to propagate the constant 32, so that for instance the multiplication is compiled down to
N_NIMCALL(NI, HEX2A_90097)(NI a, NI b) {
NI result;
result = 0;
result = ((NI) ((NI)((NI)(a * b) % ((NI) 32))));
return result;
}
Now, performing a division or a remainder operation modulo 32 (or any power of 2) does not actually require a full division, but can be done by a single shift operation (respectively, setting to 0 the highest bits).
It is my understanding that GCC or Clang will make such kinds of optimizations. Is there a way to check whether this actually happens?
My problem, in general, is the following. I have some C code that relies on such tricks to optimize some operations. I would like to be able to exploit the Nim type system (in particular the inference of static[int] types) to write something that is more general and more similar to the mathematical description of the algorithm, while still being sure that when I finally specialize the constants to powers of 2, the resulting compiled code will be as efficient as the manually optimized version.
$ nim -d:release c x.nim
$ grep gcc nimcache/x.c
/* Compiled for: Linux, amd64, gcc */
gcc -c -w -O3 -fno-strict-aliasing -I/media/nim/lib -o /home/def/nimcache/x.o /home/def/nimcache/x.c */
$ gcc -c -S -w -O3 -fno-strict-aliasing -I/media/nim/lib -o /home/def/nimcache/x.s /home/def/nimcache/x.c
Then check the output of x.s, for me it contains:
HEX2A_90103:
.LFB61:
.cfi_startproc
imulq %rsi, %rdi
movq %rdi, %rdx
sarq $63, %rdx
shrq $59, %rdx
leaq (%rdi,%rdx), %rax
andl $31, %eax
subq %rdx, %rax
ret
.cfi_endproc
So you see the andl $31, %eax for your mod 32.By the way, what's up with these other instructions? I see the need, of course, for a multiplication and an and with a mask $31.
Does anyone have a clue why arithmetic shifts or leaq are needed? I am a complete noob in assembly and I have a little trouble figuring out what is going on
That's the handling for modulo with a negative number. With unsigned ints it looks like this instead:
HEX2A_90103:
.LFB61:
.cfi_startproc
imull %esi, %edi
movq %rdi, %rax
andl $31, %eax
ret
.cfi_endproc
Edit: This is what I imagine to go on:
rsi rdi rdx rax
a b
imulq %rsi, %rdi a a*b
movq %rdi, %rdx a a*b a*b
sarq $63, %rdx a a*b x x = filled with 64 1s if negative, otherwise 0
shrq $59, %rdx a a*b y y = 1111 if a*b negative, otherwise 0
leaq (%rdi,%rdx), %rax a a*b y z z = a*b + y
andl $31, %eax a a*b y w w = higher half: z, lower half: z && 31
subq $rdx, %rax a a*b y v v = w-y
result is v
And with actual values:
10 -5
imulq %rsi, %rdi 10 -50
movq %rdi, %rdx 10 -50 -50
sarq $63, %rdx 10 -50 2⁶⁴-1
shrq $59, %rdx 10 -50 15
leaq (%rdi,%rdx), %rax 10 -50 15 -35
andl $31, %eax 10 -50 15 -3
subq $rdx, %rax 10 -50 15 -18
result is -18
Note: To force unsigned modulo arithmetic on signed integers without having to convert to uints, you can use the %% operator.
E.g.:
let a = -1
let b = a %% 32
echo b