From www.godbolt.org we have with -O3:
#define eqabs(a, b) a*a == b*b
int t(int x) {
return eqabs(x, 9) or eqabs(x, 81);
}
t(int):
imul edi, edi
cmp edi, 81
sete al
cmp edi, 25
sete dl
or eax, edx
movzx eax, al
ret
That seems to be fastest possible code with only one mul op. (I am still not sure if that is really faster than abs().)
For Nim, I think that inline procs do not help, so I would need a template. But how can I ensure that the integer literal parameter is evaluated at compile time and that "common subexpression elimination" works to get also best possible code? Would untyped template parameters suffice?
Have not tried to look at Nim's assembly code yet, as finding the template instructions in assembly listing is some work.
template eqabs(a, b): bool = a * a == b * b
proc t(x: int32): bool = eqabs(x, 5) or eqabs(x, 9)
echo t(10)
nim -d:release c x. Assembly with gcc -S -fverbose-asm shows:
# /home/d067158/nimcache/x.c:50: T1_ = ((NI32)(x * x) == ((NI32) 25));
imull %edi, %edi # x, _1
movl $1, %eax #, <retval>
# /home/d067158/nimcache/x.c:51: if (T1_) goto LA2_;
cmpl $25, %edi #, _1
je .L3 #,
# /home/d067158/nimcache/x.c:52: T1_ = ((NI32)(x * x) == ((NI32) 81));
cmpl $81, %edi #, _1
sete %al #, <retval>
.L3:
# /home/d067158/nimcache/x.c:56: }
ret
Which looks fine. So if you want to depend on compiler optimizations, check the assembly output. If you don't want that, do the optimization manually.
But why multiply the number (can overflow too) instead of this?
let s = x shr 31
(x xor s) - s
Reference for tricks like this: http://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
I assume your template use untyped parameters as
template eqabs(a, b: untyped): bool = a * a == b * b
(Indeed I should have used 64 bit int for the C code.)
It is great that we get the same optimized assembly as in C (for a template, but not for an inline proc)
But indeed comparing the squares seems to give no real advantage -- Nim's and C's abs() is already fully optimized.
#include <stdint.h>
#include <stdlib.h>
#define eqabs(a, b) a*a == b*b
#define eq(a, b) llabs(a) == b
int8_t t1(int64_t x) {
return eqabs(x, 9) or eqabs(x, 5);
}
int8_t t2(int64_t x) {
return eq(x, 9) or eq(x, 5);
}
int64_t a1(int64_t x) {
return (x < 0 ? -x : x);
}
int64_t a2(int64_t x) {
return llabs(x);
}
t1(long):
imul rdi, rdi
cmp rdi, 81
sete al
cmp rdi, 25
sete dl
or eax, edx
ret
t2(long):
mov rax, rdi
sar rax, 63
xor rdi, rax
sub rdi, rax
sub rdi, 5
test rdi, -5
sete al
ret
a1(long):
mov rdx, rdi
mov rax, rdi
sar rdx, 63
xor rax, rdx
sub rax, rdx
ret
a2(long):
mov rdx, rdi
mov rax, rdi
sar rdx, 63
xor rax, rdx
sub rax, rdx
ret
Function t1 has one instruction less, but that does not mean that it is faster than t2.
There's asm statement to define how the function behaves in assembly level.
Useful for people who works in very constrained resources.
Only using that asm statement you can exactly control how will it become in assembly, without asm statement, in the end the code will changed to C and will be optimized by the compiler.
There's also emit pragma , but as mentioned there, use with caution.