nimforum mirror - Disadvantages of static proc parameters?

Stefan_Salewski (orginal) [2019-05-27T19:11:27+02:00] view original

I still wonder what the disadvantages of static proc parameters are.

Is it increase in compile time, or are there other issues?

Because, I recently suggested in

https://github.com/nim-lang/Nim/issues/10910#issuecomment-490977890

proc `^`*[T](x: T, y: static[Natural]): T {.inline.} =
  when y < 7:
    when y == 0:
      result = T(1)
    when y == 1:
      result = x
    when y == 2:
      result = x * x
    when y == 3:
      result = x * x * x
    when y == 4:
      result = x * x
      result *= result
    when y == 5:
      result = x * x
      result *= (result * x)
    when y == 6:
      result = x * x
      result *= (result * result)
  else:
    result = math.`^`(x, y)

# or

proc `^`*[T](x: T, y: static[Natural]): T {.inline.} =
  when y < 10:
    result = T(1)
    var i = y
    while i > 0:
      result *= x
      dec(i)
  else:
    result = math.`^`(x, y)

to make small integer powers more nimish.

Miran' s choise was instead adding this code for small powers:


case y
  of 0: result = 1
  of 1: result = x
  of 2: result = x * x
of 3: result = x * x * x

in https://github.com/nim-lang/Nim/blob/devel/lib/pure/math.nim#L966

Which seems to be an improvement, but I can not imagine that it leads to optimal code. (Maybe the case statement is covered by a cmov instruction avoiding a slow branch, but still the ^ proc is not inlined, as long as we do not compile with -flto.)

Araq (orginal) [2019-05-27T21:24:49+02:00] view original

Well if you can't "imagine" it, look at the produced assembler.

Stefan_Salewski (orginal) [2019-05-27T21:33:21+02:00] view original

Well, I have tested running time, without -flto but with -d:release of course.

Mirans fix was an improvement for ^ 2 (square) but still slower than plain "*". My suggested fix was equal to "*" for both of my suggestions, and assembler was perfect.

So I assumed there just is a drawback of my solution.

OK, will look at assembler.

Stefan_Salewski (orginal) [2019-05-27T22:28:24+02:00] view original

OK, here it is:

First my well known Nim test code:

# nim c -d:release k.nim
import random, math

proc main =
  var s, j: int
  for i in 0 .. 10000000:
    j = rand(7)
    #s += j * j # j ^ 2
    s += j ^ 2
  echo s

main()

Compiled with nim c -d:release k.nim with gcc9.1


        .file	"k.c"
.L33:
        movl	$7, %edi
        call	rand_v7jZDEs4VOsrcpvk0yo8Rg
        movq	%rax, %rdi
        movl	$2, %esi
        call	roof__e6fgxN584SyDK8XF8s1uig
        addq	%rax, %rbp
        decq	%rbx
        jne	.L33


        .file	"stdlib_math.c"
        .text
        .p2align 4
        .globl	roof__e6fgxN584SyDK8XF8s1uig
        .hidden	roof__e6fgxN584SyDK8XF8s1uig
        .type	roof__e6fgxN584SyDK8XF8s1uig, @function
roof__e6fgxN584SyDK8XF8s1uig:
.LFB3:
        .cfi_startproc
        movq	%rdi, %rax
        cmpq	$2, %rsi
        je	.L2
        jg	.L3
        testq	%rsi, %rsi
        je	.L9
        cmpq	$1, %rsi
        jne	.L5
        ret
        .p2align 4,,10
        .p2align 3
.L3:
        cmpq	$3, %rsi
        jne	.L5
        movq	%rdi, %rdx
        imulq	%rdi, %rdx
        imulq	%rdx, %rax
        ret
        .p2align 4,,10
        .p2align 3
.L9:
        movl	$1, %eax
        ret
        .p2align 4,,10
        .p2align 3
.L5:
        movq	%rax, %rdx
        movl	$1, %eax
        jmp	.L7
        .p2align 4,,10
        .p2align 3
.L22:
        imulq	%rdx, %rdx
.L7:
        testb	$1, %sil
        je	.L8
        imulq	%rdx, %rax
.L8:
        shrq	%rsi
        jne	.L22
        ret
        .p2align 4,,10
        .p2align 3
.L2:
        imulq	%rdi, %rax
        ret
        .cfi_endproc
.LFE3:
        .size	roof__e6fgxN584SyDK8XF8s1uig, .-roof__e6fgxN584SyDK8XF8s1uig
        .ident	"GCC: (Gentoo 9.1.0 p1.0) 9.1.0"
        .section	.note.GNU-stack,"",@progbits

Seems to be not surprising. As roof proc is not inlined, full assembler proc is called.

And to verify here timing with your roof proc:


$ time ./k
175032899

real	0m0.172s

Plain j * j gives


$ time ./k
175032899

real	0m0.164s

It is not a big difference of course, but note the rand() and loop overhead. For Python we would be happy with the roof proc, but this is Nim. And as I wrote above, my static suggestion gives same as j * j. Of course we can use -flto, and maybe we should make that the default, then all is inlined and your roof proc works perfectly.

mratsim (orginal) [2019-05-28T10:41:16+02:00] view original

The main disadvantage is that you have one proc per static instantiation value which usually increase the code size (and is less efficient on the instruction cache).

It doesn't really matter in this case as the proc is inline and a single machine instruction proc.

arnetheduck (orginal) [2019-05-28T18:23:39+02:00] view original

compilers nowadays often do call specialization for constants arguments - basically create alternate functions for specific constant values based on metrics collected during the compile, instead of doing it blindly, meaning that static ends up being a noisy premature optimization if used for this reason.

the story is similar to inline in c - obsoleted by better compilers, effectively.

this will work better if you also enable LTO - which incidentally removes the need for the nim {.inline.} noise in your code too.

mratsim (orginal) [2019-05-29T11:50:43+02:00] view original

In that case ^ is a library function and will be in a different module from where it's used. So you need either static or {.inline.} so that respectively Nim VM or the C compiler does constant folding.

Also while C compilers certainly does constant propagation of add, mul, or/and/xor, shifts, I'm not sure they do that for the power function.

Mirror of forum.nim-lang.org

4878 :: Disadvantages of static proc parameters?