nimforum mirror - if-else VS case-else VS case

drkameleon (orginal) [2022-08-04T17:25:14+02:00] view original

I've been experimenting a bit with the performance of the following constructs... and I'm a bit baffled by the results (especially the last 2). And they are totally consistent, regardless of how many times I run my benchmarks.

Can somebody explain to me what is going on?

type
    MyType* = enum
        A,
        B,
        C,
        D,
        E

var loops = 100000000

benchmark "if-else":
    var item = A
    for i in 1..loops:
        if item == A:
            item = B
        elif item == B:
            item = C
        elif item == C:
            item = D
        elif item == D:
            item = E
        else:
            item = A

# this is the fastest version, clearly
benchmark "case-else":
    var item = A
    for i in 1..loops:
        case item:
            of A:
                item = B
            of B:
                item = C
            of C:
                item = D
            of D:
                item = E
            else:
                item = A

# this is quite ok, without -d:release. WITH it, it's definitely the slowest.
benchmark "case-all":
    var item = A
    for i in 1..loops:
        case item:
            of A:
                item = B
            of B:
                item = C
            of C:
                item = D
            of D:
                item = E
            of E:
                item = A

Prestige (orginal) [2022-08-04T17:40:00+02:00] view original

I'm not sure, but I'll provide my runtimes of these benchmarks:

nim r -d:release foo.nim:


if-else: 0.086073489
case-else: 0.10265143
case-all: 0.14253516

nim r foo.nim:


if-else: 0.346636164
case-else: 0.249107282
case-all: 0.313689373

drkameleon (orginal) [2022-08-04T17:56:57+02:00] view original

My results are similar, with -d:danger included (which I hadn't tried before). Basically, the case-else version seems like the best of the bunch.

Now, here comes an even weirder realization...

If I add this to the mix a .computedGoTo. benchmark, the results baffle me even more:

benchmark "case-all+computedGoTo+while":
    var item = A
    var i = 1
    while true:
        {.computedGoto.}
        case item:
            of A:
                item = B
            of B:
                item = C
            of C:
                item = D
            of D:
                item = E
            of E:
                item = A
        i += 1
        if i > loops:
            break

This is clearly the fastest generally. BUT: when compiled with -d:danger it takes roughly double the time each and every one of the rest takes.

DMisener (orginal) [2022-08-04T17:57:10+02:00] view original

Using bency benchmarking tool -- I get:


nim git:(master) ✗ nim r -d:release test_case.nim
name ............................... min time      avg time    std dv   runs
if-else ........................... 20.048 ms     20.724 ms    ±0.370   x241
case-else ......................... 20.078 ms     21.153 ms    ±0.536   x236
case-all .......................... 20.153 ms     21.113 ms    ±0.376   x236

All pretty close.

drkameleon (orginal) [2022-08-04T18:06:33+02:00] view original

Hmm... Interesting. Then I guess I'm miscalculating the timings?

r3c (orginal) [2022-08-04T20:01:51+02:00] view original

One might see a difference when using C compiler that generates jump tables, it should be several times faster than if-else

treeform (orginal) [2022-08-04T21:18:30+02:00] view original

As writer of benchy I trust that benchmark the most. My gut feeling is that there is no real difference and benchy shows that. You are benching your CPU's microcode jump prediction.

The compiler can generate different code for different cases which might results in different performance. I feel what you are seeing is "compiler noise" where slightly different code produces different results even though it should have the same outcome.

The main problem with this benchmark is that it too small and too contrived. You are benchmarking the jmp instruction and how well a compiler can place it. How well can the microcode predict the jumps. Its not a place which usually slow.

I recommend benching real problems that can't be faked. As soon as you get even a single cached miss is x100 slower then then the jumps everything will just block on that. Using if or case just usually don't matter.

I have seen {.linearScanEnd.} help with jump tables.

mratsim (orginal) [2022-08-14T11:14:32+02:00] view original

Are you writing an interpreter?

Your benchmark might be a bit too simple and compilers might optimize some stuff away.

Here are mine like 4 years ago, at the bottom: https://github.com/status-im/nimbus-eth1/wiki/Interpreter-optimization-resources

import random, sequtils, times

type
  Op = enum
    Halt # = 0x0000
    Inc  # = 0x0100
    Dec  # = 0x0110
    Mul2 # = 0x0230
    Div2 # = 0x0240
    Add7 # = 0x0307
    Neg  # = 0x0400

func interp_switch(code: seq[Op], initVal: int): int =
  
  var
    pc = 0
  result = initVal
  
  while true:
    case code[pc]:
    of Halt:
      return
    of Inc:
      inc pc
      inc result
    of Dec:
      inc pc
      dec result
    of Mul2:
      inc pc
      result *= 2
    of Div2:
      inc pc
      result = result div 2
    of Add7:
      inc pc
      inc result, 7
    of Neg:
      inc pc
      result = -result

#################################################################################################################

func interp_cgoto(code: seq[Op], initVal: int): int =
  # Requires a dense enum
  var
    pc = 0
  result = initVal
  
  while true:
    {.computedGoto.}
    let instr = code[pc]
    case instr:
    of Halt:
      return
    of Inc:
      inc pc
      inc result
    of Dec:
      inc pc
      dec result
    of Mul2:
      inc pc
      result *= 2
    of Div2:
      inc pc
      result = result div 2
    of Add7:
      inc pc
      inc result, 7
    of Neg:
      inc pc
      result = -result

#################################################################################################################

func halt(result: var int, stop: var bool) {.inline, nimcall.}=
  stop = true

func inc(result: var int, stop: var bool) {.inline, nimcall.}=
  inc result

func dec(result: var int, stop: var bool) {.inline, nimcall.}=
  dec result

func mul2(result: var int, stop: var bool) {.inline, nimcall.}=
  result *= 2

func div2(result: var int, stop: var bool) {.inline, nimcall.}=
  result = result div 2

func add7(result: var int, stop: var bool) {.inline, nimcall.}=
  inc result, 7

func neg(result: var int, stop: var bool) {.inline, nimcall.}=
  result = -result

# Requires dense enum
type InstrF = proc (result: var int, stop: var bool){.inline, nimcall, noSideEffect, gcsafe, locks: 0.}

type FuncTable = array[Op, InstrF]

const funcTable: FuncTable = [
  Halt: halt,
  Inc: inc,
  Dec: dec,
  Mul2: mul2,
  Div2: div2,
  Add7: add7,
  Neg: neg
]

proc interp_ftable(code: seq[Op], initVal: int): int =
  # Requires dense enum
  var
    pc = 0
    stop = false
  result = initVal
  
  while not stop:
    funcTable[code[pc]](result, stop)
    inc pc

#################################################################################################################

type
  InstrNext = proc (val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}
  
  OpH = ref object
    handler: InstrNext
  
  FuncTableNext = array[Op, OpH]

proc halt(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}
proc inc(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}
proc dec(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}
proc mul2(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}
proc div2(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}
proc add7(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}
proc neg(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}

let funcTableNext: FuncTableNext = [
  Halt: OpH(handler: halt),
  Inc: OpH(handler: inc),
  Dec: OpH(handler: dec),
  Mul2: OpH(handler: mul2),
  Div2: OpH(handler: div2),
  Add7: OpH(handler: add7),
  Neg: OpH(handler: neg)
]

proc halt(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}=
  stop = true

proc inc(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}=
  
  inc val
  inc pc
  result = funcTableNext[code[pc]]

proc dec(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}=
  dec val
  inc pc
  result = funcTableNext[code[pc]]

proc mul2(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}=
  val *= 2
  inc pc
  result = funcTableNext[code[pc]]

proc div2(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}=
  val = val div 2
  inc pc
  result = funcTableNext[code[pc]]

proc add7(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}=
  inc val, 7
  inc pc
  result = funcTableNext[code[pc]]

proc neg(val: var int, code: seq[Op], pc: var int, stop: var bool): OpH {.inline, nimcall.}=
  val = -val
  inc pc
  result = funcTableNext[code[pc]]

proc interp_handlers(code: seq[Op], initVal: int): int =
  # Requires dense enum
  var
    pc = 0
    stop = false
    oph = funcTableNext[code[pc]]
  result = initVal
  
  while not stop:
    oph = oph.handler(result, code, pc, stop)

#################################################################################################################

type
  OpD = ref object {.inheritable.}
  
  Ohalt {.final.}= ref object of OpD
  Oinc {.final.}= ref object of OpD
  Odec {.final.}= ref object of OpD
  Omul2 {.final.}= ref object of OpD
  Odiv2 {.final.}= ref object of OpD
  Oadd7 {.final.}= ref object of OpD
  Oneg {.final.}= ref object of OpD
  
  FuncTableToken = array[Op, OpD]

method execute(op: OpD, result: var int, stop: var bool) {.base, inline, noSideEffect.} =
  raise newException(ValueError, "To override")

method execute(op: Ohalt, result: var int, stop: var bool) {.inline, noSideEffect.}=
  stop = true

method execute(op: Oinc, result: var int, stop: var bool) {.inline, noSideEffect.}=
  inc result

method execute(op: Odec, result: var int, stop: var bool) {.inline, noSideEffect.}=
  dec result

method execute(op: Omul2, result: var int, stop: var bool) {.inline, noSideEffect.}=
  result *= 2

method execute(op: Odiv2, result: var int, stop: var bool) {.inline, noSideEffect.}=
  result = result div 2

method execute(op: Oadd7, result: var int, stop: var bool) {.inline, noSideEffect.}=
  inc result, 7

method execute(op: Oneg, result: var int, stop: var bool) {.inline, noSideEffect.}=
  result = -result

let funcTableToken: FuncTableToken = [
  Halt: Ohalt(),
  Inc: Oinc(),
  Dec: Odec(),
  Mul2: Omul2(),
  Div2: Odiv2(),
  Add7: Oadd7(),
  Neg: Oneg()
]

proc interp_methods(code: seq[Op], initVal: int): int =
  # Requires dense enum
  var
    pc = 0
    stop = false
    opt: OpD
  result = initVal
  
  while not stop:
    opt = funcTableToken[code[pc]]
    opt.execute(result, stop)
    inc pc

#################################################################################################################

import random, sequtils, times, os, strutils, strformat

const Nb_Instructions = 1_000_000_000

template bench(impl: untyped) =
  let start = cpuTime()
  let r = impl(instructions, n)
  let stop = cpuTIme()
  let elapsed = stop - start
  echo "result: " & $r
  let procname = impl.astToStr
  let mips = (Nb_Instructions.float / (1_000_000.0 * elapsed))
  echo procname & " took " & $elapsed & "s for " & $Nb_Instructions & " instructions: " & $mips & " Mips (M instructions/s)"

proc main(n: int)=
  randomize(42)
  
  let ops = [Inc, Dec, Mul2, Div2, Add7, Neg]
  let instructions = newSeqWith(Nb_Instructions, rand(ops)) & Halt
  
  bench(interp_switch)
  bench(interp_cgoto) # requires dense enum (no holes)
  bench(interp_ftable) # requires dense enum (no holes) or tables (instead of arrays)
  bench(interp_handlers) # requires dense enum (no holes) or tables (instead of arrays)
  bench(interp_methods) # requires dense enum (no holes) or tables (instead of arrays)

# Warmup
var start = cpuTime()
block:
  var foo = 123
  for i in 0 ..< 1_000_000_000:
    foo += i*i mod 456
    foo = foo mod 789

# Compiler shouldn't optimize away the results as cpuTime rely on sideeffects
var stop = cpuTime()
echo "Warmup: " & $(stop - start) & "s"

# Main loop
let arguments = commandLineParams()
let initial = if arguments.len > 0: parseInt($arguments[0])
              else: 1

main(initial)

## Results on i5-5257U (Broadwell mobile dual core 2.7 turbo 3.1Ghz)
# Note that since Haswell, Intel CPU are significantly improed on Switch prediction
# This probably won't carry to ARM devices

# Warmup: 4.081501s
# result: -14604293096444
# interp_switch took 8.604712000000003s for 1000000000 instructions: 116.2153945419672 Mips (M instructions/s)
# result: -14604293096444
# interp_cgoto took 7.367597000000004s for 1000000000 instructions: 135.7294651159665 Mips (M instructions/s)
# result: -201628509198920 <--- some bug here to fix
# interp_ftable took 8.957571000000002s for 1000000000 instructions: 111.6374070604631 Mips (M instructions/s)
# result: -14604293096444
# interp_handlers took 11.039072s for 1000000000 instructions: 90.58732473164413 Mips (M instructions/s)
# result: -14604293096444
# interp_methods took 23.359635s for 1000000000 instructions: 42.80888806695823 Mips (M instructions/s)

drkameleon (orginal) [2022-08-15T17:12:27+02:00] view original

Are you writing an interpreter?

Yep. haha. (And I believe we have talked about similar things in the past... ;-) )

https://github.com/arturo-lang/arturo

Basically, given that the language is already quite mature and used in different projects, I decided it was high time to start benchmarking a bit and make any performance-related improvements needed. Hence, here I am re-visiting things...

Btw, in case you want to have a look, here's the "loop": https://github.com/arturo-lang/arturo/blob/master/src/vm/exec.nim#L281-L409

Mirror of forum.nim-lang.org

9348 :: if-else VS case-else VS case