nimforum mirror - "Benchmarking the Beast" update

3-2-1 (orginal) [2023-09-16T06:33:59+02:00] view original

Hi all,

I just discovered Nim and have been going some of the blog articles. I read "Benchmarking the Beast" which is an awesome demonstration of Nim's features. I tried to run it with Nim 2.0.0_1 and the LSP flagged many errors (406). I assume these are a bunch of deprecations.

Can this script be easily fixed for modern Nim?

dlesnoff (orginal) [2023-09-16T10:39:59+02:00] view original

Thanks for pointing that out! I may look into it.

giaco (orginal) [2023-09-16T11:20:13+02:00] view original

surely an impressive post

worth fixing along Nim evolution

dlesnoff (orginal) [2023-09-16T13:05:50+02:00] view original

All the snippets are still working for me.

import std/macros

import std/monotimes
from times import inMilliseconds

const cLOOPS {.intdefine.} = 1225

# avoids some "bit-twiddling" for better speed...
const cBITMSK = [ 1'u8, 2, 4, 8, 16, 32, 64, 128 ]

macro unrollLoops(ca, sz, strtndx, bp: untyped) =
  let cmpstsalmtid = "cmpstsalmt".newIdentNode
  let szbitsid = "szbits".newIdentNode
  let strtndx0id = "strtndx0".newIdentNode
  let strt0id = "strt0".newIdentNode
  let strt7id = "strt7".newIdentNode
  let endalmtid = "endalmt".newIdentNode
  let bpintid = "bpint".newIdentNode
  let cullaid = "culla".newIdentNode
  result = quote do:
    let `szbitsid` = `sz` shl 3
    let `cmpstsalmtid` = `ca` + `sz`
    let `bpintid` = `bp`.int
    let `strtndx0id` = `strtndx`
    let `strt0id` = `strtndx0id` shr 3
  for i in 1 .. 7:
    let strtndxido = newIdentNode("strtndx" & $(i - 1))
    let strtndxidn = newIdentNode("strtndx" & $i)
    let strtid = newIdentNode("strt" & $i)
    result.add quote do:
      let `strtndxidn` = `strtndxido` + `bp`
      let `strtid` = (`strtndxidn` shr 3) - `strt0id`
  let csstmnt = quote do:
    case (((`bpintid` and 0x6) shl 2) + (`strtndx0id` and 7)).uint8
    of 0'u8: break
  csstmnt.del 1 # delete last dummy "of"
  for n in 0'u8 .. 0x3F'u8: # actually used cases...
    let pn = (n shr 2) or 1'u8
    let cn = n and 7'u8
    let mod0id = newLit(cn)
    let cptr0id = "cptr0".newIdentNode
    let loopstmnts = nnkStmtList.newTree()
    for i in 0'u8 .. 7'u8:
      let mskid = newLit(1'u8 shl ((cn + pn * i.uint8) and 7).int)
      let cptrid = ("cptr" & $i).newIdentNode
      let strtid = ("strt" & $i).newIdentNode
      if i == 0'u8:
        loopstmnts.add quote do:
          let `cptrid` = cast[ptr uint8](`cullaid`)
      else:
        loopstmnts.add quote do:
          let `cptrid` = cast[ptr uint8](`cullaid` + `strtid`)
      loopstmnts.add quote do:
        `cptrid`[] = `cptrid`[] or `mskid`
    loopstmnts.add quote do:
      `cullaid` += `bpintid`
    let ofbrstmnts = quote do:
      while `cullaid` < `endalmtid`:
        `loopstmnts`
      `cullaid` = ((`cullaid` - `ca`) shl 3) or `mod0id`.int
      while `cullaid` < `szbitsid`:
        let `cptr0id` = cast[ptr uint8](`ca` + (`cullaid` shr 3))
        `cptr0id`[] = `cptr0id`[] or cBITMSK[`cullaid` and 7]
        `cullaid` += `bpintid`
    csstmnt.add nnkOfBranch.newTree(
      newLit(n),
      ofbrstmnts
    )
  for n in 0x40'u8 .. 255'u8: # fill in defaults for remaining possibilities
    csstmnt.add nnkOfBranch.newTree(
      newLit(n),
      nnkStmtList.newTree(
        nnkBreakStmt.newTree(
          newEmptyNode()
        )
      )
    )
  result.add quote do:
    let `endalmtid` = `cmpstsalmtid` - `strt7id`
    var `cullaid` = `ca` + `strt0id`
    `csstmnt`
#  echo csstmnt[9].astGenRepr # see AST for a given case
#  echo csstmnt[9].toStrLit # see code for a given case
#  echo result.toStrLit # see entire produced code at compile time

proc benchSoE(): iterator(): int {.closure.} =
  var cmpsts = newSeq[byte](16384)
  let cmpstsa = cast[int](cmpsts[0].addr)
  for _ in 0 ..< cLOOPS:
    for i in 0 .. 254: # cull to square root of limit
      if (cmpsts[i shr 3] and cBITMSK[i and 7]) == 0'u8: # if prime -> cull its composites
        let bp = i +% i +% 3
        let swi = (i +% i) *% (i +% 3) +% 3
        unrollLoops(cmpstsa, 16384, swi, bp)
  return iterator(): int {.closure.} =
    yield 2 # the only even prime
    for i in 0 .. 131071: # separate iteration over results
      if (cmpsts[i shr 3] and cBITMSK[i and 7]) == 0'u8:
        yield i +% i +% 3

let strt = getMonotime()
let answr = benchSoE()
let elpsd = (getMonotime() - strt).inMilliseconds
var cnt = 0; for _ in answr(): cnt += 1
echo "Found ", cnt, " primes to 262146 for ", cLOOPS,
     " loops in ", elpsd, " milliseconds."


$nim --version
Nim Compiler Version 2.0.0 [Linux: amd64]
Compiled at 2023-08-01
Copyright (c) 2006-2023 by Andreas Rumpf

git hash: a488067a4130f029000be4550a0fb1b39e0e9e7c
active boot switches: -d:release

$nim r --d:release eratosthenes2.nim
Found 23000 primes to 262146 for 1225 loops in 78 milliseconds.

My CPU: CPU: 11th Gen Intel i5-1135G7 (8) @ 4.200GHz (Laptop CPU, we can do much better on an i9 desktop CPU I guess ^^).

3-2-1 (orginal) [2023-09-16T16:42:49+02:00] view original

Hey thank you for taking a look at this!

I owe you an apology. I read the article a couple days ago, then came back to it last night just before falling asleep and, blithely ignoring all the text, C&P'ed my way into the problem.

It for sure still works. You just have to, you know, read it...

Here's the M2 Pro 12 core stats


nim r -d:release -d:danger unrolled_loop_demo.nim
Found 23000 primes to 262146 for 1225 loops in 42 milliseconds.

Fastest Rust run in comparison (compiled with release)


Found 23000 primes up to 262146 for 1225 loops in 51 milliseconds.

I bet you're leaving some performance on the table without disabling danger checks.

Very cool language, I look forward to exploring it more!

dlesnoff (orginal) [2023-09-16T19:47:17+02:00] view original

I bet you're leaving some performance on the table without disabling danger checks.

Indeed! I put my laptop in performance mode, and added the flags you used + --opt:speed. I tried -d:lto too, but it does not change anything.

Some runs here: nim r -d:release -d:danger eratosthenes2.nim Found 23000 primes to 262146 for 1225 loops in 52 milliseconds. nim r -d:release -d:danger --opt:speed eratosthenes2.nim Found 23000 primes to 262146 for 1225 loops in 48 milliseconds.

miran (orginal) [2023-09-16T21:31:10+02:00] view original

nim r -d:release -d:danger --opt:speed

You can just do nim r -d:danger, no need for other two flags with it.

Araq (orginal) [2023-09-17T09:15:39+02:00] view original

Please note that we keep improving the compiler and things will get faster for -d:release without requiring -d:danger.

Mirror of forum.nim-lang.org

10486 :: "Benchmarking the Beast" update