I was reading this: Golang's Real-time GC in Theory and Practice and thought I'd try the same code in Nim.
Here's my initial cut, anyone see anything that should be changed/improved to make the benchmark more comparable/meaningful?
# run with 'nim -c -r -d:release bm.nim'
import strutils
import times
const windowSize = 200000
const msgCount = 1000000
type msg = ref array[1024, byte]
type buffer = ref array[windowSize, msg]
var worst: float
proc mkMessage( n: int ): msg =
result = new( msg )
for i in items( result[] ):
result[i] = byte( n )
proc pushMsg( b: var buffer, highID: int ) =
let start = cpuTime()
let m = mkMessage( highID )
b[highID mod windowSize] = m
let elapsed = cpuTime() - start
if elapsed > worst:
worst = elapsed
proc main() =
var b = new( buffer )
for i in 0..<msgCount:
pushMsg( b, i )
let worst_ms = formatFloat( worst*1000, format=ffDecimal, precision=2 )
echo( "Worst push time: ", worst_ms, "ms" )
when isMainModule:
main()
I also don't know enough about the various garbage collectors to ID which would produce the best numbers but I thought that would be good information to include in the top comment block, e.g. something like:
# Try with nim -c -r -d:release --gc:stack bm.nim for best performance!
Once there are no concerns/objections I'll submit it back to their repo for inclusion. Thanks!
Thanks for the link, sounds interesting, will read soon...
For your test code, I am not really sure what the purpose is. Nim's realtime GC is tunable, so you can get very short pause time with a tradeoff in throughput. See
http://nim-lang.org/docs/gc.html
You seems to test only only pause time for the default, which may be more tuned to throughput. (For my understanding, for most software pause times are not critical, or at least pause times of a few ms are fully OK, as for action games. For hardware control pauses may be critical, but if we do not allocate GC memory in critical parts, then the GC will not run in that critical part.) And when you are testing, you may test the boehm Gc as well.
I hope my understanding is not too wrong, I have newer spent time investigationg or understanding the GC.
For your code itself there is some room for improvement of course, for example
for i in items( result[] ):
result[i] = byte( n )
should be writable as
for i in mitems( result[] ):
i = byte( n )
I was planning on porting these benchmarks to Nim myself, happy to see that you beat me to it :)
I will play around with your version and see if I can speed it up.
Here is my version:
# Compile and run with 'nim c -r -d:useRealtimeGC -d:release main.nim'
import strutils
import times
const
windowSize = 200000
msgCount = 1000000
type
Msg = seq[byte]
Buffer = seq[Msg]
var worst: float
proc mkMessage(n: int): Msg =
result = newSeq[byte](1024)
for i in 0 .. <result.len:
result[i] = byte(n)
proc pushMsg(b: var Buffer, highID: int) =
let start = epochTime()
let m = mkMessage(highID)
b[highID mod windowSize] = m
GC_step(10_000) # 10ms
let elapsed = epochTime() - start
if elapsed > worst:
worst = elapsed
proc main() =
GC_disable()
var b = newSeq[Msg](windowSize)
for i in 0 .. <msgCount:
pushMsg(b, i)
let worst_ms = formatFloat(worst * 1000, format=ffDecimal, precision=2)
echo("Worst push time: ", worst_ms, "ms")
echo(GC_getStatistics())
when isMainModule:
main()
Things i've changed:
Strangely, switching to seq causes an overall slowdown (the max push time is unaffected, but the application takes longer to complete).
On my machine the typical worst push time is 10ms, both for my version of the code and yours. The addition of GC_step doesn't really help, but it is nice to show off. Go 1.5 (maybe I should upgrade) gets 8ms. So Nim is 2ms off, which isn't bad. As pointed out by /u/matthieum on Reddit, the power of Nim is the fact that you can specify exactly where the GC should run (as I've done using the GC_step call).
BTW, stack isn't a GC.
Here is my version, I switched to measuring nano seconds to get numbers different from 0:
# Compile and run with 'nim c -r -d:useRealtimeGC -d:release main.nim'
import strutils
#import times
include "$lib/system/timers"
const
windowSize = 200000
msgCount = 1000000
type
Msg = seq[byte]
Buffer = seq[Msg]
var worst: Nanos
proc mkMessage(n: int): Msg =
result = newSeq[byte](1024)
for i in 0 .. <result.len:
result[i] = byte(n)
proc pushMsg(b: var Buffer, highID: int) =
let start = getTicks()
let m = mkMessage(highID)
# seqs copy, we don't want a copy here, so I used 'shallowCopy'.
# Alternatively the 'ref array' version could be used.
# Note that due to Nim's design we could also *re-use* the memory
# completely, eliminating the GC runs but this would make this
# benchmark pointless.
shallowCopy(b[highID mod windowSize], m)
GC_step(500) # 0.5ms, but GC never takes this long.
let elapsed = getTicks() - start
if elapsed > worst:
worst = elapsed
proc main() =
GC_disable()
# The MS backup collector is only required for cylic data structures,
# but in this program we generate no cycles. Nim allows us to specify this
# behaviour directly, so IMO it's fair to disable it. Nim's GC docs also
# mention this fact, it's not undocumented voodoo:
GC_disableMarkAndSweep()
var b = newSeq[Msg](windowSize)
for i in 0 .. <msgCount:
pushMsg(b, i)
# nano seconds into ms:
echo("Worst push time: ", worst, "nano seconds")
echo(GC_getStatistics())
when isMainModule:
main()
On my machine I get to roughly 23000 nano seconds.
Take with a big grain of salt!
on my win7 64-bit system
running @Araq's version 25 times
25 Count
204,940 Min
975,141 Mean
8,422,127 Max
1,601,464 StdDev
So a fairly wide deviation of values (and a much slower system).
Follow on comments:
Removed extraneous spaces after ( and before ).
You'll take the spaces from my cold, dead fingers. ;)
Rewrote code to make it more idiomatic.
Using seq[byte] instead of ref array[1024, byte] (this is not idiomatic at all)
Tracking. I used the array/ref array because that seemed a more comparable data structure to Go's array & slice types but I agree that idiomatic code is probably better for a benchmark like this.
Switched from cpuTime to epochTime, the former only measures CPU time(!).
I originally used epochTime but the STL docs say:
This may be more useful for benchmarking than epochTime.
Thanks for all the feedback everyone!
Ok, on my Mac this one produces better timings.
# Compile and run with 'nim c -r -d:useRealtimeGC -d:release main.nim'
import strutils
#import times
include "$lib/system/timers"
const
windowSize = 200000
msgCount = 1000000
type
Msg = seq[byte]
Buffer = seq[Msg]
var worst: Nanos
proc mkMessage(n: int): Msg =
result = newSeq[byte](1024)
for i in 0 .. <result.len:
result[i] = byte(n)
proc pushMsg0(b: var Buffer, highID: int) =
# warmup:
let m = mkMessage(highID)
shallowCopy(b[highID mod windowSize], m)
proc pushMsg1(b: var Buffer, highID: int) =
# with benchmarking:
let start = getTicks()
let m = mkMessage(highID)
shallowCopy(b[highID mod windowSize], m)
let elapsed = getTicks() - start
if elapsed > worst:
worst = elapsed
proc main() =
# Don't use GC_disable() and GC_step(). Instead use GC_setMaxPause().
GC_disableMarkAndSweep()
GC_setMaxPause(300)
var b = newSeq[Msg](windowSize)
# we need to warmup Nim's memory allocator so that not most
# of the time is spent in mmap()... Hopefully later versions of Nim
# will be smarter and allocate larger pieces from the OS:
for i in 0 .. <msgCount:
pushMsg0(b, i)
# now after warmup, we can measure things:
for i in 0 .. <msgCount:
pushMsg1(b, i)
echo("Worst push time: ", worst, " nano seconds")
echo(GC_getStatistics())
when isMainModule:
main()
On my slower machine, I get latencies of max 0.3ms.
For comparison purposes, @Araq's second solution gives me
25 Count
272,765 Min
1,271,322 Mean
7,401,332 Max
1,503,168 StdDev
which over 25 tests is not much different to the first test.
Turning on cycle detection doesn't seem to affect the pause times for me. I still get sub-millisecond pauses for Araq's Nim snippet.
This is the snippet I'm using:
# Compile and run with 'nim c -r -d:useRealtimeGC -d:release main.nim'
import strutils
#import times
include "$lib/system/timers"
const
windowSize = 200000
msgCount = 1000000
type
Msg = seq[byte]
Buffer = seq[Msg]
var worst: Nanos
proc mkMessage(n: int): Msg =
result = newSeq[byte](1024)
for i in 0 .. <result.len:
result[i] = byte(n)
proc pushMsg0(b: var Buffer, highID: int) =
# warmup:
let m = mkMessage(highID)
shallowCopy(b[highID mod windowSize], m)
proc pushMsg1(b: var Buffer, highID: int) =
# with benchmarking:
let start = getTicks()
let m = mkMessage(highID)
shallowCopy(b[highID mod windowSize], m)
let elapsed = getTicks() - start
if elapsed > worst:
worst = elapsed
proc main() =
# Don't use GC_disable() and GC_step(). Instead use GC_setMaxPause().
# GC_disableMarkAndSweep()
GC_setMaxPause(300)
var b = newSeq[Msg](windowSize)
# we need to warmup Nim's memory allocator so that not most
# of the time is spent in mmap()... Hopefully later versions of Nim
# will be smarter and allocate larger pieces from the OS:
for i in 0 .. <msgCount:
pushMsg0(b, i)
# now after warmup, we can measure things:
for i in 0 .. <msgCount:
pushMsg1(b, i)
echo("Worst push time: ", worst, " nano seconds")
when isMainModule:
main()
@Varriount
You are right, for your example I get pause times between 44 and 110us on Linux. While for Araqs first example GC_disableMarkAndSweep() was necessary to get so small values.
@Araq
Well you can always switch it off and see what happens,
And what may happen? I assume that some cyclic mem may not be freed, so one may get a slow memory leak, with maybe out of mem errors after some months of uptime?
Breaking up cycles via ptr -- well before that we have to detect the cycles of course :-)
Thanks @dom96!
It kept sliding down my todo list unfortunately.
For reference, here is a comparison of the performance of most of the language's in this benchmark:
https://gist.github.com/dom96/77b32e36b62377b2e7cadf09575b8883