Hi,
Is there a way to influence GC's behavior? I can use --gc: to change the algorithm. Is it possible to play with the parameters?
Motivation: I have two almost identical files with different running time. (Please ignore any +/-10% differences here, these are one time runs, not averages.) https://github.com/petermora/nimMapBenchmarks/blob/master/test5.nim
nim c test5.nim; time ./test5
...
real 0m3.052s
user 0m1.380s
sys 0m1.677s -> roughly 1.5 sec
https://github.com/petermora/nimMapBenchmarks/blob/master/test7.nim
nim c test7.nim; time ./test7
...
real 0m1.615s
user 0m1.107s
sys 0m0.507s -> rougly 0.5 sec
Am I assuming correctly that the sys part is measuring the memory management part (since in my examples there are no file IO, just an echo)?
Just for comparison (please don't get me wrong, I'm completely happy with GC, just try to understand it), the same Rust program gives: https://github.com/petermora/nimMapBenchmarks/blob/master/test.rs
real 0m1.724s
user 0m1.717s
sys 0m0.007s
Thank you, Peter
The sys part would actually be time spent in the kernel and has nothing to do with the GC (other than anything that incidentally happens as the result of mmap()s and page faults). On my system (OS X) I cannot see any measurable difference between the sys part for both versions.
Edit: Actually, I stand (partly) corrected. There is (I think) some overhead returning the pages in system.freeOsChunks(). I'll have to investigate that more; at first I thought it was an end-of-process thing, but now I'm wondering if it may occur mid-GC.
Edit 2: This looks like a micro-benchmark artifact because you use basically no memory, but allocate a lot of throw-away seqs.It may be a useful idea still to have an option to not return pages to the OS; Nim does this to minimize its virtual memory footprint, but often that is an unnecessary optimization that can backfire.
@Jehan: Thank you for taking a look. I agree, this is a micro-benchmark. We probably want to free the memory immediately on a router, and want to be a little bit lazy with the page returning on a PC. I guess that there is no perfect setup for all cases. That's why I'm asking whether there is a way to control/suggest the behavior.
Thanks, Peter
So the good news is: with weirdUnmap = true most of the versions runs in 1.0 sec or 1.2 sec, beating Rust's 1.7 sec.
I could also achieve the exact same speed up if I left weirdUnmap as it was, and increased ChunkOsReturn constant from 1 MB to 8 MB. If I understand the role of this constant correctly, then an unused chank is returned to the OS if it is bigger than this size. My benchmark is very special because of the small allocations. However, a webserver could have similar characteristics (serving small files, converting data to json, etc), and webserver benchmarks are popular these days.
Do you think that somehow increasing this ChankOsReturn parameter would be reasonable? Could we have a smart (and obviously quick) logic which realize the recent allocation pattern and increases it automatically if needed?
Update: in this special benchmark with ChunkOsReturn = 8 MB there is no page return to the OS, if ChunkOsReturn = 1 MB then there are exactly 6000 page returns.
Thanks, Peter