This is great!
Could you elaborate on shared heap and how memory is passed from one thread to another?
Does spawn transfer ownership of its arguments and returns ownership of its result? Would love to see official examples.
Does spawn transfer ownership of its arguments and returns ownership of its result?
Exactly. The ownership transfer is also done for channels' send and recv.
We are working on a better thread pool and maybe a more light-weight channel implementation. I'll post some examples here when they are ready.
Threadpool wasn't ported and I'll
This is awesome, and a great overview thanks for writing it up.
I am also curious about the shared heap, the implication from your post is that you are still sticking to the "one heap per thread" model and that we will continue to restrict the sharing of memory between threads. Is that the case?
I am also curious about the shared heap, the implication from your post is that you are still sticking to the "one heap per thread" model and that we will continue to restrict the sharing of memory between threads. Is that the case?
No. The heap is now shared as it's done in C++, C#, Rust, etc etc. A shared heap allows us to move subgraphs between threads without the deep copies but the subgraph must be "isolated" ensuring the freedom of data races and at the same time allowing us to use non-atomic reference counting operations. How to ensure this "isolation" at compile-time was pioneered by Pony and we can do it too via our owned ref syntax. However, we will likely do it at runtime because it's simpler and a variation of the "cycle collection" algorithm. The pieces fit together in a marvelous way. :-)
This looks awesome!
Can you give more info on what you mean when you say that this new GC will provide a “deterministic memory management”?
For example, if I create a local seq or string variable on a procedure A, and that variable is only used within that procedure A or at most is used by procedures called within that procedure A, will it’s memory be freed immediately when the procedure A is done (and before other code is executed)? I guess my question is if this will behave like a shared pointer in C++ (or perhaps even like a unique pointer in certain cases)?
That is, are there separate, non deterministic garbage collection events? Is this suitable for embedded, hard real-time code?
Also, does this require anything that is not in nim v1.0?
For example, if I create a local seq or string variable on a procedure A, and that variable is only used within that procedure A or at most is used by procedures called within that procedure A, will it’s memory be freed immediately when the procedure A is done (and before other code is executed)?
Correct, it's very close to how C++ works.
That is, are there separate, non deterministic garbage collection events? Is this suitable for embedded, hard real-time code?
Yes, embedded, hard real-time code is supported and was a design goal. There are no separate GC events.
Also, does this require anything that is not in nim v1.0?
It adds a .cursor pragma that v1.0 can emulate via the ptr T type. Everything else is in the language spec for v1.0, but the implementation is in the Nim development branch. You need to get a nightly build to get it.
Just to make sure I'm not confused - there are no changes to the source code needed, though "sink" and "lent" annotations do provide optimization? And it's independent of the bacon/dingle "owned" system (and is it going to stay or is it deprecated and going away?)
In what way does "async" need to be ported to arc? Is it just a performance thing, a cycle thing, or is it a deeper issue?
Just to make sure I'm not confused - there are no changes to the source code needed, though "sink" and "lent" annotations do provide optimization?
Correct.
And it's independent of the bacon/dingle "owned" system (and is b/d going to stay or is it deprecated and going away?)
It's independent of it but the owned ref idea is still very useful and will find its way, somehow. Both --gc:arc and --newruntime are implemented with the same technology, fixing a bug for --gc:arc most likely also fixes a bug for --newruntime.
In what way does "async" need to be ported to arc? Is it just a performance thing, a cycle thing, or is it a deeper issue?
On Unix I got the async tests to work but it leaks memory, probably because of cycles. On Windows I got crashes instead for yet unknown reasons... It's hard and takes time, but the number of bugs is finite, right? ;-)
Used latest devel (git hash af27e6bdea63bbf66718193ec44bc61e745ded38), Linux, AMD Ryzen 7 3700X.
All results here are mean values, not minimal ones (so take that into the account).
Used hyperfine to benchmark binaries.
Compiler | bintrees_gc (refc) | bintrees_gc (arc) | bintrees_manual withRC | bintrees_manual |
---|---|---|---|---|
GCC 10.1.10 | 13.857s | 5.664s | 5.516s | 4.853s |
GCC + LTO | 13.303s | 5.764s | 4.934s | 3.980s |
Clang 10.0.0 | 15.639s | 6.890s | 6.443s | 5.938s |
Clang + LTO | 12.762s | 6.124s | 6.188s | 5.472s |
(And yes, I don't know how Clang + LTO with ARC is faster than manual + withRc :D, but I checked this pair twice and indeed got similar results)
Take all of these results with a grain of salt since I also had about 5-15% background CPU usage and, well, benchmarks can't always be reliable :)
P.S.: I think we need to implement proper grid tables in our RST parser :)
Interesting, did you compile with -d:danger --panics:on ? Also watch out, the allocator now uses 16 byte alignments iirc and for reasons currently unknown to me -d:withRC uses much more peak memory than ARC.
Also a fun fact: Even without .acyclic ORC has the same performance than ARC on this benchmark. For me at least, ymmv.
bintrees_gcs_gcc_arc.bin - 5.51s ± 0.057s
bintrees_gcs_clang_arc.bin - 6.34s ± 0.11s
bintrees_manual_gcc_withrc.bin - 4.80s ± 0.10s
bintrees_manual_clang_withrc.bin - 5.91s ± 0.10s
bintrees_gcs_gcc_arc_lto.bin - 5.77s ± 0.06s
bintrees_gcs_clang_arc_lto.bin - 6.02s ± 0.04s
bintrees_manual_gcc_withrc_lto.bin - 4.30s ± 0.08s
bintrees_manual_clang_withrc_lto.bin - 5.93s ± 0.08s
bintrees_gcs_gcc_refc.bin - 13.61s ± 0.13s
bintrees_gcs_clang_refc.bin - 15.56s ± 0.17s
bintrees_gcs_gcc_refc_lto.bin - 12.91s ± 0.08s
bintrees_gcs_clang_refc_lto.bin - 12.80s ± 0.12s
bintrees_manual_gcc.bin - 4.05 ± 0.06s
bintrees_manual_clang.bin - 5.22s ± 0.08s
bintrees_manual_gcc_lto.bin - 3.58s ± 0.04s
bintrees_manual_clang_lto.bin - 5.22s ± 0.09s
Seems like --panics:on helped manual quite a lot