Hello,
I read about --gc:arc here:
https://forum.nim-lang.org/t/5734
But before that there was an article on introducing ownership, similar to Rust.
Are these things completely unrelated? As I understand, --gc:arc will be the way to go right? So what will happen to owernship?
Also am I correct that --gc:arc is very similar to the memory model described here:
https://aardappel.github.io/lobster/memory_management.html
Thanks a lot!
Are these things completely unrelated?
They are related, --gc:arc is the evolution of owned ref.
So what will happen to ownership?
It is outlined here, https://github.com/nim-lang/RFCs/issues/178 https://github.com/nim-lang/RFCs/issues/177
Ownership is still coming but owned can be implemented as a library.
Also am I correct that --gc:arc is very similar to the memory model described here
Yes, it's basically C++'s / Rust's / Lobster's algorithm, as it's a pretty natural outcome of move semantics.
I have not followed the arc story too closely but I have a few questions about it. In particular I’d like to understand how it compares to the C++ memory model.
I skimmed the Lobster memory model document referenced by @Araq. In it they talk about “in-line, by-value structs”. Is this model used by arc for local, “simple” variables? That is, with arc, are Integers, floats, etc stored in the stack like they are in C++? What about strings and seqs? Are those immediately destroyed when the procedure in which they are declared exits? What about objects in general?
In another part of the Lobster memory management document they say:
Lobster combines its original (runtime) reference counting with a lifetime analysis algorithm, to get “compile time reference counting”.
Is this applicable to arc? If so, when do we get the compile time reference counting and when do we get the runtime reference counting?
Would it be correct to assume that arc would be equivalent to wrapping all variables of “non simple” types with std::shared_ptr in C++? Or perhaps with std::uniqur_ptr or a mix of both?
How deterministic is the memory management with arc? What about orc? Is there a garbage collector still needed and running in either case? Is there any runtime overhead due to the reference counting compared to what would be achieved with C++’s manual memory management?
Sorry if these questions have very obvious answers. I really think that some documentation comparing the arc (and the future orc?) memory model to the C++ model would be very useful that come from that kind of language.
Well I can only describe how arc works and I can assure you that Rust/Lobster work very similarly and none of these languages have "compile-time reference counting", strictly speaking.
Nim's integers, floats, enums, bools and objects of these and arrays of these always are "value" based types. And always have been. They are embedded into the host container. They are not necessarily allocated on the stack (though usually they are), for example:
type
O = object
a: array[2, int]
proc main =
var x = (ref O)(a: [1, 2]) # aha! we have an array here! and it's not allocated on the stack!
a is directly embedded into the O and we put it onto the heap. The x itself is stored on the stack and it points into the heap. Now the question is: When is the block inside the heap freed? Under --gc:arc/orc it's always at the end of main. Under the other GCs it's "you don't know". That is true for C++ shared_ptr, unique_ptr, Rust's equivalents and whatever Lobster's name for these things is.
Why is that? Well there is one pointer to the ref O, it's a uniquely referenced memory cell. Now let's make this program more complex:
proc main =
var x = (ref O)(a: [1, 2])
var y = x
Now we have 2 references to ref O. When it is freed? Still at the end of main. Why? Because that's how reference counting works. What's the point of move semantics then? It makes the assignment var y = x cheaper by exploiting that x isn't used afterwards. Does this affect "deterministic" memory management? No.
Ok, so about this example:
proc construct(p: int): ref O =
var x = (ref O)(a: [1, 2])
if haltingProblem(p):
result = x
else:
result = nil
proc main =
let x = construct(12)
When is the ref O object really freed? Well it depends on a halting problem, either it's freed right after construct or after main. Why is that? Because unique pointers are really a 1 bit reference counting system. Note how even uniqueness doesn't help all that much with "deterministic" memory management because the uniqueness means "0 or 1" and not "always 1". However, in practice, if you do a minimal amount of testing or reasoning about your code, the runtime profile of your code remains analysable. That's true for both classic reference counting like C++'s shared_ptr and Rust's 1-bit reference counting or various schemes in between where you optimize away more and more RC operations ("compile-time reference counting").
So why is this "better" for "hard realtime" systems than classical tracing GC algorithms or copying GCs or Jamaica's hard realtime GC? It's better in the sense that it attaches a simpler cost model to a program where some modularity is preserved. Your subsystem allocates N objects on the heap? The costs are N deallocations when the subsystem is done, regardless of other subsystems in your program.
Thank you @Araq, that is pretty clear. This matched what I would expect from a “deterministic” memory management model. I don’t take that to mean I know _exactly when me left is freed but that you can reason about it (as opposed to a garbage collector that can run more or less at any time).
What kinds of optimizations are currently done by arc? Would it be fair to say that the baseline is that everything (other than value types) is handled more or less like a shared_ptr by default but that the compiler is able to do some optimizations, which would be equivalent to using a unique_ptr in C++ where it makes sense or even not using any kind of reference counting at all (at runtime)? That is, is it possible for the compiler to completely eliminate the cost of reference counting during compilation, achieving the same performance that you could achieve if you managed your memory manually? I guess that is what it would mean to do “compile time reference counting”.
Also, how does orc change the picture?
Also it's quite impressive that your arc implementation is actually faster than GC, because it's always been the opposite (for example, Swift's abysmal performance).
I assume it's because of the compile time optimizations you do to avoid reference counting at runtime, and just insert free() during compilation, right?
for example, Swift's abysmal performance
Whatever performance you're referring to, I doubt it's because of ARC vs GC. Swift's ref-counting implementation is based on Objective-C's. Objective-C has transitioned through manual ref-counting, GC, and now ARC; the GC was uncomfortably slow, but ARC is much faster, around the same performance as the manual ref-counting. (Objective-C has been ref-counted since sometime in the 1990s.)
FWIW, I'm using homemade atomic ref-counting in the C++ codebase I now work on, and I've never noticed the retain/release functions being significantly 'hot' in performance tests. Maybe because they're testing large-scale stuff like database access and networking, not tiny artificial things that spend all their time creating objects ;-)
the biggest impact is the fact that Nim's ARC does not use atomic reference counting, instead a complete SCC will be moved between threads
Just watch out that the moving doesn't become the critical path! Erlang does this, by copying the data between its 'process' heaps, but in the early days of Couchbase Dustin Sallings found that to be a major performance killer in CouchDB. (And part of the reason Couchbase abandoned CouchDB, or rather tore it into pieces and replaced half of them with C code.)
I think you've mentioned Pony elsewhere, though — their approach sounds much better; transfer ownership of a heap object between threads without actually having to copy the bytes around. That would be awesome to have in Nim.
I think you've mentioned Pony elsewhere, though — their approach sounds much better; transfer ownership of a heap object between threads without actually having to copy the bytes around.
Yes, that's what we're working on.
I did a test with and without --gc: arc
with the same parameters
DEBUG mode without: the program is smaller but much slower 1.2 MB with: the program is larger 1.4 MB but much more fluid PROD mode without: 523 KB very slightly slower with: 602Kb
I did not find a report on the subject !!! can you please explain
Also can anyone paste a link to the tree benchmark? Can't find it
It compared the performance of a tree implementiation using gc:arc, boehm, and something else. Was pretty interesting.