This is connected to https://forum.nim-lang.org/t/4964, but I think it merits its own thread
I've been away from Nim and this forum for some time so excuse me if the questions asked in this post have been asked before. One of the things I love about Nim and attracted me to it is the unique, customisable GC. This allows Nim to be used on all 3 main programming use-cases:
Now, @Arak is working on the newruntime, which -the way I understand it (and please correct me if I'm wrong)- is a kind of memory-borrowing mechanism, similar to Rust's. My understanding is that this will cover, for now, use-case #3 above. However, the plan is to expand this in the future so that it covers all 3 use-cases and completely replaces the GC.
Here's the thing. I've played with Rust a little bit and one of things that put me off it, apart from the syntax, is the ownership/borrowing mechanism. This is great for use-case #3 as you can have safe memory management, an annoyance for use-case #2 and a huge overhead for use-case #1 where I mostly work. When I write solutions for my clients, I want to use my time and effort for creating those solutions, not worrying about which lifetime owns which pointer. The GC is great for that.
So my question is this: If my understanding is correct, then why should I invest in Nim when I won't have the advantage of the GC and instead I could use another language with the same memory-management mechanism as Nim but with a huge community behind it, backed by Mozilla?
Arak
s/k/q
If my understanding is correct, then why should I invest in Nim when I won't have the advantage of the GC and instead I could use another language with the same memory-management mechanism as Nim but with a huge community behind it, backed by Mozilla?
Here's the thing. I've played with Rust a little bit and one of things that put me off it, apart from the syntax, is the ownership/borrowing mechanism. This is great for use-case #3 as you can have safe memory management, an annoyance for use-case #2 and a huge overhead for use-case #1 where I mostly work. When I write solutions for my clients, I want to use my time and effort for creating those solutions, not worrying about which lifetime owns which pointer. The GC is great for that.
I don't know the exact number, but I do know a lot of Nim users also work primarily in use-case #1. I would be shocked if the newruntime replaced GC while being unsuitable for higher level applications.
At the beginning the ownership system will have runtime failsafe via a GC until we feel confident that we can remove it.
The main issue today is that if you don't want the GCs you have to give up on a lot of parts of the standard library and base types like seq and strings. Also the current GC has issues for multithreaded applications.
In any case, the new runtime will come with the possibility to implement custom allocators for types, which means that you could have GCs as library and also make it easier to implement memory pools and object pools for example.
So my question is this: If my understanding is correct, then why should I invest in Nim when I won't have the advantage of the GC and instead I could use another language with the same memory-management mechanism as Nim but with a huge community behind it, backed by Mozilla?
Nim's new runtime is different from Rust's ownership/borrowing/lifetime system, and instead resembles more or less C++ shared_ptr and unique_ptr system.
An owned ref is similar to an unique_ptr, you can only move them and not copying them. These will be destroyed when they go out of scope, much like unique_ptr. All ref starts out as owned ref.
An owned ref is transparently converted into a ref when a copy of the pointer is wanted. Each ref acts as a shared_ptr in C++ sense. These are ref-counted. The only rule is that all ref that come from an owned ref must be destroyed before the owned ref is destroyed or your code will abort. See this example:
type
Node = ref object
data: int
var x = Node(data: 3) # inferred to be an ``owned ref``
let dangling: Node = x # unowned ref
assert dangling.data == 3
x = Node(data: 4) # destroys x! But x has dangling refs --> abort.
To the very least this is my understanding of Nim's new runtime. There's no borrow checker to fight with, and no lifetime annotations, only an owned annotation that you have to add yourself where appropriate (the compiler will guide you via its error messages). For more examples see Araq's original blog post ("Owned refs" section).
Also see the spec @mratsim linked above for the latest iteration of new runtime.
It's worth noting that --newruntime is only an experiment, and only until Nim v2 would --newruntime replaces the GC. Currently it's just a better --gc:none, and will have it's effectiveness evaluated during v1 to see if the benefits outweighs the costs of adding owned. In the worst case, the approach will be dropped, but we will still benefits from the improvements in the destructors system. (Nim's core devs correct me if I'm wrong).
The two mentioned use-cases are the most frequent by far.
Quote Libman :
While Rust may have an optional GC mechanism (heck, so does C), for Nim it's been the default approach from the beginning. Even if your main project needs manual memory management, you'll likely have other projects you'll want to share code with, and most programming needs do better with GC.
Exactly. Manual memory management is never as flexible as automatic MM. Nim's standard GC has been proven to be very fast with almost no impact on speed. So, a GC'ed Nim is not flawed. It's the opposite, a feature in its own.
On the other side, there is a new bandwagon: "You'll better skip the GC". A "systems programming language" (whatever that means) may not have a GC. But this is propaganda, because there is no evidence for that claim regarding single-threaded applications. Multi-threaded tasks do change the game. But then many other things are changing too, GC'ed MM is a particular aspect only.
So, why should NIM shift gear just now? It doesn't make sense. And it's a bit late. NIM should keep the focus on traits with vtables instead. It should be done before NIM 1.0. If this is not enough, some pattern matching could be added.
: While Rust may have an optional GC mechanism (heck, so does C), for Nim it's been the default approach from the beginning. Even if your main project needs manual memory management, you'll likely have other projects you'll want to share code with, and most programming needs do better with GC.
That's exactly my point. If the GC is eventually replaced in Nim by a borrowing system, one of its biggest advantages (for me, at least) will disappear.
Thanks for the info @mratsim. I full understand/appreciate the need for smart memory management for use-case #3, however what worries me is the potential removal/replacement of the GC.
: In any case, the new runtime will come with the possibility to implement custom allocators for types, which means that you could have GCs as library and also make it easier to implement memory pools and object pools for example.
Yes, having the option to choose between the GC and the borrower system would be much better IMO.
Thanks for a greatly detailed reply @leorize, much appreciated!
There's no borrow checker to fight with, and no lifetime annotations, only an owned annotation that you have to add yourself where appropriate (the compiler will guide you via its error messages).
That puts me much more at ease . A Rust-like borrow-checker would be a big turn-off.
That said, I'd like to complete the "storage classes" aka "reference types" (abbr reft) with the following system...
I'm glad to see you take a combination of the Rust and my ideas and run with it (although I haven't seen your RFC as suggested by @Araq yet).
However, the more I see this expanded especially to the idea of "lifetimes" other than what Rust calls "static" implicitly inferred global lifetimes, the more I hate it. If we need to expand beyond "static" lifetimes to make this work, I see that Nim would be just another Rust and I don't think any of us want that. The only advantage to it for Nim that I can see is mutability type safety, and as currently Nim has been mostly built around the concept that arbitrary mutability will be a given whenever the var keyword is used, it's going to be a huge change to use controlled mutability. Part of it is that the Nim ecosystem prefers using "value"/"struct" objects as the building block basis of new types rather than "reference" types built on the heap with just a pointer to them, although the second is possible it doesn't seem to be encouraged by the ecosystem; for instance the crucial string and seq types are primarily objects that contain a pointer to heap data rather than a pointer to an object on the heap that contains all the fields related to the structure. There are advantages/disadvantages either way, but one of the major disadvantages of the current system is that special overloaded proc's need to be used to manually deal with edge cases such as GC_ref/GC_unref for string and seq rather than just being able to handle them generally as just standard ref types.
Although in one of the other forum threads, @Araq has convinced me of the advantages of B/D, when fully implemented as "the team" seem to be moving toward, it is becoming more and more a subset of what Rust does as in from what I understand, "ownership" no longer just applies to the new ref type but can be applied to everything, especially objects and tuples, but if extended to those why not to all primitives as well as in SomeNumber, enum, set, proc, func (which can include the closure versions of them), etc. As I pointed out in my post, that makes it even more possible to implement the "kind of" affine type safe system similar to Rust, to which I have no objection as long as it doesn't overly impact program complexity. However, I'm afraid that it will make programs considerably more complex as per Rust.
My other objection to B/D and/or Rust ideas is that they don't work for everything, meaning we still need either GC or RC. Rust chose RC; many here seem to hate pure RC, objecting to both its speed and the fact that one can't use cyclic data with it. The reason Rust must support either RC or GC is the same as we must: some data structures such as lists usually have multi owners as to their nodes and duplication by deepCopy/"Clone" in Rust terms is too expensive for structures that may be linked millions of items deep. Rust also has a huge problem handling the captured environment for closures as it doesn't fit the ownership paradigm, but it seems to be alright (so far) for us.
According to my tests, RC speed isn't that bad when comparing apples with apples: Current GC can't efficiently be used across threads, and non-thread-safe RC is actually faster than the current GC. If the current GC were expanded at great difficulty to include being able to be used across threads, it would be slower and likely about the same speed as atomic RC. Plus there are all kinds of optimizations we can make with RC as to improving efficiency of allocations that are likely much more difficult with GC and eliminating atomic ref counts by using the simpler of the Bacon/Dingle techniques for many use cases is not difficult. I don't really see speed being an issue, just as Rust doesn't seem to see it as an issue other than offering the option of non atomic RC when it isn't necessary to make it a little faster.
So the real issue is RC not handling cyclic data. Do we really need cyclic data? Elm has taken the step of forbidding cyclic data except at the global level to make RC a future option (to perhaps be applied for output of WebAssembly) and it doesn't seem to be much of a limitation. One can still define cyclic lists/structures at the global level (deferred definition tails because Elm doesn't have pointers exposed to the programmer) if one really needs them, but in most cases there are other (and more memory efficient) ways to get the same effect. In what cases must one have cyclic data at a non-global level? I've been able to apply such things as "Ring" lists, but there is almost always an alternate data structure that turns out to be more efficient as to memory use and/or execution speed. Remember that an ordinary linked list of numbers takes as much space for the tail links as it does for the actual storage of numbers, plus there is the extra processing to "chase the tails"; a simple array/seq has almost no memory overhead, is extremely efficient and when one wants to roll around to the head, one only needs to check the index for the "len" and start over again from low.
I think there is too much arbitrary rejecting of RC because of the way C++ uses it (and Swift's not-so-good automatic RC) without fully exploring whether we can use it. Since we don't really want to have to expand GC to multi-threading use, and B/D techniques don't work for such things as lists with multi owners, I don't see that we have another choice, unless someone brilliant comes up with something that's not on the table now.
I think there is too much arbitrary rejecting of RC because of the way C++ uses it (and Swift's not-so-good automatic RC) without fully exploring whether we can use it. Since we don't really want to have to expand GC to multi-threading use, and B/D techniques don't work for such things as lists with multi owners, I don't see that we have another choice, unless someone brilliant comes up with something that's not on the table now.
Completely disagree, the "lists with multi owners" is a very theoretical use-case that I don't care about, lists usually have a terrible performance profile anyway. But again, if you want RC, atomic or not, Nim is giving you the building blocks to do exactly that.
There is no "arbitrary rejecting of RC" going on here, the reasoning behind it is very clear: GC is a form of RC, just more incomplete than a tracing GC. B/D however, is different: It turns the imperative manual memory management with its dealloc calls into a declarative style. The hope is that it encourages a design that prevents logical memory leaks (leaks that full tracing GCs don't prevent either!) by construction.
B/D follows a strict and simple rule: The lifetime on the stack determines the lifetime on the heap.
No, that's wrong. I'm sorry but you don't understand B/D, simple as that.
But again, if you want RC, atomic or not, Nim is giving you the building blocks to do exactly that.
I'm grateful for that capability and have used it witha destructor implementation when I wanted to try an algorithm that depended on list-like structures without using GC.
The real issue is that RC cannot even detect cycles so you're left with having to run a separate leak detector.
Would it help to reject cyclic data completely (except at the global level) as per what Elm does currently? Or do you see use cases where we can't live without cyclic data? According to the Elm people, cyclic data isn't a problem if you can't create it in the first place and they apparently think that they can live without it.
I see that "introducing a cycle (leak) detector" could cost a lot in cycles and is greatly to be avoided.
Introducing cycles with B/D is much harder to accomplish.
I don't see it being hard to introduce cycles with B/D at all if we want B/D to be able to handle everything that GC currently can. The following trivial example works fine with current GC, but would be a cycle that would require cycle detection or that the special destruction extensions be used if the ref were a B/D owned ref or a combination of owned ref and dangling ref:
type
List[T] = ref object
head: T
tail: List[T]
proc makeCycle[T](x: T): List[T] =
result = List[T](head: x); result.tail = result
var ones = makeCycle(1)
for _ in 1 .. 20: stdout.write ones.head, " "; ones = ones.tail
Yes, this is a list which isn't really very useful, but the same principles apply if it were a more useful structure such as a binary tree (with cycles).
It's possible to fix for B/D by using the extension(s) proposed by B/D for destruction, but those, in turn, cost cycles and don't seem to cover deepCopy if one truly needs it. Again, deepCopy extensions can be implemented in a similar way, but they also cost cycles, although I suppose that would be acceptable given that deepCopy would be needed so rarely.
As well, B/D doesn't seem to apply in all cases where one must have multiple owners such as for the sub-nodes in a binary tree, as the only alternates would seem to be deepCopy ("Clone" for Rust) which are extremely wasteful for a large tree structure, having GC as an alternate (I've come to see your objections to it so don't really want to have to revert to using that), and RC (which is Rust's solution in spite of being "incomplete").