nimforum mirror - Is ORC considered production-ready? What is the consensus on its use?

termer (orginal) [2023-05-02T04:00:31+02:00] view original

I've been using ORC ever since it arrived in the stable release channel, and generally have had few problems, but every once in a while I run into some severe bug that causes a crash, usually via SIGSEGV, and usually related to multithreading. Given this is not the default memory management strategy at the moment, it can't be expected to be as stable as refc, but I've run into enough issues frequently enough to be unsure of it.

Since ORC will become the default memory management strategy in Nim 2.0, I'm wondering how much focus there is on it, and what the consensus is. I know that there are a lot of projects right now that crash under ORC, including Prologue with threads on (I think the same is true for Jester), which already will cause loads of issues when Nim 2.0 is released.

I also find that ORC's cycle collector seems to need some manual help to be called sometimes, and I get better memory usage when manually calling GC_runOrc periodically. In general, I'm pretty clueless as to how it actually behaves, and what I should do to coax it into collecting more reliably. There is very little documentation on its behavior.

This post isn't to complain, but as someone who isn't a compiler engineer, I'd to hear what some other people think about the state of ORC and its readiness for use, especially in the context of multithreading and general stability.

guzba (orginal) [2023-05-02T07:35:07+02:00] view original

I have found that ORC + multithreading do not mix in the way one might expect imo. The mental model from Java, Go, C# etc is not transferable for those that expect it. Took me a long time to really internalize that for some reason when coming from that background myself.

Possibly you already know all of this but I'll put it here in case others are interested or if maybe some could correct my understanding.

It is not correct or safe to share a ref object between multiple threads. Doing so will result in broken behavior for ARC, even if you manually manage the ref count since it cannot be bulletproof as it is not atomic nor lockable. For ORC there are easy to create crashes since iirc the cycle check list is a threadlocal. Marking ref objects as {.acyclic.} will resolve the ORC threadlocal cycle check list issue but then you still need to be both careful not to create cycles and still cannot share references across threads safely.

Assuming a ref object is only ever used / referenced from one thread and carefully moved across threads, ORC and multithreading can work without issue in my experience. I do not have issues, crashes etc in Mummy in my usage. I was quite careful with how I managed the one or two ref types I have, only using them internally and just manually managing memory for shared resources.

While the care needed to give ref objects in a multithreaded setting is different from other langs, the shared heap is still such a huge improvement over refc that I'm very happy with what is now possible.

Araq (orginal) [2023-05-02T08:54:56+02:00] view original

Making the refcounting atomic isn't that hard (there are people who use an unofficial atomic ARC mode) but the cycle collector is quite tricky to multi-thread.

I'm more interested though in ensuring that plain old locking works with ref types. It's quite unexplored territory but it makes sense: You leave out the atomic instructions for a single refcount field but you protect it and other RCs at the same time with a plain lock. Single threaded case remains fast and the multi-threaded case has a higher chance of actually being correct.

Clonk (orginal) [2023-05-02T14:56:44+02:00] view original

It is not correct or safe to share a ref object between multiple threads

Isn't that what channels are for ?

jasonfi (orginal) [2023-05-02T15:17:50+02:00] view original

I use channels for message passing under ORC mm. It works great.

Zerbina (orginal) [2023-05-02T17:08:36+02:00] view original

Regarding documentation about ORC, there's the official announcement blog post plus this guest entry.

For more details on the technical and practical side of things, you can also refer to this documentation (Disclaimer: I authored it). As of now, it still applies to Nim for the most part.

The key points of interest about ORC are:

the cycle collector is thread-local and uses trial deletion for detecting/freeing reference cycles

only cells registered as potential cycle roots are considered by the collector

a cell is registered as a potential cycle root when a reference (ref T) to it is destroyed and the following conditions are met
- the cell hasn't been registered as a potential cycle root yet
- the reference counter is > 1
- the compiler couldn't prove that no reference cycles through T are possible (the analysis can be overridden with the acyclic pragma)

cycle collection is only run automatically when a new cycle root is registered and the number of registered potential cycle roots exceeds a dynamic threshold

before invoking the cell's destructor, the cycle collector resets outgoing edges (ref locations part of it) to nil

As a consequence of the collector removing outgoing edges (i.e. ref locations) through which reference cycles are possible, one needs to watch out with the following:

type Cyclic = object
  a: ref Cyclic
  ...

proc `=destroy`(x: var Cyclic) =
  if x.a != nil: # check the type is initialized
    # ^^ this won't work if `x` is destroyed by the cycle collector
    ...

As @guzba mentioned, if one is very careful with ref types and ensures that whole subgraphs are only owned by a single thread at a time, it's possible to use ORC when multi-threading. The threshold is (currently) a non-atomic and unguarded global, however, so performing operations relevant to the cycle collector (copying, sinking, or destroying a ref through which reference cycles are possible) in multiple threads leads to, strictly speaking, undefined behaviour.

It's important to note that there exists a long standing (since 2 years, at least) bug with the Nim compiler's cyclic type detection logic, that causes all compound and seq types not explicitly marked with acyclic being treated as cyclic. In other words, types like ref array[1, int], ref (int, int), and ref seq[int] are currently all considered relevant to the cycle collector.

Finally, since the automatic reference counting used for ARC and ORC is build upon the lifetime-tracking-hook mechanism, it is also affected by bugs and issues with the latter.

Araq (orginal) [2023-05-02T17:52:00+02:00] view original

It's important to note that there exists a long standing (since 2 years, at least) bug with the Nim compiler's cyclic type detection logic, that causes all compound and seq types not explicitly marked with acyclic being treated as cyclic.

That's news to me. I mean, I've seen a recent bug report about it and I'm working on it but it's not 2 years old. :-)

The threshold is (currently) a non-atomic and unguarded global ...

Good catch. This should be thread local.

Zerbina (orginal) [2023-05-02T18:04:09+02:00] view original

That's news to me. I mean, I've seen a recent bug report about it and I'm working on it but it's not 2 years old. :-)

The problematic lines, 395-397 in compiler/types.nim, were introduced by this PR, which was merged on May 12th 2021 (which was almost two years ago).

juancarlospaco (orginal) [2023-05-02T18:11:20+02:00] view original

"ARC with Atomic mode" would be very worthy to add, and remove old unused GC.

termer (orginal) [2023-05-03T06:18:38+02:00] view original

Thank you for this, @Zerbina. I think a lot of this should be put into the official documentation, as it would clear up a lot of confusion with it.

arnetheduck (orginal) [2023-05-06T11:13:46+02:00] view original

"Production ready" means a lot of different things for a different people, but the way we have come to look at the roadmap is more or less:

ORC represents a major shift in the garbage collector which is a central piece of the language - the "minimum viable release" level of quality is that the compiler actually uses it by default and that the test suite passes with both refc and orc, and trivial bugs and issues have been addressed

ORC changes everything because it changes how to write well-written Nim - code is always a collaboration with the language - when the language changes, that collaboration is upset

ORC is one small component in a sea of things that are important for the ecosystem - before ORC, you had refc and an ecosystem of working libraries - you switch to ORC and you have an ecosystem of one new component and a bunch of obsolete libraries - including the standard library!

anyone approaching Nim at this stage will be significantly disappointed: when you have a problem to solve, ORC or not doesn't really matter that much: your access to well-written libraries matters a lot more because you don't want to be reinventing the wheel or writing the 32nd functional library all over - this is a colossal waste of time and a major barrier when you want to start using the language for a project that has an agenda of its own

So, with that out of the way, the way we approach production readiness is:

let the dust settle around the 2.0 release - at this stage, we report bugs and keep an eye on things to understand where the porting difficulties will lie

once 2.0 is out, start adjusting our libraries to work well with it - this represents a significant investment in time and resources: we have a well-oiled application suite and moving to ORC will take time - for comparison, the switch from 1.2 to 1.6 took so long that we missed 1.4

start updating libraries and code one by one - this process will lead to improvements in the standard library, the libraries themselves and ORC - this process is critical for any feature because this is where the feature actually gets to see real-world usage

somewhere around 2.2.X, you will start seeing the ecosystem being "orc-production-ready" as opposed to it being a "mostly ready feature in a sea of expected changes".

So, what can help this process, from the point of view of someone that has a lot of Nim code and coders around?

more frequent releases - this above all is critical: the releases as it stands today are way too big and not frequent enough - it's hard to make the upgrade when the above dance between language and ecosystem takes 1-2 years for every step - it slows us down and it slows Nim down

maintenance of std lib in lock step with new language features - features that are added to the language without updating the std library to follow suite should be banned outright: take lent - added in .. 1.2 maybe? minimally usable around 1.6.6 which is after Optional got it and a few bug-fix releases passed

tooling that makes libraries as easy to use as bundled code - another big item: it allows the decoupling of library development timelines from the language release cycle and allows each project to update things on their own timeline - insanely powerful efficiency hack.

Fortunately, many of these things are well underway, in particular on the tooling front - if I have a wish for 2.2, it's that it happens 3 months after 2.0 - there exists a trivial way to get there: make time-based releases instead of feature-based releases - if a feature isn't ready for the release, it gets cut out from it and that's the end of the story, with no compromises: there will be another release on a predictable date so the feature gets another chance soon without having to compromise on quality just to ship it.

elcritch (orginal) [2023-08-04T03:42:49+02:00] view original

It's a bit later, but it does seem like the stdlib's async library will need some TLC to work well with ORC. Future's cause all sorts of chaotic cycles. Ideally the core async library will be ARC compatible itself, which would make it more deterministic.

Also, using ORC on embedded is tricky since the cycle collector doesn't run often enough. There used to be a "debug mode" which ran the cycle collector more often, but I couldn't find it the last time I looked. That may be an important property to tune in the future as well.

elcritch (orginal) [2023-08-04T03:45:28+02:00] view original

On that topic, @arnetheduck is Chronos non-cyclic / ARC capable now? I recall reading that it was, but would be curious if there have been updates in that world. Especially it looks looks like there was some big refactoring in Chronos recently.

elcritch (orginal) [2023-08-04T03:53:26+02:00] view original

Atomic refcounting works, however, because the access to heap memory is protected by atomic operations even when the ref variable goes out of the scope, and thus prevents races.

It's very important to note that only protects against races in the atomic count, not data races in general.

In my opinion, relying on just atomic-ref's is an easy way to lure developers into a false sense of security like is seen in Golang. Essentially you have one of: read only data (atomic sharedptr wrappers), movable data with single ownership (Isolate[T]), or you need some sort of locking mechanism around the data in "atomic chunks" (locks).

arnetheduck (orginal) [2023-08-04T08:02:43+02:00] view original

On that topic, @arnetheduck is Chronos non-cyclic / ARC capable now? I recall reading that it was, but would be curious if there have been updates in that world. Especially it looks looks like there was some big refactoring in Chronos recently.

chronos has been acyclic for a good while for memory efficiency reasons - ie even with refc, it is a lot better to avoid cycles... as to arc, we don't test that specifically and I don't think we ever will.

I imagine that in some future when orc is (more) stable we'll start using / supporting that, but ARC looks like a niche compromise that will never work quite well because developers will keep shooting themselves in the foot with it and then complain it hurts.

elcritch (orginal) [2023-08-04T21:33:21+02:00] view original

Thanks! To be clear I don’t think ARC should be officially supported. Rather as you mention avoiding cycles is more efficient.

Though I do plan to try and use Chronos with ARC on embedded someday. That’s a niche case though and requires devs to design carefully.

Hmmm, it possibly could be useful in audio or robotics where real-time networking matters.

termer (orginal) [2023-08-05T18:32:59+02:00] view original

It's a bit later, but it does seem like the stdlib's async library will need some TLC to work well with ORC. Future's cause all sorts of chaotic cycles. Ideally the core async library will be ARC compatible itself, which would make it more deterministic.

I imagine in the coming months that there will be quite a reckoning when it comes to projects depending on asyncdispatch. I haven't tested them on 2.0.0 yet, but last time I checked, things running on Jester and Prologue crash under ORC. We shall see. I'm hopeful that the bugs get ironed out quick now that it's the default, though.

Araq (orginal) [2023-08-05T22:09:01+02:00] view original

I know a couple of projects that use asyncdispatch with ORC successfully for months without any crashes.

termer (orginal) [2023-08-08T17:10:43+02:00] view original

@Araq I think Jester has fixed this issue since last time I checked on it, which is good. Prologue still has issues with ORC and threads, and they all still have leaking issues with asyncdispatch (asyncnet?), but I think things are looking better than they were before this release.

Jester doesn't leak because it avoids using asyncdispatch at all if it can (and any allocation that it can avoid for that matter). I'll withhold further judgements until I get the chance to re-test things on Nim 2.0.0, since it looks like things may be more stable than they used to be.

Mirror of forum.nim-lang.org

10155 :: Is ORC considered production-ready? What is the consensus on its use?