The Arc memory management system of reference counting of heap references with the compiler improving that by eliding away some counts when determined as redundant by graph Data Flow Analysis (DFA) as almost perfect being deterministic; however, the Orc system of allowing for memory management of cyclic references is not so perfect as it is non-deterministic and relies on either a manual trigger or a memory pressure trigger to activate, and thus sticks more of Garbage Collection...
When there are no cyclic references (or when it is declared that references are not to be considered as cyclic through use of a pragma), the underlying Arc system destroys references deterministically when non-cyclic so the Orc system just adds processing when cycles are detected. Cyclic references can be collected deterministically by determining when the last external reference to the cyclic reference structure goes out of scope and injecting a GC_fullCollect() at that point. Surely, if we can trigger this manually, the compiler should be able to determine it through an algorithm and do this automatically...
I'm thinking of a wrapper for the entire cyclic data structure which deterministically tracks the total number of reference counts versus the number of edges in the data structure and automatically destroys this when the count drops below the number of edges, with this wrapper number(s) updated when reference counts are changed or nodes added anywhere in the cyclic structure. This does depend on the ability to detect cyclic data structures in order to apply the wrapper, but these are already detected in the current Orc implementation in order to cause tracing; if one couldn't reliably detect cycles, then the wrappers could always be applied even if the structure is not cyclic at a slight cost in overhead (again elided away somewhat by DFA). Detecting cycles is harder when mutation is allowed as a non-cyclic structure can be made cyclic or vice versa by mutation, such as a non-cyclic linked list being made cyclic by mutating the last node (or any node really) to refer to the head of the list (or any sub node), but this must have already been implemented in the current Orc implementation.
It certainly would be nice to avoid all non-deterministic memory management and get rid of the "stink" of Garbage Collection...
Sorry if I don't really reply to your points in detail. I've studied every cycle collection algorithm in detail that I could find. I think I found a couple of genuinely new combinations myself, there is definitely much unexplored terrain here. Especially if you combine trial deletion algorithms with ideas from generational GCs.
However, it is very hard to find something that is an actual improvement in practice, even algorithms exploiting special cases for certain benchmarks tend to not change benchmark results.
@Araq:
However, it is very hard to find something that is an actual improvement in practice ... In my experience.
Of course, you have much more practical experience in this area than I, I merely lament that Orc, when it is necessary to use it, is non-deterministic unless one takes steps to manually cause a collection when it is known that the last external reference is no more.
I seem to recall that you once said that one of the main reasons that Nim needs to handle cyclic references is that Nim's version of async/await generates cyclic references. I wonder if this is a necessity or whether there might be alternate implementations of async/await? I notice that in some languages (ie. Rust) generating cyclic value references is very difficult and requires unsafe manipulations, and in order to avoid dealing with the problem, some languages seem to prohibit using cyclic value references at all, yet it seems to be possible to implement some version of async/await in those languages.
I raise these questions because I value your research and experience and consider your opinions...
async itself is a bit overrated/overused, IMO. Most "phases" of high concurrency | parallelism are long-lived relative to their creation time. So, you can use a process pool instead of async|threads with say 10X the pool size as CPU cores and just mostly have 100 idle processes. You aren't Google with > 100 simultaneous clients and (careful) servers block you if you hit them with >100 connections. Bonus - no need for non-portable assumptions about blocking on disk reads/writes.
(Yes, yes - this is not a great idea if you are trying to "ping the whole internet", start with a giant virtual memory image, or probably 5 other special cases. Hey...Maybe that's not what you're doing, though? Everything almost always "depends on a lot".)
It's imperfect, but it's very "low tech". cligen/procpool.nim is one example (that yes, could use some Windows love/testing/hardening). If 100 processes in your (Linux) procs table bother you, use procs to aggregate them. Threads & processes are not that different. I know someone who recently gave up on async and, in frustration, used procpool instead and was very happy.
I know someone who recently gave up on async and, in frustration, used procpool instead and was very happy.
Just as a quick aside / explanation, that was me. Note for the following that I'm neither dealing with web stuff often, nor use async (pretty much ever). So my frustration is possibly just lack of experience.
The above refers to me wanting to speed up downloading a bunch of data files. I wrote a script that initially was only meant to download a few O(50) files, from an "online calculator" (for X-ray reflectivities):
https://github.com/jovoy/AxionElectronLimit/blob/master/tools/download_henke_files.nim
Due to a change of use case, if you will, I now needed O(2000) files, so I thought I would at least use the AsyncHttpClient to request multiple files concurrently. The thing was though that I felt like I had to change a ton of code in order to switch from HttpClient to AsyncHttpClient, instead of just changing signatures & attaching an async.
So I ended up going the opposite direction and used procpool to leave the meat of my code unchanged:
https://github.com/Vindaar/AxionElectronLimit/commit/87194ea29abf85c5251ac25245b68438b4232ec6
(Note that a bunch of error handling was also added that didn't exist before, which makes the diff a bit longer).
So ymmv, but can't beat the simplicity and why use a more complicated tool if the simple one works fine...