The Developer Stream was cool, at least 1 a month would be nice to have.
Everything sounds good. 👍
Any update on newruntime status? It seems that the new GC is a bit of an admission of defeat on that front, or at least an admission that the new runtime won't be ready for use with async any time soon.
Well good question. Let me try to untangle it: We have 2 ("araqsgc" and "newruntime") different, competing designs with much overlap in the implementation, both use the destructor based strings and seqs, for example. Both also focus on these goals (and more-or-less achieve them):
Whatever will support async first will "win" and so far araqsgc looks like it's winning, which means newruntime's importance will be downplayed to a new --gc:none switch for embedded development. If that happens, async should still support "newruntime", but it is not that important.
Ok, now what are the differences? How does "araqsgc" work? Will tell you later, I have to go now.
I assume you've changed your mind about the other reasons for pursuing a solution without a tracing GC, that GCs have been a source of many hard to fix Nim issues, and that GCs don't play well with each other and you want Nim programs callable from other languages with their own runtimes, etc.
I don't follow the discussions on IRC or the videos, so this is a bit of a surprise to me. It seems like now you envision --newruntime as analogous to D's -betterC switch.
@Araq:
Ok, now what are the differences? How does "araqsgc" work? Will tell you later, I have to go now.
I've been checking this thread for answers for the past two weeks since this post appeared and there haven't been any, so I have researched your "araqsgc" project to find the following (part of what I think I found is posted in the "types of gc" thread):
The "newruntime" has been discussed at length in other forum threads, and currently partially works inside the current Nim version 1.0 and the "devel" branch, with promises that work will continue and that it won't be dropped. The main limitation with the current state of "newruntime" is that is still doesn't smoothly work with multi-threading and the current multi-threading libraries (which was supposed to be one of its features/advantages), but it does look to be possible to make it work. Paraphrasing what you have said, you have pulled back a little from the "newruntime" implementation in favor of this new GC implementation due to the changes required to existing libraries to support the "newruntime", not only limited to the required addition of the owned ref keyword and other supporting programming constructs in order to make it work with those libraries.
The "araqsgc" project seems to be a "drop-in" solution to the problems of multi-threading without requiring library changes, requiring only that the araqsgc library be imported and that the "--gc:destructors" compiler flag be used.
To those reading this post: Both of these sub-projects show that @Araq continues to do the research to look for better solutions for the current "state-of-Nim" with the "newruntime" the result of his reading a paper by Bacon/Dingle describing the use of the owned ref idea and this latest version of a multi-threaded GC from his reading of the Microsoft paper describing the "mi-malloc" project. Hopefully, one of these solutions will make the use of multi-threading in Nim much easier, which is one of Nim version 1.0's main (and most difficult to work around) weaknesses...
What you say is entirely correct, albeit incomplete, so let me fill in the missing parts:
A simple mark&sweep GC has the very nice property of not adding any overhead to pointer assignments. It's fundamentally compatible with manual memory management, you can free individual pieces on your own, reducing memory pressure and then the GC runs less often. If you manage everything "manually", you can disable the GC entirely. It's a memory management "hybrid" and can be seen as "gradual" memory management.
Araqsgc offers dispose and deepDispose operations (these are currently being backported to the other GCs) for this reason. The idea is that the stdlib uses deepDispose in strategic places (async's event loop comes to mind). It can also be put into a custom =destroy.
For example, consider a "node" based data structure (json.nim, lists.nim, ropes...): You can free individual nodes (risky, but often you do know enough about your program to do that) or you can bulk-free every node in it.
To do that (mostly) safely and easily, you can encapsulate your data structure in a refcounted wrapper like Refcount[JsonNode]. This would use refcounting for a complete Json graph, not for individual objects. This seems to be a key feature for performance. A granularity that works on individual objects is almost never desired for performance.
Now this leaves us with the inherently problematic use-after-free problems. Under this scheme, they are mitigated, but not solved. But B/D shows that detecting use-after-free bugs can be changed from "detect dangerous pointer read operations" to "detect potentially dangling refs via refcounting" and maybe that's good enough. The overhead seems to be so high that you want to disable it in a production setting though. For exploit prevention you can use type-based node allocation then.
Having said that, classic B/D with owned still looks more elegant... ;-)
Maybe its good that I am coming from outside the dev team, as all of you have been buried in the current implementations and problems for so many years that maybe there is something else that is so simple that it has been missed...
Definitely true.
I've got something cooking that I think might be possible within the current general code base framework that should be compatible with owned although having owned shouldn't be necessary to make it work based on how clever the current Nim compiler is at managing data flow analysis built around the AST; another good thing about it is that it isn't a tracing GC, but I think it should be able to support some implementation of async.
Please don't repeat my mistake, no need to implement it when you can already talk about it. ;-)
@Araq:
Please don't repeat my mistake, no need to implement it when you can already talk about it. ;-)
I really do like to back up my "talking" with little "mini-experiments" in code to prove and develop the concepts, but I'll have to think more about how to do it simply, as otherwise it needs compiler help. Meanwhile...
I'll just throw the idea out and hopefully it isn't too laughable, as follows:
There, I'm done talking except for discussion. Does it merit consideration, investigation, and testing?
@Araq:
I haven't tried to use async in Nim yet because I haven't had a need for it, but I am very familiar with the concept from F#, where it works very well (but which also has a tracing GC).
I've now had a quick look at those libraries from Nim. I'm not really in a position to comment meaningfully as I haven't tried to use them, but my first impression is that they are quite "bulky" as compared to the F# async/await implementations with which I am familiar.
However, since we are now at version 1.0+ and these don't seem to be marked as experimental libraries, I don't know that changes to make it appear to be as clean as F# are possible without breaking the API's. Part of why F#'s implementation is so clean to the user is the async is implemented as a computational expression which has elements of Haskell's monads and the fact that they are slow implemented this way doesn't much matter as compared to making sequences (the F# seq) implemented with computational expressions, which makes enumeration extremely slow, as use of async is more of a "coarse grained" use. However, async/await has been crossed over to use by C# (maybe partly by "compiler magic", as I don't believe it has computational expressions) and TypeScript and adopted by JavaScript ECMASCRIPT so it seems to be possible to migrate.
I suppose that Nim's "asyncmacro" library could be considered to be the "compiler magic" applied to make it possible in Nim and with that in mind, the implementation doesn't seem that bad, but somehow we seem to have lost some of the F# elegance in being able to do something like the following:
let fetchAsync(name, url:string) =
async {
try
let uri = new System.Uri(url)
let webClient = new WebClient()
let! html = webClient.AsyncDownloadString(uri)
printfn "Read %d characters for %s" html.Length name
with
| ex -> printfn "%s" (ex.Message);
}
But as I said at the start, I haven't used the Nim async libraries enough to be able to give a valid option.It's important to see the full picture, most of what Nim's async does is probably simply hidden from your view as F# delegates the hard work to C#'s event loop implementation whereas in Nim there is little in between async and the underlying event loop. IMHO that's why you think it's "bulky".
It's also very easy to perform an incomplete, insufficient analysis on the problem. That's why I claimed "owned can do async" months ago... And maybe it can if we rewrite async a bit :P
@Araq:
It's important to see the full picture, most of what Nim's async does is probably simply hidden from your view as F# delegates the hard work to C#'s event loop implementation whereas in Nim there is little in between async and the underlying event loop. IMHO that's why you think is "bulky".
You are probably right; although I have looked at how F#'s computational expressions are implemented (investigating why seq/enumerations are so slow), I've never looked under the covers at the DotNet code used to actually implement what it uses, though that is now open source and available in DotNet.Core. I do know that F# calls into DotNet's thread pool dispatcher when it needs to use threads for any sort of multi-threading and where threading is required for async/await, so it isn't so much calling into C# facilities as just the general facilities available in DotNet.
As stated, it's beside the point of the problem of implementing either a GC that can support Nim's async in whatever it does or making it possible to use "newruntime" to do it. As you say, any working tracing shared-memory GC such as Boehm, Go, or your new "araqsgc" can be made to do the job. My point is that it doesn't look to be too hard to make the "newruntime" support a hybrid of reference counting and B/D to also do the job without requiring changes to the current async.
My idea of reasonably easily adding a combination of the different concepts to "newruntime" would seem to make it work. It combines the currently almost working owned ref's with reference counting where multiple ownership of the same data is required (especially in multi-threading) and with deepDestroy/deepDispose that I think may be able to accommodate cyclic data if desired so as to seemingly fill the bill. I've now backed off as trying to do this through "--gc:destructors" hooks and now think that these ideas would just be part of "newruntime" with the main extra requirement that there would be a little bit of extra code required to handle the two levels of destruction as in B/D and the shared-memory reference count.
I've backed off of tying into the old threadpool type deepCopy which has already been removed for "newruntime"; my point there is that one would inject the shared-memory ref counted copies at the same place where these deepCopy's used to be injected, but that they shouldn't need to have all the "compiler magic and cruft" with the new destructor based versions of seq's/string's (and closures?), perhaps with the currently forbidden copy of an owned ref becoming possible through reference counting.
You've mentioned adding some extra fields such as task ID's to async to make it work with "newruntime"; my ideas here are to try to make "newruntime" work so that changes aren't required to async to be able to work so that the accommodations would be made in one place. My thinking is that if field(s) need to be added to async tasks that likely need to be atomic in order to make it work with "newruntime", it would be better to first investigate just adding atomic ref counting to B/D in "newruntime" in such a way so that it would be used about as often as those extra task fields in order to narrow and thus simplify the things that need to be changed and tested.
I am trying to think of a way to easily test my ideas in the current code base as by dropping back to using "newruntime" would seem to eliminate the requirement for "compiler magic" that isn't already there and would only require changes within the handing of owned ref. The primary areas that need to be tested is how the "big three" of seq's, string's, and closures environments as well as ref's cross the shared memory barrier. You have mentioned back-porting deepDispose and dispose across all the memory management so that part will have to be done anyway.
You've mentioned a "race" between "araqsgc" and "newruntime" and whichever supports async first wins. Although I see that the mi-malloc allocator may be a desirable facility to have available as an option in Nim, I see that "araqsgc" is just another mark and sweep GC and I would really like to see "newruntime" win, as it is unique and more elegant.
I would really like to see "newruntime" win, as it is unique and more elegant.
Me too. Ok, here is what I know: Our async seems to work with the refcounting GC without its cycle collector with a tiny codegen patch (I don't know whether that patch is always correct). So lets assume (atomic) refcounting is sufficient. B/D is not only a "faster refcounting", it describes the spanning tree of a graph so that it's free of cycles by construction (ignoring some edge cases). This is achieved by making owned a "move-only" type but that's too inexpressive for our async. So we need a way to duplicate owned refs that still guarantees cycle freedom, statically. I know that's what you're proposing via a =deepCopy for owned but I don't know if it can work out.
@Araq' Thanks for the further data on async using ref counting GC - I didn't realize it only worked without the cycle detector and a further "tiny" code gen patch.
So we need a way to duplicate owned refs that still guarantees cycle freedom, statically. I know that's what you're proposing via a =deepCopy for owned...
Perhaps we should stop calling what I proposed as deepCopy using =deepCopy as I just grabbed that term as as being the location where "outer" ref counting would be injected. Perhaps that would be the "hook" whereby my proposal could be best implemented, maybe not...
but I don't know if it can work out.
So how to test this and what code base to test it against? I am probably able to do some patches against the current devel branch if they don't include code gen/"compiler magic" but I would prefer to leave any AST manipulations to you and your dev team as I have little experience in that area.
I have held off doing too much in the way of experiments as I would like to include ""--seqsv2", but I suppose that isn't really necessary as that was implicitly turned on in "--newruntime". I guess I'll start with some copying experiments somewhere combining use of copying seq's, string's, closures containing ref's, and ref's. It would seem to be easy to test it I just new how to "hook" into the disposal of ref's when at the end of proc's in the way that "nominal" types get the "hooks" called now.
I would like to work within the framework of the ``=destroy``, ``=``, ``=sink``, and ``=destroy`` "hooks" but as applied to ``ref``'s and not to just "nominal" types. Would this be possible?
Is there a place in the code that could make these "hooks" readily accessible to me?
Thanks for the further data on async using ref counting GC - I didn't realize it only worked without the cycle detector and a further "tiny" code gen patch.
No, it also does work without its cycle collector, the cycle collector does no harm. The codegen patch is to break up a cycle which keeps the RC component from collecting.
Is there a place in the code that could make these "hooks" readily accessible to me?
At this point I suggest you join our IRC channel so that I can guide you through the codebase.