nimforum mirror - Nim v1.1 and beyond roadmap

Araq (orginal) [2019-10-05T23:15:39+02:00] view original

Tried to write an article about it, but while we wait for it, here is a summary:

Finish the "incremental recompilation" implementation.

Make 'not nil' the default. Use warnings as a transition period.

Use a new GC that supports a shared heap and is optimized for async code. (See my "araqsgc" project.)

Make 'openArray' more flexible, 'openArray' can be part of objects. Expand the existing borrowing rule to ensure safety.

Introduce .invariant, .requires, .ensures pragmas that are ignored by Nim, but used by a new verification tool that uses the Z3 proof engine.

r3c (orginal) [2019-10-05T23:19:31+02:00] view original

mmmm TreeOfDepth benchmark :P

bpr (orginal) [2019-10-06T00:59:10+02:00] view original

Any update on newruntime status? It seems that the new GC is a bit of an admission of defeat on that front, or at least an admission that the new runtime won't be ready for use with async any time soon.

juancarlospaco (orginal) [2019-10-06T01:59:46+02:00] view original

The Developer Stream was cool, at least 1 a month would be nice to have.

Everything sounds good. 👍

Araq (orginal) [2019-10-06T07:07:58+02:00] view original

Any update on newruntime status? It seems that the new GC is a bit of an admission of defeat on that front, or at least an admission that the new runtime won't be ready for use with async any time soon.

Well good question. Let me try to untangle it: We have 2 ("araqsgc" and "newruntime") different, competing designs with much overlap in the implementation, both use the destructor based strings and seqs, for example. Both also focus on these goals (and more-or-less achieve them):

Work well with any heap size.

Work well with a shared heap.

Work well with clang's sanitizers and valgrind.

Whatever will support async first will "win" and so far araqsgc looks like it's winning, which means newruntime's importance will be downplayed to a new --gc:none switch for embedded development. If that happens, async should still support "newruntime", but it is not that important.

Ok, now what are the differences? How does "araqsgc" work? Will tell you later, I have to go now.

bpr (orginal) [2019-10-06T16:50:14+02:00] view original

I assume you've changed your mind about the other reasons for pursuing a solution without a tracing GC, that GCs have been a source of many hard to fix Nim issues, and that GCs don't play well with each other and you want Nim programs callable from other languages with their own runtimes, etc.

I don't follow the discussions on IRC or the videos, so this is a bit of a surprise to me. It seems like now you envision --newruntime as analogous to D's -betterC switch.

filcuc (orginal) [2019-10-06T21:41:41+02:00] view original

Can't wait to hear about the new gc

kodkuce (orginal) [2019-10-07T10:55:12+02:00] view original

i hope too --newruntime dosent get abandoned :)

GordonBGood (orginal) [2019-10-21T08:45:18+02:00] view original

@Araq:

Ok, now what are the differences? How does "araqsgc" work? Will tell you later, I have to go now.

I've been checking this thread for answers for the past two weeks since this post appeared and there haven't been any, so I have researched your "araqsgc" project to find the following (part of what I think I found is posted in the "types of gc" thread):

As per that thread, the "--gc:destructors" compiler option is intended so that custom implementations of memory management/garbage collection/allocators can be "hooked" in using the available (but undocumented) newObjHook and traverseObjHook.

Currently, use of the above compiler option also turns on new seq/string semantics (which also apply to the "newruntime" option where this is automatically turned on) making their use more efficient; now, in the "devel" branch there is a separated "--seqsv2" compiler option (automatically turned on for "newruntime") which can be optionally enabled (for any of the gc modes?) to see these improved destructor semantics enabled without forcing one to use "--gc: destructors" (or "newruntime").

As to what "araqsgc" is, it is a simple garbage collector built around the "mi-malloc" allocator developed by Daan Leijen, et al, of Microsoft Research and available to the open source community under the same MIT license that Nim uses. Its technical merits are described here (with links to performance measurements), and the actual repo is here. your "araqsgc" project is a simple Nim wrapper around this implementation that uses the parts of it as appropriate and hooks it all into the above mentioned "gc hooks".

The "newruntime" has been discussed at length in other forum threads, and currently partially works inside the current Nim version 1.0 and the "devel" branch, with promises that work will continue and that it won't be dropped. The main limitation with the current state of "newruntime" is that is still doesn't smoothly work with multi-threading and the current multi-threading libraries (which was supposed to be one of its features/advantages), but it does look to be possible to make it work. Paraphrasing what you have said, you have pulled back a little from the "newruntime" implementation in favor of this new GC implementation due to the changes required to existing libraries to support the "newruntime", not only limited to the required addition of the owned ref keyword and other supporting programming constructs in order to make it work with those libraries.

The "araqsgc" project seems to be a "drop-in" solution to the problems of multi-threading without requiring library changes, requiring only that the araqsgc library be imported and that the "--gc:destructors" compiler flag be used.

To those reading this post: Both of these sub-projects show that @Araq continues to do the research to look for better solutions for the current "state-of-Nim" with the "newruntime" the result of his reading a paper by Bacon/Dingle describing the use of the owned ref idea and this latest version of a multi-threaded GC from his reading of the Microsoft paper describing the "mi-malloc" project. Hopefully, one of these solutions will make the use of multi-threading in Nim much easier, which is one of Nim version 1.0's main (and most difficult to work around) weaknesses...

Araq (orginal) [2019-10-21T13:18:08+02:00] view original

What you say is entirely correct, albeit incomplete, so let me fill in the missing parts:

A simple mark&sweep GC has the very nice property of not adding any overhead to pointer assignments. It's fundamentally compatible with manual memory management, you can free individual pieces on your own, reducing memory pressure and then the GC runs less often. If you manage everything "manually", you can disable the GC entirely. It's a memory management "hybrid" and can be seen as "gradual" memory management.

Araqsgc offers dispose and deepDispose operations (these are currently being backported to the other GCs) for this reason. The idea is that the stdlib uses deepDispose in strategic places (async's event loop comes to mind). It can also be put into a custom =destroy.

For example, consider a "node" based data structure (json.nim, lists.nim, ropes...): You can free individual nodes (risky, but often you do know enough about your program to do that) or you can bulk-free every node in it.

To do that (mostly) safely and easily, you can encapsulate your data structure in a refcounted wrapper like Refcount[JsonNode]. This would use refcounting for a complete Json graph, not for individual objects. This seems to be a key feature for performance. A granularity that works on individual objects is almost never desired for performance.

Now this leaves us with the inherently problematic use-after-free problems. Under this scheme, they are mitigated, but not solved. But B/D shows that detecting use-after-free bugs can be changed from "detect dangerous pointer read operations" to "detect potentially dangling refs via refcounting" and maybe that's good enough. The overhead seems to be so high that you want to disable it in a production setting though. For exploit prevention you can use type-based node allocation then.

Having said that, classic B/D with owned still looks more elegant... ;-)

Araq (orginal) [2019-10-23T13:42:43+02:00] view original

Maybe its good that I am coming from outside the dev team, as all of you have been buried in the current implementations and problems for so many years that maybe there is something else that is so simple that it has been missed...

Definitely true.

I've got something cooking that I think might be possible within the current general code base framework that should be compatible with owned although having owned shouldn't be necessary to make it work based on how clever the current Nim compiler is at managing data flow analysis built around the AST; another good thing about it is that it isn't a tracing GC, but I think it should be able to support some implementation of async.

Please don't repeat my mistake, no need to implement it when you can already talk about it. ;-)

GordonBGood (orginal) [2019-10-23T16:35:19+02:00] view original

@Araq:

Please don't repeat my mistake, no need to implement it when you can already talk about it. ;-)

I really do like to back up my "talking" with little "mini-experiments" in code to prove and develop the concepts, but I'll have to think more about how to do it simply, as otherwise it needs compiler help. Meanwhile...

I'll just throw the idea out and hopefully it isn't too laughable, as follows:

We seem to be agreed that we would prefer not to just implement Yet Another Gar(bage) Collector (YAGarC), as we seem to have an excess of them that can be used in Boehm, GoGC, and now "araqgc", which I'm sure can be made to work. All of them will trade computational overheads, against latency, against throughput, and we've grown to not really like any of them, plus any of them can be chosen for those who truly love GC's.

This has nothing to do with allocators, of which any of the Nim native one, malloc, mi-malloc, or some other new invented one can be used as appropriate, they all do the same job in providing heap memory when asked for it and freeing it when desired along with a couple of "tweaks" like resizing and reallocating; they just again provide various trade-offs and the new "mi-malloc" seamingly a very good one.

The idea of "hooks" strikes my fancy, and in particular the newObjHook that when asked for memory returns a pointer to a free area of the requested size, but is well able to have an "extra" attached structure "head" containing anything we desire and do anything else on the side at the same time such as creating and using pointer structures, tables, etc.

I don't particularly like the transverseObjHook even when it works as it would only seem to be of use in implementing GC's as in requesting a Mark phase or a Sweep phase. Remember, We don't really want YAGarC!

We both really like the current destructor/copy/move semantics, which also is what really helped make the owned ref B/D reasonably easy to implement; however, although I still like it and think B/D can be combined with this idea in order to optimize non-shared ownership, this idea (kind of) extends it to efficiently "cloning" ownership.

So I propose a new destroyObjHook (instead of transverseObjHook or as well as) that gets called only for ref's when they go out of scope or at the end of proc's just as destructors get called for "nominal" types. If the compiler can figure out when it time to call =destroy on those types (or inject a =destroy when there isn't one), it should be able to easily do this just for the ref type. But wait for it, this isn't all...

If ref's are destroyed at the end of scope or proc if they haven't been "sink'ed" away, how do we create clones so we have multiple owners of the data? Answer: we (Oh no, that dreaded word) deepCopy them. But wait, deepCopy in its final (not the channels version, which is terrible) actually wasn't so bad, and this isn't the deepCopy where everything gets duplicated; this is the overridable =deepCopy that can be applied to ref T and is always overridden to just do a ref count! In a way, it means that copying of owned ref's is no longer forbidden, it just becomes the some owned ref with a higher separate (permanent even for production) ref count.

Now, the destroy called through the hook only destroys if there is no ref count and otherwise just decrements the ref count, and thus the object will only be destroyed once.

Oh no, you say, that sounds a lot like atomic ref counting, and that isn't particularly fast. No problem, I say, as this "cloning" only happens when one needs multiple owners such as across threads, which is (should be) quite "coarse graIned" anyway. For other uses, this can be combined with such ideas as B/D if performance turns out to be a problem for almost no overhead (in production, with that part of the "inner" ref counting turned off).

Ah, you say, but what about data cycles. Well, I say, maybe the "hooks" can support cycle detection analysis as some sort of deferred action as part of the "semi GC" role through the hooks, and the proposed options of B/D or "araqsgc" don't really handle that very well yet either.

But wouldn't this be hard to implement, I hear the (dev) crowd groan. I reply, it seems that you have implemented most of it already, and this just corrects it. Ugly deepCopy (as implemented for threadpool can be applied in exactly the same places, just it becomes beautiful as it just makes the data structures (including closures, which no longer have to be prohibited) able to be shared. REALLY REALLY UGLY deepCopy as implemented in channels gets changed to the same system.deepCopy (or system.clone if you prefer).

async becomes possible and beautiful and all is sweetness and light.

There, I'm done talking except for discussion. Does it merit consideration, investigation, and testing?

GordonBGood (orginal) [2019-10-24T00:46:04+02:00] view original

@Araq:

I haven't tried to use async in Nim yet because I haven't had a need for it, but I am very familiar with the concept from F#, where it works very well (but which also has a tracing GC).

I've now had a quick look at those libraries from Nim. I'm not really in a position to comment meaningfully as I haven't tried to use them, but my first impression is that they are quite "bulky" as compared to the F# async/await implementations with which I am familiar.

However, since we are now at version 1.0+ and these don't seem to be marked as experimental libraries, I don't know that changes to make it appear to be as clean as F# are possible without breaking the API's. Part of why F#'s implementation is so clean to the user is the async is implemented as a computational expression which has elements of Haskell's monads and the fact that they are slow implemented this way doesn't much matter as compared to making sequences (the F# seq) implemented with computational expressions, which makes enumeration extremely slow, as use of async is more of a "coarse grained" use. However, async/await has been crossed over to use by C# (maybe partly by "compiler magic", as I don't believe it has computational expressions) and TypeScript and adopted by JavaScript ECMASCRIPT so it seems to be possible to migrate.

I suppose that Nim's "asyncmacro" library could be considered to be the "compiler magic" applied to make it possible in Nim and with that in mind, the implementation doesn't seem that bad, but somehow we seem to have lost some of the F# elegance in being able to do something like the following:


let fetchAsync(name, url:string) =
    async {
        try
            let uri = new System.Uri(url)
            let webClient = new WebClient()
            let! html = webClient.AsyncDownloadString(uri)
            printfn "Read %d characters for %s" html.Length name
        with
            | ex -> printfn "%s" (ex.Message);
    }

But as I said at the start, I haven't used the Nim async libraries enough to be able to give a valid option.

Araq (orginal) [2019-10-24T10:52:14+02:00] view original

It's important to see the full picture, most of what Nim's async does is probably simply hidden from your view as F# delegates the hard work to C#'s event loop implementation whereas in Nim there is little in between async and the underlying event loop. IMHO that's why you think it's "bulky".

It's also very easy to perform an incomplete, insufficient analysis on the problem. That's why I claimed "owned can do async" months ago... And maybe it can if we rewrite async a bit :P

GordonBGood (orginal) [2019-10-24T22:32:16+02:00] view original

@Araq:

It's important to see the full picture, most of what Nim's async does is probably simply hidden from your view as F# delegates the hard work to C#'s event loop implementation whereas in Nim there is little in between async and the underlying event loop. IMHO that's why you think is "bulky".

You are probably right; although I have looked at how F#'s computational expressions are implemented (investigating why seq/enumerations are so slow), I've never looked under the covers at the DotNet code used to actually implement what it uses, though that is now open source and available in DotNet.Core. I do know that F# calls into DotNet's thread pool dispatcher when it needs to use threads for any sort of multi-threading and where threading is required for async/await, so it isn't so much calling into C# facilities as just the general facilities available in DotNet.

As stated, it's beside the point of the problem of implementing either a GC that can support Nim's async in whatever it does or making it possible to use "newruntime" to do it. As you say, any working tracing shared-memory GC such as Boehm, Go, or your new "araqsgc" can be made to do the job. My point is that it doesn't look to be too hard to make the "newruntime" support a hybrid of reference counting and B/D to also do the job without requiring changes to the current async.

My idea of reasonably easily adding a combination of the different concepts to "newruntime" would seem to make it work. It combines the currently almost working owned ref's with reference counting where multiple ownership of the same data is required (especially in multi-threading) and with deepDestroy/deepDispose that I think may be able to accommodate cyclic data if desired so as to seemingly fill the bill. I've now backed off as trying to do this through "--gc:destructors" hooks and now think that these ideas would just be part of "newruntime" with the main extra requirement that there would be a little bit of extra code required to handle the two levels of destruction as in B/D and the shared-memory reference count.

I've backed off of tying into the old threadpool type deepCopy which has already been removed for "newruntime"; my point there is that one would inject the shared-memory ref counted copies at the same place where these deepCopy's used to be injected, but that they shouldn't need to have all the "compiler magic and cruft" with the new destructor based versions of seq's/string's (and closures?), perhaps with the currently forbidden copy of an owned ref becoming possible through reference counting.

You've mentioned adding some extra fields such as task ID's to async to make it work with "newruntime"; my ideas here are to try to make "newruntime" work so that changes aren't required to async to be able to work so that the accommodations would be made in one place. My thinking is that if field(s) need to be added to async tasks that likely need to be atomic in order to make it work with "newruntime", it would be better to first investigate just adding atomic ref counting to B/D in "newruntime" in such a way so that it would be used about as often as those extra task fields in order to narrow and thus simplify the things that need to be changed and tested.

I am trying to think of a way to easily test my ideas in the current code base as by dropping back to using "newruntime" would seem to eliminate the requirement for "compiler magic" that isn't already there and would only require changes within the handing of owned ref. The primary areas that need to be tested is how the "big three" of seq's, string's, and closures environments as well as ref's cross the shared memory barrier. You have mentioned back-porting deepDispose and dispose across all the memory management so that part will have to be done anyway.

You've mentioned a "race" between "araqsgc" and "newruntime" and whichever supports async first wins. Although I see that the mi-malloc allocator may be a desirable facility to have available as an option in Nim, I see that "araqsgc" is just another mark and sweep GC and I would really like to see "newruntime" win, as it is unique and more elegant.

Araq (orginal) [2019-10-24T23:26:15+02:00] view original

I would really like to see "newruntime" win, as it is unique and more elegant.

Me too. Ok, here is what I know: Our async seems to work with the refcounting GC without its cycle collector with a tiny codegen patch (I don't know whether that patch is always correct). So lets assume (atomic) refcounting is sufficient. B/D is not only a "faster refcounting", it describes the spanning tree of a graph so that it's free of cycles by construction (ignoring some edge cases). This is achieved by making owned a "move-only" type but that's too inexpressive for our async. So we need a way to duplicate owned refs that still guarantees cycle freedom, statically. I know that's what you're proposing via a =deepCopy for owned but I don't know if it can work out.

GordonBGood (orginal) [2019-10-25T07:28:37+02:00] view original

@Araq' Thanks for the further data on async using ref counting GC - I didn't realize it only worked without the cycle detector and a further "tiny" code gen patch.

So we need a way to duplicate owned refs that still guarantees cycle freedom, statically. I know that's what you're proposing via a =deepCopy for owned...

Perhaps we should stop calling what I proposed as deepCopy using =deepCopy as I just grabbed that term as as being the location where "outer" ref counting would be injected. Perhaps that would be the "hook" whereby my proposal could be best implemented, maybe not...

but I don't know if it can work out.

So how to test this and what code base to test it against? I am probably able to do some patches against the current devel branch if they don't include code gen/"compiler magic" but I would prefer to leave any AST manipulations to you and your dev team as I have little experience in that area.

I have held off doing too much in the way of experiments as I would like to include ""--seqsv2", but I suppose that isn't really necessary as that was implicitly turned on in "--newruntime". I guess I'll start with some copying experiments somewhere combining use of copying seq's, string's, closures containing ref's, and ref's. It would seem to be easy to test it I just new how to "hook" into the disposal of ref's when at the end of proc's in the way that "nominal" types get the "hooks" called now.

I would like to work within the framework of the ``=destroy``, ``=``, ``=sink``, and ``=destroy`` "hooks" but as applied to ``ref``'s and not to just "nominal" types. Would this be possible?

Is there a place in the code that could make these "hooks" readily accessible to me?

Araq (orginal) [2019-10-25T09:00:38+02:00] view original

Thanks for the further data on async using ref counting GC - I didn't realize it only worked without the cycle detector and a further "tiny" code gen patch.

No, it also does work without its cycle collector, the cycle collector does no harm. The codegen patch is to break up a cycle which keeps the RC component from collecting.

Is there a place in the code that could make these "hooks" readily accessible to me?

At this point I suggest you join our IRC channel so that I can guide you through the codebase.

mratsim (orginal) [2019-10-25T10:14:53+02:00] view original

I think it's valuable to have this sort of info in the forum, possibly in a more dedicated thread, but IRC is a pain to search.

Mirror of forum.nim-lang.org

5289 :: Nim v1.1 and beyond roadmap