My progress was blocked by some compiler bugs, and since hitting them I haven't invested much more time into it, but I strongly believe it's possible to complete it with enough effort.
Some ideas of how it should work. async macro becomes a lot more involved: it splits an async function into states, derives an environment object type for every async function, lifts local variables and result into environment object, places every state in a separate function. The environment object also contains a variant switch for substate environment object, so theoretically all the system requires no allocations on its own, unless there's async recursion. In case of recursion, every step of recursion will allocate another environment dynamically.
An example of how async macro works:
proc foo(a: int): float {.async.} =
if a == 1:
await sleepAsync(1000)
else:
await sleepAsync(2000)
return 5
Translates into:
type # These are base types, always declared
ContBase = object {.inheritable, pure.}
retEnv: ptr ContBase
retFunc: proc(a: ptr ContBase) {.nimcall.}
finished: bool
Cont[T] = object of ContBase
when T is void:
discard
else:
result: T
# Async transformation follows
type
ContEnv_foo = object of Cont[float]
arg_a: int
case sub: uint8
of 0:
sub0: ContEnv_sleepAsync
of 1:
sub1: ContEnv_sleepAsync
else:
discard
proc foo(a: int): float =
# This is just an equivalent of waitFor'ed function. Can be called in non-async code.
# Note how everything that used to be in closure iterator environments and Futures now
# resides in a single stack-allocated object.
var env: ContEnv_foo
env.arg_a = a
foo_state0(addr env)
while not env.finished:
runLoopOnce()
read(env)
proc foo_state0(env: ptr ContEnv_foo) =
if env.arg_a == 1:
env.sub = 0
env.sub0.arg_ms = 1000
env.sub0.retEnv = env
env.sub0.retFunc = foo_state1
sleepAsync_state0(addr env.sub0)
else:
env.sub = 1
env.sub1.arg_ms = 2000
env.sub1.retEnv = env
env.sub1.retFunc = foo_state1
sleepAsync_state0(addr env.sub1)
proc foo_state1(env: ptr ContEnv_foo) =
env.result = 5
complete(env)
As you can see, foo is split into 2 states (actually there's more, but tail states of if branch are collapsed into one as they are identical). Every state got its function, and a high level waitFor-ish wrapper is created. Async macro would take care of calling async function through its *_state0.
Some nuances: whether a function result type is declared as T or Future[T] is an implementation detail of the transformation macro. As well as whether await keyword is actually required or not.
The states are split by the same logic that is in compiler/closureiters.nim, but in macro code, which is quite some effort to translate it, and not done yet.
Also there's for-loop (over inline terators) lowering, that has to be done in the macro code, which I found not doable due to bugs: #18056, #17185, #16867, #16758, and maybe more, who knows :).
I hope to resume working on it once the above bugs are fixed (btw, I would be very thankful if anyone can do it :), but now I'd love to hear some thoughts whether it is a good idea overall or can it be improved.
Initially I wanted to reimplement state splitting in a macro, as it would allow for easier experimenting with the transformation algorithms. Doing this in compiler code is arguably more cumbersome. Then it also gives me the full control over memory management and layout.
But your question is actually very good. Should there be a way to call closure iterator over preallocated environment, it might invalidate most of my work so far :). But currently it is not possible, or is it?
Should there be a way to call closure iterator over preallocated environment, it might invalidate most of my work so far :). But currently it is not possible, or is it?
No, but it should be added.
Really awesome to see your approach focus on keeping semantics the same (or as close as possible) to existing async in Nim. I think adoption for something that forks the semantics/API will be difficult, especially if the improvements over the existing implementation are relatively niche.
From what I can see you have effectively extracted the closure iterator code from the compiler into a macro. I do believe in the long-held objective of Nim being a language with a small but extensible core, so your work here is really great.
Should there be a way to call closure iterator over preallocated environment, it might invalidate most of my work so far :). But currently it is not possible, or is it?
Indeed it is not, but this could surely be implemented :)
I don't think this invalidates your work, like I said above I think putting this in a macro is a win anyway and would let us make the core language smaller.
Btw I see your code generates objects which inherit from each other, doesn't Nim require those are put on the heap?
This seems quite similar to where I wanted to drive CPS to, having everything objects and only use ref when needed (JS target or escaping continuations).
In C, to avoid pitfalls outlined by @dom96, I would use union types, like this https://github.com/disruptek/cps/blob/52b2ed2/talk-talk/manual1_stack.nim
My idea for composition and allowing any compatible executor (async/threadpool/streams) to pass continuations to each other and use the appropriate resource is to add 2 pragmas and 3 magic calls:
And a motivating example:
type Awaitable[T] = concept aw
## An awaitable concept, for example a Future or Flowvar.
## Implementation left at the scheduler discretion.
aw.get is T
aw.isReady is bool
var ioScheduler{.threadvar.}: IOScheduler
var computeScheduler{.threadvar.}: CPUScheduler
var isRootThread{.threadvar.}: bool
proc saveContinuation() {.suspend.} =
ioScheduler.enqueue bindCallerContinuation()
proc await[T](e: Awaitable[T]): T {.resumable.} =
if not e.isReady():
suspendAfter saveContinuation()
return e.get()
proc continueOnThreadpool(pool: CPUScheduler) {.suspend.} =
pool.spawn bindCallerContinuation()
proc serve(socket: Socket) =
while true:
let conn = await socket.connect()
if isRootThread:
suspendAfter pool.continueOnThreadpool()
# --- the rest is on another thread on the CPU threadpool.
# and the root thread can handle IO again.
# Stream processing
var finished = false
while not finished:
let size = await conn.hasData()
var chunk = newSeq[byte](size)
conn.read(chunk) # non-blocking read
finished = process(chunk)
# Now that thread still owns the socket,
# we can return it to the main thread via channel
# or keep its ownership.
Full design:
type
ContinuationProc[T] = proc(c: var T) {.nimcall.}
## using mutating contination
Continuation* = concept cont
cont.fn is ContinuationProc[Continuation]
cont.frame is (object or ref)
Coroutine* = concept coro
type Output = auto
coro is Continuation
coro.promise is Option[Output]
coro.hasFinished is bool
Portable storage schemes for non-supportsCopyMem types or JS backend
type
ContFrameBase_myFn = object of RootObj
ContFrameBase_myFn0 = object of ContFrameBase_myFn
a_atomblock0: int
b_atomblock0: int
ContFrameBase_myFn1 = object of ContFrameBase_myFn
a_atomblock1: seq[float64]
b_atomblock1: string
ContFrameBase_myFn2 = object of ContFrameBase_myFn
a_atomblock2: int
b_atomblock2: int
c_atomblock2: int
Stack-allocatable storage scheme for supportsCopyMem on C/C++
type
SomeContinuation = object of RootObj
ContFrameBase_myFn0 = object
a_atomblock0: int
b_atomblock0: int
ContFrameBase_myFn1 = object
# Warning when writing the spec, I forgot that seq/strings needed destructors
a_atomblock1: seq[float64]
b_atomblock1: string
ContFrameBase_myFn2 = object
c_atomblock3: int
ContFrameBase_myFn {.union.} = object of SomeContinuation
frame0: ContFrameBase_myFn0
frame1: ContFrameBase_myFn1
frame2: ContFrameBase_myFn2
Regarding in-depth overview of all async approaches in most (all?) language, you can check my compendium: https://github.com/weavers-guild/weave-io/tree/master/research
I'm curious, what is the benefit of building a state object manually this way over using a closure iterator?
Off the top of my head, there are two annoying problems with the giant-closure-iterator approach:
I've thought about doing similar things in the chronos async macro - now that we've solved exception tracking and acyclic references chronos, these kinds of optimizations would be next - one could think of it as an async optimizer even which for example removes the closure iterator entirely when it's not needed (similar to a tail call optmization in regular code) etc.
The last thing I'd really like to do is to force an annotation on anything that goes into the closure - C++ gets this right by not copying things into closures by default, instead requiring an explicit list - this solves the remaining problem where things accidentally end up in closures - otherwise, in practical application of async, one often ends up with accidental huge copies, shadowing and shared mutable state that gets updated out of order - a smarter macro could be more strict here.
@dom96 > From what I can see you have effectively extracted the closure iterator code from the compiler into a macro.
Well yes, except I haven't done it, but intend to do so :)
I don't think this invalidates your work, like I said above I think putting this in a macro is a win anyway and would let us make the core language smaller.
As you noticed, my solution implies reimplementing a significant chunk of the compiler, so that alone is already questionable. I surely see potential benefits in it, as described above, but of course there are drawbacks too, such as functionality duplication and worse compile times. So it's tempting to reuse compiler-provided closureiters transformation here, like @PMunch suggests, if only we had more control over closure environments...
Btw I see your code generates objects which inherit from each other, doesn't Nim require those are put on the heap?
Inheritance on its own doesn't require the objects to be heap allocated. In my case I use inheritance merely to avoid code bloat, to define some primitives akin to FutureBase.
@mratsim
Heh, as always, I'll need some more time to study what you wrote :).
@arnetheduck
lifetimes are awful
If I remember correctly it's fixed in both chronos and lately asyncdispatch by nullifying the future (or reusing the future variable to be precise) after reading it. If cycles are still there, I believe they could easily be fixed by nullifying callbacks after fire. But TBH it's been a long time since I last looked at that code :)
lots of extraneous copying being done
Right, that one was also on my list. In terms of closure iterator transformation it boils down to lambda-lifting optimization that would not lift a variable if it only is used in a single state. So not only it reduces the number of copies, but also leaves some stack variables on the stack, leading to better performance. But again, it's a relatively simple optimization which doesn't affect semantics in any way.
The last thing I'd really like to do is to force an annotation on anything that goes into the closure
Though does it have anything to do with closure iterators (and thus async functions)? They are kinda different from closures as their primary purpose is not to capture surrounding variables, but their owns :)
If I remember correctly it's fixed in both chronos and lately asyncdispatch by nullifying the future (
it's a bit more involved than that - the closure iterator ends up referring to the future and vice versa in the generated code - setting futures to nil aggressively removes some references but not all - to the best of my knowledge this is not yet solved in AD - the patch to chronos is here: https://github.com/status-im/nim-chronos/pull/243
But again, it's a relatively simple optimization which doesn't affect semantics in any way.
it seems simple but it requires a liveness / lifetime analysis pass which is where things get more involved.
Though does it have anything to do with closure iterators (and thus async functions)?
well, async is the place where the problem becomes most visible because of the out-of-order execution which in turn leads to shared state bugs and data races ("all the problems with threads, but none of the benefits") - I'm not hopeful that this will become a feature in the compiler / language any time soon, but a macro would "likely" be a good starting point to model the feature and async being a common source of such bugs is a good ground for experiments
I'm finally at the point to show something with passing tests :) Albeit run it you'll need to use latest nim devel plus this patch which "fixes" this bug :).
As per @PMunch idea I've thrown away everything I had before, and started from scratch, and here it is, closure iterators called over preallocated environments and stuff like that.
Would be happy to hear your feedback now.