nimforum mirror - Async musings

yglukhov (orginal) [2022-05-09T15:40:38+02:00] view original

A while ago I started playing with a yet-another async system, borrowing ideas from existing asyncdispatch, cps, and other sources. My goal was to:

improve dynamic memory situation

preserve the current semantics of async/await as much as possible

do it all in the library code, without specific compiler functions

My progress was blocked by some compiler bugs, and since hitting them I haven't invested much more time into it, but I strongly believe it's possible to complete it with enough effort.

Some ideas of how it should work. async macro becomes a lot more involved: it splits an async function into states, derives an environment object type for every async function, lifts local variables and result into environment object, places every state in a separate function. The environment object also contains a variant switch for substate environment object, so theoretically all the system requires no allocations on its own, unless there's async recursion. In case of recursion, every step of recursion will allocate another environment dynamically.

An example of how async macro works:

proc foo(a: int): float {.async.} =
  if a == 1:
    await sleepAsync(1000)
  else:
    await sleepAsync(2000)
  return 5

Translates into:

type # These are base types, always declared
  ContBase = object {.inheritable, pure.}
    retEnv: ptr ContBase
    retFunc: proc(a: ptr ContBase) {.nimcall.}
    finished: bool
  
  Cont[T] = object of ContBase
    when T is void:
      discard
    else:
      result: T

# Async transformation follows
type
  ContEnv_foo = object of Cont[float]
    arg_a: int
    case sub: uint8
    of 0:
      sub0: ContEnv_sleepAsync
    of 1:
      sub1: ContEnv_sleepAsync
    else:
      discard

proc foo(a: int): float =
  # This is just an equivalent of waitFor'ed function. Can be called in non-async code.
  # Note how everything that used to be in closure iterator environments and Futures now
  # resides in a single stack-allocated object.
  var env: ContEnv_foo
  env.arg_a = a
  foo_state0(addr env)
  while not env.finished:
    runLoopOnce()
  read(env)

proc foo_state0(env: ptr ContEnv_foo) =
  if env.arg_a == 1:
    env.sub = 0
    env.sub0.arg_ms = 1000
    env.sub0.retEnv = env
    env.sub0.retFunc = foo_state1
    sleepAsync_state0(addr env.sub0)
  else:
    env.sub = 1
    env.sub1.arg_ms = 2000
    env.sub1.retEnv = env
    env.sub1.retFunc = foo_state1
    sleepAsync_state0(addr env.sub1)

proc foo_state1(env: ptr ContEnv_foo) =
  env.result = 5
  complete(env)

As you can see, foo is split into 2 states (actually there's more, but tail states of if branch are collapsed into one as they are identical). Every state got its function, and a high level waitFor-ish wrapper is created. Async macro would take care of calling async function through its *_state0.

Some nuances: whether a function result type is declared as T or Future[T] is an implementation detail of the transformation macro. As well as whether await keyword is actually required or not.

The states are split by the same logic that is in compiler/closureiters.nim, but in macro code, which is quite some effort to translate it, and not done yet.

Also there's for-loop (over inline terators) lowering, that has to be done in the macro code, which I found not doable due to bugs: #18056, #17185, #16867, #16758, and maybe more, who knows :).

I hope to resume working on it once the above bugs are fixed (btw, I would be very thankful if anyone can do it :), but now I'd love to hear some thoughts whether it is a good idea overall or can it be improved.

PMunch (orginal) [2022-05-09T16:03:57+02:00] view original

I'm curious, what is the benefit of building a state object manually this way over using a closure iterator? I've experimented with various versions of async systems myself in the past, but I've always built them around closure iterators as the transformation is much easier to do, and as far as I can tell they are very similar in logic to what you're doing above. My experiments have mostly been around syntax and "colour". The former was an experiment to create something with much greater control over the flow between async tasks, and the latter was an experiment to be able to call async procedures from non-async functions and vice-versa. This was to allow only things that actually used async to need to be called as such.

yglukhov (orginal) [2022-05-09T16:31:17+02:00] view original

Initially I wanted to reimplement state splitting in a macro, as it would allow for easier experimenting with the transformation algorithms. Doing this in compiler code is arguably more cumbersome. Then it also gives me the full control over memory management and layout.

But your question is actually very good. Should there be a way to call closure iterator over preallocated environment, it might invalidate most of my work so far :). But currently it is not possible, or is it?

Araq (orginal) [2022-05-09T16:52:54+02:00] view original

Should there be a way to call closure iterator over preallocated environment, it might invalidate most of my work so far :). But currently it is not possible, or is it?

No, but it should be added.

dom96 (orginal) [2022-05-09T17:06:38+02:00] view original

Really awesome to see your approach focus on keeping semantics the same (or as close as possible) to existing async in Nim. I think adoption for something that forks the semantics/API will be difficult, especially if the improvements over the existing implementation are relatively niche.

From what I can see you have effectively extracted the closure iterator code from the compiler into a macro. I do believe in the long-held objective of Nim being a language with a small but extensible core, so your work here is really great.

Should there be a way to call closure iterator over preallocated environment, it might invalidate most of my work so far :). But currently it is not possible, or is it?

Indeed it is not, but this could surely be implemented :)

I don't think this invalidates your work, like I said above I think putting this in a macro is a win anyway and would let us make the core language smaller.

Btw I see your code generates objects which inherit from each other, doesn't Nim require those are put on the heap?

mratsim (orginal) [2022-05-09T19:49:34+02:00] view original

This seems quite similar to where I wanted to drive CPS to, having everything objects and only use ref when needed (JS target or escaping continuations).

In C, to avoid pitfalls outlined by @dom96, I would use union types, like this https://github.com/disruptek/cps/blob/52b2ed2/talk-talk/manual1_stack.nim

My idea for composition and allowing any compatible executor (async/threadpool/streams) to pass continuations to each other and use the appropriate resource is to add 2 pragmas and 3 magic calls:

{.suspend.} A function that allows suspending the caller after it ends. (continuation producer)

{.resumable.} A function that allows suspending and resuming from any continuation.

bindCallerContinuation: store the caller in a first class object (reification) that can be used to call the continuation later, just like calling a callback. We allow bindCallerContinuation only once and only in a procedure tagged {.suspend.}. Useful to package and delegate a continuation and its context to a threadpool or an async executor

resumeContinuation: jumps to a continuation, execute it and return to the current context. Useful for a threadpool or an async executor.

suspendAfter: call a {.suspend.} function that will suspend us afterward.

And a motivating example:

type Awaitable[T] = concept aw
  ## An awaitable concept, for example a Future or Flowvar.
  ## Implementation left at the scheduler discretion.
  aw.get is T
  aw.isReady is bool

var ioScheduler{.threadvar.}: IOScheduler
var computeScheduler{.threadvar.}: CPUScheduler
var isRootThread{.threadvar.}: bool

proc saveContinuation() {.suspend.} =
  ioScheduler.enqueue bindCallerContinuation()

proc await[T](e: Awaitable[T]): T {.resumable.} =
  if not e.isReady():
    suspendAfter saveContinuation()
  return e.get()

proc continueOnThreadpool(pool: CPUScheduler) {.suspend.} =
  pool.spawn bindCallerContinuation()

proc serve(socket: Socket) =
  while true:
    let conn = await socket.connect()
    if isRootThread:
      suspendAfter pool.continueOnThreadpool()
    # --- the rest is on another thread on the CPU threadpool.
    #     and the root thread can handle IO again.
    # Stream processing
    var finished = false
    while not finished:
      let size = await conn.hasData()
      var chunk = newSeq[byte](size)
      conn.read(chunk) # non-blocking read
      finished = process(chunk)
    # Now that thread still owns the socket,
    # we can return it to the main thread via channel
    # or keep its ownership.

Full design:

https://github.com/weavers-guild/weave-io/blob/master/design/design_2_continuations.md#apis=

https://github.com/weavers-guild/weave-io/tree/master/implementation

type
  ContinuationProc[T] = proc(c: var T) {.nimcall.}
    ## using mutating contination
  
  Continuation* = concept cont
    cont.fn is ContinuationProc[Continuation]
    cont.frame is (object or ref)
  
  Coroutine* = concept coro
    type Output = auto
    coro is Continuation
    coro.promise is Option[Output]
    coro.hasFinished is bool

Portable storage schemes for non-supportsCopyMem types or JS backend

type
  ContFrameBase_myFn = object of RootObj
  
  ContFrameBase_myFn0 = object of ContFrameBase_myFn
    a_atomblock0: int
    b_atomblock0: int
  
  ContFrameBase_myFn1 = object of ContFrameBase_myFn
    a_atomblock1: seq[float64]
    b_atomblock1: string
  
  ContFrameBase_myFn2 = object of ContFrameBase_myFn
    a_atomblock2: int
    b_atomblock2: int
    c_atomblock2: int

Stack-allocatable storage scheme for supportsCopyMem on C/C++

type
  SomeContinuation = object of RootObj
  
  ContFrameBase_myFn0 = object
    a_atomblock0: int
    b_atomblock0: int
  
  ContFrameBase_myFn1 = object
    # Warning when writing the spec, I forgot that seq/strings needed destructors
    a_atomblock1: seq[float64]
    b_atomblock1: string
  
  ContFrameBase_myFn2 = object
    c_atomblock3: int
  
  ContFrameBase_myFn {.union.} = object of SomeContinuation
    frame0: ContFrameBase_myFn0
    frame1: ContFrameBase_myFn1
    frame2: ContFrameBase_myFn2

Regarding in-depth overview of all async approaches in most (all?) language, you can check my compendium: https://github.com/weavers-guild/weave-io/tree/master/research

arnetheduck (orginal) [2022-05-09T21:28:49+02:00] view original

I'm curious, what is the benefit of building a state object manually this way over using a closure iterator?

Off the top of my head, there are two annoying problems with the giant-closure-iterator approach:

lifetimes are awful - if you send some data, the lifetime of the data you pass in is extended all the way through the last await (in chronos it works this way - in std/asyncdispatch it's way worse due to the cyclic references which keeps the memory alive forever)

lots of extraneous copying being done - for example, a full copy is made of every passed-in variable even though it's lifetime is limited to the "local"scope of the first part of the async function that runs synchronously (and therefore doesn't need copying at all)

I've thought about doing similar things in the chronos async macro - now that we've solved exception tracking and acyclic references chronos, these kinds of optimizations would be next - one could think of it as an async optimizer even which for example removes the closure iterator entirely when it's not needed (similar to a tail call optmization in regular code) etc.

The last thing I'd really like to do is to force an annotation on anything that goes into the closure - C++ gets this right by not copying things into closures by default, instead requiring an explicit list - this solves the remaining problem where things accidentally end up in closures - otherwise, in practical application of async, one often ends up with accidental huge copies, shadowing and shared mutable state that gets updated out of order - a smarter macro could be more strict here.

yglukhov (orginal) [2022-05-09T22:07:45+02:00] view original

@dom96 > From what I can see you have effectively extracted the closure iterator code from the compiler into a macro.

Well yes, except I haven't done it, but intend to do so :)

I don't think this invalidates your work, like I said above I think putting this in a macro is a win anyway and would let us make the core language smaller.

As you noticed, my solution implies reimplementing a significant chunk of the compiler, so that alone is already questionable. I surely see potential benefits in it, as described above, but of course there are drawbacks too, such as functionality duplication and worse compile times. So it's tempting to reuse compiler-provided closureiters transformation here, like @PMunch suggests, if only we had more control over closure environments...

Btw I see your code generates objects which inherit from each other, doesn't Nim require those are put on the heap?

Inheritance on its own doesn't require the objects to be heap allocated. In my case I use inheritance merely to avoid code bloat, to define some primitives akin to FutureBase.

@mratsim

Heh, as always, I'll need some more time to study what you wrote :).

@arnetheduck

lifetimes are awful

If I remember correctly it's fixed in both chronos and lately asyncdispatch by nullifying the future (or reusing the future variable to be precise) after reading it. If cycles are still there, I believe they could easily be fixed by nullifying callbacks after fire. But TBH it's been a long time since I last looked at that code :)

lots of extraneous copying being done

Right, that one was also on my list. In terms of closure iterator transformation it boils down to lambda-lifting optimization that would not lift a variable if it only is used in a single state. So not only it reduces the number of copies, but also leaves some stack variables on the stack, leading to better performance. But again, it's a relatively simple optimization which doesn't affect semantics in any way.

The last thing I'd really like to do is to force an annotation on anything that goes into the closure

Though does it have anything to do with closure iterators (and thus async functions)? They are kinda different from closures as their primary purpose is not to capture surrounding variables, but their owns :)

arnetheduck (orginal) [2022-05-09T22:20:00+02:00] view original

If I remember correctly it's fixed in both chronos and lately asyncdispatch by nullifying the future (

it's a bit more involved than that - the closure iterator ends up referring to the future and vice versa in the generated code - setting futures to nil aggressively removes some references but not all - to the best of my knowledge this is not yet solved in AD - the patch to chronos is here: https://github.com/status-im/nim-chronos/pull/243

But again, it's a relatively simple optimization which doesn't affect semantics in any way.

it seems simple but it requires a liveness / lifetime analysis pass which is where things get more involved.

Though does it have anything to do with closure iterators (and thus async functions)?

well, async is the place where the problem becomes most visible because of the out-of-order execution which in turn leads to shared state bugs and data races ("all the problems with threads, but none of the benefits") - I'm not hopeful that this will become a feature in the compiler / language any time soon, but a macro would "likely" be a good starting point to model the feature and async being a common source of such bugs is a good ground for experiments

yglukhov (orginal) [2022-08-04T15:59:49+02:00] view original

I'm finally at the point to show something with passing tests :) Albeit run it you'll need to use latest nim devel plus this patch which "fixes" this bug :).

As per @PMunch idea I've thrown away everything I had before, and started from scratch, and here it is, closure iterators called over preallocated environments and stuff like that.

Would be happy to hear your feedback now.

HJarausch (orginal) [2022-08-04T18:28:04+02:00] view original

Hi Juriy, sorry if this is a bit off topic, do you have an idea how to fix bug 19984? It has to do with closure iterators.

yglukhov (orginal) [2022-08-04T19:28:27+02:00] view original

Sorry. If it takes me more than a minute to reproduce some obscure nim bug, I usually give up. And I'm sorry but I'm not buying that I have to nimble install ncurses (and even them I'm getting some unrelated error) because you can't reduce the sample enough.

dom96 (orginal) [2022-08-05T14:50:03+02:00] view original

Nice, have you done any benchmarks yet? Really curious how it compares to stdlib async.

Mirror of forum.nim-lang.org

9149 :: Async musings