nimforum mirror - Continuations vs closure iterators

yglukhov (orginal) [2025-07-07T17:38:44+02:00] view original

To not pollute neighboring thread with my ignorance, I'd like to know why everyone is so hyped about CPS. Because as I understand it (theoretically) achieves exactly the same results as closure iterators, with some implementation nuances that don't necessarily offer better performance. But I'm curious to be corrected.

blackmius (orginal) [2025-07-07T18:21:46+02:00] view original

they do not use stack, do not use switch case, for deep nested calls again uses only one call

cheatfate (orginal) [2025-07-07T18:46:03+02:00] view original

If CPS does not use call stack it does not use function/procedure arguments, if it does not use function/procedure arguments - it is as useless as Nim coroutines (coro.nim).

zevv (orginal) [2025-07-07T19:26:40+02:00] view original

yglukhov (orginal) [2025-07-07T21:05:20+02:00] view original

Ok, let me expand my question as I'm still lost.

Both CPS and closure iters undergo precisely the same state split transformations, and lifting locals. No difference here.

The only difference is the "encoding" of states. CPS packs every state to corresponding function, but then it requires an external runner to advance along every state until return or yield. Closureiter produces a single function that switches upon its state index. In other words:


# CPS runner:
var mycps = startMyCPS()
while mycps.next != nil: mycps.next(mycps)
# Closureiters
var it = myClosureIter()
while it.state != -1: it()

Both CPS and closureiters allow getting hold of their env, to control if it is created on stack or on the heap. No difference here.

It is not clear to me which is faster performance-wise. With closureiter you can advance along multiple states until yield or return with a single call plus one jump per state. With CPS you advance with one call per state. I would risk and claim it would be hard to measure any meaningful difference.

CPS can not really advance recursively (closureiters can not do this by definition) because in case of loops it will produce potentially unbound recursion depth. If there's a tail call opportunity, this is only because the states were split in suboptimal way, which can be circumvented in compile time. So again, no difference here.

Now one last part I'm still lost in is this claim about additional low-level control flow control. What does this mean? You advance CPS by mycps.next(mycps). You advance closureiter by myClosureIterFn(:myClosureIterEnv) (given that myClosureIterFn is nimcall and not closure). You can externally overwrite mycps.next, you can externally overwrite myClosureIterEnv.:state, although (a) why would you ever want to do it, and (b) again, what's the difference?

Anything else I'm missing?

Please explain/debunk/destroy.

Araq (orginal) [2025-07-07T21:17:38+02:00] view original

Closure iterators don't have an afterYield operation (yet), which seems to make a crucial difference for an event loop implementation.

But you're right, they are very close and it's basically two different terms for the same thing.

Zerbina (orginal) [2025-07-07T21:40:04+02:00] view original

Continuation-passing style and .closure iterator cannot be meaningfully compared, as they're two conceptually different things.

A .closure iterator is an implementation of suspendable procedures, using a form of continuation-passing style internally.

Continuation-passing style is a style of writing programs, generally contrasted with direct style. The following simple program in direct style:


proc a(x: int): int =
  return x

proc b() =
  echo a(1)
  return

b()

written in continuation-passing style could look like so:


proc a(x: int, cont: proc(ret: int)) =
  cont(x)

proc b(cont: proc()) =
  a(proc(ret: int) =
    echo ret
    cont()

b(proc() = exit(0))
# note: using an empty continuation would work just as well,
# in this specific situation

In simple cases such as the above, language support or library extensions are not required for writing programs (or part of it) in continuation-passing style!

If the language doesn't support guaranteed tail-call elimination (Nim does not), use of explicit trampolining is required in most real-world use cases, as you'd get a continuously growing stack otherwise. Continuation-passing style by itself does not require trampolining.

cheatfate (orginal) [2025-07-07T21:51:06+02:00] view original

And why you need it afterYield? For what purposes? For example right now await could be the replacement. Because await

cheatfate (orginal) [2025-07-07T21:52:29+02:00] view original

I'm 100% sure that CPS with current version, support from compilers could be at least 2x slower. Because compiler will never be able to predict what function will be called next.

Araq (orginal) [2025-07-07T22:09:55+02:00] view original

Continuation-passing style and .closure iterator cannot be meaningfully compared, as they're two conceptually different things.

I think we all use CPS and "continuations" interchangably here and nobody really focusses on the "style" aspect. Strictly speaking you're correct.

Araq (orginal) [2025-07-07T22:10:59+02:00] view original

And why you need it afterYield?

So that the closure iterator does not have to be wrapped in yet another closure for the event loop.

zevv (orginal) [2025-07-07T22:22:19+02:00] view original

I regards CPS as "lower" than closure iterators; CPS is the underlying building blocks which allows you to implement the other.

(Yes I know it's not strictly correct)

blackmius (orginal) [2025-07-08T08:15:11+02:00] view original

I'm 100% sure that CPS with current version, support from compilers could be at least 2x slower.

well, i did the benchmarks and nim-cps was faster than any async implementation, the underlying wrapping for awaiting futures eats all closure iterator speed… and then you are asking why CPS

https://github.com/blackmius/uasync/blob/master/benchmark/raw_cps_throughput.nim

https://github.com/blackmius/uasync/blob/master/benchmark/throughput_async_await.nim

dont see there is no chronos comparison it is such bad perfomance i even didnt include it

zevv (orginal) [2025-07-08T09:48:01+02:00] view original

nim-cps is pretty well optimized, both for memory and performance.

People have been doing pretty insane things with it, like spawning and completing 1 billion (as in G, 1e9) tasks on one thousand different threads consuming a mere 50Gb of memory - that is 50 bytes per task only. It also is used to run tens of tasks on an Atmel AVR Atmega8 with 2K of RAM.

hyu1996 (orginal) [2025-07-08T10:19:06+02:00] view original

Maybe you can try yasync. When integrating with libuv, its asyncRaw feature makes me feel comfortable, and I'm deeply impressed.

Here's a modified version of raw_cps_throughput.nim that uses yasync. On my machine, it's 4x faster than the original CPS implementation.


import std/monotimes
import std/deques
import yasync

type
  MyCont = object of Cont[void]

var i = 0
var callSoonQueue = Deque[ptr MyCont]()

proc nop(env: ptr MyCont) {.asyncRaw.} =
  callSoonQueue.addLast(env)

proc run() {.async.} =
  while true:
    await nop()
    
    var j = 0
    while j < 10000:
      i += 1
      j += 1


const coroutinesCount = 100_000
# var coroutines = newSeq[Future[void]](coroutinesCount)

for i in 0..<coroutinesCount:
  # coroutines[i] = run()
  discard run()

let start = getMonoTime().ticks
while i < 1_000_000_000:
  if callSoonQueue.len > 0:
    let c = callSoonQueue.popFirst()
    c.complete()

echo i
let duration = getMonoTime().ticks - start
echo duration.float / 1_000_000, "ms"


# nim -d:release -d:danger c yasync_throughput.nim
# ./yasync_throughput
1000000000
2.866101ms

# nim -d:release -d:danger c raw_cps_throughput.nim
# ./raw_cps_throughput
1000000000
11.126889ms

arnetheduck (orginal) [2025-07-08T12:37:42+02:00] view original

CPS packs every state to corresponding function,

closure iterator transform would be better if it did this too - this way, the "local" variables that affect only a particular state can stay local in that function avoiding massive stack usage and allowing better optimizations by the underlying compiler.

There are a few other things on my wishlist for closure iterator transform as well - ie things like tail call optimization aka skipping the transformation if there's only one state (ie no await inside the async function, avoiding heap allocations etc.

Mirror of forum.nim-lang.org

13178 :: Continuations vs closure iterators