# CPS runner:
var mycps = startMyCPS()
while mycps.next != nil: mycps.next(mycps)
# Closureiters
var it = myClosureIter()
while it.state != -1: it()
Please explain/debunk/destroy.
Closure iterators don't have an afterYield operation (yet), which seems to make a crucial difference for an event loop implementation.
But you're right, they are very close and it's basically two different terms for the same thing.
Continuation-passing style and .closure iterator cannot be meaningfully compared, as they're two conceptually different things.
A .closure iterator is an implementation of suspendable procedures, using a form of continuation-passing style internally.
Continuation-passing style is a style of writing programs, generally contrasted with direct style. The following simple program in direct style:
proc a(x: int): int =
return x
proc b() =
echo a(1)
return
b()
written in continuation-passing style could look like so:
proc a(x: int, cont: proc(ret: int)) =
cont(x)
proc b(cont: proc()) =
a(proc(ret: int) =
echo ret
cont()
b(proc() = exit(0))
# note: using an empty continuation would work just as well,
# in this specific situation
In simple cases such as the above, language support or library extensions are not required for writing programs (or part of it) in continuation-passing style!
If the language doesn't support guaranteed tail-call elimination (Nim does not), use of explicit trampolining is required in most real-world use cases, as you'd get a continuously growing stack otherwise. Continuation-passing style by itself does not require trampolining.
Continuation-passing style and .closure iterator cannot be meaningfully compared, as they're two conceptually different things.
I think we all use CPS and "continuations" interchangably here and nobody really focusses on the "style" aspect. Strictly speaking you're correct.
And why you need it afterYield?
So that the closure iterator does not have to be wrapped in yet another closure for the event loop.
I regards CPS as "lower" than closure iterators; CPS is the underlying building blocks which allows you to implement the other.
(Yes I know it's not strictly correct)
I'm 100% sure that CPS with current version, support from compilers could be at least 2x slower.
well, i did the benchmarks and nim-cps was faster than any async implementation, the underlying wrapping for awaiting futures eats all closure iterator speed… and then you are asking why CPS
https://github.com/blackmius/uasync/blob/master/benchmark/raw_cps_throughput.nim
https://github.com/blackmius/uasync/blob/master/benchmark/throughput_async_await.nim
dont see there is no chronos comparison it is such bad perfomance i even didnt include it
nim-cps is pretty well optimized, both for memory and performance.
People have been doing pretty insane things with it, like spawning and completing 1 billion (as in G, 1e9) tasks on one thousand different threads consuming a mere 50Gb of memory - that is 50 bytes per task only. It also is used to run tens of tasks on an Atmel AVR Atmega8 with 2K of RAM.
Maybe you can try yasync. When integrating with libuv, its asyncRaw feature makes me feel comfortable, and I'm deeply impressed.
Here's a modified version of raw_cps_throughput.nim that uses yasync. On my machine, it's 4x faster than the original CPS implementation.
import std/monotimes
import std/deques
import yasync
type
MyCont = object of Cont[void]
var i = 0
var callSoonQueue = Deque[ptr MyCont]()
proc nop(env: ptr MyCont) {.asyncRaw.} =
callSoonQueue.addLast(env)
proc run() {.async.} =
while true:
await nop()
var j = 0
while j < 10000:
i += 1
j += 1
const coroutinesCount = 100_000
# var coroutines = newSeq[Future[void]](coroutinesCount)
for i in 0..<coroutinesCount:
# coroutines[i] = run()
discard run()
let start = getMonoTime().ticks
while i < 1_000_000_000:
if callSoonQueue.len > 0:
let c = callSoonQueue.popFirst()
c.complete()
echo i
let duration = getMonoTime().ticks - start
echo duration.float / 1_000_000, "ms"
# nim -d:release -d:danger c yasync_throughput.nim
# ./yasync_throughput
1000000000
2.866101ms
# nim -d:release -d:danger c raw_cps_throughput.nim
# ./raw_cps_throughput
1000000000
11.126889ms
CPS packs every state to corresponding function,
closure iterator transform would be better if it did this too - this way, the "local" variables that affect only a particular state can stay local in that function avoiding massive stack usage and allowing better optimizations by the underlying compiler.
There are a few other things on my wishlist for closure iterator transform as well - ie things like tail call optimization aka skipping the transformation if there's only one state (ie no await inside the async function, avoiding heap allocations etc.