You can also look into CPS based solutions:
atlas search cps
cps:
url: https://github.com/nim-works/cps (git)
tags: async, await, concurrency, continuation, coroutines, cps, fibers, io, nim, parallel, passing, style, threads
description: Continuation-Passing Style for Nim 🔗
license: MIT
website: https://github.com/nim-works/cps
eventqueue:
url: https://github.com/disruptek/eventqueue (git)
tags:
description: reference CPS dispatcher using selectors
license: MIT
website: https://github.com/disruptek/eventqueue
passenger:
url: https://github.com/disruptek/passenger (git)
tags:
description: a demo of cps dispatch-fu
license: MIT
website: https://github.com/disruptek/passenger
cpstest:
url: https://github.com/zevv/cpstest (git)
tags:
description:
license:
website: https://github.com/zevv/cpstest
cps-baremetal:
url: https://github.com/zevv/cps-baremetal (git)
tags:
description:
license:
website: https://github.com/zevv/cps-baremetal
cpslearning:
url: https://github.com/shayanhabibi/cpslearning (git)
tags:
description: Some things while I learn cps
license:
website: https://github.com/shayanhabibi/cpslearning
cps-runtime:
url: https://github.com/gabearro/cps-runtime (git)
tags:
description: Nim CPS runtime with http1.1, http2, http3, ws, sse, webtransport, irc, dns and a React-like DSL, http server DSL, and wasm compilation
license:
website: https://github.com/gabearro/cps-runtime I think real OS threads got a bad reputation back in the late 2000s, when they were slow and unreliable, so people tried to avoid them. As a result, many said, "Let’s use green threads instead and stop relying on OS threads."
However, OS threads provide significant advantages: they integrate well with locking, OS-level features, floating point modes, and so on. Once you start emulating these features in green threads, they become just as slow as OS-level threads.
Just use OS-level threads. They are good now. You can create around 10,000 OS threads, and they will run fine on most modern operating systems. Yes, 15 to 20 years ago, doing that would have been terrible, but nowadays it is completely feasible.
minicoro is very fast. i once vibe coded a library (but please note it was vibe coded, so.... u know) https://github.com/kobi2187/arsenal
if i remember correctly there's also a library that includes the go runtime or uses the go supported by gcc -- aims for perfect semantics. (for example if u are porting a complex go lib and needs verified same behavior)
I don't fully agree with this. Modern concurrency and parallelism research has gone well past this rather ancient way of OS threads. It is sure useful in simpler scenarios, but it is not really a proper systems approach for multi-threading anymore: decoupling logical concurrency from physical threads. Green threads, async, cps, are all different approaches to the same issue of wasting away threads that spend most time idling, but cps satisfies both concurrency and parallelism, because the transform itself doesn't preclude either.
Chase-Lev deques, the MPMC inject queue, the pinned-inbox per worker concepts in work-stealing schedulers are fundamentally superior, and would be capped off by a sound and efficient cps solution. What I hope for Nimony and beyond is that whatever it is going to provide for CPS allows for library specialization that allows SMEs like mratsim who have implemented their own high performance solutions and have intimate knowledge of the nim-cps library to make the new CPS approach in Nim the most scientifically, and practically elegant. I'm hoping that Araq is cooking something tasty with his own Jarvis by the side ;)
decoupling logical concurrency from physical threads That works only if you don't care about performance. Otherwise the coupling comes back in the form of a multitude of knobs you have to tweak (and hope things work out): GOMAXPROCS, thread_no_node_processor_spread etc. When push comes to shove, I'd rather control which work goes where manually.
different approaches to the same issue of wasting away threads that spend most time idling This issue arises only if you're inclined to write your concurrent code in the same way you write sequential code, and for things to automagically happen in parallel.
Nim's own weave library is proof that you can have your cake and eat it too. Its current shortcomings can be overcome by a sound cps implementation, and compile-time enforcement of lifetime & ownership semantics.
Whatever you are describing are precisely the "magic" that cps and high performance work stealing schedulers enable. You write sequential like code, but the library abstracts the transformations so that you don't deal with a "multitude of knobs". That's how OpenCilk and Intel TBB work. Weave's own syntax is aligned closely to that model.
The paradigm has well and truly shifted to more hardware aware libraries and compiler design. Even take Malebolgia for example right now for structured concurrency. You are not dealing with individual threads anymore, and it works very well as a general purpose solution providing automatic, safe concurrency.
The elegance of CPS + work stealing will be a "superpower" for massively parallel work loads for HPC or ML/AI. The only thing left for Nim is to have smarter out-of-the-box CPU runtime defaults so you can get closer to peak performance without any manual setup. Making use of hwloc (from OpenMPI) data at compile time, for example. This would allow developers to metaprogram kernels for arbitrary NUMA topologies.