It's been a while since the RFC Picasso multithreading runtime (https://github.com/nim-lang/RFCs/issues/160 / https://forum.nim-lang.org/t/5083).
The project is living at https://github.com/mratsim/weave
It's well tested on Linux with 32-bit and 64bit CI, and also on ARM64 with Travis offering a whopping 32 cores undisclosed ARM CPU. Windows is not supported, I'm only lacking a low-level wrapper for Synchronization Barriers OSX should work but somehow it trips some assertions on Travis so your mileage may vary.
It offers both task parallelism and data parallelism. The task parallelism API is similar to async/await on Futures, except that you call spawn/sync on Flowvar. The data parallelism API is similar to OpenMP.
One important thing, It doesn't support GC-ed types, you need to pass a pointer (example with seq in the README) or use Nim channels.
There are a couple of low-level routines that may be of interest:
There are 10 benchmarks available to stress several aspects of the runtime, 8 being as fast or much faster than established runtime like Intel TBB or GCC/Clang/Intel OpenMP (and the 2 slow ones being parallel reductions):
Name | Parallelism | Notable for stressing |
---|---|---|
Black & Scholes (Finance) | Data Parallelism | |
DFS (Depth-First Search) | Task Parallelism | Scheduler Overhead |
Fibonacci | Task Parallelism | Scheduler Overhead |
Heat diffusion (Physics) | Task Parallelism | |
Matrix Multiplication (Cache-Oblivious) | Task Parallelism | |
Matrix Transposition | Nested Data Parallelism | Nested loop |
Nqueens | Task Parallelism | Conditional parallelism |
SPC (Single Task Producer) | Task Parallelism | Load Balancing |
Does this aim to be part of Nim? Part of the standard library? Does it have an accessible api, that feels like idiomatic nim code?
Thanks!
Does this aim to be part of Nim?
No
Part of the standard library?
It's probably too big, though some of the underlying code like the memory subsystem could be in the standard library.
Does it have an accessible api?
spawn/sync/Flowvar are directly taken from https://nim-lang.org/docs/threadpool.html. The parallelFor is just a for-loop.
It's similar to the Rayon or Tokio libraries from Rust.
Weave provides a set of fundamental building blocks to create safe and high performance multi-threaded programs.
Anyone who wants to write multi-threaded programs in Nim should be very excited about this.
like Rust, Nim itself provides a minimal runtime (as it should be for systems languages), and allows libraries to provide fundamental features without modification of the compiler.
This is actually a core philosophy of Nim. It has a small core but allows for large extensions (mainly through macros and a few other features.)
A runtime means something that operates at runtime with extra overhead compared to anything done at compile-time by the compiler. It also means potentially interoperability issue.
For example, a garbage collector or a reference counting scheme is also a runtime, extra overhead, not done at compile-time. Systems languages need to keep a runtime very lightweight in most cases and inexistent in certain critical cases (operating system, interoperability/embedability in other language).
Now a runtime is also supposed to bring benefits: abstract away the programmer's worries of manual memory management or manual thread management.
I would say the goal is not to replace threadpool, but provide an advanced version. The standard library threadpool gives you a simple way to use threads, however those are not load balance which is a critical issue in many cases.
Now users have 2 choices, either they have a common use-case (process all my tasks as fast as possible) and they can use Weave. Or they have very specific constraints like real-time scheduling/latency/fairness or priority jobs and they need to build their own scheduler on top of the threadpool or raw Nim threads. For example if you process audio or video in real-time, the goal is not to process the whole video as fast as possible but to get the next frame processed before the deadline. Weave would guarantee the former, but maybe it would schedule the first frame as the very last processed (guarantee of throughput but not of latency).
@mratsim, Sorry. I should have been more specific.
I didn't mean to compare Weave to those Rust libraries in terms of features. I was comparing them in terms of library size and level of abstraction (for lack of a better phrase.)
What I meant to say is both Tokio and Rayon are "runtime" libraries for Rust in the same way that Weave is a runtime library for Nim.
I only support trivial types, checked at compile-time via supportsCopyMem(T).
What I do is here: https://github.com/mratsim/weave/blob/v0.1.0/weave/parallel_tasks.nim#L125-L150 From the function call spawn foo(a, b, c), I check the return type (void or if I need a future).
I package the following in a task:
The task is allocated on a shared memory heap via a memory pool (or via malloc). It is load-balanced between threads if needed and when comes execution:
I can't edit my title but I'm happy to announce the release of Weave 0.2.0 codenamed "Overture".
This is the result of a fight of over 8 hours to add reconcile setting $PATH in Azure Pipelines and nimble/findExe.
Weave now supports Windows in addition to Linux, macOS and all platforms with Pthreads.
Furthermore, Weave Backoff system has been reworked, formally verified to be deadlock free. It is now enabled by default and is without any noticeable performance impact. It allows Weave to park idle threads to save power.
Side-story: In the process a critical bug in glibc and musl implementation of condition variables has been found, signal does not always wakeup a waiting thread. This does not happen with MacOS and Windows condition variables.
And for the new year 0.3.0: https://github.com/mratsim/weave/releases/tag/v0.3.0
Next developments will probably take a while as the "low-hanging" fruits are done (i.e. from my PoC in July/August). If someone wants to add something like graphviz output to Synthesis that would be helpful to display Weave internal/control-flow visually.
Changelog
One thing of note: measuring performance on a busy system is approximative at best, you need a lot of runs to get a ballpark figure. Furthermore for multithreading runtime, workers often "yield" or "sleep" when they fail to steal work. But in that case, the OS might give the timeslice to other processes (and not to other thread in the runtime). If a process like nimsuggest hogs a core at 100% it will get a lot of those yield and sleep timeslices even though your other 17 threads would have made useful progress. The result is that while nimsuggest is stuck at 100% (or any other application), Weave gets worse than sequential performance and I don't think I can do anything about it.