nimforum mirror - How mature is async/threading in Nim?

snej (orginal) [2020-05-19T01:56:12+02:00] view original

The project for which I'm evaluating Nim does a lot of concurrent network I/O (over WebSockets) and database access. It needs to run on most mainstream platforms (iOS, Android, Mac, Windows, Linux.) The current implementation is in C++17, using a homemade Actor library. The other language I'm seriously considering is Rust.

Nim seems to have the necessary pieces, like the async macro and the asyncnet module. But are they mature enough to use in production code?

I have a few specific questions:

The asyncnet module documentation has a couple of caveats about Windows, like "on Windows it only supports select()" and "In theory you should be able to work with any of these layers interchangeably (as long as you only care about non-Windows platforms)." I'm not clear on how this would impact clients, and whether these caveats apply to using asyncnet or just the lower-level modules.

The async examples I've seen run on a single thread. Is there any support for distributing async calls across a thread pool?

The spawn function's thread safety seems to be based on segregating heap objects per thread (Erlang-style.) This can involve a lot of copying, especially when passing data buffers around. Is this something baked into the language or is it specific to the spawn function? Are there alternatives to this, like the more Rust-style model using move semantics?

We found an interesting quote on r/nim from a few days ago:

What ... caused my last team to abandon Nim in favor of Haskell ... was the weak concurrency story. There is a story there (thread-local heaps) but we found it far too easy to get yourself into very confusing situations where behavior wasn't as expected (especially since you can pass around pointers which breaks the thread safety checker).

To be fair, our C++ code has almost no built-in thread safety at all; you have to be careful what types you pass to Actor methods. But that's one of the things I'd like to improve on!

I've written some prototype code in both languages (C API bindings, not any async or net code yet) and I have to say I really enjoyed Nim a lot. Rust I found frustrating, and ugly due to lack of function overloading and the need to convert between equivalent types like str/String. Nim also builds faster, and seems to generate much smaller code. But Rust does have huge momentum behind it so it feels safer for that reason, and the ironclad memory safety is a good thing to have.

Any comments or perspective from those who've been using Nim a lot?

--Jens

snej (orginal) [2020-05-19T02:09:05+02:00] view original

Right after posting that (how often this happens!) I came across the big ARC thread here from last December. It sounds like ARC means a lot of (positive) changes to the things I've read earlier, like:

The heap is now shared as it's done in C++, C#, Rust, etc etc. A shared heap allows us to move subgraphs between threads without the deep copies but the subgraph must be "isolated" ensuring the freedom of data races and at the same time allowing us to use non-atomic reference counting operations. How to ensure this "isolation" at compile-time was pioneered by Pony and we can do it too via our owned ref syntax

I've been interested in the Pony language for several years and adopting its memory model would be amazing!

I'm still reading through this long thread. At this point I'm unclear on how much of this stuff is solid and enabled-by-default (in particular, what's the difference between "arc" and "orc"?)

Yardanico (orginal) [2020-05-19T03:22:49+02:00] view original

ORC is ARC with a cycle collector to deal with cycles (and collect them so they won't leak)

treeform (orginal) [2020-05-19T07:56:19+02:00] view original

I use async/await in production, but only server side on Linux. It's pretty mature for a simple web application. My app runs on multiple servers instead of threds. Async does not mesh well with threds. Multiprocessing is more scalable anyways.

federico3 (orginal) [2020-05-19T11:12:55+02:00] view original

Parallelism/concurrency and async are some of the few pain points of the language. ARC/ORC together with https://github.com/mratsim/weave might be very promising.

Araq (orginal) [2020-05-19T13:19:45+02:00] view original

At this point I'm unclear on how much of this stuff is solid and enabled-by-default (in particular, what's the difference between "arc" and "orc"?)

ARC is in version 1.2 with significant stability improvements around the corner in 1.2.2. Many Nimble packages already work with --gc:arc. While the stability is still not good enough for Nim compiler bootstrapping, for new projects I wouldn't use anything else because the tooling is so much better. All the sanitizers from C++ simply work, compile your code with nim c --gc:arc --debuginfo -d:useMalloc y.nim && valgrind ./y and you can be assured the remaining ARC bugs (sorry!) don't affect you.

You can also slowly move from the C++ code to Nim, the interop between Nim and C++ is superb and only getting better with ARC.

andrea (orginal) [2020-05-19T14:16:10+02:00] view original

for new projects I wouldn't use anything else because the tooling is so much better

That would be great, but it requires an introduction that explains to users what ARC is, how to make use of it, how it impacts multithreading, the new sync and lent parameters, how to design collections and libraries without a GC and much more.

snej (orginal) [2020-05-19T17:11:42+02:00] view original

Async does not mesh well with threds.

Could you explain why not? My understanding is that it’s thread-agnostic; an unfinished async call is just a sort of lightweight continuation that can be restarted in any context.

Multiprocessing is more scalable anyways.

This project is a library for (primarily) mobile apps, so that’s not an option!

snej (orginal) [2020-05-19T17:19:44+02:00] view original

it requires an introduction that explains to users what ARC is, how to make use of it

+1 👍🏻 The existing documentation is great (I’ve read the tutorial, manual, and “Nim In Action” cover to cover), but in some areas seems to lag behind. Which is understandable since the language is evolving quickly.

I’m one of those weird people who likes writing documentation, so maybe when/if I get up to speed on this stuff I can help out.

jasonfi (orginal) [2020-05-19T18:32:14+02:00] view original

Are you using Jester or another web framework?

k0zmo (orginal) [2020-05-22T14:33:20+02:00] view original

Not really. What you can achieve is something similar to what I created with httpbeast: each thread running its own Async event loop and allowing the system to load balance the socket connections. For clients that will likely be trickier.

This event-loop per thread is only required on Linux where epoll is inherently single threaded. Using IOCP you simply spawn N threads and make them spin on GetQueuedCompletionStatus. All balancing is done by the kernel. ARC should make this work more natural.

jackhftang (orginal) [2020-05-23T16:17:48+02:00] view original

Speaking from my user experience with nim (~1 yr), you don't have to worry about the stability of async related things, they are quite robust.

The things that is not very mature is the GC. I am not talking about the arc one, but the current default one refc. I have a single threaded project around 10k~20k line. When I run tests with defaut gc, 3 to 4 times out of 10, I will run into illegal storage access error. After I moved to boehm, I have never seen the same error again. There are chance that could be my fault, but I believe this is GC bug.

For inter-thread communication, I have made a library https://github.com/jackhftang/threadproxy.nim to simplify ITC programming. You can take a look at it~ Again, in my practical experience, the little trick to be stable for ITC is to use JSON as data exchange =] When I was developing multi-threaded program, I found that the deep copy of channels do not handle well with null pointer. It seems it will run into problem when there are nil somewhere in data structure. And with JSON you would easily have nil and cyclic structures.

slonik_az (orginal) [2020-11-25T22:27:53+01:00] view original

Bumping the thread to see what changed in the intervening 6 months. Specifically, is there a multithreaded executor for CPU bound async operations like rust's tokio ?

mratsim (orginal) [2020-11-28T09:05:54+01:00] view original

No but tokio is for multithreaded IO and IO-bound operations not CPU-bound.

Here is an outline of my evaluation on the challenges to write one https://forum.nim-lang.org/t/7065#44457 (better discuss in the current thread than the original).

slonik_az (orginal) [2020-11-28T20:37:13+01:00] view original

@mratsim: No but tokio is for multithreaded IO and IO-bound operations not CPU-bound.

That's incorrect. tokio has thread-pull based parallel executor that can utilize up to all the cpu cores and, thus, deal with CPU intensive threads.

mratsim (orginal) [2020-11-30T08:03:37+01:00] view original

The usual recommendation for CPU intensive task is to use Rayon. Tokio is ill-suited even if it uses a threadpool underneath.

The role of the threadpool is to avoid the overhead of create/teardown thread. What makes a multithreading runtime suitable for IO or CPU is its scheduler.

The Tokio scheduler has a budget system to ensure fairness and minimize latency (https://github.com/tokio-rs/tokio/blob/8880222/tokio/src/runtime/thread_pool/worker.rs#L192-L194) but this will hurt throughput which is what CPU intensive task need.

The reason why it hurts throughput is that when the budgeted time end, the task is unloaded until further notice. Unfortunately for a workload to be CPU-bound, it requires not to be memory-bound. The speed of CPU has been greatly improved in the past years but the speed of storage has not changed much. While waiting for L1 cache you can execute 50 instructions, while waiting for L2 cache you can execute hundreds, while waiting for disk or networking you can execute thousands.

This means that when you interrupt a CPU-bound task with a budget system, the data in cache will be flushed and need to be reloaded later, incurring heavy costs.

The main difference between IO and CPU is that for IO to make progress you need to wait while for CPU to make progress you need work. So IO multithreading is how to make multiple threads wait efficiently and CPU multithreading is how to make multiple threads work efficiently. While context switching a CPU doesn't do work, and those budget tracker which are great for fairness significantly hurt CPU-intensive workloads.

slonik_az (orginal) [2020-11-30T12:31:57+01:00] view original

@mratsim: Thanks for a great explanation of the intricacies of Rust's rayon and tokio.

Are there similar facilities in the Nim's universe?

pposca (orginal) [2022-11-13T22:55:45+01:00] view original

Well, it has been 2 years since the last post, and I wonder what is the state of Nim regarding IO and CPU intensive work.

I personally like Rust's multithreaded async (the chance to be able to opt-in, in fact) for IO tasks; and also Go goroutines, useful for both IO and CPU tasks. Does Nim has any alternative to them?

Araq (orginal) [2022-11-14T10:09:55+01:00] view original

There is https://github.com/status-im/nim-taskpools for example.

Mirror of forum.nim-lang.org

6352 :: How mature is async/threading in Nim?