nimforum mirror - Criticism of Parallel Nim

alexeypetrushin (orginal) [2021-03-13T21:13:38+01:00] view original

Couple days ago I started evaluating Nim for the server side.

I already used Nim for half a year, for data processing, it works well, so I decided to also use it on the server. The project is not critical, yet it's important enough so the server should work more or less reliably and be relatively performant.

Nim support Threads, Async and Weave and CPS via third party libraries. Unfortunately, it seems like right now none of it could be used to build simple, reliable and performant web server.

After couple of days trying to understand parallel Nim, I still don't feel like I understand how to use it. Parallel support in Nim feels fragile, experimental, unfinished and buggy.

Compared to learning experience in Java / Ruby / Elixir / Node - after a day or two you know pretty much everything you need to build web server and are ready to go.

Problems:

Async

While I personally don't like async as it has some problems, but ok, seems like that's the official Nim way, and I just have to accept it.

How to read/write some data from/to the file? You just use open/read/write file operations from the std you already know, right? Wrong. They can't be used because they are not async, you need to learn a special async API like std/asyncfile.

That was unexpected. I use one library (sync library) in my data processing code, but for the web server code I can't use it and should spent time learning something else? And what about lots of my custom functions that are written in sync style? Seems like I can't use it too. And you can't use third party libraries like DB API etc. that are not async. So no code reusability, code can't be shared easily.

It is worse than Node.JS, as in Node.JS all IO functions are async, no matter if used in plain scripting or web server, so you use same functions everywhere.

Another red flag, that pretty much makes Async unacceptable for any production server is ability to accidentally block the event loop.

Use some non-async IO call in any server handler, accidentally, and the whole server and all other request will block until that handler is finished.

You deployed changes with non-async function to the server, tested it on staging seems ok, then deployed to prod - and the server became unavailable. Because the problem didn't manifested itself on staging as that specific route with sync IO has not been hit there, but it was hit in production.

Or, you paid good money for ads, the traffic increased x10 times, but your server unexpectedly slowed down, providing terrible experience for users and wasting your ads money. Because of some sync function, that was invisible with usual load but manifested itself during x10 traffic boost.

Such problem is not possible in Node.JS because there's no sync IO functions (and sync CPU-intense functions are usually rare).

That pretty much means that you can't use Nim Async if you care about reusable code and reliable and performant server.

Threads

It seems that it works more or less well. But nobody uses it, there's no popular web servers with thread support, and everyone discourage using threads and advises to use async.

Also, there is the documentation, but it still leaves lots of blank spots, and there are surprises with compiler complains about GC-safety and Closure not supported, etc.

And lots of different GC-modes and ways to deal with threads just adds to confusion, so in the end you have no clear understanding how to actually use Threads in Nim.

3 Weave and CPS

Weave - is a specialised solution, not designed for web server like load.

CPS - experimental, not ready for usage.

All that pretty much means that currently Nim is not ready to be used on the server side.

Yardanico (orginal) [2021-03-13T21:34:17+01:00] view original

How to read/write some data from/to the file? You just use open/read/write file operations from the std you already know, right? Wrong. They can't be used because they are not async, you need to learn a special async API like std/asyncfile.

This is wrong - you can use them just fine, but they'll _block . And also std/asyncfile is not a fully-async implementation, as far as I understand it only allows to make async file descriptors and reading/writing is still synchronous.

That was unexpected. I use one library (sync library) in my data processing code, but for the web server code I can't use it and should spent time learning something else? And what about lots of my custom functions that are written in sync style? Seems like I can't use it too. And you can't use third party libraries like DB API etc. that are not async. So no code reusability, code can't be shared easily.

This is again wrong, you can use all sync code just fine in async code.

Yardanico (orginal) [2021-03-13T21:35:16+01:00] view original

And have you actually seen https://github.com/dom96/httpbeast (Jester uses it by default)?

alexeypetrushin (orginal) [2021-03-13T21:45:48+01:00] view original

This is wrong - you can use them just fine, but they'll _block .

Well, you also can use microscope to hammer nails, it would work too, no one said you can't do that.

mratsim (orginal) [2021-03-13T22:17:31+01:00] view original

Note that there is no way at a low-level to do file IO without blocking, kernels may just not offer async API. Javascript and Go delegates those calls to a threadpool to make them async.

jackhftang (orginal) [2021-03-14T03:18:38+01:00] view original

I have read your long previous post together with this one. My overall feeling is that you are shooting yourself in your foot.

If you have some custom synchronize functions and you want to use it in web server, then create dedicated threads to those task or send those tasks to some other dedicated processes/servers. There is more or less a architectural decision than the problem of language/stdlib. Also, It has nothing to do with reusability of code, calling sync code simply block no matter which languages or libraries you are using. It is the responsibility of programmers to choose the right solution to ensure the application meet the requirements.

It seems that it works more or less well. But nobody uses it, there's no popular web servers with thread support, and everyone discourage using threads and advises to use async.

The wordings you are using (not only in the above quote) are a bit extreme. There are good reasons to avoid using threads in web server. In many cases, a web server with only mult-threads is slower that a web server with only async (this was proven by the advent of nodejs). If you need to scale out a web server, the current technology tend to be orchestration. Also, it is true that fewer people choose to go with the multi-thread path (one of the reason is that it is harder to test and debug), it is not nobody. And even fewer people use threads, you can just go with your own path. As you have said, threads work more or less, it is not the problem of Nim. Nim is provide possibilities, it is the programmers decide how to construct their application.

I think the real problem your are facing is that your mind strongly refuse to write async code and you are not sure how to write proper multi-thread application in Nim.

alexeypetrushin (orginal) [2021-03-14T06:48:25+01:00] view original

If you have some custom synchronize functions and you want to use it in web server, then create dedicated threads to those task or send those tasks to some other dedicated processes/servers.

It's like double language problem in Python, if you want it fast you need to use C-wrappers. In this case if you want your server to be fast you can't use std/files and need to use something else, like std/asyncfile.

calling sync code simply block no matter which language or library you are using. It is the job of programmers to choose a sound solution that meet requirements

The difference is how server deals with the "blocking sync code".

alexeypetrushin (orginal) [2021-03-14T07:13:28+01:00] view original

Note that there is no way at a low-level to do some IO without blocking, kernels may just not offer async API. Javascript and Go delegates those calls to a threadpool to make them async.

The low-level primitives could be the same, but how those low-level building blocks are used make the end result different. I don't know what Erlang or Go use at the low level, but I can see the end result, simple blocking code that is fast.

Araq (orginal) [2021-03-14T08:04:01+01:00] view original

There is http://goran.krampe.se/2014/10/25/nim-socketserver/ which you can use to base your server on. It uses a single spawn handle(client) and inside your handler you can use ordinary, blocking Nim code that uses the thread pool and thus the blocking nature doesn't affect the handling of other clients.

Would I use that personally for production? Yes, I would, given the requirements (can use DB modules, don't have to watch out for blocking calls).

mratsim (orginal) [2021-03-14T08:25:32+01:00] view original

There is nothing parallel about a web server by the way. It's all about concurrency.

And the Erlang and Go approach have significant overheads that are unacceptable for a system programming language.

You can't use either Erlang or Go for parallel problems such as those from high-performance computing.

Erlang actors are just unsuited for this.

Go requires a different calling convention making calls to C (or Fortran) significantly more expensive, which would force use to basically not use all the C code lying around as we would be way slower. Furthermore Goroutines are littered with syscalls that just thrashes CPU caches with kernel data.

The low-level primitives could be the same, but how those low-level building blocks are used make the end result different. Erlang or Go somehow execute simple blocking code in an efficient parallel way. Freeing humans from hard work of using state machines and thread-pools explicitly :)

There is nothing preventing Nim for providing such but time and people ready to implement this. Unfortunately Nim doesn't have a worldwide company with unlimited resources (Telecom or Google) to push that everyday. Nim is still very much enthusiasts driven.

shirleyquirk (orginal) [2021-03-15T01:59:08+01:00] view original

i dont have a horse in this race, but neither guildenstern nor httpbeast rely on nim's async. httpbeast interfaces with asyncdispatcher to allow user code to be async (and to update the time). both use threads, both use selectors and their own event loop for the real work. So both are threaded and asynchronous.. Comparisons between them should not be stretched to draw conclusions about whether concurrency or parallelism is "better". Did no one read mratsim's blog post?

alexeypetrushin (orginal) [2021-03-15T06:43:53+01:00] view original

Couldn't render post #48381.

alexeypetrushin (orginal) [2021-03-15T06:45:38+01:00] view original

I think we can look at the most frequent web server architecture, for the inspiration, NGinx + PHP:

Request Receiver. NGinx, efficient async server, complicated low-level code. Accepts requests and holds it untill a free worker would be available. Smoothing and ballancing the load.

Pool of N Workers. PHP, single threaded unix process, plain easy code, not efficient. Process the request.

Binary Streaming, Static Files, Caching etc. NGinx, efficient async server, complicated low-level code.

Somethign like that could be used in Nim (seems like GuildenStern works that way):

Request Receiver. Async Nim Server. Hidden component, not used explicitly.

Pool of N Workers. Nim Thread Pool. About 95% application code would be here.

Binary Streaming, Static Files, Caching etc. Async Nim Server. About 5% of application code would be here, it's ok to write a little bit of async to make special cases more efficient, like serving static files or streaming binary content.

juancarlospaco (orginal) [2021-03-15T09:17:41+01:00] view original

In the end everything gets transformed into sync blocking code, AFAIK theres not truly "async assembly instructions", so Nim async macro expands to functions and you can see it with expandMacro, other programming languages lie to you that they are Async, and throw you into an internal threadpool or greenthread or similar thing with a hipster name.

cblake (orginal) [2021-03-15T09:56:48+01:00] view original

Hardware interrupts happen "between instructions" (though there are "polling modes" often useful in high perf scenarios). And what is being considered is largely about IO and so relates to HW interrupts on network cards/disks/keyboards/etc. The real world is more parallel & async than serial & sync. Serial & sync is just enough easier to reason about that we erect barrier after barrier.

There may be some "quantum of time" and, yeah, people have been chattering about being "in a simulation" which might have some sync program counter, but that stuff is all super speculative physics and/or philosophy. I doubt waiting on figuring any of that out would help IO abstraction. ;-)

dom96 (orginal) [2021-03-15T15:39:36+01:00] view original

Don't tell them their requirements are "weird", they are not.

Just to be clear, I have never told anyone in this thread that their requirements are "weird" (I don't think I ever said that to anyone honestly). Not sure why you are implying that I did.

The arguments against await are very solid.
And I'm challenging this "vast majority" claim.

Await isn't a panacea. But I am unconvinced that it's not the best approach for the vast majority of web apps.

We shouldn't be shy to tell our users about the spawn'ing server solution. [...] in the meantime for production usages we should encourage people to try both solutions and see what fits their use cases better.

Sure, I don't think we are being shy though. Plenty of people suggesting all sorts of solutions which is great!

I picked a real world example aligning well with alexeypetrushin's requirements:

Your example shows clearly that async/await fits those requirements well. After all, that is how the playground is implemented. I've also used the same architecture for my obfuscator and it works well.

There was no need for threads and no need to rewrite anything. So if it was in an effort to disprove my "vast majority" claim then it's not convincing me :)

Araq (orginal) [2021-03-15T20:43:35+01:00] view original

Since I've switched to GuildenStern (no async, threads only), things have been peachy, running very reliably. And with ARC, the memory consumption is super low, and remains super low in a deterministic way. My app uses blocking database calls, which sometimes take a few seconds, and with threads/spawn, I'm getting close to 100% utilization of my server's capacity when requests pile up. If I need to handle more scale, I'd just scale-out knowing that I'm not leaving any performance behind.

This is a very interesting success story. Would you consider writing a blog post for us about it?

boia01 (orginal) [2021-03-17T06:02:30+01:00] view original

Yes, I can write a blog post about this. I won't be able to share the specific code (it's proprietary) but I can create an example project that looks pretty close to it. It's a web server that receives/sends json and does postgres database calls. Not very fancy but overall representative of a lot of webapps, I think.

r3c (orginal) [2021-03-20T18:00:54+01:00] view original

"The Asynctread Wars started have..."

-- Master Yoda *

Mirror of forum.nim-lang.org

7621 :: Criticism of Parallel Nim