nimforum mirror - Nim in production #2: 100k+ WebSocket connections on one small VM, lots of CPU and RAM to spare

guzba (orginal) [2023-04-03T20:32:11+02:00] view original

Hello all, I have migrated another production web service over to Nim and wanted to share how it turned out. I hope some of you may find this interesting.

The Migration

The service I migrated to Nim manages around 100k concurrent mostly long-lived WebSocket connections. These WebSocket connections are opened by client apps (Android, Windows desktop, etc) and are used to receive realtime events.

I wrote the new WebSocket server using Mummy, Ready and JSONy. I wasn't quite sure what to expect for performance but have been pleasantly surprised.

The new Nim server has now been running for a week now on a single 2 vCPU + 16GB RAM VM. This VM is behind a load balancer for HTTPS termination, the same kind of setup as described in my previous post.

With 100k open WebSocket connections, the new server's CPU and RAM usage are both under 10% (~7.5%ish). This is pretty great.

While I don't have 1 million production WebSocket connections to prove it, this provides a solid indication that Nim + Mummy can handle 1M+ WebSocket connections on a single small VM which is quite cool.

The server is also not just sitting there idle either, every minute over 200k messages are sent to clients (most of these are simple heartbeat messages).

Considering this new Nim server has been humming along without any issues for the past week I am ready to say the migration has been a success.

Thoughts

I have now put Nim and Mummy into production serving both a lot of HTTP request traffic and a lot of WebSocket connections. Since everything has performed really well in both cases it is pretty easy to say I am happy with the results I have achieved with Nim.

Why does this matter to you?

Many other languages used for web programming are either not capable of handing so many WebSocket connections per VM or have to go to extraordinary effort to make it work. This out-of-the-box performance highlights a strength of Nim.

If you're learning Nim (or interested in doing so), this provides real-world evidence that Nim is capable of exceptional performance for web servers too, not just in games or systems programming. Seems smart to me to learn a language that can do so much while still having such pleasant syntax.

Nim 2.0 is very close and for me the highlight is ARC/ORC + threads:on as the new defaults. Well, everything written above is multi-threaded and compilied with those settings. Having these servers running in production for days, weeks and soon months without issue provides more proof that Nim's threading capabilities are very real, solid and perhaps under-utilized.

It can be hard to know if a programming language or web server or lib or whatever is a good choice without real-world testimonials of things actually working in production. A demo app and passing some tests are not always enough to go on. Does the system last in front of a lot of traffic for months on end? That's a very high bar to clear and what I want to prove.

Thanks for giving this a read and good luck with your own Nim adventures.

Araq (orginal) [2023-04-03T20:43:43+02:00] view original

Can we turn this into an article for the website?

guzba (orginal) [2023-04-03T20:56:48+02:00] view original

Yes certainly, that's all good with me. You are welcome to use the post however you'd like. If more details would be helpful for an article format I can certainly add some. Perhaps I could even open source some code? Just offering. Regardless I'm happy if my little success story can be valuable to getting more folks to take the plunge and try Nim!

elcritch (orginal) [2023-04-03T23:52:02+02:00] view original

Considering this new Nim server has been humming along without any issues for the past week I am ready to say the migration has been a success.

Congrats! Could you give any insights into the previous system and how it compares to the new Nim codebase? It'd be cool to know what factors led to the re-write as well, like was it performance or maintenance costs.

JeysonFlores (orginal) [2023-04-04T01:20:54+02:00] view original

I'd love to see part of this code open sourced and of course a more detailed description of the service!

PMunch (orginal) [2023-04-04T11:26:36+02:00] view original

I would be interesting to know what the server you're replacing was written in and how the performance of the old server compares to the Nim rewrite. Also a bit of the rationale as to why you've decided to migrate it.

inv2004 (orginal) [2023-04-04T19:15:16+02:00] view original

It sounds very interesting. Can I ask technical question - did you compare with async solution?

Because even if it is loong-running websockets - it is still 100k+ threads with context switch and, probably, most of the time websocket is just waiting.

I like Mummy's idea to have simple interface without async, but still interesting to compare with async

guzba (orginal) [2023-04-04T19:36:30+02:00] view original

There is not 100k+ threads.

There is 1 Mummy socket thread and 2 worker threads for the HTTP Upgrade requests. Then there is 1 thread receiving on a Redis pubsub connection and 1 thread handling heartbeat intervals.

In total there are 5 threads and this need does not scale with number of connections.

I'll clean up and open source a minimal example to show how to set this up.

arnetheduck (orginal) [2023-04-04T21:04:15+02:00] view original

Nice!

In total there are 7 threads and this need does not scale with number of connections.

the documentation notes that "handlers" are free to take over the thread and do lengthy processing, as on of the main advantages over async - "block postgresql" is cited as an example - what happens after the "few" worker threads are up? ie if each of those blocking queries takes up a thread?

guzba (orginal) [2023-04-04T21:42:53+02:00] view original

I have put together a simplified example WebSocket server that works in the same way as the production server this post is about.

The example can be seen at https://github.com/guzba/mummy/blob/master/examples/advanced_websockets.nim

This is not exactly simple but at ~200 lines it is hopefully something those interested can get through and get some value out of. I can add more comments or answer more questions from here now that we have some code I can reference.

Not that this does use threads and global memory so there are some locks etc required.

guzba (orginal) [2023-04-04T21:55:46+02:00] view original

This is a reasonable but also complex question.

You can choose how many worker threads you want. I've confirmed having 1k worker threads is totally fine, just totally unnecessary in my case. They all are parked anyway so the overhead is basically zero until and unless you start using them.

If you do happen to block all the worker threads, requests will get queued waiting for a worker. In a bursty situation, this could be fine, they wait a little bit but then get handled. In the case you're falling behind (more requests coming in than going out), eventually the server will oom. This is totally the same as async though imo, with caveats as always.

In the Postgres specific case, yes if you block all your workers on long queries, you'll start to accumulate a request backlog. However, importantly, just how many Postgres connections do you have? Async will not be able to make more progress than the number of connections either. And Postgres connections in specific are super heavy, you probably don' have 1k of them. In this case, I don't see compelling evidence async would do better than worker threads.

There is a lot to this and I'm sure you're aware of it all too. Happy to answer more questions if you interested. My main thesis is more "pro-thread" than "anti-async" to be clear. I just like blocking code, maybe I'm crazy.

curioussav (orginal) [2023-04-05T10:20:05+02:00] view original

This is super encouraging to hear. I was really tempted to see if I could get away with putting some nim into production at work recently but I’m holding off for the moment.

One thing I didn’t see last time I searched were nim libraries for open telemetry. If I can find the time I might try to work an implementation

federico3 (orginal) [2023-04-05T16:45:24+02:00] view original

eventually the server will oom. This is totally the same as async though imo, with caveats as always.

Unmanaged queues are usually a pattern to be avoided due to the bad failure mode (oom) but also because clients can timeout and generate unnecessary load. Often it's better to apply (incremental) backpressure e.g. return HTTP 429 early.

Unfortunately unmanaged queues are present in many async implementations.

treeform (orginal) [2023-04-06T02:59:43+02:00] view original

I want to highlight that this is based on the mummy webserver that does not use Nim's asyncdispatch, but instead uses OS threads, ORC memory management, and a custom HTTP parser. It's all forward-looking to Nim 2.0. It has a really cool model where all IO is handled asynchronously in one thread, but all the real CPU work is done in a fixed thread pool. This means that there is no "what color is your function" problem. Operations that are not async, such as DNS lookup, file reading, or any OS call, just work. This also means that this server can't really succumb to many of the performance issues that can happen to a single thread-per-connection server or an async-but-single-threaded server. It's a cool combination of both.

I also want to highlight that this type of WebSocket support is pretty rare. Most web servers treat WebSockets as an escape hatch, where they just hand you a TCP socket after they do the initial HTTP handshake. But not with mummy. WebSocket support is first class. Each message gets put into the thread work pool. New messages don't run before old messages are finished. There is no async reading loop. Everything is very fast.

Upcoming Nim 2.0 and the new --threads:on and --mm:orc by default are uniquely suited for these performance gains. It's very cool to see real-world benchmarks. It's easy to have synthetic benchmarks win, but nothing can beat real-world experience. This also shows just how many connections can be served, even with a modest VM. Maybe as an industry, we should spend less time on cloud, Kubernetes, and microservices, but just more time in the profiler and optimizing things. Maybe then Twitter can actually run on one machine if it were written in Nim.

pp (orginal) [2023-04-06T09:15:59+02:00] view original

About websocket does Nim have support for socket.io with C backend? Do you use some kind of library or plain websockets where you put your implementation on top?

treeform (orginal) [2023-04-06T21:58:52+02:00] view original

I have used socket.io maybe 15 years ago when browsers had support for WebSocket was just rolling out. I would not use socket.io today because browsers have had full support for WebSocket for like a decade. Just like people stopped using JQuery when browser support got good enough.

Mirror of forum.nim-lang.org

10066 :: Nim in production #2: 100k+ WebSocket connections on one small VM, lots of CPU and RAM to spare

The Migration

Thoughts