Hey all, I recently migrated a bunch of production HTTP traffic to a new server written in Nim using Mummy, Ready, JSONy and Curly.
The results have been great so I want to share some concrete details about the real-world performance that can be achieved today using nothing but open source Nim code to build the server.
The context: The new server is currently handling around 350 requests per second constantly. Peaks are about double that. The hardware this server is running on is a 2 vCPU + 2GB RAM virtual machine. And it's only using ~10% of the CPU and less than 10% of the RAM in top. This is not a big VM, and even still it is barely noticing the traffic.
The VM is in Google Cloud. I have the VM behind a load balancer which handles HTTPS termination. This is a "load balancer" in-name-only since just one VM is able to easily handle all the traffic.
This is an API server so it is JSON-in-JSON-out with some Redis and database / HTTP RPC calls in the endpoint handlers.
I have had the process running for 48+ hours between deploying updates and the memory use has been perfectly stable. This provides evidence I am not struggling with memory leaks.
My goal with this post is to share that 1) Nim is being used in production with hundreds of thousands of users, and 2) Nim 1.6.10 with --threads:on --mm:orc + Mummy are performing great. For me, Mummy, Ready etc have been proved to be reliable for building real services.
So why is this of interest? Well it's pretty rare to hear about the performance of real production services. Usually its just benchmarks of artificial scenarios which are of murky value.
Because every service is different it is hard to make comparisons, however, this post does give you evidence that Nim is capable of thousands of requests per second per vCPU. I have every reason to believe a 4, 8, 12, 16+ vCPU VM would scale throughput linearly too, all else equal. One machine running a Nim server handling 10s of thousands of real-world HTTP requests per second is totally possible. Likely much more than that.
Some HTTP performance links as food for thought:
GitHub processes 2.8 billion API requests per day, peaking at 55k requests per second
Production Twitter on One Machine? 100Gbps NICs and NVMe are fast
Really cool to see (impressive) real world numbers from a framework!
Have you tried running any stress tests to see what the max performance from the single server is?
There are some stress tests for mummy, see the readme: https://github.com/guzba/mummy#benchmarks
But I don't think synthetic stress tests are useful here, besides telling you it's good enough. They pretty much converge to the same value. If you just run the webserver that does practically nothing, the async stuff will be faster. But as I learned the hard way, a single blocking operation or cpu intensive operation can really lower the performance of your async server.
Blocking or CPU intensive operation are really hard to find and get rid of. Any library, os and any API call can be that. Are you talking to a database (can randomly block)? Are you talking to an API (DNS resolution blocks)? Are you parsing json (can be CPU intensive for large amounts)? Are you reading a file (blocks)?
This is where mummy wins, because now each request can go into a thread and can be on its own core. Unlike the old threaded server model, the I/O is done in dedicated reactor/async thread, it isolates your code reading/writing to socket using HTTP. Your code runs in a thread and can block and do CPU stuff, and with modern computers having so many cores its a good fit. And mummy can have 1000 threads... that's why it works.
Stories like this are great to hear! I guess I should've shared how I ran Nim in production at my old job. It was as part of a DNS server and handled about a thousand requests a second (with decryption and verification of messages, and logging via Kafka using both async and threading). Ran stable and as far as I know still does. It was also split in two parts, one cross compiled to Windows and Linux, and the server part only running on Linux (never tried to compile it for Windows but I don't see why that wouldn't work) from the same codebase.
It's really nice that you're sharing this, more people need to hear these stories. Just out of curiosity, how much that machine cost you a month? For companies it would be nice to show how using a better language can give lower costs. I'm currently running a couple small webservers and worker scripts on the lowest level Linode tier for work, and it's like they're not even there (would love to give performance numbers, but just running htop over ssh skews the CPU numbers).
That's great, I might contribute to code that lets the server listen over multiple sockets at the same time, and use socket file descriptor inherited from the environment.
I'm struggling with HTTP servers in Nim because of that. Ideally I'd want to do systemd socket activation, and I also have services listening to both IPv4 and IPv6 on non wildcard addresses. This makes 2 sockets.
For static files, I would have thought of something like X-Sendfile or X-Accel-Redirect:
https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/