Hi. I've recently came across Nim. I got convinced by Andres and his great presentation about Nim(rod)'s metaprogramming capabilities that this language really deserves an attention. I have spent many years playing with C++ templates, but Nim offers so much more. Especially interesting is building templates (such as HTML templates) out of AST and some high-level construct.
So going back to the point, Nim seems to be (could be) perfect high-performance solution for advanced web services, due great language and performance of C... but according to my own benchmarks right now IT ISN'T.
It seems currently AsyncHTTPServer is single threaded. Unfortunately I couldn't find any clue if it can be made multi-threaded so we can match Nim with best frameworks. For me proving that Nim is same fast as native C or Java solutions is main selling point, since I get:
So again, I need some clues how we can make AsyncHTTPServer to reach 60k req/sec. Also I don't want to introduce any extra dependency, like proxying several Nim instances with Nginx.
I did what I could and documented it here https://github.com/def-/nim-http-speedup and the accompanying PR: https://github.com/Araq/Nim/pull/2244
This is still single threaded, but performance should be improved 3-4 times for your Hello World test, which should beat the other frameworks even single-threaded.
Still, the main problem is that Nim's async implementation has many GC collected heap allocations and copies and turns procs into closures. I don't know if there's a way around that.
I've recently talked to k1i on IRC and he's looking into speeding up Nim's HTTP server, mainly by multithreading and edge-triggered epoll/kqueue.
Alright, I merged your pull request into my local repo, and re-run test I got almost 2 times performance boost, however it is still below Java/C/Go benchmarks, see: https://github.com/nanoant/WebFrameworkBenchmark
So it is really impressive for single threaded, but it could be even more if we run it multithreaded. Once this lands into main Nim branch we can ask Techempower to update their benchmark at https://www.techempower.com/benchmarks/ hopefully making Nim to jump into 1st position ;)
dom96 wrote: Out of curiosity, how does the Nim standard implementation compare to @def's PR with the mark and sweep GC?
@def's PR with mark & sweep GC is 67 330 req/sec while standard GC non-patched Nim is 28 994 req/sec.
One has to be careful with assessing the results of such benchmarks. Crucially, once you start looking at large heaps (multiple GB), then mark-and-sweep GCs can quickly become less attractive.
A common scenario where mark-and-sweep doesn't perform well is one where you have, say, 2GB of memory that is never collected. This means 2GB of extra marking work (and generally, pauses that last multiple seconds) for each collection. The deferred reference counting collector only needs to touch the actual changing part of the heap (except for cycles, but you usually can avoid cycles).
I'll also add that I personally wouldn't run any mission-critical internet facing service with -d:release. At the very least I'd enable the usual memory safety checks so that buffer overflows etc. result in an error rather than a potential exploit.
Works for me, updated results for Nim 0.11:
https://github.com/nanoant/WebFrameworkBenchmark
https://github.com/nanoant/WebFrameworkBenchmark/commit/e08ec2d989
Looking at your results again I do wonder, why do MB/s differ so much? Shouldn't the results be ordered by that?
It doesn't seem like your benchmarks transfer the same data.
Looking at your results again I do wonder, why do MB/s differ so much? Shouldn't the results be ordered by that?
Because some frameworks tend to add their own headers such as Server, Date, and some other not. Yet I don't think this has so much impact on overall performance. This is all about req/sec not MB/s. But to prove that I'd need to make all framework emit same headers, which could be tricky.
Thank you very much for your initiative, @ono!
I'm a big fan of Nim (since 2012!), though so far mostly from the sidelines: I'm trapped using scripting languages, and rarely find economic justification to use a compiled static language (except occasionally C). To justify using Nim, it would need to be a champ at high-productivity development of highly-scalable server-side API's - significantly faster than things like OpenResty (LuaJIT in nginx), PUBE (pypy + uwsgi + bottle + (e)nginx), etc.
This is why I think benchmarks like TechEmpower as well as yours are very important. I hope this leads to focused optimization work and improved Nim performance in future rounds, which I believe will bring it much recognition.
A Facebook post I've made on my Copyfree page showing your results has had over a thousand views! (Official Web-site for the Copyfree Initiative is copyfree.org.)
Unfortunately Nim does not score much, because it is single-threaded, comparing to other multi-threaded solutions. Yet Araq told me there are some real plans for multi-threaded HTTP async server. So keeping my fingers crossed.