nimforum mirror - Making AsyncHTTPServer multi-threaded?

ono (orginal) [2015-04-13T16:46:00+02:00] view original

Hi. I've recently came across Nim. I got convinced by Andres and his great presentation about Nim(rod)'s metaprogramming capabilities that this language really deserves an attention. I have spent many years playing with C++ templates, but Nim offers so much more. Especially interesting is building templates (such as HTML templates) out of AST and some high-level construct.

So going back to the point, Nim seems to be (could be) perfect high-performance solution for advanced web services, due great language and performance of C... but according to my own benchmarks right now IT ISN'T.

It seems currently AsyncHTTPServer is single threaded. Unfortunately I couldn't find any clue if it can be made multi-threaded so we can match Nim with best frameworks. For me proving that Nim is same fast as native C or Java solutions is main selling point, since I get:

great & clear high-level syntax

great performance

no havoc and no dependencies

once built self-contained binary app

So again, I need some clues how we can make AsyncHTTPServer to reach 60k req/sec. Also I don't want to introduce any extra dependency, like proxying several Nim instances with Nginx.

def (orginal) [2015-04-13T17:52:43+02:00] view original

I did what I could and documented it here https://github.com/def-/nim-http-speedup and the accompanying PR: https://github.com/Araq/Nim/pull/2244

This is still single threaded, but performance should be improved 3-4 times for your Hello World test, which should beat the other frameworks even single-threaded.

Still, the main problem is that Nim's async implementation has many GC collected heap allocations and copies and turns procs into closures. I don't know if there's a way around that.

I've recently talked to k1i on IRC and he's looking into speeding up Nim's HTTP server, mainly by multithreading and edge-triggered epoll/kqueue.

ono (orginal) [2015-04-13T18:17:24+02:00] view original

Alright, I merged your pull request into my local repo, and re-run test I got almost 2 times performance boost, however it is still below Java/C/Go benchmarks, see: https://github.com/nanoant/WebFrameworkBenchmark

So it is really impressive for single threaded, but it could be even more if we run it multithreaded. Once this lands into main Nim branch we can ask Techempower to update their benchmark at https://www.techempower.com/benchmarks/ hopefully making Nim to jump into 1st position ;)

def (orginal) [2015-04-13T18:36:58+02:00] view original

You should also use --gc:markandsweep or --gc:boehm for more performance.

ono (orginal) [2015-04-13T18:56:43+02:00] view original

OMG OMG! With --gc:markandsweep it is now faster than Nginx and matches multithreaded Java's undertow while being single-threaded! 67k req/sec. It has to be updated on https://www.techempower.com/benchmarks we need to spread this great news to the web developers community.

dom96 (orginal) [2015-04-13T21:59:20+02:00] view original

@ono Out of curiosity, how does the Nim standard implementation compare to @def's PR with the mark and sweep GC?

ono (orginal) [2015-04-13T22:07:52+02:00] view original

dom96 wrote: Out of curiosity, how does the Nim standard implementation compare to @def's PR with the mark and sweep GC?

@def's PR with mark & sweep GC is 67 330 req/sec while standard GC non-patched Nim is 28 994 req/sec.

Jehan (orginal) [2015-04-13T23:46:00+02:00] view original

One has to be careful with assessing the results of such benchmarks. Crucially, once you start looking at large heaps (multiple GB), then mark-and-sweep GCs can quickly become less attractive.

A common scenario where mark-and-sweep doesn't perform well is one where you have, say, 2GB of memory that is never collected. This means 2GB of extra marking work (and generally, pauses that last multiple seconds) for each collection. The deferred reference counting collector only needs to touch the actual changing part of the heap (except for cycles, but you usually can avoid cycles).

I'll also add that I personally wouldn't run any mission-critical internet facing service with -d:release. At the very least I'd enable the usual memory safety checks so that buffer overflows etc. result in an error rather than a potential exploit.

Sixte (orginal) [2015-04-14T00:02:20+02:00] view original

Is it possible to circumvent the checking against cycles in Nim's GC? ( In Rust, the ref.c. GC was removed Sept/14 ... and the actual RC/GC doesn't cycle-check )

Jehan (orginal) [2015-04-14T00:10:07+02:00] view original

The compiler normally infers from the types whether cycle checking is necessary. You can use the {.acyclic.} pragma to tell the compiler that a data structure doesn't contain cycles, too.

ono (orginal) [2015-05-02T14:27:58+02:00] view original

Works for me, updated results for Nim 0.11:

https://github.com/nanoant/WebFrameworkBenchmark

https://github.com/nanoant/WebFrameworkBenchmark/commit/e08ec2d989

dom96 (orginal) [2015-05-02T17:21:45+02:00] view original

Looking at your results again I do wonder, why do MB/s differ so much? Shouldn't the results be ordered by that?

It doesn't seem like your benchmarks transfer the same data.

ono (orginal) [2015-05-05T18:57:41+02:00] view original

Looking at your results again I do wonder, why do MB/s differ so much? Shouldn't the results be ordered by that?

Because some frameworks tend to add their own headers such as Server, Date, and some other not. Yet I don't think this has so much impact on overall performance. This is all about req/sec not MB/s. But to prove that I'd need to make all framework emit same headers, which could be tricky.

Libman (orginal) [2015-05-06T18:32:19+02:00] view original

Thank you very much for your initiative, @ono!

I'm a big fan of Nim (since 2012!), though so far mostly from the sidelines: I'm trapped using scripting languages, and rarely find economic justification to use a compiled static language (except occasionally C). To justify using Nim, it would need to be a champ at high-productivity development of highly-scalable server-side API's - significantly faster than things like OpenResty (LuaJIT in nginx), PUBE (pypy + uwsgi + bottle + (e)nginx), etc.

This is why I think benchmarks like TechEmpower as well as yours are very important. I hope this leads to focused optimization work and improved Nim performance in future rounds, which I believe will bring it much recognition.

A Facebook post I've made on my Copyfree page showing your results has had over a thousand views! (Official Web-site for the Copyfree Initiative is copyfree.org.)

ono (orginal) [2015-05-18T01:00:10+02:00] view original

FYI updated my benchmark on some more server-like machine, my Linux workstation having 6 core Xeon CPU.

https://github.com/nanoant/WebFrameworkBenchmark

Unfortunately Nim does not score much, because it is single-threaded, comparing to other multi-threaded solutions. Yet Araq told me there are some real plans for multi-threaded HTTP async server. So keeping my fingers crossed.

dom96 (orginal) [2015-05-18T01:15:19+02:00] view original

@ono could you test to see how well Jester performs?

ono (orginal) [2015-05-18T10:19:26+02:00] view original

@dom96 I did, but it was some really bad number, so I skipped Jester for the moment. I have similar problem with some Node frameworks on this Linux machine. They get around 2.5k req/sec while I would expect something around 80k req/sec. There are some stalls, maybe a kernel bug or misconfiguration, I haven't yet figured it out.

dom96 (orginal) [2015-05-18T15:32:17+02:00] view original

@ono How long ago did you try it? I recently optimised it a bit.

ono (orginal) [2015-05-18T21:53:50+02:00] view original

@dom96: I managed to update Jester and rerun benchmark, not it scores around ~80% performance of pure async HTTP module. Not bad, not bad. But not it would be great to try to reach 90% :)

Mirror of forum.nim-lang.org

1125 :: Making AsyncHTTPServer multi-threaded?