I'm happy to announce that GuildenStern 0.9 has just been released:
https://github.com/olliNiinivaara/GuildenStern
If you are interested in multithreading or web programming, please check it out.
I'll publish 1.0 to nimble after serious bugs (if any) have been detected and fixed.
My original idea was to use Weave, but as it's support for low latency is in research phase, I sticked to threadpool for now. But the opportunity is there...
Any feedback welcome.
Cheers!
For those interested into latency-optimized / IO-optimized multithreading here are a couple of interesting pointers.
Budget system to avoid one task hogging the scheduler, used in Tokio the Rust's main IO multithreading runtime.
For IO, I find the documentation from WIndows I/O Completion Ports (IOCP) quite interesting:
So basically they advocate a threadpool waiting on an IOCP (or ACP, Asynchronous Procedure Call https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-queueuserapc) and when any is ready continue processing on an available thread.
On Linux, it seems way messier with epoll, io_uring or even AIO:
From those I get that an efficient multithreading runtime for IO will need:
Rust couple of writeups that are very interesting regarding this:
- https://aturon.github.io/blog/2016/08/11/futures/
- https://aturon.github.io/blog/2016/09/07/futures-design/
- https://jblog.andbit.net/2019/11/10/rust-async-execution/
- https://tmandry.gitlab.io/blog/posts/optimizing-await-1/
In particular it distinguishes between the traditional completion-based and their own poll-based futures with completion-based requiring a buffer for each future and so requiring more memory allocation (which are problematic because it stresses the GC, and lead to memory fragmentation on long running application). In particular the poll approach is attractive because it eases cancellation (don't poll) and since there is no heap indirection for the future, the compiler can do deep optimizations.
Note: this is with the perspective of writing full blown high performance multithreading IO runtime. As you can see there are many design tradeoffs to consider between futures API (completion vs poll based), kernel vs userspace (task budget tracking), OS event primitives and resumable function primitives.
Obviously you can always go the current way which is threadpool + current async and can already give decent performance (Guildenstern and httpbeast uses this).
The service I'm using (http://htmlpreview.github.io/) is apparently facing the kiss of death due to Nim Forum popularity.
Try again when traffic goes does, and the docs should be ok.
That's a lot of knowledge to digest... Let's not steal the work in this thread (sic), but continue the research within Weaver's guild. But cannot resist adding yet another interesting piece here: https://kristoff.it/blog/zig-colorblind-async-await
Httpbeast and Guildenstern are quite different beasts under the hood.
Httbeast runs one selector loop per physical core and handles requests with async/await and non-blocking I/O.
Guildenstern runs one selector loop and spawns new thread for every request, using blocking I/O and no async/await constructs.
In theory Httpbeast fares better when there are few cores, writes may block or there's a lot of "light" requests. Guildenserver should dominate when lots of CPU cores are available, reverse proxy is being used to buffer outgoing writes and requests are "heavy" (requiring lots of CPU work).
There's a simple benchmark against Httpbeast in the repo.
We are talking about performance differences in sub-millisecond scale which don't matter in practical web development - thinking of what performance Django or Rails offer...