I decided to try compiling Nim with Cosmopolitan and it worked pretty well!
Cosmopolitan is a C compiler based on GCC that creates fat binaries which run... pretty much anywhere. Mac, Linux, *BSDs, WIndows on both arm64 or amd64. I wrote details about how to compile Nim with it on my blog.
It uses some eldritch horrors of hacks to work. However it's resulted in some pretty impressive projects like lamafile: https://www.theregister.com/2024/04/03/llamafile_performance_gains/
I just wish it had cross platform OpenGL but they haven't ported that (yet).
That's a good question. I haven't run benchmarks but there's no noticeable lag on startup. It has identical boot-time.
The binary has both arm64 and amd64 versions I believe so it shouldn't be too slow. Their C library might have some overhead but from what I've read they've really optimized it.
Turns out most libc's are actually fully of crap accrued over decades.. So writing a new libc gives some interesting optimization opportunities. Both their forking and threading seem more efficient.
Their threading support in particular looks to be very efficient and can handle thousands of threads on Linux:
Then it somehow only uses 40mb of peak resident memory, and according to htop, greenbean's virtual memory usage is 76,652kb. That's for 9,001 threads. Like redbean, greenbean is able to handle hundreds of thousands of requests per second on my Intel Core i9-9900, except (1) greenbean has better shared memory support, (2) it sets up and tears down connections faster, and (3) it lets you experience the joy of using Mike Burrows' *NSYNC library, which is the basis of Cosmopolitan's POSIX synchronization primitives. If you're not familiar with the man, he's the guy who coded Chubby and Altavista, a global search engine which was so efficient it only needed to operate on a single server. But you wouldn't think *NSYNC is as prolific as it is if you're only going off star count.
Wow, their process handling stuff is pretty impressive!
Running atlas install on my Figuro project after everything is cloned, the binaries are warmed up, and I double checked it's the correct binaries, I'm actually getting a speed boost of 17%:
Native Atlas: 4.856, 4.844, 4.601, 4.917, 4.907 => 4.825 ± 0.129 s
Cosmo Atlas: 4.068, 4.057, 4.086, 4.218, 4.165 => 4.119 ± 0.070 s
Atlas is a pretty good test case for this come to think of it since it does a lot of sub-process invocations which apparently Cosmo libc really has optimized.
Maybe I'll going to have to try compiling Nim with it... A 17% compiler speed boost could be kinda nice! ;)
It is kind of its own category - it is dynamic in the sense that it uses its own /usr/bin/ape dynamic loader (analogue to ld.so), but static in the sense that all the code it needs to run is self-contained in the one file. From a performance point of view on Linux, binaries start-up overhead behaves more like a slower ld.so. So, the run time commentary above is more about "once things are up and running" than static vs. dynamic "overhead". For example,
touch foo.nim # empty file
nim c -d:cosmo -d:danger foo
tim /tmp/true /bin/true ./foo
shows
208.0 +- 2.0 μs (AlreadySubtracted)Overhead # glibc-static dash
33.3 +- 3.0 μs /tmp/true # musl-static trivial return 0;
315 +- 11 μs /bin/true # glibc-dynamic, coreutils
554 +- 10 μs ./foo # cosmocc empty prog
So, you know, your program probably / hopefully? does more than 200 μs worth of work and so the implementations of the libc functions may pick better trade-offs for your workload/whatnot.
Thanks for documenting how to do this @elcritch! I tried getting it to work with https://github.com/guzba/mummy, but for some reason the cross compilation doesn't seem to actually run. I am able to compile and run the binary on my Linux x86 machine, but after compiling a fresh binary, not running it, and moving it to my ARM64 Linux machine I get a cannot execute binary file: Exec format error.
Are you able to reproduce this with one of the Mummy examples out of curiosity? I did have to adjust the compilation flags to say -static -fPIE to get it to even compile locally.
I spent far too much time digging into this - going to document a couple of my thoughts.
First of all, yes - I had an issue with my local Cosmopolitan install. I needed to run a couple commands in the readme to prevent my instance from being treated as a Windows binary by wine.
Using cosmopolitan with the tips above to compile something like mummy is actually pretty doable. I did need to patch lib/pure/selectors.nim to rely on poll instead of epoll. Epoll is not implemented by cosmopolitan yet, and there's no way that I could figure out to force Nim to use poll. Setting alternative OSes breaks other parts fo the build.
The main issue came when I tried to compile my main application (which uses debby (sqlite) and curly (curl)). I managed to get the third_party folder from Cosmopolitan building to give me a libsqlite3.a file to statically link against, but I was unable to do the same thing for curl. OpenSSL and a bunch of other dependencies are a pain.
I then tried using the cosmo_dlopen functions by patching lib/system/dyncalls.nim. That allowed my app to finally compile and start, but then it started segfaulting right after a sql query is run. I haven't figured out why yet, and I'll likely leave this as is and continue cross compiling using zigcc.
Seems like there should be a flag to force using poll instead of epoll.
Thanks for the updates! It’s a bit of a nerd snipe.
The lack of curl or ssl libraries to link is a blocker. It might be possible to use bearssl and build it in or similar. It seems like the ssl would need to understand enough of the OS to find the certificate chains. It seems like a problem the cosmo people would’ve worked on.
Seems like there should be a flag to force using poll instead of epoll.
There is one: https://github.com/nim-lang/Nim/blob/c6352ce0ab5fef061b43c8ca960ff7728541b30b/lib/pure/selectors.nim#L345
Maybe it needs better documentation.