Hey all, just wanted to share the result of the work I've done recently here in the hope it helps people. Zippy https://github.com/guzba/zippy/ is new and pure Nim implementation of the deflate compression algorithm and the gzip and zlib data formats.
I hope this will save people a bunch of trouble fighting with wrappers for zlib, etc which I fought with myself (I work a lot Windows which makes this extra bad).
There are a couple simple examples for using Zippy to handle gzip for HTTP client and HTTP server scenarios in the /examples dir of the repo.
This is a fresh implementation so while I have a lot of tests in place and have fuzzed the library it is obviously not as mature as zlib. I'll be continuing to work on it from here.
It's just 1% slower then regular zlib, but in pure nim. This is a big deal. Does not require zlib1.dll or C. zlib had like 20 years of effort put into optimizing it, with nim you were able to achieve nearly same perf nim much shorter time.
You have solving the HTTP gzip'ed payload problem. This lib should be installed by default as part of nim install.
I have run into zlib1.dll not matching windows 32/64 bit before. Its really annoying.
This is really cool. Its obvious what I will be using from now on.
@treeform This lib should be installed by default as part of nim install.
I think fusion is a good place to put this code into.
This is awesome! I will be pushing to use it in choosenim :D
Thank you for working on this.
I think fusion is a good place to put this code into.
FWIW I disagree. Why push to have so much stuff shipping with Nim? Installing Nim packages is trivial for most use cases, if it is not then we should fix that instead of pushing the Nim core team to maintain more packages themselves.
Well but gzip support for our HTTP server/client stuff should work "out of the box". The problem is that we added HTTP support to the stdlib, now we might as well make it work well...
But there is not much to regret here either, both "web of third party libs" (Rust) and "stdlib is big and audited" (Go?, Python) are valid designs. I very much prefer the "stdlib is big" approach as it's simply the much better user experience, all libraries work well together (if not, they'll receive bugfixes), use the same conventions, have been reviewed and are accessible via Nim's documentation system.
@dom96: Why push to have so much stuff shipping with Nim?
TL;DR: New Nim users tend to favor stdlib, fusion over 3-rd party packages.
My perspective is of a new Nim user (started using Nim a month ago). As it was mentioned here by other newcomers https://forum.nim-lang.org/t/7031#44328 I am trying to stay as close as possible to what is officially shipped with Nim (stdlib and fusion). You might ask why? There are several reasons:
It's just 1% slower then regular zlib, but in pure nim.
I didn't try it, but reading guzba's readme table, it's 0-20% slower when compressing, but about 200% slower when decompressing.
It's still great to have a native Nim version, but if that's the speed difference, it's still not time to kick zlib out.
Honest question : how can we expect to build a mature ecosystem if nobody uses external package ?
Package matures we use them and contribute to them.
Many popular packages probably wouldn't be where they are today if nobody contributed but the original author.
Relying only on the stdlib also puts the burden of developing & maintaining the ecosystem on the core team. That's just not sustainable with limited resources.
Relying only on the stdlib also puts the burden of developing & maintaining the ecosystem on the core team. That's just not sustainable with limited resources.
I agree but the core team (or let's better call it the "library" team) can grow too. In the end it's not that important where the .nim files are stored but that they work well together, that somebody reviewed them and that they won't disappear and keep working.
One common github repo is the stdlib itself which is where people seem to try to submit to first, but then too featureful/too hard to maintain gets pushback. Slightly more granular is stdlib+fusion, but I've never been sure that split had much value..Fusion mostly seems a staging area, and has also been much less active than stdlib & nimbleverse in general.
FWIW, most of The Nimbleverse is actually quite fresh/not stale.. Something like 60-70% by package count. I think this is all going back to the maybe not so well resolved "Nim distribution" idea (link for newcomers). The fision package may be a better approach { although I prefer someone the English spelling "fission" :-) }. The general freshness may mean the underlying problem to be solved isn't so bad. Nim has great programmer/project retention/a low Abandonware Quotient.
Anyway, this discussion is maybe getting slightly off track. Zlib (and its HTTP cousins) are very often in prog.lang stdlibs for the "Big Tent" stdlib approaches. On that basis alone, I think this work should be fast tracked to be a new stdlib module, maybe bypassing fusion entirely. (And I say this being no fan of the deflate algo. ZStd is much better. Deflate is just "standard".)
I totally agree with cblake's point, most programming languages have zlib deflate as part of standard library because so many things depend on that, like http and unpacking package zips. Which is usually part of standard library.
I prefer snappy over zstd, and why? Because guzba has the fastest in the world implementation of that in pure nim as well: https://github.com/guzba/supersnappy beating Google's C++ and alternative C implementation!
I would use guzba's zippy when I need compatible compression and guzba's supersnappy when its just between my own code in networking or my own file formats.
Did not know about @guzba's fast snappy. Nice!! He should make a library/tool that can do parallel compression/decompression. lz4 would smoke with parallelism. With parallel decompr, in particular, you could probably get aggregate throughput numbers competitive with DIMM bandwidth, meaning "only" CPU cost and no throughput slowdown, but not quite as good a compression ratio as Zstd. Sometimes that compression ratio takes a huge factor off of how fast the backing store needs to be, like for the data of this thread
Anyway, as with so many things a small toolchest of these is better than any single one...Sounds like we are near 2 out of the ultimate 3 that would satisfy the (most standard, fastest, most compressing-without tons of speed compromise) triple. :-) I'd vote for all 3 in the stdlib since I kind of think of "compress/decompress" as "like basic IO"..super fundamental.
I am now working on approximating zlib's behaviors for its various compression levels (been focused just on the default level until now).
My most recent Zippy release has support for a levels parameter and includes a fast BestSpeed which is using Snappy's algorithm for finding runs that can be copied instead of the slower LZ77. Turns out the Go team did the same thing in their standard lib. The compressed output is of course fully compatible.
I still need to do levels 2 - 9 + lazy matching but progress is progress.
As for the standard lib vs fusion vs just GitHub+nimble discussion:
1:
I'd be very happy if my zlib implementation was acceptable in part or whole for either fusion or the standard lib. I started the project as my own repo simply due to there being no friction, not as a position against the standard lib.
2:
To me, zlib/gzip support is pretty standard library-ey. The deflate + zlib + gzip formats are super stable, old and extremely widespread. Once it works, it kind of shouldn't need to get touched much. It's also not something most people want to mess with. In my past lives I just wanted it to work so I could move on.
I agree that dynamic, new, or opinionated projects are probably not good for the standard library, but I don't think this is troublesome there. The deflate standard is almost as old as I am lol.