nimforum mirror - Zippy update: Zip archive and tarball support + much closer to zlib performance

guzba (orginal) [2020-12-23T02:25:08+01:00] view original

Hey all, wanted to share an update on Zippy now that the library has improved a lot since my original post.

Some highlights:

Added support for opening, iterating over, extracting from and composing new Zip archives (.zip) and tarballs (.tar and .tar.gz). See examples. The _extract examples show extracting the Nim source from a .tar.gz and .zip file and work well which is pretty cool to me.

Much much closer to zlib performance at both compressing and uncompressing (started off 3x slower at uncompressing for example, now 0% to 30% slower but sometimes faster too). This is an ongoing battle. See the repo for benchmarks.

Support for zlib compression levels -2 through 9 (-2: Huffman coding only, -1: default, 0: no compression, 1: best speed .. 9: best compression)

Wrote a PNG loader and writer which uses Zippy internally that lives in Pixie. This PNG writer has been used a lot and is working great, boosting my confidence in Zippy by seeing it work day after day.

All of the above is of course in pure Nim!

Tarballs and Zip archives have been tested between Windows and Mac, with the tar command and the OS utilities. Everything appears compatible and working well in my testing so hopefully it goes the same for anyone that can benefit from these new features. If you notice any issues or have any feedback, let me know here or GitHub.

Araq (orginal) [2020-12-23T07:38:28+01:00] view original

I love your work and we'll be using Zippy inside the Nim compiler soon for the upcoming IC support.

treeform (orginal) [2020-12-23T19:09:38+01:00] view original

This is great! With the speed improvements you are making soon the C/C++ people would have to come to us to get the fastest zlib implementation. Zlib’s inflate and deflate is part of so many file formats (png, zip, woff, minecraft, etc..), many languages such as python include that as part of the standard library.

Having Nim’s HTTP use zippy could be great. Having nimble/choosenim use zippy for tarbals could be great too. Using pure nim would just show power and maturity of the nim's community. And it might be the fastest option.

guzba (orginal) [2020-12-23T20:55:42+01:00] view original

Currently, Zippy is written to work on a seq/string entirely in memory, not on streams.

I did this to keep things simple. I had not written this compression stuff before so I didn't want to make it any harder than it had to be. I do also think in-memory is great for most scenarios (as an analogy, I would expect Nim's readFile is used much more than FileStream for example).

I do think streaming support would be a nice improvement for some scenarios (like very large files), but supporting that would be fairly big-ish undertaking. I don't anticipate working on that in the short term.

As for Zippy vs Snappy, I think my choice would be based on something like this:

Zippy is great for compatibility. HTTP gzip, Zip files, tarballs, PNG, so many things must use zlib to work so you don't really get a choice. However, zlib is slow to compress and uncompress, so I would choose to use a more modern technique if I can get away with it.

Snappy would be that more modern technique I'd prefer if it is an option. Like for example when compressing my own data for transport over UDP or something. Nobody else's code needs to read it. Snappy is drastically faster at both compressing and uncompressing, and is super tiny in terms of code. To me Snappy is an awesome local maxima of good enough compression, fast compressing, fast uncompressing and code complexity.

guzba (orginal) [2020-12-23T22:36:17+01:00] view original

Hm, interesting. Would you be (or could you) read these parts of a file into memory? If so, I don't think you'd necessarily need to make any changes to supersnappy. I only see streams helping in the specific case that you can't fit something into memory.

It sounds like you could read a part of a file into memory, then compress/uncompress that part and write it out using a stream at the file system level (FileStream or similar). I'm not sure what you're trying to do though so I may be totally misunderstanding.

I am open to patches of course but I do have opinions on what I'd want to maintain long term. In my head I like the idea of having an entire-in-memory API (already have it) and adding stream support with a stream-in stream-out API to work in memory-constrained scenarios.

Mirror of forum.nim-lang.org

7281 :: Zippy update: Zip archive and tarball support + much closer to zlib performance