Hey all, wanted to share an update on Zippy now that the library has improved a lot since my original post.
Some highlights:
All of the above is of course in pure Nim!
Tarballs and Zip archives have been tested between Windows and Mac, with the tar command and the OS utilities. Everything appears compatible and working well in my testing so hopefully it goes the same for anyone that can benefit from these new features. If you notice any issues or have any feedback, let me know here or GitHub.
This is great! With the speed improvements you are making soon the C/C++ people would have to come to us to get the fastest zlib implementation. Zlib’s inflate and deflate is part of so many file formats (png, zip, woff, minecraft, etc..), many languages such as python include that as part of the standard library.
Having Nim’s HTTP use zippy could be great. Having nimble/choosenim use zippy for tarbals could be great too. Using pure nim would just show power and maturity of the nim's community. And it might be the fastest option.
Currently, Zippy is written to work on a seq/string entirely in memory, not on streams.
I did this to keep things simple. I had not written this compression stuff before so I didn't want to make it any harder than it had to be. I do also think in-memory is great for most scenarios (as an analogy, I would expect Nim's readFile is used much more than FileStream for example).
I do think streaming support would be a nice improvement for some scenarios (like very large files), but supporting that would be fairly big-ish undertaking. I don't anticipate working on that in the short term.
As for Zippy vs Snappy, I think my choice would be based on something like this:
Zippy is great for compatibility. HTTP gzip, Zip files, tarballs, PNG, so many things must use zlib to work so you don't really get a choice. However, zlib is slow to compress and uncompress, so I would choose to use a more modern technique if I can get away with it.
Snappy would be that more modern technique I'd prefer if it is an option. Like for example when compressing my own data for transport over UDP or something. Nobody else's code needs to read it. Snappy is drastically faster at both compressing and uncompressing, and is super tiny in terms of code. To me Snappy is an awesome local maxima of good enough compression, fast compressing, fast uncompressing and code complexity.
Hm, interesting. Would you be (or could you) read these parts of a file into memory? If so, I don't think you'd necessarily need to make any changes to supersnappy. I only see streams helping in the specific case that you can't fit something into memory.
It sounds like you could read a part of a file into memory, then compress/uncompress that part and write it out using a stream at the file system level (FileStream or similar). I'm not sure what you're trying to do though so I may be totally misunderstanding.
I am open to patches of course but I do have opinions on what I'd want to maintain long term. In my head I like the idea of having an entire-in-memory API (already have it) and adding stream support with a stream-in stream-out API to work in memory-constrained scenarios.