nimforum mirror - Need gzopen(filename, mode...) proc with similar behaviour to open(filename, mode...)

enormandeau (orginal) [2018-10-12T17:26:27+02:00] view original

For some applications, it is needed to read large (10-100Go+) compressed files line by line.

Moreover, it is needed to be versatile in letting the program decide if a file is compressed or not (has ".gz" at the end) and decide whether to open it as a File or as a GZFile.

Specifically, being able to write the following function would be very useful and much easier than playing with streams:

proc myopen(filename: string, mode: FileMode=fmRead): File =
  # Look for ".gz" at the end of "filename"
  # to decide if a File or GZFile should be returned
  
  if filename[^3..^1] == ".gz":
    return gzopen(filename)
  
  else:
    return open(filename)

The zip/gzipfiles module offers the possibility to create GZFileStream objects, but I'd argue that having to play with streams is making it harder on new NIM users than it could (should?) be.

Having GZFile objects (that behave like File) and gzopen proc (that behaves like open for files) would make it much easier to deal with this case.

I think this woud be very useful to a lot of data scientists attracted to NIM and these people, who already use Python a lot, are a bit pool of potential NIM enthusiasts.

juancarlospaco (orginal) [2018-10-12T20:43:33+02:00] view original

nimarchive ?.

enormandeau (orginal) [2018-10-12T20:52:14+02:00] view original

Thanks. I looked at nimarchive. I cannot see how this can be a simpler approach than using zip/gzipfiles and Streams.

Do you have sample code using nimarchive to read gzip files line by line?

boia01 (orginal) [2018-10-13T15:29:45+02:00] view original

I put together a GZipInputStream implementation here: https://gist.github.com/aboisvert/c08e63727d0a3c5de53afa04498e9a90

You can use it to read gzip files line-by-line as such:

let filestream = newFileStream(filename, fmRead)
let gzip = newGZipInputStream(filestream)
for s in gzip.lines:
  echo s

Hope it helps.

enormandeau (orginal) [2018-10-14T17:14:58+02:00] view original

That sounds exactly like what I need. I'll try it, thank you.

dom96 (orginal) [2018-10-14T22:29:19+02:00] view original

My untar library has something very similar, so you may wish to use it instead: https://github.com/dom96/untar/blob/master/src/untar/gzip.nim#L33

Mirror of forum.nim-lang.org

4305 :: Need gzopen(filename, mode...) proc with similar behaviour to open(filename, mode...)