nimforum mirror - Please help, learning NIM to speed up Python programs, but strange results

epoz (orginal) [2024-06-05T08:21:32+02:00] view original

Dear Nim community,

I am investigating porting some of the speed critical parts of my projects to Nim from Python, for performance reasons. Doing some naive initial tests, the Nim was significantly slower, so I must be doing something wrong. Here is the Nim:


import zip/gzipfiles
import times

echo now()
let filePath = "somelargefile.nt.gz"
let file = newGzFileStream(filePath)
var count = 0

let startTime = cpuTime()
while not file.atEnd:
    var l = file.readLine()
    count = count + 1
    if count mod 1000000 == 0:
        let lps = int(count / int(cpuTime() - startTime))
        echo now(), " ", count, " ", lps

echo cpuTime() - startTime, count
file.close()

and here is the Python:


import time, gzip

start_time = time.time()
count = 0
line = True
with gzip.open('somelargefile.nt.gz') as F:
    while line:
        line = F.readline()
        count += 1
        if count % 1000000 == 0:
            lps = int(count / int(time.time() - start_time))
            print(time.ctime(), count, lps)
end_time = time.time()
print(int(end_time - start_time), count)

Any tips on what I am doing wrong?

PMunch (orginal) [2024-06-05T08:39:55+02:00] view original

Hard to tell without having somelargefile.nt.gz to test with. But some general ideas:

Did you compile with -d:release? Debug builds can be pretty slow

For benchmarking you should really use std/monotimes instead of cpuTime, but I doubt this would change much.

Not sure how optimised the standard library gzip is, you might want to try https://github.com/guzba/zippy instead

epoz (orginal) [2024-06-05T09:02:38+02:00] view original

Thx for the tips, recompiling with -d:release make a big difference, it is then roughly 5 times faster. But then, the Python is still roughly 3 times faster.

This is not a proper benchmark, it is a "quick wins" investigation. Was hoping that using a compiled language would make a giant difference.

Araq (orginal) [2024-06-05T10:14:14+02:00] view original

Any tips on what I am doing wrong?

for l in lines(file) should be slightly faster than your while loop.

Araq (orginal) [2024-06-05T10:16:09+02:00] view original

Was hoping that using a compiled language would make a giant difference.

There is no reason to assume that though since you're mostly using Python's interface to compiled code.

PMunch (orginal) [2024-06-05T11:02:38+02:00] view original

for l in lines(file) should be slightly faster than your while loop.

That actually made more of a difference than I would've expected!

There is no reason to assume that though since you're mostly using Python's interface to compiled code.

To expand a bit on this: Many of Pythons libraries are actually C libraries with a Python interface. This means that as long as you're just interfacing with the library it stays fast. However as soon as you actually start doing work on the Python side, or possibly even shuffle data between libraries, you will notice things starting to slow down. The Nim standard library module for gzip is likely not as optimised as the Python one, hence the slowdown.

I tried with Zippy, doing something similar to what your original code does:

import std/monotimes, times, strutils
import zippy

let filePath = "somelargfile.nt.gz"
var count = 0

let startTime = getMonoTime()
let data = filePath.readFile.uncompress()
for line in data.splitLines:
  count = count + 1
  if count mod 1000000 == 0:
    let lps = count / (getMonoTime() - startTime).inSeconds
    echo now(), " ", count, " ", lps

echo getMonoTime() - startTime, " ", count

And on my test input this ran 2.26 times faster than the Python code (measured using hyperfine) when compiled with -d:release. There are other flags you can play with as well, but that's a bit of an advanced topic. I managed to push it to 2.75x though with some minor flag tweaking.

treeform (orginal) [2024-06-05T23:22:40+02:00] view original

I also want to point out that zippy ( https://github.com/guzba/zippy ) is a pure Nim implementation of deflate, zlib, gzip and zip that is often faster then the usual zlib written in C. I think this is why the Nim implementation is beating python's C implementation.

joppez (orginal) [2024-06-06T01:10:31+02:00] view original

Isn't part of this explained by zip/gzipfiles exposing a streaming interface, while zippy is decompressing the whole file into memory? Sometimes you cannot afford this, so there's a need for both approaches.

nrk (orginal) [2024-06-06T15:35:21+02:00] view original

As @joppez says zippy beating Python above is not very surprising, since it just decompresses the file in a single pass after having read the entire file into memory. Streaming implementations tend to be slower (but of course use less memory, which is kind of the point).

As for why gzipfiles is slower than Python: it uses std/streams, which is difficult to use efficiently. For example, the default readLine implementation (which gzipfiles does not override) looks like:

proc readLine*(s: Stream, line: var string): bool =
  # [...]
  line.setLen(0)
  while true:
    var c = readChar(s)
    if c == '\c':
      c = readChar(s)
      break
    elif c == '\L': break
    elif c == '\0':
      if line.len > 0: break
      else: return false
    line.add(c)
  result = true

readChar just calls readDataImpl with a buffer length of 1; that means gzipfiles will call a function pointer (which then calls gzread) for every single byte of the file. If anything, I'm surprised it even gets close to Python's performance; the reason may be that zlib does some internal buffering, so at least the decompression algorithm isn't invoked for every single call.

cblake (orginal) [2024-06-06T22:32:34+02:00] view original

@epoz - this & related topics come up a lot, but here is a perhaps more informative thread than many: https://forum.nim-lang.org/t/5103 (though maybe you will fall asleep before reading all of it).

@PMunch, I think hyperfine leaves "on the table/floor" much opportunity for precision on the hottest caches path which I try to get at with tim. Might be worth a look. Also, it's only like a few screenfuls of Nim instead of however 6,000 lines of Rust or whatever hyperfine is up to. So, you could import it (or bu/emin, cligen/strUt.fmtUncertain) and perhaps use the ideas for non-command-line timings as well. The CLI tool tim can do most of what hyperfine does except calculate ratios like "2.26X" (without propagating uncertainties which it does not really try to present, AFAICT). I usually do that by hand with @Vindaar's Measuremancer.

epoz (orginal) [2024-06-11T10:49:24+02:00] view original

Thanks for the comments all, very valuable. My use case is reading a large file that does not fit in memory, (circa 45GB compressed) so it has to be streaming.

After reading the file, I then want to do some basic splitting of the lines, and then hash each part with xxhash.

And @cblake thanks for the link to thread 5103, will give it a read.

Mirror of forum.nim-lang.org

11710 :: Please help, learning NIM to speed up Python programs, but strange results