Dear Nim community,
I am investigating porting some of the speed critical parts of my projects to Nim from Python, for performance reasons. Doing some naive initial tests, the Nim was significantly slower, so I must be doing something wrong. Here is the Nim:
import zip/gzipfiles
import times
echo now()
let filePath = "somelargefile.nt.gz"
let file = newGzFileStream(filePath)
var count = 0
let startTime = cpuTime()
while not file.atEnd:
var l = file.readLine()
count = count + 1
if count mod 1000000 == 0:
let lps = int(count / int(cpuTime() - startTime))
echo now(), " ", count, " ", lps
echo cpuTime() - startTime, count
file.close()
and here is the Python:
import time, gzip
start_time = time.time()
count = 0
line = True
with gzip.open('somelargefile.nt.gz') as F:
while line:
line = F.readline()
count += 1
if count % 1000000 == 0:
lps = int(count / int(time.time() - start_time))
print(time.ctime(), count, lps)
end_time = time.time()
print(int(end_time - start_time), count)
Any tips on what I am doing wrong?
Thx for the tips, recompiling with -d:release make a big difference, it is then roughly 5 times faster. But then, the Python is still roughly 3 times faster.
This is not a proper benchmark, it is a "quick wins" investigation. Was hoping that using a compiled language would make a giant difference.
Any tips on what I am doing wrong?
for l in lines(file) should be slightly faster than your while loop.
Was hoping that using a compiled language would make a giant difference.
There is no reason to assume that though since you're mostly using Python's interface to compiled code.
for l in lines(file) should be slightly faster than your while loop.
That actually made more of a difference than I would've expected!
There is no reason to assume that though since you're mostly using Python's interface to compiled code.
To expand a bit on this: Many of Pythons libraries are actually C libraries with a Python interface. This means that as long as you're just interfacing with the library it stays fast. However as soon as you actually start doing work on the Python side, or possibly even shuffle data between libraries, you will notice things starting to slow down. The Nim standard library module for gzip is likely not as optimised as the Python one, hence the slowdown.
I tried with Zippy, doing something similar to what your original code does:
import std/monotimes, times, strutils
import zippy
let filePath = "somelargfile.nt.gz"
var count = 0
let startTime = getMonoTime()
let data = filePath.readFile.uncompress()
for line in data.splitLines:
count = count + 1
if count mod 1000000 == 0:
let lps = count / (getMonoTime() - startTime).inSeconds
echo now(), " ", count, " ", lps
echo getMonoTime() - startTime, " ", count
And on my test input this ran 2.26 times faster than the Python code (measured using hyperfine) when compiled with -d:release. There are other flags you can play with as well, but that's a bit of an advanced topic. I managed to push it to 2.75x though with some minor flag tweaking.
As @joppez says zippy beating Python above is not very surprising, since it just decompresses the file in a single pass after having read the entire file into memory. Streaming implementations tend to be slower (but of course use less memory, which is kind of the point).
As for why gzipfiles is slower than Python: it uses std/streams, which is difficult to use efficiently. For example, the default readLine implementation (which gzipfiles does not override) looks like:
proc readLine*(s: Stream, line: var string): bool =
# [...]
line.setLen(0)
while true:
var c = readChar(s)
if c == '\c':
c = readChar(s)
break
elif c == '\L': break
elif c == '\0':
if line.len > 0: break
else: return false
line.add(c)
result = true
readChar just calls readDataImpl with a buffer length of 1; that means gzipfiles will call a function pointer (which then calls gzread) for every single byte of the file. If anything, I'm surprised it even gets close to Python's performance; the reason may be that zlib does some internal buffering, so at least the decompression algorithm isn't invoked for every single call.
@epoz - this & related topics come up a lot, but here is a perhaps more informative thread than many: https://forum.nim-lang.org/t/5103 (though maybe you will fall asleep before reading all of it).
@PMunch, I think hyperfine leaves "on the table/floor" much opportunity for precision on the hottest caches path which I try to get at with tim. Might be worth a look. Also, it's only like a few screenfuls of Nim instead of however 6,000 lines of Rust or whatever hyperfine is up to. So, you could import it (or bu/emin, cligen/strUt.fmtUncertain) and perhaps use the ideas for non-command-line timings as well. The CLI tool tim can do most of what hyperfine does except calculate ratios like "2.26X" (without propagating uncertainties which it does not really try to present, AFAICT). I usually do that by hand with @Vindaar's Measuremancer.
Thanks for the comments all, very valuable. My use case is reading a large file that does not fit in memory, (circa 45GB compressed) so it has to be streaming.
After reading the file, I then want to do some basic splitting of the lines, and then hash each part with xxhash.
And @cblake thanks for the link to thread 5103, will give it a read.