Hello Nim users!
Just discovered Nim and have been playing with it.
Can you tell me what I am doing wrong here? Because Python seems to be much faster than Nim with this code?
This python code is executed to a text file with 3.6MB in size, some 43000 lines. (sorry, can not publish the file)
import sys
import time;
# Usage: python test.py <filename.txt> "<what is replaced>" "<replaced with this>" """
ms_1 = time.time()*1000.0
sFile, sFind, sReplaced = sys.argv[1], sys.argv[2], sys.argv[3]
fp = open(sFile.replace(".", "_new."), "w")
sFind = sFind.replace("\"", "")
for sLine in open(sFile):
if len(sLine) > 2:
if sLine.find(sFind) > -1:
fp.write(sLine.replace(sFind, sReplaced.replace("\"", "")))
else:
fp.write(sLine)
else:
fp.write(sLine)
print "\nTook: " , (time.time()*1000.0 - ms_1), " ms"
vs. the same code in Nim:
import os, times, strutils
#Compile: nim --passc:-flto --opt:size c test.nim
let lstParams = commandLineParams()
var
flDurat: float = 0.0
sFind: string = lstParams[1]
sLine: string = ""
let
sFile: string = lstParams[0]
sReplaced: string = lstParams[2]
flTime = cpuTime()
let f2 = open(sFile.replace(".", "_new."), fmWrite)
sFind = sFind.replace("\"", "")
let f = open(sFile)
while f.readLine(sLine):
if len(sLine) > 2:
if sLine.find(sFind) > -1:
f2.writeLine(sLine.replace(sFind, sReplaced.replace("\"", "")))
else:
f2.writeLine(sLine)
else:
f2.writeLine(sLine)
flDurat = (cpuTime() - flTime)
close(f)
echo "\nReplace took: ", flDurat, " s" # --> 0.249 s
==> results:
Python: Took: 78.0 ms
Nim: Replace took: 0.24 s
So Python seems to be 3 x faster?
Same result if I use getTime().nanosecond for calculation.
I have Nim Compiler Version 1.0.6 [Windows: amd64]
#Compile: nim --passc:-flto --opt:size c test.nim
Use -d:release and try again, please.
Oh! Sorry, I added the -d:release, but when compiling with only that flag, I got:
0.085 s
So pretty much as fast as python, although I assumed would be still faster?
You have to be aware that using strings this way is always going to be somewhat inefficient, since each replace call will make a copy!
Of those especially in the following:
f2.writeLine(sLine.replace(sFind, sReplaced.replace("\"", "")))
the sReplaced.replace("\", "") seems unnecessary. Why not perform the replacement when defining sReplaced above? Since it doesn't seem to depend on the current line, it's going to be the same either way.
Also, as far as I can tell, the whole find seems unnecessary too. If replace cannot find the string sFind no replacement will take place. So you can just replace:
if sLine.find(sFind) > -1:
f2.writeLine(sLine.replace(sFind, sReplaced.replace("\"", "")))
else:
f2.writeLine(sLine)
by
f2.writeLine(sLine.replace(sFind, sReplaced)) # with `sReplaced` changed as above
Especially given that the substring seems to be found in 1/4 of the cases, I imagine this should be faster. The little overhead of replace over find shouldn't matter in that case.
To be fair, both things also apply to the Python code.
still at 0.09 s
with:
var
flDurat: float = 0.0
sFind: string = seqParams[1].replace("\"", "")
sLine: string = ""
let
sFile: string = seqParams[0]
sReplaced: string = seqParams[2].replace("\"", "")
flTime = cpuTime()
let f2 = open(sFile.replace(".", "_new."), fmWrite)
let f = open(sFile)
while f.readLine(sLine):
f2.writeLine(sLine.replace(sFind, sReplaced))
ps. this forum answers veeery slow at times, takes a whole minute to open up a page
As far as I know such simple string manipulations are actually pretty fast in python. So don't expect an amazing speed improvement over python if your code is this simple.
In more "real world" examples you'll see Nim outperforming Python.
Ok, I guess the string manipulations in Python are implemented with C as well, and there is not much of lack from dynamic nature of Python in this case.
I checked once more after all the modifications to both codes, with 100% same replaced lines, here's the results:
Nim: 0.085 s
Python: 70.0 ms
Python code:
import sys
import time;
if "-h" in sys.argv or "-help" in sys.argv:
print """\n\nUsage: ..." """
print "--> new file is created, e.g: <filename>_new.txt\n\n"
sys.exit(0)
ms_1 = time.time() * 1000.0
sFile, sFind, sReplaced = sys.argv[1], \
sys.argv[2].replace("\"", ""), \
sys.argv[3].replace("\"", "")
fp = open(sFile.replace(".", "_new."), "w")
for sLine in open(sFile):
fp.write(sLine.replace(sFind, sReplaced))
print "\nTook: " , (time.time() * 1000.0 - ms_1), " ms"
Nim:
import os, times, strutils
let seqParams = commandLineParams()
if "-h" in seqParams or "-help" in seqParams:
echo """\n\nUsage: ..."""
echo "--> new file is created, e.g: <filename>_new.txt\n\n"
quit(0)
var
flDurat: float = 0.0
sLine: string = ""
let
sFile: string = seqParams[0]
sFind: string = seqParams[1].replace("\"", "")
sReplaced: string = seqParams[2].replace("\"", "")
flTime = cpuTime()
let f2 = open(sFile.replace(".", "_new."), fmWrite)
let f = open(sFile)
while f.readLine(sLine):
f2.writeLine(sLine.replace(sFind, sReplaced))
flDurat = (cpuTime() - flTime)
close(f)
echo "\nReplace took: ", flDurat, " s"
Thanks for the responses!
Ok, I guess the string manipulations in Python are implemented with C as well,
Yes, most basic operations in Python are generally coded in C and are optimized well.
But what you can try: Put all your code in a main() proc. Whenever you do benchmarking, you should do that, in some cases it can increase performance drastically.
I'm always saying that but strutils is the biggest performance trap of Nim.
The operations always return a new string which makes it easy to compose and very very heavy on the memory management.
Python and Javascript have heavy optimizations done from string and fast string manipulation in Nim requires one to avoid most strutils proc and instead modify already existing string in-place.
Hopefully a library that compose string transformations in a reasonable manner while being as efficient as low-level in-place mutations of string will be possible once we have openarrays as value (https://github.com/nim-lang/RFCs/issues/178#issuecomment-583830275)