This string test uses s.add('x') instead of s = s & x for Nim, and s += 'x' for Python.
ms:nim jim$ cat str1.nim
var
s: string
for i in 0..100_000_000:
s.add('x')
echo len(s)
ms:nim jim$ nim c -d:danger str1
Hint: 14210 LOC; 0.275 sec; 15.977MiB peakmem; Dangerous Release build; proj: /Users/jim/nim/str1; out: /Users\
/jim/nim/str1 [SuccessX]
ms:nim jim$ /usr/bin/time -l ./str1
100000001
0.68 real 0.56 user 0.10 sys
326627328 maximum resident set size
79753 page reclaims
8 page faults
1 voluntary context switches
6 involuntary context switches
ms:nim jim$ cat str1.py
s = ''
for i in xrange(100000000):
s += 'x'
print len(s)
ms:nim jim$ /usr/bin/time -l py str1.py
100000000
20.74 real 20.67 user 0.06 sys
105099264 maximum resident set size
25834 page reclaims
9 involuntary context switches
Nim blows Python out of the water on this, though it uses 326M of RAM to create a 100M string.
Python's memory use is good, only 105M for a 100M string, but it's slow.
For these tests, I'm not so much looking to find the best way to create a 100M string in Nim or Python. I'm comparing the two to find out where there may be large performance differences, hopefully in Nim's favor, and to get a better understanding of how Nim works.
Thanks. I tried that just now:
ms:nim jim$ nim c -d:danger --gc:arc str1
Hint: 11937 LOC; 0.390 sec; 12.988MiB peakmem; Dangerous Release build; proj: /Users/jim/nim/str1; out: /Users\
/jim/nim/str1 [SuccessX]
ms:nim jim$ /usr/bin/time -l ./str1
100000001
0.90 real 0.73 user 0.15 sys
440176640 maximum resident set size
107478 page reclaims
5 page faults
1 voluntary context switches
4 involuntary context switches
Does this need 1.3x?@cumulonimbus - I tried that. Didn't alter the behavior I was seeing.
If this behavior was not always there then my guess is that some arc bug was causing a crash, got fixed, and now the fix causes this. Regardless of whether it was always there or appeared by bug-jello-squishing accident as I theorize, we should probably have a little suite of "memory use regression" tests to prevent stuff like the scenario I described. Such a suite would be a kind of "correctness testing" for deterministic memory management. Could have a "fuzzy/ball park compare".
Maybe we have such already, perhaps informally? If so, we should add this str1 to it. If not, it can be the first test. :-)
The reason I suggest comparing against Python 3 is that Python 2 is no longer supported by the CPython project. Also, by far most of the people who start with Python will use Python 3.
If Python 2 is faster in many string benchmarks that's most likely because the default string type in Python 2 is simpler (just bytes) vs. Python 3 (code points). If you see your data as just bytes and want to compare on these grounds, compare with Python 3's bytes type.
Now, when benchmarking Nim vs. Python, should you use a Python version and/or code style because it's more similar in implementation to Nim or should you use a Python version and/or code style because that's how most people would use Python? :-)
By the way, I think it's similar to the question: When benchmarking Nim, should you use the fastest implementation or the most idiomatic/straightforward implementation? I guess it depends.
@HashBackupJim - newSeqOfCap[T](someLen) also exists and, yes, pre-sizing can help a lot in Nim (and almost any lang that supports it).
Profile-guided-optimization at the gcc level can also help Nim run timings a lot..In this case 1.6x to 1.9x for various gc modes. https://forum.nim-lang.org/t/6295 explains how. LTO also helps since most of the boost of PGO is probably from well chosen inlining.
@sschwartzer - not only string benchmarks...Interpreter start-up/etc. Anyway, this isn't a Python forum, and benchmarking "always depends". :-) :-)
Someone else should reproduce my --gc:arc uses more memory than gc:none for the original str1.nim or one with a main() (or both). I think this kicking the tires has probably uncovered a real problem.
I think this kicking the tires has probably uncovered a real problem.
Indeed, there is a high priority bug lurking here, please keep investigating!
One other Nim-level thing I can say is that things work as expected for seq[int] of the same memory scale (100MB). I.e.,
proc main() =
var s: seq[int]
for i in 0..12_500_000: s.add 1
echo len(s)
main()
produces a memory report (using /usr/bin/time on Linux) like:
187MB seq2-arc
250MB seq2-default
250MB seq2-msweep
265MB seq2-boehm
300MB seq2-none
So, this problem is only for Nim string. Indeed, if one changes string to seq[char] in the original example, usage goes down to 139MB, roughly what one would expect for a 3/2 growth policy.I was mistaken. I was compiling my seq test with -d:useMalloc which fixes the problem. Sorry..fiddling with too many knobs.
string and seq[char] behave identically with gc:arc and both get fixed (139MB) with --gc:arc -d:useMalloc. Other GCs (including none) still beat gc:arc-without-useMalloc on string. However, other gc's spend more memory (like 420MB) than gc:arc on seq[char]. So, whatever is going on, seq is actually worse than string, not better. { But also a side note for @HashBackupJim to try -d:useMalloc with --gc:arc. }
At this point, I should raise an issue over at Github. I'll link it back here.