In HashBackup (a Python app), planning a restore requires pre-traversal of all data blocks to be restored to decide how to manage cache during the restore. For multi-TB restores this can take something like 5 minutes, which isn't great.
This small test compares the creation and sorting of a 10M array of tuples in Python and Nim. First is Python:
import random
import time
r = random.randint
a = [(r(0, 1000000),i) for i in xrange(10000000)]
print 'sorting'
t = time.time()
a.sort()
print 'done', time.time()-t
ms:nim jim$ /usr/bin/time -l py sort2.py
sorting
done 29.7032859325
47.61 real 46.82 user 0.76 sys
1389780992 maximum resident set size
374738 page reclaims
408 page faults
11 block input operations
19 block output operations
67 voluntary context switches
127 involuntary context switches
Now Nim:
import random
import algorithm
const
els = 10_000_000
maxval = 1_000_000
type E = object
r: int32
i: int32
type A = object
a: array[els, E]
func cmp(a, b: E): int = cmp(a.r, b.r)
proc main() =
var
a: ref A
new a
for i in 0..high(a.a):
a.a[i].r = int32(rand(maxval))
a.a[i].i = int32(i)
a.a.sort(cmp)
echo a.a[0..<5], " ... ", a.a[^5..^1]
main()
ms:nim jim$ nim c -d:danger sort2
Hint: 32979 LOC; 0.850 sec; 47.047MiB peakmem; Dangerous Release build; proj: /Users/jim/nim/sort2; out: /Users/jim/nim/sort2 [SuccessX]
ms:nim jim$ /usr/bin/time -l ./sort2
@[(r: 0, i: 861925), (r: 0, i: 1018886), (r: 0, i: 1112145), (r: 0, i: 3095791), (r: 0, i: 3127185)] ... @[(r: 1000000, i: 5801839), (r: 1000000, i: 6467043), (r: 1000000, i: 6768282), (r: 1000000, i: 7387067), (r: 1000000, i: 8454119)]
2.28 real 2.23 user 0.04 sys
120729600 maximum resident set size
29483 page reclaims
10 page faults
1 voluntary context switches
16 involuntary context switches
This is a great example of where Nim shines. Not only is it ~21x faster, it also used 11.5x less memory: 120M vs ~1.4G.
In the actual Python app, this huge memory use was a major issue to work around because each unique integer in Python is a 24-byte structure (28 bytes for 64-bit integers). Instead of creating a list of tuples in memory, I had to write the tuples to a file, sort the file, then read the sorted file to create the data structure I actually needed, and it still used around 400MB. I could have maybe used Python's array.array somehow, by bit-shifting 2 32-bit inteers into a 64-bit array element, but arrays can't be sorted. I could have used NumPy, and probably should have, but HashBackup is a static build and it requires baking all 3rd-party code into a customized Python interpreter that is used to build the final static executable. Based on the other, much smaller things I've done this with, adding NumPy would probably be quite a challenge.
Here's the simpler Nim program I would have liked to have been able to write:
import random
import algorithm
const
els = 10_000_000
maxval = 1_000_000
type E = object
r: int32
i: int32
proc main() =
var
a: array[els, E]
for i in 0..high(a):
a[i].r = int32(rand(maxval))
a[i].i = int32(i)
a.sort()
echo a[0..<5], " ... ", a[^5..^1]
main()
I'm a neoNim so maybe something like that is possible. Still, even with a little extra complexity, I was able to put this Nim test together in a few minutes and the performance is great, whereas I spent a day or 2 hacking the Python code to get the memory usage down to 400M. And that's not even mentioning that it was obviously much slower to write a disk file, sort it, and read it back in (so much slower than this Python test program).
Great result for Nim!
I would have liked to have been able to write:
What exactly was your problem?
The array? Well an array in Nim is a value object, it lives on the stack, so there is no pointer indirection involved. But for large arrays the default stack size can be too small. In that case we generally use sequences. Use of sequences are explained in all the fine tutorials, for performance you should create them with right size and avoid add(), use [] instead. My explanation was http://ssalewski.de/nimprogramming.html#_arrays_and_sequences, let me know what is unclear or about wrong grammar, I may continue in winter.
Your "simpler" version can be made to work with a sequence (as Stefan said), and actually using a tuple instead of an object (since they have comparison operators defined):
import random
import algorithm
const
els = 10_000_000
maxval = 1_000_000
type E = tuple
r: int32
i: int32
proc main() =
var a = newSeq[E](els)
for i in 0..high(a):
a[i].r = int32(rand(maxval))
a[i].i = int32(i)
a.sort()
echo a[0..<5], " ... ", a[^5..^1]
main()
Also, if you compile with -d:useMalloc and run under valgrind (I did it with ARC) you can clearly see this:
==30089== Memcheck, a memory error detector
==30089== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30089== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info
==30089== Command: ./b
==30089==
@[(r: 0, i: 861925), (r: 0, i: 1018886), (r: 0, i: 1112145), (r: 0, i: 3095791), (r: 0, i: 3127185)] ... @[(r: 1000000, i: 5801839), (r: 1000000, i: 6467043), (r: 1000000, i: 6768282), (r: 1000000, i: 7387067), (r: 1000000, i: 8454119)]
==30089==
==30089== HEAP SUMMARY:
==30089== in use at exit: 0 bytes in 0 blocks
==30089== total heap usage: 53 allocs, 53 frees, 120,003,383 bytes allocated
==30089==
==30089== All heap blocks were freed -- no leaks are possible
==30089==
==30089== For lists of detected and suppressed errors, rerun with: -s
==30089== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Your version using sequences works well. I wondered how an array version of that would look, so tried using your sequence version but with a ref array, ie:
***************
*** 12,15 ****
proc main() =
! var a = newSeq[E](els)
!
for i in 0..high(a):
--- 12,17 ----
proc main() =
! var
! a: ref array[els, E]
!
! new a
for i in 0..high(a):
ms:nim jim$ nim c sort4
/Users/jim/nim/sort4.nim(17, 19) Error: type mismatch: got <ref array[0..9999999, E]>
but expected one of:
proc high(T: typedesc[SomeFloat]): T:type
first type mismatch at position: 1
required type for T: type SomeFloat
but expression 'a' is of type: ref array[0..9999999, E]
proc high(x: cstring): int
first type mismatch at position: 1
required type for x: cstring
but expression 'a' is of type: ref array[0..9999999, E]
proc high(x: string): int
first type mismatch at position: 1
required type for x: string
but expression 'a' is of type: ref array[0..9999999, E]
proc high[I, T](x: array[I, T]): I
first type mismatch at position: 1
required type for x: array[I, T]
but expression 'a' is of type: ref array[0..9999999, E]
proc high[I, T](x: typedesc[array[I, T]]): I
first type mismatch at position: 1
required type for x: type array[I, T]
but expression 'a' is of type: ref array[0..9999999, E]
proc high[T: Ordinal | enum | range](x: T): T
first type mismatch at position: 1
required type for x: T: Ordinal or enum or range
but expression 'a' is of type: ref array[0..9999999, E]
proc high[T: Ordinal | enum | range](x: typedesc[T]): T
first type mismatch at position: 1
required type for x: type T: Ordinal or enum or range
but expression 'a' is of type: ref array[0..9999999, E]
proc high[T](x: openArray[T]): int
first type mismatch at position: 1
required type for x: openArray[T]
but expression 'a' is of type: ref array[0..9999999, E]
expression: high(a)
I don't understand why high(a) works if the array isn't a ref array, but fails on a ref array. But to continue, I changed high(a) to ..< els. Then it didn't like sort:
/Users/jim/nim/sort4a.nim(20, 4) Error: type mismatch: got <ref array[0..9999999, E]>
but expected one of:
func sort[T](a: var openArray[T]; cmp: proc (x, y: T): int {.closure.};
order = SortOrder.Ascending)
first type mismatch at position: 1
required type for a: var openArray[T]
but expression 'a' is of type: ref array[0..9999999, E]
proc sort[T](a: var openArray[T]; order = SortOrder.Ascending)
first type mismatch at position: 1
required type for a: var openArray[T]
but expression 'a' is of type: ref array[0..9999999, E]
expression: sort(a)
Shouldn't this change from array to ref array + new have worked?ref means that it's a managed reference, so you need to dereference it (see https://nim-lang.org/docs/manual.html#types-reference-and-pointer-types).
import random
import algorithm
const
els = 10_000_000
maxval = 1_000_000
type E = tuple
r: int32
i: int32
proc main() =
var a: ref array[els, E]
new(a)
for i in 0..high(a[]):
a[i].r = int32(rand(maxval))
a[i].i = int32(i)
a[].sort()
echo a[0..<5], " ... ", a[^5..^1]
main()
Thanks, I should have read more before asking and will try to do better on that.
I did see where the experimental implicitDeref pragma would have allowed the a.sort() syntax to work (I think). In the original version, without ref, a.high worked fine too. So I guess with this experimental pragma a.high and high(a) would also deref and work correctly. I'll switch to devel build today and play around with it. Seems like a good thing so that if an array blows through the stack and has to be moved to the heap, existing code referencing it doesn't have to be changed, though the doc comment says this only happens if the first arg is a ref array. Seems like it should happen anytime the array is referenced in a non-pointer context, but maybe that's hard for the compiler to determine.
so that if an array blows through the stack and has to be moved to the heap
Note that array references are rarely used in real life, as there is no benefit over sequences. A seq is an object on the stack with contains a pointer to the actual data, and a array ref is a pointer on the stack that points to the data elements. So performance should be identical. Even performance advantage of a plain array to a seq should be not big -- seq has one indirection, but note when we work with the seq data, for example sorting it, then we load the start address of the data once in a register and never care again about details. Mratsim did a performance comparison array vs seq once, there was no difference.
I just noted that you are the Mr Wilcoxson who did the grammar fixes for my book, thanks again. But I have the feeling that I still have to explain some details better, maybe next winter, currently I am working on a GTK4 book.
Note that array references are rarely used in real life, as there is no benefit over sequences.
this is not true, array references have the benefit that their size is known at compile time. It makes the intention of the code clearer and eliminates potential errors such as forgetting to initalise the seq to the right size.