I have this little program where I basically create a new seq with many entries and then reassign a new seq of big size to the same variable. What I want is that the program then doesn't use the memory from that first sequence. But when I look with htop for the memory usage, it show that the program still uses all the memory.
import strutils
type HashTableEntry = object
key: int
value: float
type HashTable = object
table: seq[HashTableEntry]
func newHashTable*(sizeInBytes: int): HashTable =
let numEntries = sizeInBytes div sizeof(HashTableEntry)
result.table = newSeq[HashTableEntry](numEntries)
func getIndex(ht: HashTable, key: int): int =
key mod ht.table.len.int
var hashTable = newHashTable(140_000_000)
var key = readLine(stdin).parseInt
var index = hashTable.getIndex(key)
hashTable.table[index] = HashTableEntry(value: 1.0, key: key)
hashTable = newHashTable(128_000_000)
key = readLine(stdin).parseInt
index = hashTable.getIndex(key)
hashTable.table[index] = HashTableEntry(value: 1.0, key: key)
discard readLine(stdin)
When I run this and enter any number (for example 123) it shows a memory usage of ~260MB even though only ~128MB are accessible (I am compiling with the default GC). How can I make sure, that this program only uses at maximum ~140MB at any time?
I can't test the minimal example I posted here, but the regarding the seemingly non released memory the real code behaves identical on Windows. When I use /usr/bin/time I get "Maximum resident set size (kbytes): 263504".
The program is a chess engine that isn't allowed to use excessively more RAM than a given limit. So I would like to eliminate any situation where my program uses more RAM that I want it to.
large_thing1 = large_thing_2 logically requires that for a brief moment they both exist at the same time.
modify hashtable in-place instead of creating a new one and sinking it over.
I used nim c -d:danger --passC:"-flto" --passL:"-flto -static" --cc:clang --threads:on main.nim, if I add --gc:arc -d:useMalloc it is 4% slower (which is not terrible but I want to make sure that there isn't another solution that doesn't involve a performance penalty).
I didn't know about PGO, I'll try that.
(The real application is this: https://gitlab.com/tsoj/Nalwald)
I tried out PGO it gives about 2% improvements when not using --gc:arc -d:useMalloc, when using it, it becomes even slower, almost 20%.
@juancarlospaco That's nice, now my compiling commands are slightly easier to read :)
You may profit from profiling the code in the various modes, but order 5..10% differences (or more) can easily be code layout noise. That is easily perturbed as you add code/logic. Maybe Nalwald is toward the very end of its dev cycle..
Too bad the PGO didn't just work. I've seen it make things run slower on occasion, too. The job gcc/backend compiler is trying to do is really quite hard and also sometimes hard to steer. You might get quite different answers on AMD vs Intel vs. Intel 3 generations back, etc.
My advice/just my opinion would be that if you get the memory conservation properties that you want to not worry about a 10-20% perf delta..at least not until very, very final stage everything and after testing on true deployment targets. I mean, maybe you have and are at that stage, but it seemed like good advice to mention if you have not/are not. :-)
If you are using --gc:arc your top-level code should be wrapped in a main proc for better optimizations.
That applies much moreso for the older GCs than it does for --gc:arc.