Hi,
while trying to optimize bitabs.nim a bidirectional table implementation taken from the compiler, I had to group the key and hcode together because benchmarking showed that the getKey check was faster due to cache locality (25%). Now for that specific benchmark which does no favors to this implementation btw, downcasting the cached hash(val) showed another 5% speedup and 15% reduction in size (a tuple (int64, uint32) takes 16 bytes due to padding). Is this downcasting (as you can see in the links) a good choice or it may cause more hash collisions in other cases?
Thanks for the affirmation.
Maybe making LitId 64 bits as well as the hash could help?
I could do that, but how would it help?
Is this downcasting a Hash (as you can see in the links) a good choice or it is misguided and may cause more hash collisions in other cases?
It seems to be a good choice and I should do the same in my implementation.
Also fwi the benchmark results are
You should also try other JSON benchmarks. But the memory consumption looks too high to me, maybe we need to use SSO based strings inside the BiTable.
but the quality of the hash algorithm itself can be paramount.
exactly it depends how good murmurHash is at producing unique hashes, I intend to find out with a benchmark that produces different combinations from the same letters, see how much truncating worsens the performance.
But the memory consumption looks too high to me
I tried to approximate it and I am 86mb less but very close: https://play.nim-lang.org/#ix=3OXu
maybe we need to use SSO based strings inside the BiTable.
will do that.
A collision for Java can easily be done with data[i]*31 + (data[i+1]+31) = (data[i]+1)*31 + data[i+1]
Nim's murmurhash would be more involved but it's still 32-bit only. There are collision benches here:
For our use-case, hash tables are used to store P2P message topics and generating collisions on 32-bit hash is relatively inexpensive.
For a generic hash table in the standard library, it's important that they have decent worse behavior on collisions so that network applications don't need to reimplement them.
If anyone is still interested, especially @Araq, I have ported it to use SSO strings, internally. The same benchmark result are now:
packedjson2: used Mem: 386.075MiB time: 2.85s
packed sso: used Mem: 308.023MiB time: 3.35s
packed json: used Mem: 94.02MiB time: 2.0s
stdlib json: used Mem: 1.32GiB time: 3.3s
There is a bug that needs investigation with some Strings containing garbage when echoing but I doesn't affect the bench. I had an idea of removing nodes of kind opcodeKeyValuePair in order to trim some overhead, but I don't know if it's feasible unless I try to implement it. Lastly the remaining issue is with duplicate keys stored in the tree which I haven't found a performant way to avoid.
I made up a benchmark that I can win just for fun.
packedjson2: used Mem: 307.755MiB time: 5.68s
packed json: used Mem: 312.02MiB time: 7.90s
stdlib json: used Mem: 1.577GiB time: 17.1s