Team,
A client of mine needs a set of highly specialised algorithms that arrive at a variety of inferences based on a big graph. The said graph is multi-mode, and has millions of vertices and tens of millions of edges, currently. As we infer secondary and tertiary transitive associations (with appropriate weights), the number of edges is likely to increase by an order of magnitude.
Some of the algorithms involve numerical computations of descriptive nature, while others perform classifications and suggest probabilistic interpretations of specific associations.
My initial prototype in Java has developed memory issues rather quickly. I tried Haskell, but it appears as if management of large graphs is still a challenge in purely functional languages!
Is Nimrod a good candidate? Nimrod's low memory overhead looks attractive. How does Nimrod's GC behave with the graph scaling to several (tens of) gigabytes of memory? Do we have any precedents? Thanks.
-- 0|0
Well in terms of performance, Nimrod is without a doubt ideal. My benchmarks (some of which are non-trivial) show that, trivial Nimrod often outperforms optimized C++ version (both with Clang). It also does this with significantly fewer lines, about half is common.
Now memory-wise, it depends. If you use the GC, you can definitely get some problems, but definitely not even close to Java. Go read up on it, you can use different GCs, although the default is a real-time capable deferred reference counter with a cycle collector. It's blazing fast, and has very little memory inflation. The scaling to several gigabytes of memory is far better than conventional GCs as used in C# and Java, orders of magnitudes I'd assume (although this is pure speculation). Because it's deferred reference counting, the GC heap is only scanned when doing cycle collections. And even then, if you know you have no cyclic references, you can disable that cycle collector completely, which means you never do a full-heap sweap! Which means the size of the heap has virtually no influence on the GC.
So do try to program with the GC enabled, and if it's too slow, switch to the real-time version. But if you have some serious memory constraints and find yourself exceeding acceptable limits, (profile and) modify the code to reduce the footprint using memory management where needed.
Note, I'm still no expert on Nimrod. My main languages are C++ and Ada, and Nimrod is simply my favorite language that I use for a lot of hobby dev.