nimforum mirror - Nim module like Python multiprocessing?

cdunn2001 (orginal) [2017-03-09T06:20:36+01:00] view original

I have been pulling my hair out over a strange slow-down when using spawn. Things worked fine on Mac, but on Linux my multithreaded Nim program was slower than my multiprocessing Python program.

Both Nim and Python versions are mostly C code, using the exact same C code in fact. When I run everything in the main thread of the a single process, Nim is a little faster than Python, probably because of start-up time. I added micro-second timing to the C code, so I am fairly confident that the slow-down for threaded code is in the actual C code, not in Nim thread setup (deep-copying etc).

The C function takes about 0.03s per call from Nim, single-process and main thread. It takes about the same from Python like that, as well as using the multiprocessing Python module. But as soon as I use Nim threads, the C code takes about .25s for each call.

I tried compiling with -O2 instead of -O3 (which I think is smart anyway). No apparent difference.

I also tried limiting the Nim threadpool to exactly 1 thread (by modifying Nim/lib/pure/concurrency/threadpool.nim:setup()). Again, no difference.

My best guess for the slow-down is a False Sharing, but I guess if someone really wants to dive into this bioinformatics example, I could try to provide a full test-case.

I'm not sure that it's worth the effort. I think we might actually be better off with a Nim version of Python's multiprocessing. It solves so many problems.

Does such a thing exist? Am I crazy to want this? Thoughts?

For reference (nearly up-to-date):

Nim version: https://github.com/cdunn2001/nim-consensus/blob/master/n.nim

Py version: https://github.com/PacificBiosciences/FALCON/blob/master/falcon_kit/mains/consensus.py

Red herring: https://github.com/nim-lang/Nim/issues/5499

Araq (orginal) [2017-03-09T09:25:44+01:00] view original

Does such a thing exist?

No, but osproc plus marshal can give you the building blocks. :-)

Interesting issue.

cdunn2001 (orginal) [2017-03-09T16:07:43+01:00] view original

I've narrowed the problem down to a single line of code: calloc() on 2MB - 20MB, many times. I guess the standard malloc/calloc uses a locking call for large allocations.

So there are probably several solutions:

Re-write the C code in Nim, which uses thread-local allocation by default.

Use a different memory-manager than the Linux/gcc default.

multiprocessing (starting with Araq's suggestion)

This was probably not a case of "False Sharing". Still, I don't quite understand why the same code is fast in the main thread. Wouldn't the large allocation still use a lock?

Jehan (orginal) [2017-03-09T16:40:22+01:00] view original

If you're allocating that much memory, you may be better off using straight up mmap(). That said, it may well be the case that the calloc() implementation simply dispatches to mmap() already for large allocations and it's something in the Linux kernel that's actually the underlying cause.

As for multiprocessing, I'd use fork() instead of osproc. Also, marshal has some limitations in its current incarnation:

It encodes data as JSON, which is slow if the data you are sending is large. I always wanted to write a version that uses a binary format, but never came around to doing that.

There may be slight floating point inaccuracies, as floats are also encoded in ASCII.

Marshalling does not work properly with inheritance.

cdunn2001 (orginal) [2017-03-09T19:43:39+01:00] view original

I'd use msgpack for marshalling. Thanks for the tips though.

Mirror of forum.nim-lang.org

2837 :: Nim module like Python multiprocessing?