Nim Compiler Version 0.17.3 (2017-10-25) [Linux: amd64]
Copyright (c) 2006-2017 by Andreas Rumpf
git hash: fa02ffaeba219ca3f259667d5161d30e47bb13e0
active boot switches: -d:release
Code:
import os except sleep
import posix, strutils
import threadpool
when isMainModule:
proc main() =
var i = 0
while true:
i.inc()
echo i
discard sleep(1)
if fork() == 0:
echo("Hello from the child!")
spawn main()
threadpool.sync()
else:
quit(QUITSUCCESS)
I was using https://github.com/OpenSystemsLab/daemonize.nim to try and write a simple daemon.
I ran into a problem where invoking threadpool's spawn from a forked process resulted in a deadlock.
The code above is a simplified example from how I am using daemonize.nim.
strace: Process 146711 attached
futex(0x65dd24, FUTEX_WAIT_PRIVATE, 1, NULL)
Is there any way around this? I couldn't find anything explicitly prohibiting threadpool from being used like this.
I tried running your code under helgrind.
valgrind --tool=helgrind test
It reported a data race:
==24599== ----------------------------------------------------------------
==24599==
==24599== Possible data race during read of size 8 at 0x33B168 by thread #1
==24599== Locks held: none
==24599== at 0x118490: nimSpawn3 (stdlib_threadpool.c:394)
==24599== by 0x10948A: NimMainModule (test.c:186)
==24599== by 0x10943E: NimMain (test.c:165)
==24599== by 0x10918C: main (test.c:172)
==24599==
==24599== This conflicts with a previous write of size 8 by thread #5
==24599== Locks held: none
==24599== at 0x11813A: slave_ZLEiMrITYl7xqyEhA5iC1g (stdlib_threadpool.c:281)
==24599== by 0x1156FB: threadProcWrapDispatch_t0lnloO9aBTBU0elWgvMSlw_2 (stdlib_system.c:4330)
==24599== by 0x115811: threadProcWrapStackFrame_t0lnloO9aBTBU0elWgvMSlw (stdlib_system.c:4454)
==24599== by 0x115811: threadProcWrapper_2AvjU29bJvs3FXJIcnmn4Kg (stdlib_system.c:4472)
==24599== by 0x4C32D06: mythread_wrapper (hg_intercepts.c:389)
==24599== by 0x5555493: start_thread (pthread_create.c:333)
==24599== by 0x5853AFE: clone (clone.S:97)
==24599== Address 0x33b168 is 0 bytes inside data symbol "readyWorker_BT69aUhVoxO4qq9b8EinSVNA"
==24599==
==24599== ----------------------------------------------------------------
Followed by this:
==24703==
==24703== Possible data race during read of size 1 at 0x33C848 by thread #1
==24703== Locks held: none
==24703== at 0x11849E: selectWorker_BW1ODxSJ4rN4KmfdYS8AFw (stdlib_threadpool.c:369)
==24703== by 0x11849E: nimSpawn3 (stdlib_threadpool.c:394)
==24703== by 0x10948A: NimMainModule (test.c:186)
==24703== by 0x10943E: NimMain (test.c:165)
==24703== by 0x10918C: main (test.c:172)
==24703==
==24703== This conflicts with a previous write of size 1 by thread #5
==24703== Locks held: none
==24703== at 0x118130: slave_ZLEiMrITYl7xqyEhA5iC1g (stdlib_threadpool.c:280)
==24703== by 0x1156FB: threadProcWrapDispatch_t0lnloO9aBTBU0elWgvMSlw_2 (stdlib_system.c:4330)
==24703== by 0x115811: threadProcWrapStackFrame_t0lnloO9aBTBU0elWgvMSlw (stdlib_system.c:4454)
==24703== by 0x115811: threadProcWrapper_2AvjU29bJvs3FXJIcnmn4Kg (stdlib_system.c:4472)
==24703== by 0x4C32D06: mythread_wrapper (hg_intercepts.c:389)
==24703== by 0x5555493: start_thread (pthread_create.c:333)
==24703== by 0x5853AFE: clone (clone.S:97)
==24703== Address 0x33c848 is 4648 bytes inside data symbol "workersData_R5YxoJYCt3PvKeJGhluUDQ"
There's no guarantee that the problem that valgrind is reporting is the cause of your bug of course.
Anyhow, the code in question looks to be this, in threadpool.nim:
proc nimSpawn3(fn: WorkerProc; data: pointer) {.compilerProc.} =
# implementation of 'spawn' that is used by the code generator.
while true:
if selectWorker(readyWorker, fn, data): return
vs this:
proc slave(w: ptr Worker) {.thread.} =
isSlave = true
while true:
when declared(atomicStoreN):
atomicStoreN(addr(w.ready), true, ATOMIC_SEQ_CST)
else:
w.ready = true
readyWorker = w
Thanks for the helgrind example. I'll have to add that to my toolkit.
https://linux.die.net/man/3/fork
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources...
import threadpool
when isMainModule:
quit(QUITSUCCESS)
Running strace, I noticed that there were 12 clone syscalls made.
In nim/lib/pure/concurrency/threadpool.nim, setup is called in the main body and creates a worker for each processor.
I think this explains why the mutex is not properly cleared as we are forking from a multi-threaded process that is setup by threadpool.nim.