nimforum mirror - Deadlock using threadpool from a forked process.

erhlee_bird (orginal) [2018-01-02T16:06:59+01:00] view original

Nim Compiler Version 0.17.3 (2017-10-25) [Linux: amd64]

git hash: fa02ffaeba219ca3f259667d5161d30e47bb13e0

active boot switches: -d:release

Code:

import os except sleep
import posix, strutils
import threadpool

when isMainModule:
  proc main() =
    var i = 0
    while true:
      i.inc()
      echo i
      discard sleep(1)
  if fork() == 0:
    echo("Hello from the child!")
    spawn main()
    threadpool.sync()
  else:
    quit(QUITSUCCESS)

I was using https://github.com/OpenSystemsLab/daemonize.nim to try and write a simple daemon.

I ran into a problem where invoking threadpool's spawn from a forked process resulted in a deadlock.

The code above is a simplified example from how I am using daemonize.nim.

strace: Process 146711 attached

futex(0x65dd24, FUTEX_WAIT_PRIVATE, 1, NULL)

Is there any way around this? I couldn't find anything explicitly prohibiting threadpool from being used like this.

luked2 (orginal) [2018-01-02T21:58:18+01:00] view original

I tried running your code under helgrind.


valgrind --tool=helgrind test

It reported a data race:


==24599== ----------------------------------------------------------------
==24599==
==24599== Possible data race during read of size 8 at 0x33B168 by thread #1
==24599== Locks held: none
==24599==    at 0x118490: nimSpawn3 (stdlib_threadpool.c:394)
==24599==    by 0x10948A: NimMainModule (test.c:186)
==24599==    by 0x10943E: NimMain (test.c:165)
==24599==    by 0x10918C: main (test.c:172)
==24599==
==24599== This conflicts with a previous write of size 8 by thread #5
==24599== Locks held: none
==24599==    at 0x11813A: slave_ZLEiMrITYl7xqyEhA5iC1g (stdlib_threadpool.c:281)
==24599==    by 0x1156FB: threadProcWrapDispatch_t0lnloO9aBTBU0elWgvMSlw_2 (stdlib_system.c:4330)
==24599==    by 0x115811: threadProcWrapStackFrame_t0lnloO9aBTBU0elWgvMSlw (stdlib_system.c:4454)
==24599==    by 0x115811: threadProcWrapper_2AvjU29bJvs3FXJIcnmn4Kg (stdlib_system.c:4472)
==24599==    by 0x4C32D06: mythread_wrapper (hg_intercepts.c:389)
==24599==    by 0x5555493: start_thread (pthread_create.c:333)
==24599==    by 0x5853AFE: clone (clone.S:97)
==24599==  Address 0x33b168 is 0 bytes inside data symbol "readyWorker_BT69aUhVoxO4qq9b8EinSVNA"
==24599==
==24599== ----------------------------------------------------------------

Followed by this:


==24703==
==24703== Possible data race during read of size 1 at 0x33C848 by thread #1
==24703== Locks held: none
==24703==    at 0x11849E: selectWorker_BW1ODxSJ4rN4KmfdYS8AFw (stdlib_threadpool.c:369)
==24703==    by 0x11849E: nimSpawn3 (stdlib_threadpool.c:394)
==24703==    by 0x10948A: NimMainModule (test.c:186)
==24703==    by 0x10943E: NimMain (test.c:165)
==24703==    by 0x10918C: main (test.c:172)
==24703==
==24703== This conflicts with a previous write of size 1 by thread #5
==24703== Locks held: none
==24703==    at 0x118130: slave_ZLEiMrITYl7xqyEhA5iC1g (stdlib_threadpool.c:280)
==24703==    by 0x1156FB: threadProcWrapDispatch_t0lnloO9aBTBU0elWgvMSlw_2 (stdlib_system.c:4330)
==24703==    by 0x115811: threadProcWrapStackFrame_t0lnloO9aBTBU0elWgvMSlw (stdlib_system.c:4454)
==24703==    by 0x115811: threadProcWrapper_2AvjU29bJvs3FXJIcnmn4Kg (stdlib_system.c:4472)
==24703==    by 0x4C32D06: mythread_wrapper (hg_intercepts.c:389)
==24703==    by 0x5555493: start_thread (pthread_create.c:333)
==24703==    by 0x5853AFE: clone (clone.S:97)
==24703==  Address 0x33c848 is 4648 bytes inside data symbol "workersData_R5YxoJYCt3PvKeJGhluUDQ"

luked2 (orginal) [2018-01-02T22:06:16+01:00] view original

There's no guarantee that the problem that valgrind is reporting is the cause of your bug of course.

Anyhow, the code in question looks to be this, in threadpool.nim:

proc nimSpawn3(fn: WorkerProc; data: pointer) {.compilerProc.} =
  # implementation of 'spawn' that is used by the code generator.
  while true:
    if selectWorker(readyWorker, fn, data): return

vs this:

proc slave(w: ptr Worker) {.thread.} =
  isSlave = true
  while true:
    when declared(atomicStoreN):
      atomicStoreN(addr(w.ready), true, ATOMIC_SEQ_CST)
    else:
      w.ready = true
    readyWorker = w

erhlee_bird (orginal) [2018-01-03T03:46:49+01:00] view original

Thanks for the helgrind example. I'll have to add that to my toolkit.

https://linux.die.net/man/3/fork

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources...

import threadpool

when isMainModule:
  quit(QUITSUCCESS)

Running strace, I noticed that there were 12 clone syscalls made.

In nim/lib/pure/concurrency/threadpool.nim, setup is called in the main body and creates a worker for each processor.

I think this explains why the mutex is not properly cleared as we are forking from a multi-threaded process that is setup by threadpool.nim.

Araq (orginal) [2018-01-10T08:43:25+01:00] view original

Well? Anything we can do about that?

erhlee_bird (orginal) [2018-01-11T00:31:34+01:00] view original

Couldn't render post #21719.

Mirror of forum.nim-lang.org

3449 :: Deadlock using threadpool from a forked process.