nimforum mirror - Boehm GC

wiffel (orginal) [2015-09-05T21:27:21+02:00] view original

I'm currently testing Nim with the Boehm garbage collector. But wathever I try, I keep getting errors. I created a tiny test program:

import threadpool
{.experimental.}

proc f(id: int) {.thread.} =
  var str = "test(" & $id & ") "

parallel:
  spawn f(1)
  spawn f(2)
f(3)

When I run this with nim c --threads:on --gc:boehm test.nim it crashes with the folowing message:

Exclusion ranges overlap
...
Exclusion ranges overlap
Exclusion ranges overlap
Traceback (most recent call last)
threadpool.nim(300)      slave
test.nim(12)           fWrapper
test.nim(7)            f
mmdisp.nim(108)          allocAtomic
mmdisp.nim(108)          allocAtomic
...
mmdisp.nim(108)          allocAtomic
SIGABRT: Abnormal termination.

Could it be that Boehm GC support in Nim is broken ? Or am I missing something ?

Jehan (orginal) [2015-09-05T22:45:19+02:00] view original

This looks more like a broken installation of the Boehm GC. Basically, the Boehm GC has a routine (GC_exclude_static_roots()) that allows you to specify whether a certain area of memory should not be scanned for GC roots. Nim does NOT use this, however, the Boehm GC uses it internally. The error message that you are getting ("Exclusion ranges overlap") means that the Boehm GC is getting somewhat confused about the memory areas it is telling itself about.

What version of the Boehm GC are you using?

wiffel (orginal) [2015-09-05T22:52:48+02:00] view original

@Jehan: Thanks for the reply.

A broken installation could well be the problem. I'm using Ubuntu with the standard libgc.so.1.0.3. But I'm running it inside a VirtualBox image. Maybe libgc does not like that.

I did test it on native linux machine by now and on that it seems to work fine.

So, it must indeed be a problem with my installation or with the VirtualBox virtualisation.

Jehan (orginal) [2015-09-05T23:17:06+02:00] view original

Huh. I have no idea what version that would be. The current version is 7.4.2, and the major version number has been 7 for years. But yeah, it may be a VirtualBox thing, I'll have to check that myself.

wiffel (orginal) [2015-09-05T23:39:24+02:00] view original

@Jehan: That is version 7.2 (but the lib has version number 1.0.3). I did use the same version on the native machine and on the virtual image. So I guess it must be a VirtualBox thing.

wiffel (orginal) [2015-09-06T00:58:42+02:00] view original

Next problem ...

I know that sharing of data between threads can be problematic, but I was under the impression that using createShared in combination with the Boehm collector should work. (I'm not sure if that is a correct assumption).

So I made a small test program. It creates a very small linked list that is shared between the main thread and two other threads. Both threads grab the second node and change it. So they typically have a reference to a node created in the other thread. That is a problematic case for a thread local GC and indeed it is very easy to make this crash. I think it might work with the Boehm collector, but that also gives an error.

import os, threadpool, locks
{.experimental.}

type
  Node = object
    name : string
    next : ptr Node

var nodeLck : TLock

proc changerLoop(node : ptr Node; thrName: string) {.thread.} =
  for t in 0 .. 300_000:
    acquire(nodeLck)
    let
      oldNode = node.next
      newNode = createShared(Node)
    newNode.name = thrName & "." & $t
    node.next =  newNode
    if t mod 500 == 0:
      echo thrName, ": ", t, " - ", oldNode.name
    release(nodeLck)

var
  a = createShared(Node)
  b = createShared(Node)
a.name = "a"
a.next = b
b.name = "b"
initLock(nodeLck)
parallel:
  spawn changerLoop(a, "T1")
  sleep(123)
  spawn changerLoop(a, "T2")

Running this code using nim c -r --threads:on --gc:boehm par.nim gives the followin error:

...
T1: 1000 - T1.999
T1: 1500 - T1.1499
T1: 2000 - T1.1999
Collecting from unknown thread
Traceback (most recent call last)
threadpool.nim(300)      slave
par.nim(31)              changerLoopWrapper
par.nim(17)              changerLoop
mmdisp.nim(108)          allocAtomic
SIGABRT: Abnormal termination.
Error: execution of an external program failed

It looks like the boehm collector has a problem with dealocating over multiple threads ?

Jehan (orginal) [2015-09-06T01:52:21+02:00] view original

First, it looks like there's a problem in that the Boehm GC wasn't properly initialized on platforms other than OS X. I've got a proposed fix here. I'll look at your other code in a bit.

wiffel (orginal) [2015-09-06T02:23:07+02:00] view original

@Jehan: I manually patched my local files based on your proposed fix. That seems to fix the problem of the first example. It also works fine in VirtualBox with that fix :-)

The Collecting from unknown thread problem from the second example is still there. :-(

wiffel (orginal) [2015-09-06T02:59:41+02:00] view original

As a quick and dirty test, I did add the following in the mmdisp.nim file:

proc boehmAllowRegisterThreads {.importc: "GC_allow_register_threads",
                                 dynlib: boehmLib.}

type
  GCStackBase = tuple
    mem_base : pointer # Base of memory stack
    reg_base : pointer # Base of separate register stack

proc boehmGetStackBase(base: ptr GCStackBase): int
     {.importc: "GC_get_stack_base", dynlib: boehmLib.}
proc boehmRegisterMyThread(base: ptr GCStackBase): int
     {.importc: "GC_register_my_thread", dynlib: boehmLib.}

proc boehmRegisterThisThread*() =
  var base : GCStackBase
  discard boehmGetStackBase(addr base)
  discard boehmRegisterMyThread(addr base)

I did add a call to boehmAllowRegisterThreads after the GCinit. I did add a call to boehmRegisterThisThread at the beginning of the thread code in my second example. That seems to do the trick :-) It works fine with that.

I think that the threads should be registered with the Boehm library this way. I'm not sure where it should be called in the Nim code though.

Jehan (orginal) [2015-09-06T03:16:43+02:00] view original

If you use GC_register_my_thread(), you also need to unregister it and call GC_allow_register_threads() to initialize the underlying data structures. While this works, GC_get_stack_base() is not portable. A portable solution is to use GC_pthread_create() in lieu of pthread_create(), and I've updated my pull request accordingly.

wiffel (orginal) [2015-09-06T03:22:53+02:00] view original

@Jehan: Thanks for all of this work.

I hope the pull request gets in soon :-)

Jehan (orginal) [2015-09-06T04:03:52+02:00] view original

Until it's been reviewed and incorporated, you can just download the patch directly and apply it, just append .diff or .patch to the PR URL:


wget https://patch-diff.githubusercontent.com/raw/nim-lang/Nim/pull/3292.patch
git apply 3292.patch

wiffel (orginal) [2015-09-06T10:42:21+02:00] view original

I did apply the patch and have been hammering on the Boehm collector with the ugly code below. That's almost the same code as the previous example, but this time using the normal ref and the parallel / spawn construction.

Everything seems to be working OK.

Thanks again for this patch.

import os, threadpool, locks
{.experimental.}

# nim c -r --threads:on -d:release --gc:boehm test.nim

type
  Node = ref object
    name : string
    next : Node
var
  loopLck : Lock

proc loop(node: var Node; thrName: string) =
  for t in 0 .. 300_000:
    sleep(10)
    acquire(loopLck)
    let
      oldNode = node.next
      newNode = Node(name: thrName & "." & $t)
    node.next = newNode
    if t mod 100 == 0:
      echo thrName, ": ", t,
           " - old node: ", oldNode.name
    release(loopLck)
    var str = " "
    for _ in 0 .. 1_000: str = str & " "

proc main () =
  var
    b = Node(name: "b")
    a = Node(name: "a", next: b)
  initLock(loopLck)
  parallel:
    for ix in 1 .. 8:
      spawn loop(a, "Thr" & $ix)
      sleep(33)

when isMainModule:
  main()

Mirror of forum.nim-lang.org

1612 :: Boehm GC