nimforum mirror - Threads and copying

andrea (orginal) [2015-02-06T16:56:15+01:00] view original

I have a structure that is essentially a pointer to some chunk of memory and a couple of metadata:

type
  UncheckedArray*{.unchecked.}[A] = array[1, A]
  Blob* = object
    size: int
    step: int
    data: ptr UncheckedArray[uint8]

I am producing a few such structures inside threads and I want to collect them together and make something with them after that. I have a procedure

proc sortBlock(source: string, size, rowSize, piece, max: int): Blob =
  ...

and what I am doing is

var blobs = newSeq[Blob]()
parallel:
  for i in 0 .. < cores:
    let blob = spawn sortBlock(source, size = size, rowSize = L, piece = i, max = cores)
    blobs.add(blob)

Now, here is what is strange: before returning from each thread, I echo the size of each blob and I have some numbers; after the parallel block I echo the size of each blob and they are all 0!

I am trying to understand the documentation about parallel and spawn but it is not very clear. Apparently, deepCopy is called for data that is sent across threads.

So, I have tried to override deepCopy for Blob, but the compiler complains that it is not a ref or ptr type. Fair enough; so I tried to override it for ptr UncheckedArray[uint8], just to check if it is called, and I got

Error: cannot bind 'deepCopy' to: UncheckedArray

In any case, I would expect a deep copy to be possibly slow, but not to lose integer values.

Is there some more detailed explanation to understand what is really going on during spawn (which locks, if any, are acquired; which copies, if any, are performed; what happens with manually allocated memory, and so on)?

andrea (orginal) [2015-02-06T18:12:20+01:00] view original

I reply to myself: apparently the following does work

var blobs = newSeq[Blob](cores)
parallel:
  for i in 0 .. blobs.high:
    blobs[i] = spawn sortBlock(source, target, size = size, rowSize = L, piece = i, max = cores)

I am not sure what the difference is, but it seems that the assignment is treated specially. That is, the thing I posted above runs like follows:

parallel:
  for i in 0 .. < cores:
    let blob = spawn sortBlock(source, size = size, rowSize = L, piece = i, max = cores) // here blob is initialized empty
    blobs.add(blob) // the control flow goes on here
    // when finally spawn has done its task, blob is correctly assigned, but it is too late

It seems that the support for dataflow variables - which I understand à la Oz - is only partial, and the dependence of the line blobs.add(blob) is not treated correctly.

I would still be interested in understanding what happens under the covers with spawn and parallel, especially with regards to:

what copies are performed (is this a language feature? something implemented cleeverly in the language itself? what about non-managed pointers?)

how memory is initialized (I understand that threads have local heaps - what about the shread heap?)

what locks are acquired, and if it is possible to control this

what check the compiler performs to ensure memory safety (the disjoint check is only mentioned but never explained in the manual)

Araq (orginal) [2015-02-06T18:48:10+01:00] view original

Well sorry, but this is not productive. Please give us a full example that we can compile and run and then we can answer concrete questions. For instance, obviously there is some form of locking in the 'parallel' statement as it produces an implicit barrier which relies on locking to work. But it shouldn't cause a slowdown of a factor of 4! So either the parallel statement is buggy or your code is, but we cannot find out without code that we can run.

andrea (orginal) [2015-02-06T19:01:13+01:00] view original

I am sorry I was not clear: I do not have any slowdown right now, since I managed to avoid the sharing that was causing it. I really would like to share the full example, but it relies on proprietary C libraries we have bought, so it is a little hard to do that. If necessary, I can take some time to create an example in pure Nim, but it takes a while.

So, I do not have concrete problems right now. Still, I would like to understand more about the threading model of Nim - if anything, to use it with more confidence. I have read everything I have found, but I am still confused about some things.

I apologize if I have been insistent about this topic: I did not want to abuse of people's time and kindness.

andrea (orginal) [2015-02-06T19:13:04+01:00] view original

Anyway, I can add an example of the surprising (for me) behaviour of dataflow variables

import threadpool

type Foo = object
  bar: int

proc slow_fib(n: int): int =
  if n <= 1: n
  else: slow_fib(n - 1) + slow_fib(n - 2)

proc makeFoo(i: int): Foo =
  return Foo(bar: slow_fib(30))

var foos = newSeq[Foo]()
parallel:
  for i in 0 .. 3:
    let foo = spawn makeFoo(i)
    foos.add(foo)
for foo in foos:
  echo foo.bar

I would expect to have printed 832040 four times, but instead I get four 0. If I change the above to

var foos = newSeq[Foo](4)
parallel:
  for i in 0 .. foos.high:
    foos[i] = spawn makeFoo(i)

everything works as expected.

When using dataflow variables in, say, Oz, the line foos.add(foo) would not be executed until foo has been assigned from the thread. Now, this is not inherently right or wrong - now that I understand this, I can pay attention. But this prompted me to try understanding better what goes on behind the scenes, to avoid such misunderstandings in the future

OderWat (orginal) [2015-02-10T23:03:05+01:00] view original

I tried this too and also don't understand what is going on! I feel like @andrea has a resonable question here!

I made this code which runs on my system such that the test1() returns 55 always while the test2() returns 55 and 0 intermixed (depending on cpu load). You may need to modify slow_fib(10) to higher or lower values to get 0 or mixed results.

It looks like that the test1() version includes some "magical" info about that foos[i] has to wait until the thread completes when accessed.

While the assignments in the let foo = spawn... / foos.add(foo) case destroys this "extra" info.

import threadpool

type Foo = object
    bar: int

proc slow_fib(n: int): int =
    if n <= 1: n
    else: slow_fib(n - 1) + slow_fib(n - 2)

proc makeFoo(): Foo =
    result.bar=slow_fib(10)

proc test1() =
    var foos = newSeq[Foo](4)
    
    parallel:
        for i in 0 .. foos.high:
            foos[i] = spawn makeFoo()
    
    echo "test1:"
    for foo in foos:
        echo foo.bar


proc test2() =
    var foos = newSeq[Foo]()
    
    parallel:
        for i in 0 .. 3:
            let foo = spawn makeFoo()
            foos.add(foo)
    
    echo "test2:"
    for foo in foos:
        echo foo.bar

test1()
test2()

Output:


test1:
55
55
55
55
test2:
55
0
0
55

Araq (orginal) [2015-02-10T23:28:00+01:00] view original

This is simply a missing dependency check in the 'parallel' statement that I didn't find the time to implement yet (there is a disabled test case for it). Nothing mysterious at all afaict. For the non-array indexing versions you have subtle but wrong "read after write" dependencies. The compiler should complain about the code, but doesn't yet.

andrea (orginal) [2015-02-11T09:26:40+01:00] view original

@Araq Sorry, I did not fully understand your last response. When the missing dependency check is added, the output will be:

that both tests print fib(30), or

that the second test does not compile because of the statement dependency?

Araq (orginal) [2015-02-11T09:41:55+01:00] view original

that the second test does not compile because of the statement dependency?

Exactly.

Mirror of forum.nim-lang.org

839 :: Threads and copying