type
UncheckedArray*{.unchecked.}[A] = array[1, A]
Blob* = object
size: int
step: int
data: ptr UncheckedArray[uint8]
I am producing a few such structures inside threads and I want to collect them together and make something with them after that. I have a procedure
proc sortBlock(source: string, size, rowSize, piece, max: int): Blob =
...
and what I am doing is
var blobs = newSeq[Blob]()
parallel:
for i in 0 .. < cores:
let blob = spawn sortBlock(source, size = size, rowSize = L, piece = i, max = cores)
blobs.add(blob)
Now, here is what is strange: before returning from each thread, I echo the size of each blob and I have some numbers; after the parallel block I echo the size of each blob and they are all 0!
I am trying to understand the documentation about parallel and spawn but it is not very clear. Apparently, deepCopy is called for data that is sent across threads.
So, I have tried to override deepCopy for Blob, but the compiler complains that it is not a ref or ptr type. Fair enough; so I tried to override it for ptr UncheckedArray[uint8], just to check if it is called, and I got
Error: cannot bind 'deepCopy' to: UncheckedArray
In any case, I would expect a deep copy to be possibly slow, but not to lose integer values.
Is there some more detailed explanation to understand what is really going on during spawn (which locks, if any, are acquired; which copies, if any, are performed; what happens with manually allocated memory, and so on)?
var blobs = newSeq[Blob](cores)
parallel:
for i in 0 .. blobs.high:
blobs[i] = spawn sortBlock(source, target, size = size, rowSize = L, piece = i, max = cores)
I am not sure what the difference is, but it seems that the assignment is treated specially. That is, the thing I posted above runs like follows:
parallel:
for i in 0 .. < cores:
let blob = spawn sortBlock(source, size = size, rowSize = L, piece = i, max = cores) // here blob is initialized empty
blobs.add(blob) // the control flow goes on here
// when finally spawn has done its task, blob is correctly assigned, but it is too late
It seems that the support for dataflow variables - which I understand à la Oz - is only partial, and the dependence of the line blobs.add(blob) is not treated correctly.
I would still be interested in understanding what happens under the covers with spawn and parallel, especially with regards to:
I am sorry I was not clear: I do not have any slowdown right now, since I managed to avoid the sharing that was causing it. I really would like to share the full example, but it relies on proprietary C libraries we have bought, so it is a little hard to do that. If necessary, I can take some time to create an example in pure Nim, but it takes a while.
So, I do not have concrete problems right now. Still, I would like to understand more about the threading model of Nim - if anything, to use it with more confidence. I have read everything I have found, but I am still confused about some things.
I apologize if I have been insistent about this topic: I did not want to abuse of people's time and kindness.
import threadpool
type Foo = object
bar: int
proc slow_fib(n: int): int =
if n <= 1: n
else: slow_fib(n - 1) + slow_fib(n - 2)
proc makeFoo(i: int): Foo =
return Foo(bar: slow_fib(30))
var foos = newSeq[Foo]()
parallel:
for i in 0 .. 3:
let foo = spawn makeFoo(i)
foos.add(foo)
for foo in foos:
echo foo.bar
I would expect to have printed 832040 four times, but instead I get four 0. If I change the above to
var foos = newSeq[Foo](4)
parallel:
for i in 0 .. foos.high:
foos[i] = spawn makeFoo(i)
everything works as expected.
When using dataflow variables in, say, Oz, the line foos.add(foo) would not be executed until foo has been assigned from the thread. Now, this is not inherently right or wrong - now that I understand this, I can pay attention. But this prompted me to try understanding better what goes on behind the scenes, to avoid such misunderstandings in the future
I tried this too and also don't understand what is going on! I feel like @andrea has a resonable question here!
I made this code which runs on my system such that the test1() returns 55 always while the test2() returns 55 and 0 intermixed (depending on cpu load). You may need to modify slow_fib(10) to higher or lower values to get 0 or mixed results.
It looks like that the test1() version includes some "magical" info about that foos[i] has to wait until the thread completes when accessed.
While the assignments in the let foo = spawn... / foos.add(foo) case destroys this "extra" info.
import threadpool
type Foo = object
bar: int
proc slow_fib(n: int): int =
if n <= 1: n
else: slow_fib(n - 1) + slow_fib(n - 2)
proc makeFoo(): Foo =
result.bar=slow_fib(10)
proc test1() =
var foos = newSeq[Foo](4)
parallel:
for i in 0 .. foos.high:
foos[i] = spawn makeFoo()
echo "test1:"
for foo in foos:
echo foo.bar
proc test2() =
var foos = newSeq[Foo]()
parallel:
for i in 0 .. 3:
let foo = spawn makeFoo()
foos.add(foo)
echo "test2:"
for foo in foos:
echo foo.bar
test1()
test2()
Output:
test1:
55
55
55
55
test2:
55
0
0
55
that the second test does not compile because of the statement dependency?
Exactly.