Hi all,
I have been trying out the Nim lang threadpool module and I have a few questions about its behavior.
Example 1:
import os, threadpool
proc t(i : int) =
sleep 8000 /% (i+1)
echo "exit t: " , i
proc p() =
parallel:
for i in 1..4:
spawn t(i)
echo "Spawn complete"
sync()
echo "Sync exit "
echo "P block exit "
p()
On a two core machine it will sometimes output this:
Spawn complete exit t: 4 exit t: 3 exit t: 2 exit t: 1 Sync exit P block exit
and sometimes output this:
exit t: 3 Spawn complete exit t: 2 exit t: 4 exit t: 1 Sync exit P block exit
I guess it makes sense; if you fire off more threads than cores, the scheduler might execute a spawned thread before the thread doing the spawning completes. However that raises a question: Is it possible to have a thread (e.g. for the user interface) that will not pause, but at the same time can fire off a bunch of jobs to the thread pool? Do we have or need the concept of a parent thread that has higher priority than spawned threads or something like that?
Another question about this example: When the spawned jobs run before the spawning thread completes, the sleep call seems to actually 'park' the core for the duration. It will run out the sleep time before the spawning block completes. It's been a while, but I am pretty sure if I did that in say Java, the sleep call would release the CPU, and the scheduler would only awaken it again after the allotted time. So, is this the intended behavior of the sleep call? (Running on Fedora 21.)
Example 2: This one is a bit longer, but the idea is pretty simple. It declares two objects; Node and NodeList. Node contains a big array of float set to a random value (random(1.0)), and NodeList holds a big array of refs to Nodes. The single thread version of 'work' consists of calculating the average variance of the nodes' arrays in the nodelist. That works OK. However, when I try to split up the list of node refs (using the sequtil distribute proc) and spawn jobs that calculate the variance of each node subset, memory use increases by several fold, and performance decreases. If I pass a ptr instead of a ref to the NodeList to the 'workSet' proc in the spawn call, memory does not balloon, and performance is as hoped. It is as if by passing the ref, every thread copies the entire data structure while running the job. By passing a ptr, the spawned threads just use the ptr and don't track all of the refs in each thread. Or something. What is happening here, and is my work around appropriate?
Thanks to everyone who read this post!
-Steve
Example 2:
import threadpool, math, sequtils
const
nFloatCount = 3000
nodeCount = 9000
threads = 4
type
Node = ref object
nfa : array[nFloatCount,float]
NodeListObj = object
nodes: seq[Node]
nodeSpread: seq[seq[Node]]
NodeList = ref NodeListObj
NodeListPtr = ptr NodeListObj
# Change ptr in the above line to ref and
# memory demand explodes while
# execution speed slows.
# You may also thrash your computer!
proc makeNode() : Node =
result = Node()
for i in result.nfa.low..result.nfa.high:
result.nfa[i] = random(1.0)
proc makeNodeList() : NodeList =
var n : seq[Node] = @[]
for i in 0..<nodeCount:
n.add(makeNode())
result = NodeList(nodes :n, nodeSpread : n.distribute(threads))
proc work(n : Node) : float = return n.nfa.variance()
proc workNodeList(NodeList : NodeList) =
# single thread version, called to confirm the data structure
var tvar = 0.0
for n in NodeList.nodes:
tvar += n.work()
tvar /= float(NodeList.nodes.len)
echo " average variance " , tvar
proc workSet(NodeList: NodeListPtr, i : int) =
var tvar = 0.0
for n in NodeList.nodeSpread[i]:
tvar += n.work()
tvar /= float(NodeList.nodeSpread[i].len)
echo i , " average sub-variance " , tvar
proc workNodeListT(NodeList : NodeList) =
var tvar = 0.0
let pNodeList = cast[NodeListPtr](NodeList)
# This is the critical cast. If the ref is not cast
# to a ptr to the node obj, memory explodes and
# performance crashes.
parallel:
for i in 0..NodeList.nodeSpread.high:
spawn(pNodeList.workSet(i))
sync()
proc main() =
randomize()
var NodeList = makeNodeList()
for i in 0..5:
NodeList.workNodeList
for i in 0..50:
NodeList.workNodeListT
main()
If I pass a ptr instead of a ref to the NodeList to the 'workSet' proc in the spawn call, memory does not balloon, and performance is as hoped. It is as if by passing the ref, every thread copies the entire data structure while running the job. By passing a ptr, the spawned threads just use the ptr and don't track all of the refs in each thread.
That is the documented and supposed behaviour. :-)
Btw the parallel section doesn't require a sync.
When the spawned jobs run before the spawning thread completes, the sleep call seems to actually 'park' the core for the duration. It will run out the sleep time before the spawning block completes. It's been a while, but I am pretty sure if I did that in say Java, the sleep call would release the CPU, and the scheduler would only awaken it again after the allotted time. So, is this the intended behavior of the sleep call?
The parallel section is about parallelism, even running everything sequentially and ignoring your parallel and spawn annotations is allowed.
Thank you for your quick response. I was wondering if I needed the sync call there in the parallel block (thought I saw some examples using it in that context).
I was not clear on the behavior of ref and ptr with regard to spawn although I tried to find it in the manual. Could you point me to where this is documented? Maybe we should add a section to the tutorials?
If I take out the 'parallel:' block in example one and keep the sync, it still has the same behavior; sometimes a spawned thread activates before spawning is done, and it prevents spawn from completing until the sleep count is complete. So, that still leaves me wondering if there is preferred technique for firing off jobs in a thread pool while ensuring that the calling thread will not pause. Again, this would be nice for user interfaces for example. (Something Chrome browser still struggles with to my frustration ;-) )
If I take out the 'parallel:' block in example one and keep the sync, it still has the same behavior; sometimes a spawned thread activates before spawning is done, and it prevents spawn from completing until the sleep count is complete.
Ah but that one is a bug. I think.
Could you point me to where this is documented?
From http://nim-lang.org/docs/manual.html#parallel-spawn-spawn-statement
ref parameters are deeply copied which is a subtle semantic change and can cause performance problems but ensures memory safety. This deep copy is performed via system.deepCopy and so can be overriden.