proc main() =
var threads: array[0..4, Thread[tuple[a, b: int]]]
var res: Atomic[int]
proc worker(interval: tuple[a, b: int]) {.gcsafe, nimcall, thread.} =
# loop over interval and compute some value
res += value
for i in 0..high(threads):
let first = ... # computing first and last index of a chunk of data
let last = ...
createThread(threads[i], worker, (first, last))
joinThreads(threads)
nim compiler output: Error: illegal capture 'res' because 'worker' has the calling convention: <nimcall>
If I omit nimcall, the signature of the worker proc does not match the expected 2nd argument of createThread(). I do understand why these errors make sense in general, but in this specific case it should be fine to capture res, since it's of type Atomic, shouldn't it?
Is there anything I can do to convince the compiler to compile the code?
I know there are other ways to achieve what I want, but I'm currently only interested in this specific case.
In the future it would be nice for closures to auto-capture addresses and do escape analysis / lifetime analysis.
Currently it's tedious to pass var parameters and seq buffers to closures for compute.
For example this is a parallel "max" computation using Weave, and you need to capture the address of the var and the lock, and it would be even more verbose if Matrix was a seq, we would need to capture M[0, 0].addr as a workaround.
proc maxWeaveStaged[T: SomeFloat](M: Matrix[T]) : T =
var max = T(-Inf)
let maxAddr = max.addr
var lock: Lock
lock.initLock()
let lockAddr = lock.addr
parallelForStaged i in 0 ..< M.nrows:
captures:{maxAddr, lockAddr, M}
awaitable: maxLoop
prologue:
var localMax = T(-Inf)
loop:
for j in 0 ..< M.ncols:
localMax = max(localMax, M[i, j])
loadBalance(Weave)
epilogue:
lockAddr[].acquire()
maxAddr[] = max(maxAddr[], localMax)
lockAddr[].release()
let waslastThread = sync(maxLoop)
lock.deinitLock()
I just noticed that the following code results in unexpected behavior. Sometimes, the numbers are printed in random order to the console, sometimes the program is stuck in the call to acquire with all cores close to 100% usage and no output at all. Not sure if I'm doing something wrong, the only difference between your code example and mine is that you acquire and release the lock in the epilogue, not in the loop directly.
proc test() =
var lock: Lock
lock.initLock()
let lockAddr = lock.addr
parallelFor i in 0 ..< 10:
captures: {lockAddr}
lockAddr[].acquire() # stuck here
echo i
lockAddr[].release()
For some reason, it works as intended if I pass a ptr Lock to the function instead of initializing it inside the function:
proc test(lock: ptr Lock) =
parallelFor i in 0 ..< 10:
captures:{lock}
lock[].acquire()
echo i
lock[].release()
when isMainModule:
init(Weave)
var lock: Lock
lock.initLock()
test(lock.addr)
exit(Weave)
To be honest, this behaviour confuses me even more. I'm glad that I got it to work, though.