Hi there I'm trying to understand threading concepts in Nim by adopting one of examples from docs. Right now I'm getting an error but can't figure out get around it? I appreciate any suggestions. Thank you.
import threadpool
proc runIt(mystr: string) : string=
result = "AAA"
proc testB()=
var mySeq : seq[string] = @["a","b","c"]
var mySeqReturn = newSeq[string](3)
parallel:
for k in 0..mySeq.high:
mySeqReturn[k] = spawn runIt(mySeq[k])
testB()
Error: type mismatch: got (FlowVar[string]) but expected 'string'
mySeqReturn[k] = spawn runIt(mySeq[k])
That seems easy to fix by writing
mySeqReturn[k] = ^spawn runIt(mySeq[k])
spawn returns a FlowVar, which takes its time to return and you have to await it: http://nim-lang.org/docs/threadpool.html#^,FlowVar[ref.T]
But now I'm wondering why it's necessary to do this with strings, but not with ints/floats. Maybe this doesn't even work? Edit: Now it compiles, but indeed the string can't be passed.
Anyway, the next problem is that the prover is not very intelligent and doesn't recognize that mySeq and mySeqReturn have the same length, so you have to do:
parallel:
for k in 0..min(mySeq.high, mySeqReturn.high):
mySeqReturn[k] = ^spawn runIt(mySeq[k])
mySeqReturn[k] = ^spawn runIt(mySeq[k])
Destroys the point of the parallelism as it blocks until the result is available!
But now I'm wondering why it's necessary to do this with strings, but not with ints/floats.
Because the size of an int/float is known at compile-time, the compiler can optimize the FlowVar away and instead tell the spawned proc where to write the result. This is not possible for string. The price of this optimization is the inconsistency that you noticed, but since parallel is about optimization, I decided it's worth it.
Not sure what I'm doing wrong here. Having serialized access to the seq does not prevent this example from crashing.
import threadpool
import locks
var L: TLock
proc runIt(k, : int, mystr: ptr seq[string]) : void=
acquire(L)
echo k, " ", mystr[][k]
mystr[][k]="update" # crash
release(L)
proc testB()=
var mySeq : seq[string] = @["a","b","c"]
var mySeqReturn = newSeq[string](3)
parallel:
for k in 0..min(mySeq.high, mySeqReturn.high):
spawn runIt(k, addr mySeq)
initLock(L)
testB()
So let me summarize what I've understood so far. To process a sequence of strings (in parallel) I have two options at the moment. I can either return a copy of the string from a thread (via FlowVar) or disable gc which should lift all gc restrictions but brings more problems then benefits: unsafety, manual synchronization errors etc. Returning value from a thread by copy will work and will be efficient as long as the strings are small. If their size is large enough the actual time spent on copying data will dominate the process which makes all the parallelization inefficient.
Does that sound about right?
Araq, one more question regarding your earlier reply:
Because the size of an int/float is known at compile-time, the compiler can optimize the FlowVar away and instead tell the spawned proc where to write the result. This is not possible for string.
This basically means any objects with string fields fall into that category?