nimforum mirror - FlowVar type mismatch in parallel block

kuba (orginal) [2015-05-07T02:23:58+02:00] view original

Hi there I'm trying to understand threading concepts in Nim by adopting one of examples from docs. Right now I'm getting an error but can't figure out get around it? I appreciate any suggestions. Thank you.

import threadpool

proc runIt(mystr: string) : string=
    result = "AAA"

proc testB()=
  var mySeq : seq[string] = @["a","b","c"]
  var mySeqReturn = newSeq[string](3)
  
  parallel:
    for k in 0..mySeq.high:
      mySeqReturn[k] = spawn runIt(mySeq[k])

testB()

Error: type mismatch: got (FlowVar[string]) but expected 'string'
        mySeqReturn[k] = spawn runIt(mySeq[k])

def (orginal) [2015-05-07T03:09:51+02:00] view original

That seems easy to fix by writing

mySeqReturn[k] = ^spawn runIt(mySeq[k])

spawn returns a FlowVar, which takes its time to return and you have to await it: http://nim-lang.org/docs/threadpool.html#^,FlowVar[ref.T]

But now I'm wondering why it's necessary to do this with strings, but not with ints/floats. Maybe this doesn't even work? Edit: Now it compiles, but indeed the string can't be passed.

Anyway, the next problem is that the prover is not very intelligent and doesn't recognize that mySeq and mySeqReturn have the same length, so you have to do:

parallel:
    for k in 0..min(mySeq.high, mySeqReturn.high):
      mySeqReturn[k] = ^spawn runIt(mySeq[k])

Araq (orginal) [2015-05-07T11:56:21+02:00] view original

mySeqReturn[k] = ^spawn runIt(mySeq[k])

Destroys the point of the parallelism as it blocks until the result is available!

But now I'm wondering why it's necessary to do this with strings, but not with ints/floats.

Because the size of an int/float is known at compile-time, the compiler can optimize the FlowVar away and instead tell the spawned proc where to write the result. This is not possible for string. The price of this optimization is the inconsistency that you noticed, but since parallel is about optimization, I decided it's worth it.

kuba (orginal) [2015-05-07T20:28:31+02:00] view original

Thanks for your comments. So just to confirm - shared memory concurrency with strings is not supported at the moment?

Araq (orginal) [2015-05-07T21:07:54+02:00] view original

It IS supported via addr and ptr string for reading strings and currently the string passed from mySeq[k] to runIt isn't copied either, for better or worse. Writing strings back happens via FlowVar's and the ^ operator which currently performs a copy, but doesn't have to.

kuba (orginal) [2015-05-11T09:37:05+02:00] view original

Not sure what I'm doing wrong here. Having serialized access to the seq does not prevent this example from crashing.

import threadpool
import locks

var  L: TLock

proc runIt(k, : int, mystr: ptr seq[string]) : void=
    acquire(L)
    echo k, "  ", mystr[][k]
    mystr[][k]="update"              # crash
    release(L)

proc testB()=
  var mySeq : seq[string] = @["a","b","c"]
  var mySeqReturn = newSeq[string](3)
  
  parallel:
    for k in 0..min(mySeq.high, mySeqReturn.high):
      spawn runIt(k, addr mySeq)

initLock(L)
testB()

kuba (orginal) [2015-05-11T20:10:33+02:00] view original

EDIT: Looks like the problem is most likely related to GCC version. Today's test is run on a different machine with 4.9.2. The yesterday one was based on 4.8.3

Araq (orginal) [2015-05-11T22:18:15+02:00] view original

Nevertheless your code is wrong. Note that I said "It IS supported via addr and ptr string for reading strings". What do you do? You write. ;-)

kuba (orginal) [2015-05-11T22:41:00+02:00] view original

oops my bad - so writing does not guarantees safety.

Araq (orginal) [2015-05-12T10:43:19+02:00] view original

No, it's not only unsafe. It's not possible. The string "update" that runIt allocates and stores is local to the thread it runs on and will be collected independent of testB.

kuba (orginal) [2015-05-12T18:39:00+02:00] view original

So let me summarize what I've understood so far. To process a sequence of strings (in parallel) I have two options at the moment. I can either return a copy of the string from a thread (via FlowVar) or disable gc which should lift all gc restrictions but brings more problems then benefits: unsafety, manual synchronization errors etc. Returning value from a thread by copy will work and will be efficient as long as the strings are small. If their size is large enough the actual time spent on copying data will dominate the process which makes all the parallelization inefficient.

Does that sound about right?

kuba (orginal) [2015-05-12T19:21:29+02:00] view original

Araq, one more question regarding your earlier reply:

Because the size of an int/float is known at compile-time, the compiler can optimize the FlowVar away and instead tell the spawned proc where to write the result. This is not possible for string.

This basically means any objects with string fields fall into that category?

Araq (orginal) [2015-05-12T20:15:27+02:00] view original

Yes and yes but see my reply in some other thread: SharedString and SharedTable are coming.

Mirror of forum.nim-lang.org

1202 :: FlowVar type mismatch in parallel block