OK, I've racked my brain enough and need help. Using 0.17.2 on Linux, I have this proc below.
proc segcount(row, Kn: int): int =
var cnt = 0
for k in 0..<Kn: cnt += seg[row + k].int
result = cnt
It works when I use it as below.
proc segsieve(Kn: int) = # for Kn resgroups in segment
...
...
var cnt = 0 # count for the segment primes '1' bytes
for i in 0..<rescnt: # count Kn resgroups|bytes each restrack
cnt += segcount(i*KB, Kn)
primecnt += cnt.uint # update primecnt for the segment
But when I try to parallelize this using spawn I get this error.
proc segsieve(Kn: int) = # for Kn resgroups in segment
...
...
var cnt = 0 # count for the segment primes '1' bytes
for i in 0..<rescnt: # count Kn resgroups|bytes each restrack
cnt += spawn segcount(i*KB, Kn) <-- error points to start of '+='
sync()
primecnt += cnt.uint # update primecnt for the segment
-------------------------------------------------------------
[jzakiya@localhost nim]$ nim c --cc:gcc --threads:on --d:release ssozp5x1c1par.nim
Hint: used config file '/home/jzakiya/nim-0.17.2/config/nim.cfg' [Conf]
Hint: system [Processing]
Hint: ssozp5x1c1par [Processing]
Hint: math [Processing]
Hint: strutils [Processing]
Hint: parseutils [Processing]
Hint: algorithm [Processing]
Hint: typetraits [Processing]
Hint: threadpool [Processing]
Hint: cpuinfo [Processing]
Hint: os [Processing]
Hint: times [Processing]
Hint: posix [Processing]
Hint: ospaths [Processing]
Hint: linux [Processing]
Hint: cpuload [Processing]
ssozp5x1c1par.nim(154, 16) Error: type mismatch: got (uint64, FlowVar[system.int])
but expected one of:
proc `+=`[T: SomeOrdinal | uint | uint64](x: var T; y: T)
proc `+=`[T: float | float32 | float64](x: var T; y: T)
[jzakiya@localhost nim]$
Do you use the parallel statement at all as described in the manual? Or only a plain spawn? See
https://nim-lang.org/docs/manual.html#parallel-spawn
You may also need a FlowVar.
I did test parallel once for calculation of a convex hull, see
https://forum.nim-lang.org/t/483/2
Was not really fast at that time, but I think that is fixed already.
That example is even included in
Nim/tests/parallel/tconvexhull.nim
I've done it both with/out parallel: as shown below, but get the same compiler output.
parallel:
var cnt = 0 # count for the segment primes '1' bytes
for i in 0..<rescnt: # count Kn resgroups|bytes each restrack
cnt += spawn segcount(i*KB, Kn)
sync()
primecnt += cnt.uint # update primecnt for the segment
I'm also using spawn earlier in segsieve which works with no problems, and actually does operate in parallel, which I can verify by looking at the program's operation using htop.
Below is the total segsieve code.
# This routine performs the prime sieve for a restrack of Kn resgroups|bytes.
# 'nextp' resgroup vals for restrack 'r' mark prime multiples on it in 'seg'
# and are udpated for each prime for the next segment.
proc residue_sieve(row: int, seg_rti: int, Kn: int)=
for j, prime in primes: # for each prime r1..sqrt(N)
if nextp[row+j] < Kn.uint: # if 1st mult resgroup is within 'seg'
var k = nextp[row+j].int # starting from this resgroup in 'seg'
while k < Kn: # for each primenth byte to end of 'seg'
seg[seg_rti + k] = 0 # mark byte in segment as nonprime
k += prime # compute next prime multiple resgroup
nextp[row+j] = uint(k - Kn) # save 1st resgroup in next eligible seg
else: nextp[row+j] -= Kn.uint # do if 1st mult resgroup not within seg
# Count the primes on each row of Kn resgroups|bytes in 'seg' memory.
proc segcount(row, Kn: int): int = # for this row in 'seg' of Kn bytes
var cnt = 0
for k in 0..<Kn: cnt += seg[row + k].int # add primes '1' (and nonprimes '0')
result = cnt # return count of primes for 'row'
# This routine performs the total prime sieve for Kn resgroups|bytes by
# processing each residue track individually (in parallel). Then the
# segment primes count is computed and added to global var 'primecnt'.
proc segsieve(Kn: int) = # for Kn resgroups in segment
for b in 0..<seg.len: seg[b] = 1 # initialize seg bytes to all prime '1'
parallel:
for r in 0..<rescnt: # for each residue track number 'r'
let row = r * pcnt # set the 'nextp' table row address
let seg_rti = r * KB # set the segment mem row address
spawn residue_sieve(row, seg_rti, Kn) # mark the prime multiples along it
sync()
#parallel:
var cnt = 0 # count for the nonprimes, the '1' bytes
for i in 0..<rescnt: # count Kn resgroups along each restrack
#cnt += segcount(i*KB, Kn)
cnt += spawn segcount(i*KB, Kn)
sync()
primecnt += cnt.uint
I'm trying to get segcount to operate in parallel too, which should make the program even faster. When I get this working I'll write this all up and update my a paper to show the new parallel algorithm architecture, and the Nim implementation.
I think we will not be able to compile your code, as it looks not like a complete program.
Do you really expect that
cnt += spawn segcount(i*KB, Kn)
may work? I have no idea how it could.
Maybe what you intent is something like
parallel:
var cnt = array[rescnt, int]
for i in 0..<rescnt:
cnt[i] = spawn segcount(i*KB, Kn)
sync()
for i in 0..<rescnt:
primecnt += cnt[i].uint
Such a shape would make some sense for me, but I have not used Nim's parallel in the last two years, so I would have to consult the manual.
After reading the Nim in Action book I got it to compile by placing a ^ before spawn, but it makes the program slower. The problem has to do with segcount returning a FlowVar[T] mismatch. And when I use parallel: it won't compile, and shows even more errors. Doing more research.
var cnt = 0 # count for the primes, the '1' bytes
for i in 0..<rescnt: # count Kn resgroups|bytes each restrack
cnt += ^segcount(i*KB, Kn)
primecnt += cnt.uint # update primecnt for the segment
Now in your code there is no spawn at all!
For parallel processing, you have to ensure that there are no conflicts when parallel tasks are accessing your data, otherwise the compiler may make copies of the data before, which may make it slow. And for parallel processing a good use of the CPU cache is also important -- many parallel processes will give no speed increase when data is always fetched from slow RAM instead of cache.
Try
for i in 0..rescnt-1:
cnt[i] = spawn segcount(i*KB, Kn)
I think the issue is with ..<
In the previous snippet I forgot the spawn. The code below compiles, but is slower.
var cnt = 0 # count for the primes, the '1' bytes
for i in 0..<rescnt: # count Kn resgroups|bytes each restrack
cnt += ^spawn segcount(i*KB, Kn) <-- the '^' gets it to compile, but threads wait to finish
sync()
primecnt += cnt.uint # update primecnt for the segm
Changing 0..<rescnt to 0..rescnt-1 has same error below. Even when I do a while loop it throws the same error.
ssozp5x1c1par.nim(156, 11) Error: type mismatch: got (uint, FlowVar[system.uint])
but expected one of:
proc `+=`[T: SomeOrdinal | uint | uint64](x: var T; y: T)
proc `+=`[T: float | float32 | float64](x: var T; y: T)
The problem seems to be when segcount returns its output their is a type mismatch with cnt (?).
When I use parallel I get this compiler error:
parallel:
var cnt = 0'u # count for the nonprimes, the '1' bytes
for i in 0..rescnt-1: # count Kn resgroups along each restrack
cnt += spawn segcount(i*KB, Kn) <-- points to start of '('
sync()
primecnt += cnt
-------------------------------------------------------------
ssozp5x1c1par.nim(155, 28) Error: 'spawn' must not be discarded
Why do you refuse to try
cnt[i]
as jlp765 suggests?
Do you have an idea how plain
cnt +=
should work? All the parallel calculated results should accumulate in this single variable. Then you may need something to control the access to it. The error messages keep saying the issue is a mismatch with FlowVar[T]. In Chapter 6 of Nim in Action here is what it says they are.
FlowVar[T] can be thought of as a container similar to the Future[T] type, which you used in chapter 3. At first, the container has nothing inside it. When the spawned procedure is executed in a separate thread, it returns a value sometime in the future. When that happens, the returned value is put into the FlowVar container.
Here is updated segcount
proc segcount(row, Kn: int): uint =
var cnt = 0'u
for k in 0..<Kn: cnt += seg[row + k].uint
result = cnt
So segcount is returning a uint value. This works perfectly well as below with no type mismatch.
var cnt = 0'u
for i in 0..<rescnt:
cnt += segcount(i*KB, Kn)
primecnt += cnt
But using spawn causes a type mismatch: cnt += spawn segcount(i*KB, Kn)
No matter what kind of container for cnt you use a type mismatch error appears.
try
var cnt = array[rescnt, int]
parallel:
for i in 0..rescnt-1:
cnt[i] = spawn segcount(i*KB, Kn)
sync()
for i in 0..rescnt-1:
primecnt += cnt[i].uint
OK, I had to clean it up a little to make it work, but here is the code that gets it to compile.
var cnt: array[rescnt, uint]
parallel:
for i in 0..rescnt-1:
cnt[i] = spawn segcount(i*KB, Kn)
sync()
for i in 0..rescnt-1:
primecnt += cnt[i].uint
So was the issue that the single value of cnt was getting clobbered by each thread's return value, causing the type mismatch?
Thanks for getting it to compile, but if you can explain why this works I'd appreciate it even more.
The cnt += operation in parallel is ripe for creating a reduction like option for Nim that's in OpenMP.
https://stackoverflow.com/questions/13290245/reduction-with-openmp#13290673