I don't know if this is currently possible, but it would be nice to have.
It seems I could theoretically speed the execution of this code.
proc segsieve(Kmax: uint, KB: int) = # for Kn resgroups|bytes in segment
let Ks = KB # make default seg size immutable
parallel: # perform SSoZ in parallel
for r in 0..rescnt-1: # for each residue track number 'r'
let nextp_row = r * pcnt # set the 'nextp' table row address
let seg_row = r * Ks # set the 'seg' memory row address
spawn residue_sieve(nextp_row, seg_row, Kmax, Ks, r) # do sieve for row 'r'
sync() # wait for all row threads to finish
for i in 0..rescnt-1: # update 'primecnt' with the count of
primecnt += cnts[i] # segment primes for each 'seg' row
Here sync() causes the following code to wait for execution until all the threads finished executing. It should be theoretically possible to speed overall execution by having the cnts from each thread be asynchronously put into a thread queue (FIFO) and extracted and added to primecnt. Since here there are a known number of cnt values (rescnt amount) primecnt can then be updated as these values become availble until rescnt are added. Is this possible now? Could it be faster?