Hi, i'm playing with parallel execution/multithreading and i've got below program:
proc spawn_collatz_p(cpu:int, ub:uint64):seq[(uint64,int)] =
var responses = newSeq[FlowVarBase](cpu)
parallel:
for i in 0..<cpu:
responses[i] = spawn list_collatz(uint64(i + 1), ub, cpu.uint64)
for i in 0..<cpu:
result = result & ^responses[i]
and i get an error at the last line:
type mismatch: got <FlowVarBase>
but expected one of:
proc `^`[T](fv: FlowVar[T]): T
first type mismatch at position: 1
required type for fv: FlowVar[^.T]
but expression 'responses[i]' is of type: FlowVarBase
proc `^`[T](fv: FlowVar[ref T]): ref T
first type mismatch at position: 1
required type for fv: FlowVar[ref T]
but expression 'responses[i]' is of type: FlowVarBase
template `^`(x: int): BackwardsIndex
first type mismatch at position: 1
required type for x: int
but expression 'responses[i]' is of type: FlowVarBase
expression: ^responses[i]
Couldn't find a suitable example in the documentation for spawn, hoping someone will enlighten me!
For reference below code works, but yeah, not ideal if you want to dynamically set your number of threads ;-)
proc spawn_collatz_p2(cpu:int, ub:uint64):seq[(uint64,int)] =
parallel:
let c1 = spawn list_collatz(uint64(1), ub, cpu.uint64)
let c2 = spawn list_collatz(uint64(2), ub, cpu.uint64)
result = ^c1 & ^c2
if '^' wants a FlowVar, give it a FlowVar
proc spawn_collatz_p(cpu:int, ub:uint64):seq[(uint64,int)] =
var responses:seq[FlowVar[seq[(uint64,uint64)]]]
responses.setLen(cpu)
parallel:
for i in 0..<cpu:
responses[i] = spawn list_collatz(uint64(i + 1), ub, cpu.uint64)
for i in 0..<cpu:
result = result & ^responses[i]
Ofcourse!! Thank you, i mixed up parentheses with square brackets..
What i'm actually trying to do with this program is using al my cores/threads on my processor to minimize total time to finish. So i'm calculating all the lengths of collatz sequences (https://en.wikipedia.org/wiki/Collatz_conjecture) up to a certain upper bound. At the moment when i'm using 15 threads it takes about half an hour to calculate them upto 10^11 (i've got an 8 core/16 thread processor). Because you can calculate them seperatly per number an ideal candidate to test parallel execution. Would also be a nice challenge who could write the fastest program!
Well I did ;-)
So my Nim program, compiled with -d:danger --threads:on took 5.1 seconds to create a list with the 'running maximums' for the numbers up to 2*10^8 (see https://oeis.org/A006877)
What I also did is make a program compiled for my GPU (Nvidia RTX 2060) with futhark (https://futhark-lang.org) which is a Haskell like language which compiles down to C , CUDA or OpenCL. This could most probably be improved as well, I'm not an expert, but it took a staggering 1.1 seconds...!