2. calls proc sum, which sums the values of the ref array. For compiling I use `` nim c -d:release --threads:on sumArray2.nim `` the time values are: real 0m1.482s user 0m0.577s sys 0m0.905s
When I use a thread for the proc sum (listing below), the time values are: real 0m4.208s user 0m2.405s sys 0m1.805s
When I divide up the work to 2 threads for the proc sum (listing below), the time values are: real 0m6.833s user 0m4.307s sys 0m2.650s
Am I doing something wrong?
`` ** just the main thread ** const asize = 500_000_000 proc sum(i, j: int; aref: ref array[asize, int]): int = for k in i .. j: result += aref[k] proc fillArray(aref: ref array[asize,int]) = let median = asize div 2 for i in aref[].low .. aref[].high: if i < median: aref[i] = 1 else: aref[i] = 2 proc main() = var aref: ref array[asize, int] new(aref) fillArray(aref) var s1: int s1 = sum(0, asize - 1, aref) echo s1 main() ** using 1 thread for the proc sum ** import std/threadpool const asize = 500_000_000 proc sum(i, j: int; aref: ref array[asize, int]): int = for k in i .. j: result += aref[k] proc fillArray(aref: ref array[asize,int]) = let median = asize div 2 for i in aref[].low .. aref[].high: if i < median: aref[i] = 1 else: aref[i] = 2 proc main() = var aref: ref array[asize, int] new(aref) fillArray(aref) var s1 = spawn sum(0, asize - 1, aref) echo ^s1 main() ** using 2 threads for the proc sum ** import std/threadpool const asize = 500_000_000 proc sum(i, j: int; aref: ref array[asize, int]): int = for k in i .. j: result += aref[k] proc fillArray(aref: ref array[asize,int]) = let median = asize div 2 for i in aref[].low .. aref[].high: if i < median: aref[i] = 1 else: aref[i] = 2 proc main() = var aref: ref array[asize, int] new(aref) fillArray(aref) var s1 : FlowVar[int] = spawn sum(0, asize div 2 - 1, aref) var s2 : FlowVar[int] = spawn sum(asize div 2, asize-1, aref) echo ^s1 echo ^s2 main()`` ~
Am I doing something wrong?
What is your data size? Looks like 4G. Then the actual performance may be limited by the memory bandwidth of your box I guess. Actually, I am not really sure about this :-)
For testing spawn(), you may start with the tiny example from my book, that one was working fine for me. For your code, you may test passing not the same variable to proc sum(), but two distinct variables. You do read access only, but can the compiler ensure that, or may it assume write access, so block parallel execution? I don't now. Of course, to execute code in parallel, you would need at least 2 physical cores. And you may test with much smaller data, that fits into cache. And why ref array, why not a seq? And tell us about your Nim compiler version, gcc or clang backend, and MM options (arc, orc, refc). Finally, I see that this post is not really helpful for you -- but I would be interested in the reasons for your observations myself. No time for testing now, sorry.
Funny fact: Your code fails for me with orc, is fast with arc, and slow with refc. All without -d:release, as we should always test with debug code first. I had to remove a space from the beginning of each line for your code:
import std/threadpool
const asize = 500_000_000
proc sum(i, j: int; aref: ref array[asize, int]): int =
for k in i .. j:
result += aref[k]
proc fillArray(aref: ref array[asize,int]) =
let median = asize div 2
for i in aref[].low .. aref[].high:
if i < median: aref[i] = 1
else: aref[i] = 2
proc main() =
var aref: ref array[asize, int]
new(aref)
fillArray(aref)
var s1 : FlowVar[int] = spawn sum(0, asize div 2 - 1, aref)
var s2 : FlowVar[int] = spawn sum(asize div 2, asize-1, aref)
echo ^s1
echo ^s2
main()
nim r --mm:arc t.nim
Hint: used config file '/home/salewski/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/salewski/Nim/config/config.nims' [Conf]
.................................................................................................................
Hint: [Link]
Hint: mm: arc; threads: on; opt: none (DEBUG BUILD, `-d:release` generates faster code)
51917 lines; 0.416s; 71.145MiB peakmem; proj: /tmp/hhh/t.nim; out: /home/salewski/.cache/nim/t_d/t_2FE7193ABD2B5B38E5EC03CCAD8F34218781F691 [SuccessX]
Hint: /home/salewski/.cache/nim/t_d/t_2FE7193ABD2B5B38E5EC03CCAD8F34218781F691 [Exec]
250000000
500000000
salewski@hx90 /tmp/hhh $ nim r --mm:orc t.nim
Hint: used config file '/home/salewski/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/salewski/Nim/config/config.nims' [Conf]
...................................................................................................................
CC: ../../home/salewski/Nim/lib/system/exceptions.nim
CC: ../../home/salewski/Nim/lib/std/private/digitsutils.nim
CC: ../../home/salewski/Nim/lib/std/assertions.nim
CC: ../../home/salewski/Nim/lib/system/dollars.nim
CC: ../../home/salewski/Nim/lib/std/typedthreads.nim
CC: ../../home/salewski/Nim/lib/std/syncio.nim
CC: ../../home/salewski/Nim/lib/system.nim
CC: ../../home/salewski/Nim/lib/pure/concurrency/cpuinfo.nim
CC: ../../home/salewski/Nim/lib/pure/concurrency/cpuload.nim
CC: ../../home/salewski/Nim/lib/pure/times.nim
CC: ../../home/salewski/Nim/lib/pure/os.nim
CC: ../../home/salewski/Nim/lib/pure/concurrency/threadpool.nim
CC: t.nim
Hint: [Link]
Hint: mm: orc; threads: on; opt: none (DEBUG BUILD, `-d:release` generates faster code)
52472 lines; 0.584s; 71.051MiB peakmem; proj: /tmp/hhh/t.nim; out: /home/salewski/.cache/nim/t_d/t_E1C062DF3CB7AF52F54AB2238F57D1CC87BFD0EF [SuccessX]
Hint: /home/salewski/.cache/nim/t_d/t_E1C062DF3CB7AF52F54AB2238F57D1CC87BFD0EF [Exec]
Traceback (most recent call last)
/home/salewski/Nim/lib/pure/concurrency/threadpool.nim(370) slave
/tmp/hhh/t.nim(16) sumWrapper
/home/salewski/Nim/lib/system/orc.nim(497) nimDecRefIsLastCyclicStatic
/home/salewski/Nim/lib/system/orc.nim(469) rememberCycle
/home/salewski/Nim/lib/system/orc.nim(147) unregisterCycle
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
250000000
500000000
Error: execution of an external program failed: '/home/salewski/.cache/nim/t_d/t_E1C062DF3CB7AF52F54AB2238F57D1CC87BFD0EF'
salewski@hx90 /tmp/hhh $ nim r --mm:refc t.nim
Hint: used config file '/home/salewski/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/salewski/Nim/config/config.nims' [Conf]
.....................................................................................................................
CC: ../../home/salewski/Nim/lib/system/exceptions.nim
CC: ../../home/salewski/Nim/lib/std/private/digitsutils.nim
CC: ../../home/salewski/Nim/lib/std/assertions.nim
CC: ../../home/salewski/Nim/lib/system/dollars.nim
CC: ../../home/salewski/Nim/lib/std/typedthreads.nim
CC: ../../home/salewski/Nim/lib/std/syncio.nim
CC: ../../home/salewski/Nim/lib/system.nim
CC: ../../home/salewski/Nim/lib/pure/concurrency/cpuinfo.nim
CC: ../../home/salewski/Nim/lib/pure/concurrency/cpuload.nim
CC: ../../home/salewski/Nim/lib/pure/times.nim
CC: ../../home/salewski/Nim/lib/pure/os.nim
CC: ../../home/salewski/Nim/lib/pure/concurrency/threadpool.nim
CC: t.nim
Hint: [Link]
Hint: mm: refc; threads: on; opt: none (DEBUG BUILD, `-d:release` generates faster code)
53710 lines; 0.629s; 70.953MiB peakmem; proj: /tmp/hhh/t.nim; out: /home/salewski/.cache/nim/t_d/t_FC801BBD30666F1C90BFB44E28FFDC799824A11B [SuccessX]
Hint: /home/salewski/.cache/nim/t_d/t_FC801BBD30666F1C90BFB44E28FFDC799824A11B [Exec]
250000000
500000000
And for me, it works well with --mm:arc compiled with -d:release. When I run it, I get
real 0m1.240s user 0m0.616s sys 0m0.736s
And when I comment out s2 with
var s1 : FlowVar[int] = spawn sum(0, asize div 2 - 1, aref)
#var s2 : FlowVar[int] = spawn sum(asize div 2, asize-1, aref)
echo ^s1
#echo ^s2
I get nearly the same times. For me an indication that with s2 enabled the threads run in parallel.
$ time ./t
250000000
500000000
real 0m6.602s
user 0m4.583s
sys 0m2.154s
salewski@hx90 /tmp/hhh $ nim c --mm:refc -d:release t.nim
Hint: used config file '/home/salewski/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/salewski/Nim/config/config.nims' [Conf]
.....................................................................................................................
CC: ../../home/salewski/Nim/lib/pure/concurrency/threadpool.nim
CC: t.nim
Hint: [Link]
Hint: mm: refc; threads: on; opt: speed; options: -d:release
53710 lines; 0.546s; 71MiB peakmem; proj: /tmp/hhh/t.nim; out: /tmp/hhh/t [SuccessX]
salewski@hx90 /tmp/hhh $ time ./t
250000000
real 0m4.015s
user 0m2.514s
sys 0m1.503s
First run is with s1 and s2 enabled, last run with s2 commented out.
And finally, with seq instead of ref array, orc works fine, but is not very fast:
import std/threadpool
const asize = 500_000_000
proc sum(i, j: int; aref: seq[int]): int =
for k in i .. j:
result += aref[k]
proc fillArray(aref: var seq[int]) =
let median = asize div 2
for i in aref.low .. aref.high:
if i < median: aref[i] = 1
else: aref[i] = 2
proc main() =
var aref: seq[int] = newSeq[int](asize)
#new(aref)
fillArray(aref)
var s1 : FlowVar[int] = spawn sum(0, asize div 2 - 1, aref)
var s2 : FlowVar[int] = spawn sum(asize div 2, asize-1, aref)
echo ^s1
echo ^s2
main()
$ nim c --mm:orc -d:release t.nim
Hint: used config file '/home/salewski/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/salewski/Nim/config/config.nims' [Conf]
...................................................................................................................
CC: ../../home/salewski/Nim/lib/system/exceptions.nim
CC: ../../home/salewski/Nim/lib/std/private/digitsutils.nim
CC: ../../home/salewski/Nim/lib/std/assertions.nim
CC: ../../home/salewski/Nim/lib/system/dollars.nim
CC: ../../home/salewski/Nim/lib/std/typedthreads.nim
CC: ../../home/salewski/Nim/lib/std/syncio.nim
CC: ../../home/salewski/Nim/lib/system.nim
CC: ../../home/salewski/Nim/lib/pure/concurrency/cpuload.nim
CC: ../../home/salewski/Nim/lib/pure/times.nim
CC: ../../home/salewski/Nim/lib/pure/os.nim
CC: ../../home/salewski/Nim/lib/pure/concurrency/threadpool.nim
CC: t.nim
Hint: [Link]
Hint: mm: orc; threads: on; opt: speed; options: -d:release
52472 lines; 1.000s; 71.148MiB peakmem; proj: /tmp/hhh/t.nim; out: /tmp/hhh/t [SuccessX]
salewski@hx90 /tmp/hhh $ time ./t
250000000
500000000
real 0m3.019s
user 0m1.629s
sys 0m1.509s
I think, for spawn of threadpool, the conclusion is: It is just not always the best choice to use. But I have to admit that I do not understand the details, maybe I should really buy and read the book of Mr. Rumpf.
@Stefan_Salewski Thanks for your thorough analysis. Using --mm:arc I get the following timings using options -d:release -mm:arc
Main thread only:
real 0m1.463s
user 0m0.611s
sys 0m0.851s
With one thread for proc sum:
real 0m1.470s
user 0m0.573s
sys 0m0.897s
using 2 threads for proc sum:
real 0m1.334s
user 0m0.567s
sys 0m0.875s
If I use -d:release and -mm:orc on the 2 threads for proc sum I get:
250000000
500000000
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Segmentation fault (core dumped)
It printed out the correct result, which is the end of the program, then got the SIGSEGV