I didn't find any statements that ARC and Weave don't work together, but they seem to not work together with even a trivial example. Found some older forum discussion by mratsim on the topic https://forum.nim-lang.org/t/6352#39259
Putting loadBalance(Weave) in and out makes me think this has something to do with how the main thread is handled, based on when it crashes with ARC, but this is just a wild guess. For all I know I'm missing some important Weave command that makes this work.
(sorry for the awkard example - it's a reduction from the affected piece of my actual project)
import weave
type
Node = ref object
id:int
parent:Node
var globalID {.threadvar.}:int
proc inc(head:var Node) =
let old = head
new head
inc globalID
head.id = old.id + 1
head.parent = old
proc multiMain =
init(Weave)
parallelFor i in 1..8:
# make and grow a linked list
# one link at a time
var list:Node
new list
list.parent = nil
var timesPrinted = 1
for time in 0..20_000_000:
# grow ref chain (linked list)
inc list
if time mod 4_000_000 == 0:
stdout.write $timesPrinted
stdout.flushFile
inc timesPrinted
loadBalance(Weave)
exit(Weave)
quit(0)
That would be fantastic for porting!
In case you're curious of a user experience with ARC, I'm confused how the custom destructor should actually clear each ref object of my list. I saw some posts that showed using dispose(node) but it seems modern dispose is meant for inter-thread freeing only. So currently I just ensure that there are no references to the current node, then leave it dangling in memory and march on with the rest of the list, hoping it gets cleared. How should I "delete" the ref object here, or is it happening automatically?
I tried looking at memory usage to infer if memory was indeed being cleared up, but the linux-OS task memory usage and Nim's occupied memory report very different things, with the OS reporting an ever-increasing glob of memory (100MB/s) and Nim's total at < 1MB.
Warning, untested:
proc `=destroy`(x: Node) {.nodestroy.} =
var it = x
while it != nil:
let nxt = it.next
it.next = nil
`=destroy`(it[])
it = nxt
Your code is problematic.
You use threadvar:
Regarding slowness, Nim allocation uses a simple lock strategy with --threads:on. This doesn't scale when you do repeated small allocations and perf is killed by lock contention. Use an object pool.
Thank you! I did not know about the locking as a bottleneck. I've made a pooling/arena mechanism and it helps as you said.
I'm very confused about how to avoid threadvar, if I understand you correctly. It sounds like global ID won't be unique, and threadvar is similar to using global ID. So with Weave, how is thread-local memory achieved? For instance, each thread having a global counter for something. Avoiding threadvar, I would have attempted a global list on the heap and use Global ID to index it, but you're saying that won't work either.
The issue I'm working on is still speed under arc, which is still slower for me for slightly larger thread counts.
I'm very confused about how to avoid threadvar, if I understand you correctly. It sounds like global ID won't be unique, and threadvar is similar to using global ID. So with Weave, how is thread-local memory achieved? For instance, each thread having a global counter for something. Avoiding threadvar, I would have attempted a global list on the heap and use Global ID to index it, but you're saying that won't work either.
It's not just Weave, multithreading runtimes in general require the liberty to schedule your tasks on any thread that is available as they see fit if you use data parallelism (parallel for) In that case what you usually need is not thread-local memory but "task-local" memory.
In Weave this is used with parallelForStaged construct which allows you to setup a local context before splitting of the task occurs.
Your parallel loop is put in a loop: section and your local context is created and destroyed in a prologue: and epilogue: section.
https://github.com/mratsim/weave#parallel-for-staged
proc sumReduce(n: int): int =
let res = result.addr # For mutation we need to capture the address.
parallelForStaged i in 0 .. n:
captures: {res}
awaitable: iLoop
prologue:
var localSum = 0
loop:
localSum += i
epilogue:
echo "Thread ", getThreadID(Weave), ": localsum = ", localSum
res[].atomicInc(localSum)
let wasLastThread = sync(iLoop)
init(Weave)
let sum1M = sumReduce(1000000)
echo "Sum reduce(0..1000000): ", sum1M
doAssert sum1M == 500_000_500_000
exit(Weave)
Alternatively, you allocate a heap array with your "thread-local" contexts and spawn a task per contexts.