nimforum mirror - Nim thread memory handling in threads with and without orc

marcomq (orginal) [2022-07-17T21:07:05+02:00] view original

Hi all!

So - I always wondered if I understood the memory management in nim correctly. I had some surprising moments in the past and I was never sure how to access variables, when using threads. And I probably wrote a lot of bad code or used unnecessary structures. The documentation said that Nim threads had its own heap but I had usually no idea how to handle each heap or how to create workarounds. The new memory model "arc/orc" also didn't simplify it, as it just documented to work different, faster with a shared heap, but trying to use arc/orc sometimes resulted in strange crashes that I didn't had before... So I created some sample code - just to try out - and I wanted to share it with you.

Just some early observations - correct me if I'm wrong:

when using the standard mm
- every thread is using its own heap, mostly means that objects are copied when using "ref". You still can easily access objects when using ptr references but should then use GC_ref and GC_unref to control the object lifetime.
- the destructor is called later when the gc is involved

arc/orc
- no deep copying of objects when using "ref", but accessing shared heap objects as "ref" isn't safe if the scope of the object is left before the thread stops
- destructor of objects is called early when leaving the scope - similar as it happens for normal stack objects. GC is just used in orc to free cyclic objects

And here is the sample code

import std/os
import std/strformat

type
  TestObj* = object
    name*: string

proc getAddr(p: TestObj | pointer): string =
  let intVal = cast[uint64](p.unsafeaddr)
  return $intVal # 0x & $(intVal.toHex()) causes early crash

proc getAddr(p: ref TestObj | ptr TestObj ): string =
  let intVal = cast[uint64](p[].unsafeaddr)
  return $intVal

proc `=destroy`*(x: var TestObj) =
  echo fmt"destroying '{x.name}' {getAddr(x.unsafeaddr)} ({getAddr(x)})"

proc newObj(name: string): ref TestObj =
  new result
  result.name = name
  echo fmt"creating new obj '{result.name}' {getAddr(result)} ({getAddr(result[])})"

when compileOption("threads"):
  proc threadFunc[T](someObj: T) =
      echo fmt"starting thread with '{someObj.name}' "
      sleep(1_000)
      echo fmt"Accessing '{someObj.name}' {getAddr(someObj)} from thread"

proc main() =
  echo "x will not be deleted and can be accessed safely as pointer"
  echo "y is a ref and could be traced"
  # echo "z is referenced as ptr and access is therefore dangerous"
  echo ""
  let x = newObj("x")
  GC_ref(x) # avoid deletion of x
  let y = newObj("y")
  # let z = newObj("z")
  when compileOption("threads"):
    var thread1: Thread[ptr TestObj]
    var thread2: Thread[ref TestObj]
    var thread3: Thread[ptr TestObj]
    createThread(thread1, threadFunc[ptr TestObj], x[].addr)
    createThread(thread2, threadFunc[ref TestObj], y)
    # createThread(thread3, threadFunc[ptr TestObj], z[].addr)
    sleep(200)
  echo "end of main"
  # GC_unref(x)

when isMainModule:
  main()
  echo "all scope objects destroyed"
  GC_fullCollect()
  sleep(3_000) # wait for threads to finish - not using thread join

output of nim c -r -d:release --threads:on --gc:orc .\gcref.nim (causes access violation on linux):


x will not be deleted and can be accessed safely as pointer
y is a ref and could be traced

creating new obj 'x' 10485840 (6552720)
creating new obj 'y' 10485872 (6552720)
starting thread with 'x'
starting thread with 'y'
end of main
destroying 'y' 10485872 (6552736)
all scope objects destroyed
Accessing 'x' 10485840 from thread
Accessing '' 10485872 from thread

output of nim c -r -d:release --threads:on .\gcref.nim (no access violation, as referenced object is different):


x will not be deleted and can be accessed safely as pointer
y is a ref and could be traced

creating new obj 'x' 10481744 (6552464)
creating new obj 'y' 10481776 (6552464)
starting thread with 'x'
starting thread with 'y'
end of main
all scope objects destroyed
destroying 'y' 10481776 (6552656)
Accessing 'y' 17367120 from thread
Accessing 'x' 10481744 from thread

juancarlospaco (orginal) [2022-07-18T01:24:50+02:00] view original

This should be cleaned and added to the documentation IMHO.

mratsim (orginal) [2022-07-18T04:41:19+02:00] view original

The rule is the same as other multithreaded languages without a multithreading-aware GC:

_Ensure the lifetime of whatever you access

From this you can derive a couple rules:

Stack objects can be accessed only if allocated in the current thread or an ancestor thread that survives the current thread.

Heap objects can only be accessed until they are collected.
- Similar to stack: ref objects can be used if, thanks to blocking/awaiting there is a guarantee that the multithreaded section ends before ref object is collected by the ancestor thread.
- Or manual alloc/free should be used
- Or you use atomic refcounting: https://github.com/nim-lang/threading/blob/49562fa0/threading/smartptrs.nim#L77-L132

Now the old and the new memory management (deferred refcounting vs arc - automatic refcounting) does not change those accessing principles but significantly improve sharing.

In many cases it is significantly more efficient, maintainable and debuggable to have an unique owner of an object or data and dispatch all transformation steps of this object to several procs or services. Those may or may not be on separate threads. This is called a producer/consumer architecture (or actor model in some case or microservice if done at whole machine scale for some reason) The main advantages are:

within that proc or service, everything is single threaded and can be coded as usual.

no thread contention

backpressure-handling via queues at the input and output of those services

spinning up more threads with a copy of the proc/service can help deal with full queues

However, there is no more notion of ancestor thread that can wait for its child thread to stop processing.

What's the issue? you need to pass the data from one thread to the other. Actually you don't have to do this, you can be much faster by passing "ownership" so the pointer (handle) to the data. Still if behind there is some GC-ed memory, it needs to be collected once done and no ancestor thread can be relied on. Hence in the old memory management model:

either it needed to somehow be returned to the original thread as only it was able to collect the GC-ed data.

or data needed to be deep-copied in queues, possibly twice: Producer -> channel -> consumer

This is not the case anymore with ARC (or Boehm or if the data doesn't use GC-ed memory)

marcomq (orginal) [2022-07-18T06:23:00+02:00] view original

Thx for the clarification. Yes - there actually is no real issue here. I just assumed somehow that arc/orc also passes the ownership somehow to the new thread, so it doesn't get deleted if the thread survives. I was thinking it may work like ref counting, similar to shared_ptr on c++. Or to maybe get some warning, as I usually got a lot of warnings for async when using stack variables. I just wanted to create some simple use case for multi threaded memory and wanted to show it, as you need to take care of different things with arc and without it.

marcomq (orginal) [2022-07-18T07:07:54+02:00] view original

there is a guarantee that the multithreaded section ends before ref object is collected by the ancestor thread.

I assumed that too, but this doesn't seem to be correct when using arc/orc. The thread isn't preventing the ref from beeing collected. My application crashes when I use arc/orc. Currently, it seems that I need to manually prevent the ref from beeing collected.

mratsim (orginal) [2022-07-18T09:53:46+02:00] view original

also passes the ownership somehow to the new thread, so it doesn't get deleted if the thread survives.

Use channels, not pointers

The thread isn't preventing the ref from beeing collected. My application crashes when I use arc/orc.

Use channels + ref object or atomic refcounting not ARC.

marcomq (orginal) [2022-07-18T10:29:37+02:00] view original

Use channels, not pointers

This isn't about feeding incoming data to threads. I agree that new data that is relevant for mutlithreading should be passed via channels. But this is for example about general cfg that is read from a file before any thread starts and which doesn't change - or actually the channel itself as ptr. Even the doc recommends to pass Channels as ptr to share them between threads. Channels cannot be passed between threads. Use globals or pass them by ptr

Use channels + ref object or atomic refcounting not ARC

So - you are telling me to not use arc when using threads? I thought that arc could become the next default gc.

Would it make sense to create some ref counted datatype that can be shared more easily in threads, similar to shared_ptr in c++? Not sure if this is a good idea, as even shared_ptr in c++ have their traps that require to use atomic_shared_ptrs in some situations. Just thinking out loud.

mratsim (orginal) [2022-07-18T11:00:28+02:00] view original

But this is for example about general cfg that is read from a file before any thread starts and which doesn't change

In that case it is indeed fine, the address is a global which is guaranteed to survive until the whole program exits.

So - you are telling me to not use arc when using threads? I thought that arc could become the next default gc.

No, ARC (ref object) solves sharing memory so that it can be collected by any thread. When you have shared ownership, you need an atomic reference counter or guarantees by design that the object won't be collected (ancestor threads are the owners and responsible for creation deletion) or fancier memory management techniques like hazard pointers or epoch-based reclamation or quiescent-based reclamation.

ARC (ref object) is still correct for any object that does not have joint ownership between 2 or more threads.

Would it make sense to create some ref counted datatype that can be shared more easily in threads, similar to shared_ptr in c++? Not sure if this is a good idea, as even shared_ptr in c++ have their traps that require to use atomic_shared_ptrs in some situations. Just thinking out loud.

C++ Shared pointers have the same issue as ARC, they are not threadsafe and only usable when at any point in time only a single thread can access and mutate those objects which is guaranteed if using Communicating Sequential Processes architecture (aka Producer-Consumer or Channel-based architecture).

A threadsafe shared smartpointer is mentioned in my very first reply

sls1005 (orginal) [2022-07-18T11:59:31+02:00] view original

In theory, you can use wasMoved to give up the ownership of y, then collect it from another thread (by calling =destroy).

Mirror of forum.nim-lang.org

9309 :: Nim thread memory handling in threads with and without orc