I'm writing this here as well as on the issue tracker (https://github.com/status-im/nim-chronos/issues/542) in the hopes that maybe somebody with more experience can see the problem.
So I'm playing around with a multithreading lib that has a while-loop going on another thread that has an async-event-loop and can receive and process messages. The goal is to have the simplest possible way of pushing work from thread A to thread B with thread B acting in a server-like manner and enabling using async like you would in e.g. JS where you don't have to care yourself when the async work gets done. It just happens "eventually". To do that I use ThreadSignalPtr to put the thread to sleep when there's nothing to do / do async work. However, in the constellation I'm using for that I'm seeing memory leaks when compiling with address sanitizer. I managed to condense that down into a simple program:
import std/[os]
import chronos
import chronos/threadsync
proc cleanupThread*() {.gcsafe, raises: [].}=
{.cast(gcsafe).}:
try:
when defined(gcOrc):
GC_fullCollect() # from orc.nim. Has no destructor.
except Exception as e:
echo "Exception during cleanup: ", e[]
proc processAsync() {.async.} =
await sleepAsync(1)
echo "Stuff"
proc threadProc(threadSignal: ThreadSignalPtr) {.thread.} =
asyncSpawn processAsync()
waitFor threadSignal.wait()
cleanupThread()
var thread: Thread[ThreadSignalPtr]
let signal = new(ThreadSignalPtr)[]
createThread(thread, threadProc, signal)
sleep(10)
discard signal.fireSync()
joinThread(thread)
discard signal.close()
Compiled with ` nim r -f --debugger:native --cc:clang -d:useMalloc --passc:"-fsanitize=address -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" --passl:"-fsanitize=address -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" -d:release src/playground.nim`
(For reference, the passc/passl flags are just to use address sanitizer and provide better stacktraces, this can also be used with gcc, I just prefer clang).
And it shows a ton of leaks, all relating to getThreadDispatcher in one way or another:
... way more stacktraces ...
Indirect leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x6493f5f086f1 in calloc (/home/isofruit/.cache/nim/playground_r/playground_CAED38D1B3F5F9549C0D683A5A5C9A6761A36907+0x1266f1) (BuildId: 45b58de06b1e762d7de2cf93447bf65a9b6087d7)
#1 0x6493f5f599eb in newSeqPayload /home/isofruit/.choosenim/toolchains/nim-2.0.4/lib/system/seqs_v2.nim:44:773
#2 0x6493f5f61dff in add::add(var<seq<ref<asyncengine::TimerCallbackcolonObjectType_>>>, sink<ref<asyncengine::TimerCallbackcolonObjectType_>>) /home/isofruit/.choosenim/toolchains/nim-2.0.4/lib/system/seqs_v2.nim:116:26
#3 0x6493f5f6ae9e in push::push(var<HeapQueue<ref<asyncengine::TimerCallbackcolonObjectType_>>>, sink<ref<asyncengine::TimerCallbackcolonObjectType_>>) /home/isofruit/.choosenim/toolchains/nim-2.0.4/lib/pure/collections/heapqueue.nim:137:2
#4 0x6493f5f85f59 in asyncengine::setTimer(timer::Moment, proc<pointer>, pointer) /home/isofruit/.nimble/pkgs2/chronos-4.0.2-5ed2d147ae4760a4ec6f01daf794df761b5229af/chronos/internal/asyncengine.nim:1110:2
#5 0x6493f5f9154f in asyncfutures::sleepAsync(timer::Duration) /home/isofruit/.nimble/pkgs2/chronos-4.0.2-5ed2d147ae4760a4ec6f01daf794df761b5229af/chronos/internal/asyncfutures.nim:1397:163
#6 0x6493f5f97a41 in asyncfutures::sleepAsync(int) /home/isofruit/.nimble/pkgs2/chronos-4.0.2-5ed2d147ae4760a4ec6f01daf794df761b5229af/chronos/internal/asyncfutures.nim:1403:107
#7 0x6493f5f97a41 in processAsync::processAsync(ref<futures::FutureBasecolonObjectType_>) /home/isofruit/dev/playground/src/playground.nim:14:27
#8 0x6493f5f91d47 in asyncfutures::futureContinue(ref<futures::FutureBasecolonObjectType_>) /home/isofruit/.nimble/pkgs2/chronos-4.0.2-5ed2d147ae4760a4ec6f01daf794df761b5229af/chronos/internal/asyncfutures.nim:382:95
#9 0x6493f5f98cde in playground::processAsync /home/isofruit/dev/playground/src/playground.nim:14:2
#10 0x6493f5f99465 in playground::threadProc(ptr<threadsync::ThreadSignal>) /home/isofruit/dev/playground/src/playground.nim:18:15
#11 0x6493f5f5e579 in threadProcWrapDispatch::threadProcWrapDispatch(ptr<Thread<ptr<threadsync::ThreadSignal>>>) /home/isofruit/.choosenim/toolchains/nim-2.0.4/lib/system/threadimpl.nim:69:2
#12 0x6493f5f5e579 in threadProcWrapStackFrame::threadProcWrapStackFrame(ptr<Thread<ptr<threadsync::ThreadSignal>>>) /home/isofruit/.choosenim/toolchains/nim-2.0.4/lib/system/threadimpl.nim:95:2
#13 0x6493f5f54de8 in threadProcWrapper::threadProcWrapper(pointer) /home/isofruit/.choosenim/toolchains/nim-2.0.4/lib/system/threadimpl.nim:101:2
#14 0x6493f5e3fa36 in asan_thread_start(void*) asan_interceptors.cpp.o
SUMMARY: AddressSanitizer: 19288 byte(s) leaked in 11 allocation(s).
Does anyone see where the leak is in this scenario and how to tackle it?
Not directly related to the leak, but the way you're using cleanupThread doesn't guarantee that the list with potential cycle roots is empty on thread termination. Consider the following:
type Cycle = ref object
x: Cycle
proc run() {.thread.} =
var x = Cycle()
x.x = x # create a cycle
discard x # make sure `x` is not moved
cleanupThread()
In the case shown above, the cycle collector is run before x goes out of scope. When x goes out of scope, the referenced cell will be registered as a potential cycle root (which allocates memory for ORC's root list), and you'll thus get a memory leak.
To make sure that cleanupThread is called after the thread's procedure exits but before thread destruction, you can use system.onThreadDestruction, like so:
proc run() {.thread.} =
onThreadDestruction(cleanupThread)
# `cleanupThread` is invoked even if `run` exits due to an exception or early return
...
Of course, it'd be better if Thread cleans up after itself instead.
First up, thanks a ton for making me aware of "onThreadDestruction", I was very much not aware and am a huge fan of more explicit hooks like that. I'll include that going forward for sure.
Next up, regarding the ThreadDispatcher: > Looking at the log you provided in the GitHub issue, the internal per-thread dispatcher (which is a ref object) is what causes the leak, as .threadvar's aren't destroyed on thread exit.
The problem here is that I seemingly can't clean this up properly. Trying to nil it will... okay I just tried that attempt out again for like the 5th time or so, and found the solution.
Where previously it lead to problems over and over again, I believe a combination of your hint (that cleanupThread must be in a different scope than the rest of the code) and some fiddling with the proc itself provided the solution.
Literally doing this change:
proc cleanupThread*() {.gcsafe, raises: [].}=
{.cast(gcsafe).}:
try:
`=destroy`(getThreadDispatcher())
when defined(gcOrc):
GC_fullCollect() # from orc.nim. Has no destructor.
except Exception as e:
echo "Exception during cleanup: ", e.repr
Fixed the leak problems I had.