nimforum mirror - Threads dying, memory issues?

jasonfi (orginal) [2022-08-03T20:29:30+02:00] view original

I've got a multi-threaded program, with threads communicating via channels. Unfortunately, over time, the thread count decreases. I know the threads die because with sometimes another thread will raise an exception with a "thread died" message when they try to send a message over a channel. I also monitor the process's thread count and can see it gradually decreasing. The threads that do die seem to have random problems under ARC/ORC, but don't show any errors at all under refc.

When I run the program with Valgrind there are no problems reported, and threads don't seem to die, but it's very slow and doesn't use all cores. I read that Valgrind is a VM where there are fewer potential problems such as alignment issues.

Has anyone seen this before?

Araq (orginal) [2022-08-04T06:38:10+02:00] view original

No, but remember to compile with --mm:orc -d:useMalloc in order for Valgrind to be effective.

jasonfi (orginal) [2022-08-04T08:14:07+02:00] view original

Thanks, I've made sure I'm using --mm:orc -d:useMalloc but the issue persists without any information on why threads are dying, even with Valgrind.

On Windows I don't see the same issue. It seems this is Linux specific.

sekao (orginal) [2022-08-04T21:53:57+02:00] view original

You're using the new channels right? I run a server on linux using ORC and threads + channels and haven't seen any problems like that. How do you know they are memory-related problems? Are you getting any stack trace?

jasonfi (orginal) [2022-08-05T07:53:50+02:00] view original

No I'm not using the new channels. I've installed the threading Nimble package and will try it out though. I had heard of that before, but wasn't sure that it was ready for serious use, otherwise why are old channels still around? Are the docs outdated then, because when I search for Nim channels I don't see the new channels on the first page, but the old channels instead.

jasonfi (orginal) [2022-08-05T07:58:24+02:00] view original

There was no stack trace, although when I run Valgrind's drd there are some issues showing up. I'll retest after I finish making some other changes, and I'll try new channels too. Thanks.

sekao (orginal) [2022-08-05T10:52:51+02:00] view original

Yeah there's no documentation right now; I'm guessing the idea is to wait until ORC is the default in 2.0, but @Araq would know better. Anyway the API didn't change much. The one huge benefit is that they can be shared between threads; the old channels had to be declared as globals or manually allocated/deallocated due to the limitations of refc.

jasonfi (orginal) [2022-08-05T11:09:56+02:00] view original

I'm using channels 1:1 right now, channels shared between threads sounds good though, unless there's a performance hit (will have to benchmark).

If new channels don't fix the issue I'll work on a minimum program to reproduce the issue.

jasonfi (orginal) [2022-08-06T12:35:40+02:00] view original

I've updated my code to make sure globals aren't used. I now get an exceptions. I've actually seen similar exceptions before, except when I was preparing to write the initial post in this thread.

I didn't yet have stacktrace on for this one:


double free or corruption (fasttop)
Traceback (most recent call last)
/.choosenim/toolchains/nim-1.6.6/lib/system/seqs_v2.nim(114) myThread
/.choosenim/toolchains/nim-1.6.6/lib/system/arc.nim(164) nimRawDispose
SIGABRT: Abnormal termination.

This was one where I was iterating through an array of an object type:


/.choosenim/toolchains/nim-1.6.6/lib/system/orc.nim(494) nimDecRefIsLastCyclicStatic
/.choosenim/toolchains/nim-1.6.6/lib/system/orc.nim(466) rememberCycle
/.choosenim/toolchains/nim-1.6.6/lib/system/orc.nim(146) unregisterCycle
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Segmentation fault

jasonfi (orginal) [2022-08-06T18:25:49+02:00] view original

I've just fixed a memory issue caused by not initializing a table in an object with InitTable. Isn't there a way for the Nim compiler to check for such cases?

sekao (orginal) [2022-08-06T18:53:32+02:00] view original

That must be a bug... the docs say that tables are initialized by default.

jasonfi (orginal) [2022-08-07T12:34:29+02:00] view original

I think I've fixed the problem. The first/biggest problem was no errors from Valgrind. I had some code using a DB connection with a global, but this wasn't used by any threads I created so I didn't think it was a big deal. I fixed that and some other issues.

Then Valgrind started showing errors and I've fixed those. Now my program seems stable again, but I continue to test.

I'm actually not yet using the new channels, I'd rather only do that if I have to. But thanks for the advice.

jasonfi (orginal) [2022-08-08T15:09:19+02:00] view original

I think I've solved the original problem now. My user on the Linux instance had a very limited number of open file descriptors available as shown by ulimit -n. Increasing this limit to a higher number seems to fix it.

This also explains why the problem wasn't seen running the same program under Windows or on Linux with Valgrind, since both are different environments which presumably don't have the limitation my Linux user had.

planetis (orginal) [2022-08-09T10:25:34+02:00] view original

Btw I took a peek on the threading/channels implementation and it seems changed in comparison to the original nim and C code. For example ChannelCache is nowhere used. Also ThreadSanitizer reports a data-race when running the top example. So did the changes in the code resulted in unsafe code?

jasonfi (orginal) [2022-08-09T10:53:51+02:00] view original

I didn't use the new threading/channels. I'd prefer to stick with the stdlib that comes with Nim unless I really have to switch.

jasonfi (orginal) [2022-08-12T11:02:49+02:00] view original

It looks like the issue never went away. However using valgrind --tool=helgrind I found a data race related to Chronicles logging. Initially I thought Chronicles might be able to handle logging from multiple files internally, but to be safe I modified my code to log each thread's output to a separate log file. However the data race still occurred.

If anyone's interested, I've attached a minimal reproducible test case in the issue I logged.

https://github.com/status-im/nim-chronicles/issues/118

Mirror of forum.nim-lang.org

9346 :: Threads dying, memory issues?