I’ve finished adding basic networking to my “message queue” API, using the “asyncdispatch” facilities.
Now, the test module fails without a hint as to where or why. So I started adding traces in the code.
I seem to have located the crash into a method called "fillAndSendMsg()". The strange part is this: if I only have an "echo()" as the first line of the method, like this:
proc fillAndSendMsg[T](q: QueueID, m: ptr Msg[T], buffered: bool): void =
## Sends a message. m cannot be nil.
echo("fillAndSendMsg(…)")
# More code
I see this output:
Process 0 initialised.
Thread 0 initialised.
Topic 33 in Thread 0 initialised.
fillAndSendMsg(0.1.99, not-nil, false)
Error: execution of an external program failed: '../bin/test_kueues '
The terminal process terminated with exit code: 1
Terminal will be reused by tasks, press any key to close it.
OTOH, if I add a second trace to it, I don’t see the first trace anymore!
proc fillAndSendMsg[T](q: QueueID, m: ptr Msg[T], buffered: bool): void =
## Sends a message. m cannot be nil.
echo("fillAndSendMsg(…)")
# More code
echo("fillAndSendMsg(…): XXX")
# More code
Which produces this output:
Process 0 initialised.
Error: execution of an external program failed: '../bin/test_kueues '
The terminal process terminated with exit code: 1
Terminal will be reused by tasks, press any key to close it.
So, the following lines of output are now missing:
Thread 0 initialised.
Topic 33 in Thread 0 initialised.
fillAndSendMsg(0.1.99, not-nil, false)
In other word, the program crash before ever entering fillAndSendMsg(), but only if I add a line of code to fillAndSendMsg() (which doesn’t even ever get executed!) I tried commenting out and later re-adding the additional "echo()"; the behavior is reproducible. Commented-out second "echo()" behaves like when there is only one "echo()"
How do you deal with that?
I've run into things like this in the past and the reason for them usually is a stack corruption.
Unfortunately I don't have any magic ways to debug this. I can see that you're using raw pointers so it's likely that you're doing something unsafe incorrectly, so I would start by carefully looking at all usages of Nim's unsafe features such as 'ptr' and 'addr'.
Others may be able to give you more advice. I'm sure there are tools out there can help out more (and I would actually love to see a guide about them), maybe even gdb can help.
Thanks for the reply. I've made some progress. Compiling and running the same code multiple times doesn't always result in the same output. I now suspect it might be a "timing" thing; depending on how fast the program crashes, the stdout might be flushed, or not, to the terminal. Looks like I'll have find an alternative way to follow the code, or learn to debug Nim...
EDIT: OK, just checked the echo() doc, and it flushes stdout. I guess that makes memory/stack corruption the more likely suspect.