nimforum mirror - Need debugging help

cdunn2001 (orginal) [2019-08-26T07:46:19+02:00] view original

This fairly simple program will fail, and I cannot find my bug:


git clone https://github.com/pb-cdunn/nim-help.git
cd nim-help
make
# nim c -r t_raptor_db.nim

(I am using the devel branch of Nim, but the problem happens on 0.20.2 also.)


Traceback (most recent call last)
t_raptor_db.nim(181) t_raptor_db
raptor_db.nim(271) get_length_cutoff
/raptor_db.nim(254) load_rdb
Nim/lib/system/assign.nim(111) genericAssign
Nim/lib/system/assign.nim(100) genericAssignAux
Nim/lib/system/assign.nim(24) genericAssignAux
Nim/lib/system/assign.nim(20) genericAssignAux
Nim/lib/system/assign.nim(54) genericAssignAux
Nim/lib/system/gc.nim(435) newObjNoInit
Nim/lib/system/gc.nim decRef
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

This is very discouraging. There is a GC deref problem with fairly simple code, and I am completely unable to debug it myself. It crashes in different places when I make very small changes, so it is clearly a memory bug. It might be from how I use sscanf, but I am clueless. I've added echo all over the place, but I have no idea where the problem originates.

I hoped I could debug this by using --newruntime and valgrind, but I am not able to build with newruntime.


raptor_db.nim(178, 38) Error: region needs to be an object type

(That's on the newruntime branch.)

I would be very excited to see "newruntime" working, precisely to help me with this sort of memory bug. But that's in the distant future.

If no-one can help me fix this, I might have to give up on Nim completely within my company.

cdunn2001 (orginal) [2019-08-26T07:57:03+02:00] view original

I've pushed a commit to master which completely drops all use of the unittest module. It still fails in GC, so it's not a result of any unittest-macro magic.

There must be something fundamental that I'm doing wrong.

napalu (orginal) [2019-08-26T08:24:20+02:00] view original

Hi cdunn

change your strlen wrapper in raptor_db.nim to

proc strlen(a: var Headroom): int =
    let n = strlen(cast[cstring](a[0].addr))
    echo "  calc'd:", n
    return n

You'll then get an assertion error in t_raptor_db.nim:

: unhandled exception: /data/Projects/nim-help/t_raptor_db.nim(156, 16) `"m1/3005/0_5852" == db.seqs[0].header`  [AssertionError]

cdunn2001 (orginal) [2019-08-26T16:00:25+02:00] view original

Switching to your wrapper makes no difference for me.

Yes, if you make small changes the code can crash in different places, though I haven't been able to reproduce your assertion failure.

Do you think you've the actual bug? Could you point me to it?

mashingan (orginal) [2019-08-26T16:36:33+02:00] view original

Can you try changing line 236 with var sr = SequenceRecord() ?

I'm not sure but maybe what you're doing is scope-escaping memory? That var sr: SequenceRecord only available in the stack and it got destroyed after you copy (or move?) to result.seqs.add(sr)

I cannot run it, currently don't have available linux box currently.

cdunn2001 (orginal) [2019-08-26T18:21:14+02:00] view original

-            var sr: SequenceRecord
+            var sr = SequenceRecord()

Shoot. That didn't make any difference.

I am not over-stating the problem here. If we cannot fix this, probably the entire Bioinformatics community will be forced to give up on Nim. (A colleague is already fed up with Nim multi-threading, though I've been willing simply to avoid the threadpool library.) I will still re-adopt Nim when newruntime is available, but I will have zero credibility on Nim with my colleagues. We need a solution for weird memory bugs like this, a simple way to discover them, even if it turns out to be programmer error. I am not a terrible coder.

Stefan_Salewski (orginal) [2019-08-26T18:37:40+02:00] view original

May this help:

https://github.com/nim-lang/Nim/wiki/Hunting-crashes:-The-ultimate-guide

cdunn2001 (orginal) [2019-08-26T19:55:25+02:00] view original

I haven't tried these yet:

-d:nimBurnFree

-d:corruption

editing lib/system/mmdisp.nim "Most of these have no --define equivalent, unfortunately."

cdunn2001 (orginal) [2019-08-26T22:11:15+02:00] view original

-d:nimBurnFree and -d:corruption have not helped so far.

cdunn2001 (orginal) [2019-08-26T22:14:37+02:00] view original

I thought I'd try -d:useMalloc:


nim c --gc:none -d:useMalloc -d:useSysAssert -d:useGcAssert --debugger:native -d:nimBurnFree -d:corruption -r t_raptor_db.nim
/pbi/flash/cdunn/gh/Nim/lib/system/mmdisp.nim(443, 25) Error: pragmas are only allowed in the header of a proc; redefinition of 'deallocOsPages' from /pbi/flash/cdunn/gh/Nim/lib/system/threads.nim(111, 8)

cdunn2001 (orginal) [2019-08-27T01:21:12+02:00] view original

You're right that returning Db instead of ref Db "fixes" the problem, but is that really a solution? Returning a "ref" simply means that load_rdb() is really a Db constructor. That should work.

And no, an alias for DbRef = ref Db does _not fix it. Same for DB = ref object. Still crashes....

... Oh! Your clue helped me find the problem.

let scanned = sscanf(line.cstring, v_frmt.cstring,
                addr result.version)

"version" there is a string, not a cstring, so this is over-writing the ref-counting data.

I really, really want the "newruntime". This kind of mistake is way too easy.

dom96 (orginal) [2019-08-27T02:23:22+02:00] view original

I really, really want the "newruntime". This kind of mistake is way too easy.

This mistake is easy, but you really should reconsider mixing such low-level code with your high-level scripts. Any use of addr, ptr, cast should be very deliberate, the analogy to Rust's unsafe is very apt here. I'm sure that even the "newruntime" will have escape hatches which will allow you to ignore the compiler, this is what you're effectively doing here.

A good solution here would be to write a Nim native sscanf or just use the stdlib's parseutils or strscans.

Vindaar (orginal) [2019-08-27T02:24:43+02:00] view original

I wasn't sure whether you actually fixed your code with that post now, but I was already looking at your code when I saw the post, so I continued.

I fixed the code differently, by just removing the c types you used. I trust that the test case works, because I'm not sure if I broke something. :)

https://github.com/pb-cdunn/nim-help/pull/2

cdunn2001 (orginal) [2019-08-27T03:03:43+02:00] view original

Yes, the suggestion to use parseutils and strscans is not a bad one, and thanks for the code example. But I did consider both of those before proceeding.

First, I couldn't figure out how parseBiggestInt works. It is very important to specify the size of integer. I don't want a machine-dependent solution. It drives me crazy that I cannot specify the size that the input format provides.

Second, I saw that strscans provides a weird $w, which is just the wrong thing. I want any non-whitespace. There is an inversion matcher, but I wasn't sure how to use it. I couldn't figure out the "user-definable matchers". Basically, I just had much more confidence in the API for sscanf than for strscans. Maybe examples would help.

+({'a'..'z', 'A'..'Z', '_', '0'..'9', '/'} -> buf0.add($_)),

I would never have guessed that. As Vindaar commented in his code, "this is kinda ugly". Yes it is.

Third, I have trouble telling when memory allocations will occur. That's a big problem with Nim strings around to the parsers. Just look at the disaster in streams.FileStream, which is incredibly slow. (And don't get me started on threadpool). I need to know what is going on at a pretty low level.

Finally, on avoiding low-level code, I think people in this forum misunderstand the niche for Nim. There are tons of good languages these days. I need something specifically for:

rapid prototyping

easy glue logic

speedy interfacing with C/C++

Python is better at (1). Python is almost as good at (2), but I want a language with static-typing and fast compilation, instead of an interpreted language. (3) is where Python fails. Static-typing is absolutely required for C interfaces because so many things can go wrong.

I included in my broken code only the minimum needed for debugging. It's actually a lot of code, growing all the time, and calling various C and C++ libraries. That's not even up for discussion. Without the C/C++ interfacing, I would use a different language. And what I'm saying is that I still need to be able to debug. I will have low-level bugs in my code. I need ways to detect them. The "newruntime" seems to be critical for that. Araq is right.

cdunn2001 (orginal) [2019-08-27T03:10:17+02:00] view original

Oh, I should also mention that I am parsing GBs of data. In bioinformatics, we deal with TBs.

On strscans, apparently $+ is the way to parse non-whitespace into a sting. That was not at all obvious to me. But even that feature is not enough. The failure to specify integer sizes is really a HUGE problem.

mratsim (orginal) [2019-08-27T06:43:16+02:00] view original

I agree, generic int parsing would be a very nice addition to strutils/parseutils.

Somehow binary, hex and octal got generic int parsing before plain number stored in strings: https://github.com/nim-lang/Nim/pull/11107/files.

And haha good catch on sscanf, didn't check that one after strlen.

Unfortunately I don't see how Nim newruntime can help detecting buffer overflow.

One thing that can be done is that when sysAssert and/or gcAssert flag are defined, all seq/string/ref get an additional field canary field, initialized with a pattern say 0x42424242, and the pattern is checked regularly to make sure no wild C proc or unsafe cast/addr usage overwrote it.

cdunn2001 (orginal) [2019-08-27T17:44:31+02:00] view original

Yes, a "canary field" would be a good debug-mode for reference-counting. That's a very good idea.

Buffer-overflow is hard to detect, but when you switch to malloc/free there is a very good chance that valgrind will notice something. Maybe that's wishful thinking on my part.

cdunn2001 (orginal) [2019-08-27T18:03:11+02:00] view original

Anyway, your help saved the day. I'm expanding our use of Nim internally.

I still have a lot of concerns about performance for simple things, but Nim has so much upside that I think we'll be ok.

mratsim (orginal) [2019-08-27T18:26:03+02:00] view original

As shown in many other threads, you can get the same speed in Nim as in C or Assembly, especially for scientific computing:

https://forum.nim-lang.org/t/5124

my experiments for Project Picasso: https://github.com/mratsim/weave/tree/master/benchmarks/fibonacci (https://forum.nim-lang.org/t/5083).

Matrix Multiplication in pure Nim vs expert-tuned Assembly libraries from Intel and others: https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/benchmarks/gemm/gemm_bench_float32.nim#L418-L432

Unfortunately, it is also as slow as C if you use strutils for naive string processing due to plenty of intermediate allocations, and the code that tends to loop over and over on the same string.

A way to fix that would be to have a library in the spirit of zero-functional but specialized for strings. Temporary immutable allocations could use alloca to avoid stressing the GC (or newruntime could do that optimization).

I see multiple potential ways to implement that in an efficient and extensible manner if someone wants to explore it:

via a backend similar to D ranges or C++20 ranges, see proof of concept by @timotheecour: https://github.com/timotheecour/vitanim/tree/master/drange

via a state machine, similar to Nim inline iterators or Rust inline iterators. This involves lots of compile-time machinery.

in a functional way:
- transducers: http://sixty-north.com/blog/deriving-transducers-from-first-principles.html
- catamorphisms: https://medium.com/@olxc/catamorphisms-and-f-algebras-b4e91380d134
- others (?)

Mirror of forum.nim-lang.org

5130 :: Need debugging help