Hello,
I have an application ported from Python that leaks memory. As a first step to confirm a memory leak, a call to GC_getStatistics() and mratsim's getrusage() procedures is made every 100,000 json messages processed by the application. The output shows an incremental increase in the process's max RSS of 1.7GB to 1.9GB for every 100K messages while the output from GC_getStatistics doesn't show significant increases. The last 3 updates before the application violates a 4 GB process memory limit set with prlimit are included at the end of this post.
I'm in need of some guidance on the most effective approach to isolate the source of the leak. Is digging in with a tool such as Memcheck the best approach? Other than working through systematically cutting down the application, are there any things that one should try when approaching a memory leak issue in nim?
The version of the nim compiler used is 1.5.1 with git hash: 220b55c5d7c40aa93df7879ebc6c9f71147c3eda on Unbuntu Linux 18.04.
GC_getStatistics():
[GC] total memory: 13606912
[GC] occupied memory: 7475072
[GC] stack scans: 3827446
[GC] stack cells: 185
[GC] cycle collections: 1006
[GC] max threshold: 9055712
[GC] zct capacity: 1024
[GC] max cycle table size: 0
[GC] max pause time [ms]: 0
[GC] max stack size: 6720
Incremental max RSS: 1661992
GC_getStatistics():
[GC] total memory: 13606912
[GC] occupied memory: 7907312
[GC] stack scans: 4041233
[GC] stack cells: 185
[GC] cycle collections: 1069
[GC] max threshold: 9064704
[GC] zct capacity: 1024
[GC] max cycle table size: 0
[GC] max pause time [ms]: 0
[GC] max stack size: 6720
Incremental max RSS: 1865964
GC_getStatistics():
[GC] total memory: 13606912
[GC] occupied memory: 6686992
[GC] stack scans: 4278141
[GC] stack cells: 185
[GC] cycle collections: 1132
[GC] max threshold: 9080896
[GC] zct capacity: 1024
[GC] max cycle table size: 0
[GC] max pause time [ms]: 0
[GC] max stack size: 6720
Incremental max RSS: 1868440
This is mostly useful to find logical leaks where data structures simply keep growing and since they stay alive, the GC cannot do anything about it.
We use -d:nimTypeNames together with https://github.com/status-im/nim-metrics to track memory usage over time, per type - it's quite useful for a first hint at what's wrong - here's a sample of what this looks like for a long-running process:
async in particular has a nasty habit of keeping memory around for longer than necessary since everything is copied into a closure that has a tendency to stick around for a bit - make sure not to end up with dangling futures that reference a large async tree of data - this is a problem specially when multiple await steps keep lots of data around even though some steps have already completed and in theory will no longer be touched.
FYI, I've had the best luck finding memory leaks using Valgrind (https://www.valgrind.org/).
Requires the use of -d:useMalloc and I suggest combining with ARC (if you can) or ORC.
Thanks for the suggestions.
I'm on the trail of the leak and cutting down the application to a small(er) testcase. The issue seems related to a proc call across modules on a ref object with embedded ref objects.
The modules dependency was a red herring. I managed to produce a trivial testcase that reproduces the leak when using nimble "decimal [0.0.2]" for numeric values:
while true:
var totStr = ""
for item in seqOfDecimals:
totStr &= $item
Does this seem to be an issue? If so, is the issue with the decimal package or with Nim?@araq:
Compile with --gc:orc...
Or --gc:arc and keep comparing using getOccupiedMem() in strategic places can work too; however, be aware that getOccupiedMem() doesn't currently work with --threads:on combined with Arc/Orc as per issue #18494, always reporting zero with these combinations.
The decimal maintainers surmise the leak is from not freeing the cstring returned by the underlying mpd_decimal library function mpd_to_sci().
This seems to be in the following code snippet:
proc `$`*(s: DecimalType): string =
## Convert DecimalType to string
result = $mpd_to_sci(s[],0)
Would the following modification be an appropriate fix?
proc `$`*(s: DecimalType): string =
## Convert DecimalType to string
let cs = mpd_to_sci(s[], 0)
result = $cs
c_free(cs)
The c_free() function is from https://github.com/nim-lang/Nim/blob/0c4582c66524d58a3c72827d9546a5c5f1c40286/lib/system/ansi_c.nim#L149
Another question is how to make the nim compiler aware of the file as compilation of the nimble package with the above modification fails to find the c_free() function.
Thanks! I had misinterpreted that system modules were auto-imported in nim.
The change does fix the memory leak when using the default refc gc.
For --gc:{orc|arc}, the app still leaks memory at a far faster rate than originally with --gc:refc.