Hi,
I am in the process of creating Nim bindings for the C++ cheminformatics toolkit RDKit. I have tried this several times before but for the first time I am making some progress. But now Valgrind is reporting a memory leak for a function that is calling a wrapped function which in turn allocates a std::vector. I have tried to call the vector destructor manually, but that does not seem to help.
Does someone have any ideas how to solve this?
Many thanks in advance.
Kind regards, Axel
When you call it from C++ valgrind is happy?
And you are using --gc:arc ?
Have you tested storing Nim objects in std::vector before and that works fine?
I can not really help you, as I do know not much about C++ wrapping. But there was a thread about std::vector mapping recently, and the core devs may help you...
Hi Stefan,
thanks for the quick reaction.
When you call it from C++ valgrind is happy?
Not tried. But the C++ code should be fine.
And you are using --gc:arc ?
Valgrind complains both with the regular GC and with --gc:arc.
Have you tested storing Nim objects in std::vector before and that works fine?
No, not tested.
The irony I am facing here is really that the reason I want to create bindings in Nim is that I do not know C++ and can not program in it. But it seems, in order to bind a C++ project, I would need to learn so much about the language, that I could as well continue programming in it.
Creating bindings can be more difficult than programming in the target language. I can program in C and a bit in C++. But creating Ruby bindings for C/C++ was not easy. Low level C bindings can be generated with tools like c2nim, nimterop and similar, high level bindings are not that easy still.
Araq created the C++ wxWidgets bindings, so it seems to be possible. Maybe when your target lib is not too big you can hire someone? For large C++ libs like Qt, boost, cgal with many templates it may be a fulltime job for many years unfortunately.
Destructor in C++ is a function that called automatically and usually you don't need to call it explicitly. If an object is a local variable, destructor is called when returning from the function. If an object is allocated with new operator, destructor is called when you use delete operator to it.
In https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/SmilesParse/SmilesParse.h
RDKIT_SMILESPARSE_EXPORT RWMol *SmilesToMol(const std::string &smi,
const SmilesParserParams ¶ms);
In https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/RWMol.h
class RDKIT_GRAPHMOL_EXPORT RWMol : public ROMol {
In https://github.com/apahl/rdkit_nim/blob/master/src/rdkit/mol.nim
type
## The base Mol object
Mol* = ref object
obj*: ptr ROMol
Mol.obj take pointer to ROMol returned from SmilesToMol. I guess SmilesToMol allocate RWMol with new operator and it is supposed to be freed with delete operator. But delete operator is not used in your example code. https://github.com/apahl/rdkit_nim/blob/master/examples/ex03_mem.nim Calling destructor of RWMol cleanup resources it has but it doen't free the heap memory of RWMol.
inline std::unique_ptr<RDKit::RWMol> operator"" _smiles(const char *text,
size_t len) {
This operator call SmilesToMol and return RWMol wrapped with std::unique_ptr. So you don't need to use delete operator or anything to free pointer to RWMol but it automatically freed. But I don't know nim can work with operator"" or std::unique_ptr.
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Substruct/SubstructMatch.cpp#L286
std::vector<MatchVectType> SubstructMatch(
const ROMol &mol, const ROMol &query,
const SubstructMatchParameters ¶ms) {
SubstructMatch returns std::vector and it is automatically freed when exiting createLotsOfMolecules proc. You don't need to call destructor of std::vector.Thanks for your detailed answer.
SubstructMatch returns std::vector and it is automatically freed when exiting createLotsOfMolecules proc. You don't need to call destructor of std::vector.
Yes, that was also my understanding. But it is exactly this call that creates the memory leak. If I comment it out, Valgrind stops complaining.
In any case unless I missed something, substructMatches returns a seq of seq not a std::Vector, furthermore you also have Mol ref objects created in the procedures:
type
## The base Mol object
Mol* = ref object
obj*: ptr ROMol
https://github.com/apahl/rdkit_nim/blob/d4bbeee6d05329d1217bfe73e4c5ffa63086d2ac/src/rdkit/sss.nim#L17-L26
proc substructMatches*(this; query: Mol): seq[seq[uint]] =
# returns atom indices in the mol that match the query as seq of seq.
let matches = rdkitSubstructMatch(this.obj[], query.obj[])
if matches.isEmpty:
return result
for match in matches:
var m: seq[uint]
for pair in match:
m.add(uint(pair.second))
result.add(m)
Unless you use --gc:arc ref objects and seq are not deterministically destroyed on function exit but when the GC decides it's a good time (often on a new allocation). At the end of the function, there is no more opportunities for the GC to collect that memory so it shows up as used on program exit in Valgrind.
If you want to use Valgrind, you need to "disable" Nim memory management by passing -d:useMalloc so that instead of keeping memory around, Nim GC directly calls malloc/free.
Hi Mamy (I love Arraymancer),
I tried to debug but unfortunately RDKit seems like a nightmare to install :/
I agree, the easiest way (which I also use) is to install via anaconda, the Python virtual environment management system.
[...] substructMatches returns a seq of seq not a std::Vector
True, but the RDKit function it wraps, returns a <std::vector<std::vector<std:Pair>>>
[...] you need to "disable" Nim memory management by passing -d:useMalloc [...]
When I use that option for compiling, Valgrind still complains, but thanks a lot for your comments in general, they give me some directions to investigate.
Thanks a lot.
KR Axel
I haven't looked too closely at your code but it looks to be using pattern matching which is likely calling re. See #12952.
Jason
It seems to me that the Valgrind output in the issue you are referring to is not necessarily a memory leak. According to the documentation and to entries on StackOverflow "Still reachable" can have other, mostly harmless reasons.
Also, in my case the pattern matching happens in the C++ library, so the calls to re would be in the C++ code, I do not call Nim's re module.
So, I now ran this code:
import ../src/rdkit / [mol, parsers, sss]
proc createLotsOfMolecules() =
for i in 1 .. 1000000:
let
smi = "c1ccccc1C(=O)NC2CC2" # cyclopropyl benzamide
m = smilesToMol(smi)
q = smilesToMol("C1CC1")
if m.ok:
let
na = m.numAtoms
matches = m.substructMatches(q)
when isMainModule:
echo getOccupiedMem()
createLotsOfMolecules()
# GC_fullCollect()
echo getOccupiedMem()
which creates 1 million molecules and checked getOccupiedMem() before and after:
61440 61864
Although the numbers are not identical, they are reasonably close together (don't know, if that counts), and the difference does not increase with the number of created molecule objects.
So, maybe the code does not leak after all....?