I made a quick a (very) dirty piece of code that might be useful to others. It is in python and it uses clang. It seems to handle big projects like OpenCascade (more than 8000 header files) or OpenSceneGraph (215 header files).
It creates lots of files but it doesn't mean it works. They might be a good starting point.
I made it because I had some issues to make it work nimterop with c2nim (probably my bad).
You can find it here.
I hope it helps.
Clang is IMO indeed the correct way to go, it's guaranteed to understand C and C++ unlike treesitter used in nimterop which has difficulty with C++ as used in the wild in large projects.
I'd be very intersted in seeing this project translated in nim from python, but still using (lib)clang as the engine. Using clang python bindings instead of clang C++ bindings is potentially easier and the nim-python bindings can be done via https://github.com/yglukhov/nimpy. Even though the clang bindings are using a 2-route way with C++ => python => nim, I suspect this may be the easiest way.
IMHO new tools won't cut it. If you want something better than c2nim, evolve c2nim further by giving it a parser based on libclang. Same applies for nimterop.
You simply won't get far with code generation via
nimcode = "{} = proc( {} ):output".format(name,inputs, output)
Parsing C++ is the one thing, it's great that libclang can do it much better than c2nim. But then you also need to produce decent Nim code and Python offers no library for that...
Very interesting.
I'll try to use it to kickstart my libtorch bindings effort at https://github.com/SciNim/minitorch/issues/1.
If you want to see what I need to deal with
git clone https://github.com/SciNim/minitorch
cd minitorch
nim c -r --outdir:build minitorch/torch_installer.nim
This will compile and run the torch installer which downloads this file: https://pytorch.org/get-started/locally/ (for Linux: https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.7.0.zip) that contains a DLL + lots of headers.
My code is crappy, believe me. Nothing to do with nimterop or c2nim. No aspirations here to be a contender.
The results are crappy as well. Let me show some examples:
proc ` new`*(this: var gp_Quaternion, theSize: cint) {.importcpp: "` new`".}
or
proc `+`*(this: gp_Mat2d, Other: gp_Mat2d): gp_Mat2d {.importcpp: "`+`".}
or
gp_TrsfNLerp* {.include: "gp_TrsfNLerp.hxx", importcpp: "gp_TrsfNLerp".} = NCollection_Lerp<gp_Trsf>
or
Handle_gp_VectorWithNullMagnitude* {.include: "gp_VectorWithNullMagnitude.hxx", importcpp: "Handle_gp_VectorWithNullMagnitude".} = opencascade::handle<gp_VectorWithNullMagnitude>
It is not managing enums, structs, functions (just methods and constructors) either.
I hope this serves to manage the expectations! ;oP
I agree that it would be better to improve nimterop or c2nim.
For a noob like me, with very little C++ knowledge, it is difficult just to make a first shot at a library with c2nim (you need to clean the heaader until it is able to digest it). I think nimterop is more tolerant, but it is not trivial to use. I didn't manage to make it work both together.
This tool is sufficient for the purpose I intent to use it (just a few tests with OpenCascade whose API is seem to follow a similar pattern accross many headers).
I tried with libtorch. I just downloded libtorch-shared-with-deps-1.7.0.zip and execute it over the first folder: ATen:
python cpp2nim.py "libtorch/include/ATen/*.h" ATen
It created only 19 files out of 79 files. I think this is because most of the files probably doesn't contain a class (just functions).
For instance, for CUDAGeneratorImpl.h it created.:
{.push header: "CUDAGeneratorImpl.h".}
# Constructors and methods
proc constructor_CUDAGeneratorImpl*(device_index: c10::DeviceIndex): CUDAGeneratorImpl {.constructor,importcpp: "CUDAGeneratorImpl(@)".}
proc clone*(this: CUDAGeneratorImpl): std::shared_ptr<CUDAGeneratorImpl> {.importcpp: "clone".}
proc set_current_seed*(this: var CUDAGeneratorImpl, seed: uint64_t) {.importcpp: "set_current_seed".}
proc current_seed*(this: CUDAGeneratorImpl): uint64_t {.importcpp: "current_seed".}
proc seed*(this: var CUDAGeneratorImpl): uint64_t {.importcpp: "seed".}
proc set_philox_offset_per_thread*(this: var CUDAGeneratorImpl, offset: uint64_t) {.importcpp: "set_philox_offset_per_thread".}
proc philox_offset_per_thread*(this: var CUDAGeneratorImpl): uint64_t {.importcpp: "philox_offset_per_thread".}
proc philox_engine_inputs*(this: var CUDAGeneratorImpl, increment: uint64_t): std::pair<uint64_t, uint64_t> {.importcpp: "philox_engine_inputs".}
proc device_type*(this: var CUDAGeneratorImpl): c10::DeviceType {.importcpp: "device_type".}
proc clone_impl*(this: CUDAGeneratorImpl): at::CUDAGeneratorImpl * {.importcpp: "clone_impl".}
{.pop.} # header: "CUDAGeneratorImpl.h
where there are obvious issues (that maybe are not that difficult to fix).
One good thing is that cpp2nim is right now less than 500 lines of actual code, so it shouldn't be that hard to fix.
Both c2nim and nimterop use the Nim compiler's renderer.nim to convert an AST into Nim code so there's no difference there. c2nim parses C++ itself, whereas nimterop uses tree-sitter but both generate Nim AST.
My rationale for using tree-sitter and relying on the C preprocessor is documented in the nimterop README. The current implementation is quite effective for C in spite of the compromises.
I don't think libclang makes life any easier or perfect - you need to generate valid AST for all C/C++ variances as well as preprocessor magic into Nim logic. That Nim logic needs to map all #define values related to the platform, os and compiler into Nim equivalents. You cannot use the nimterop shortcut of deferring to the preprocessor since not everyone uses Clang. You might need to write some compile time logic to discover values that Nim does not cover. Given the diversity of platform/os/compiler combinations, I don't envy that job.
Note that nimterop does not wrap C++. My goal was to experiment with a different approach to the problem and I think that has been achieved for C. However, I'm yet to see any real acceptance within the community so working on C++ is not yet worth the time investment.
I've tried on libtorch as well, here is the result (unedited): https://github.com/SciNim/minitorch/tree/master/minitorch/bindings
The command I used (the other folder are bonus, I don't understand why they hid the high level API torch.h in a such a long path):
python build/cpp2nim.py "minitorch/libtorch/include/torch/csrc/api/include/torch/**/*.h" bindings
The **/* globbing was important to get all files.
So the code is producing - for some filename which will prevent imports, and some :: ends up in parameter types. https://github.com/SciNim/minitorch/blob/8c443ca/minitorch/bindings/expanding_array.nim#L5
# Constructors and methods
proc constructor_ExpandingArray<D, T>*(list: std::initializer_list<T>): ExpandingArray {.constructor,importcpp: "ExpandingArray<D, T>(@)".}
## Constructs an `ExpandingArray` from an `initializer_list`. The extent
## of the length is checked against the `ExpandingArray`'s extent
## parameter `D` at runtime.
That said, it's enough so that I can kickstart my bindings efforts which is my main goal though it's not enough if I want to keep up with upstream changes (but I may be able to diff dump header changes to ease that).