Hey, I've been working on narrow, Nim bindings for Apache Arrow's columnar format and compute engine. It wraps the Arrow GLib C API.
Repo: https://github.com/BontaVlad/narrow
Columnar data structures (arrays, tables, record batches), I/O (Parquet, CSV, Feather/IPC, JSON), and a compute layer (filtering, sorting, aggregations, expression DSL, Acero engine). Memory safety via ARC/ORC + GObject refcounting.
Really early stages but usable. Tests pass under ASAN, there's a cookbook, core APIs work. For the brave who want to test stuff, expect rough edges and missing pieces.
Started as a learning exercise, digging into Arrow's internals and Nim's FFI story. Half manual, half AI-assisted. Grew into something that might be useful to others.
All feedback welcome: bugs, API ergonomics, missing features. The issue tracker is open.
# Ubuntu
sudo apt install libarrow-glib-dev libparquet-glib-dev libarrow-dataset-glib-dev pkg-config
nimble install narrowimport narrow
let table = newArrowTable(@[
(name: "alice", age: 30),
(name: "bob", age: 25),
(name: "carol", age: 35),
])
echo table.sortBy([("age", Ascending)])This is my first ever published package in any language. The documentation is in its early stages, the cookbook as well, this will probably improve. I'm a bit burned out on this project, and my hope is that posting this will give me some momentum to continue.
Docs: API reference · Cookbook
Arrow GLib C API
My advice: See that the core data structures and serialization code is all in native Nim, avoid glibc and it's complex slow reference counted object system. (Yes, Nim's RC is faster because it is built on move semantics which C does not have.)
Nice addition to the ecosystem, thanks for sharing and congrats on your first package!
I think this could now be the go library for parquet I/O and having also access to the the query engine seems to cover at least partially for dataframe like needs. And I guess arrow query engine makes it likely the top performing we have in terms of data processing (I imagine it might beat datamancer on some tasks?).
I do not think memory management perf issues should matter in this context (data workload usually dominate). Of course a Nim native version would be nicer but as a pragmatic step this is definitely a good thing.
We have not been updating much recently the scinim book but this definitely belongs there https://scinim.github.io/getting-started/overview/index.html