Good day
I'm trying to implement a very light object database in Nim and found some problem: Is it possible to introduce a new custom pointer type into the language?
I need of long 64-bit pointer to address mmaped data spaces into the process memory relative to the mmap base address. The high byte must be masked and hold index in some TBASE base addresses table, and the rest of the pointer must be an offset in the corresponding mmap area.
While I'm working with a single memory area allocated via libpmem, and work with bytecode interpreter and its own virtual machine memory, there are no problems. But what if I want to work with such mmaped virtual memory directly from the Nim code? The biggest problem I see for the first look is I must reimplement all the garbage collection and data structures from the scratch.
I might be wrong but I think you're thinking too complicated 😀
Accessing non traced data is pretty straightforward in Nim and it's done all the time when interacting with external C/C++ code.
See: https://nim-lang.org/docs/manual.html#types-reference-and-pointer-types (and especially the following section „Mixing GC'ed memory with ptr“).
When you're doing pointer arithmetic you might want to use ByteAddress and ptr UncheckedArray[sometype], as it's not possible to add to/subtract from ptr T or pointer.
I don't want to access it -- I want to use Nim as a persistence-enabled language, with all its power of dynamic structures (mapped to SSD or NVDIMM).
But I don't think it is possible -- I must implement not only custom GC but integrate these "offset/based pointers" into the Nim core or at least it's stdlib, which looks for me too complicated.
I see. Hm that complicates things. Idk how practical that would be, but how about something like this:
type
PMemPtr[T] = distinct int
var pmemHeapStart: ByteAddress
proc `[]`[T](pmemPtr: PMemPtr[T]) =
cast[ptr T](cast[ByteAddress](pmemPtr) + pmemHeapStart)[]
proc test(a: PMemPtr[SomeType]) =
a[].field = 42
Why is it important: technology already and some years ago provides the non-volatile RAM, which saves its state after the full power down (Intel Optane, HPE NMDIMM's). What is much more important that modern OSes a long time ago provide the ability to extend any program address space via mmap to arbitrary size storages, especially fast enough SSD disks.
But: there is still no any programming language besides GemStone/Smalltalk which gives this power with 10Tb vRAM to any generic developer. I intently look at the Nim in this role, because it gives fast low-level code at final, but for the programmer it looks as friendly as Python.
I think the best you can do right now is map your memory in a known address (this is nearly impossible in 32-bits, but address space in 64-bit is vast enough to declare something "yours"). There might be collisions if someone else wants to do the same, of course ....
Using stdlib data structures in persisted memory is a huge headache, which is likely impossible to reasonably retrofit to an existing language/runtime. You somehow have to make sure that no assignment into any item's field points to outside the mmapped area.
I remember attempts from 20-30 years ago or so, when object databases were all the rage; I think it was called "pointer swizzling" back then, and was handled in a language agnostic way using page faults on Unix (NT was at 3.0 or 3.5 and wasn't yet considered a serious OS at the time). The conclusions back then -- and I suspect that's still true -- is that mmapped/page-faulted memory for persisted data is great for reading -- but writing requires discipline that cannot be automated, and needs its own API. This is also the approach adopted by LMDB / MDBX (in a different setting).