nimforum mirror - relative "based/biased" long pointers and data structures over it

dponyatov (orginal) [2021-03-03T07:33:48+01:00] view original

Good day

I'm trying to implement a very light object database in Nim and found some problem: Is it possible to introduce a new custom pointer type into the language?

I need of long 64-bit pointer to address mmaped data spaces into the process memory relative to the mmap base address. The high byte must be masked and hold index in some TBASE base addresses table, and the rest of the pointer must be an offset in the corresponding mmap area.

While I'm working with a single memory area allocated via libpmem, and work with bytecode interpreter and its own virtual machine memory, there are no problems. But what if I want to work with such mmaped virtual memory directly from the Nim code? The biggest problem I see for the first look is I must reimplement all the garbage collection and data structures from the scratch.

dponyatov (orginal) [2021-03-03T07:39:54+01:00] view original

Thinking about the x86_64 architecture and C compilers, there is a case of using memory segments, and dedicated segment register (for the single mmap area) -- it looks able to work at the hardware level. Is this possible in Nim and C compiler to utilize this mode?

doofenstein (orginal) [2021-03-03T07:41:32+01:00] view original

I might be wrong but I think you're thinking too complicated 😀

Accessing non traced data is pretty straightforward in Nim and it's done all the time when interacting with external C/C++ code.

See: https://nim-lang.org/docs/manual.html#types-reference-and-pointer-types (and especially the following section „Mixing GC'ed memory with ptr“).

When you're doing pointer arithmetic you might want to use ByteAddress and ptr UncheckedArray[sometype], as it's not possible to add to/subtract from ptr T or pointer.

dponyatov (orginal) [2021-03-03T09:09:37+01:00] view original

I don't want to access it -- I want to use Nim as a persistence-enabled language, with all its power of dynamic structures (mapped to SSD or NVDIMM).

But I don't think it is possible -- I must implement not only custom GC but integrate these "offset/based pointers" into the Nim core or at least it's stdlib, which looks for me too complicated.

dponyatov (orginal) [2021-03-03T09:13:41+01:00] view original

Maybe I should especially note: the PMEM base address moves randomly across the address space between every program run. That's why generic pointers not acceptable -- they use the absolute address, but I must address relatively to the BASE randomly moving in memory.

doofenstein (orginal) [2021-03-03T09:26:40+01:00] view original

I see. Hm that complicates things. Idk how practical that would be, but how about something like this:

type
    PMemPtr[T] = distinct int

var pmemHeapStart: ByteAddress

proc `[]`[T](pmemPtr: PMemPtr[T]) =
  cast[ptr T](cast[ByteAddress](pmemPtr) + pmemHeapStart)[]

proc test(a: PMemPtr[SomeType]) =
  a[].field = 42

dponyatov (orginal) [2021-03-03T09:31:17+01:00] view original

Why is it important: technology already and some years ago provides the non-volatile RAM, which saves its state after the full power down (Intel Optane, HPE NMDIMM's). What is much more important that modern OSes a long time ago provide the ability to extend any program address space via mmap to arbitrary size storages, especially fast enough SSD disks.

But: there is still no any programming language besides GemStone/Smalltalk which gives this power with 10Tb vRAM to any generic developer. I intently look at the Nim in this role, because it gives fast low-level code at final, but for the programmer it looks as friendly as Python.

cumulonimbus (orginal) [2021-03-03T14:25:17+01:00] view original

I think the best you can do right now is map your memory in a known address (this is nearly impossible in 32-bits, but address space in 64-bit is vast enough to declare something "yours"). There might be collisions if someone else wants to do the same, of course ....

Using stdlib data structures in persisted memory is a huge headache, which is likely impossible to reasonably retrofit to an existing language/runtime. You somehow have to make sure that no assignment into any item's field points to outside the mmapped area.

I remember attempts from 20-30 years ago or so, when object databases were all the rage; I think it was called "pointer swizzling" back then, and was handled in a language agnostic way using page faults on Unix (NT was at 3.0 or 3.5 and wasn't yet considered a serious OS at the time). The conclusions back then -- and I suspect that's still true -- is that mmapped/page-faulted memory for persisted data is great for reading -- but writing requires discipline that cannot be automated, and needs its own API. This is also the approach adopted by LMDB / MDBX (in a different setting).

Araq (orginal) [2021-03-03T14:39:16+01:00] view original

I would start with a custom Table and seq and string implementation (DiscTable, DiscSeq, DiscString) and then see how the programs really work out. Usually code out there assumes that state changes are not persistent -- the data is gone after the process died. This assumption doesn't hold for your new containers so beware of undisciplined code reuse.

Mirror of forum.nim-lang.org

7576 :: relative "based/biased" long pointers and data structures over it