What's up guys,
I've been playing around with the concept of a Jupyter-Notebook like Nim IDE for a while. I call it Nim Notebook
So far I've experimented with nim secret (currently the best, and the default), inim, and various implementations of embedded nimscript as the REPL. The only method that has had a reliable and persistent state is with nim secret but as many of you know, it's using the Nim VM and has a fair share of issues- most importantly the inability to access FFI.
My first goal with this project is to have a REPL that:
I now have the idea of using Hot Code Reloading to help serve as the REPL.. a quasi-REPL anyway. From what I read in the docs, it seems viable?
Can I get some feedback from those of you that have used hot code reloading and understand its benefits/drawbacks/limitations before I dive too deep into this?
Otherwise, more than happy to get any feedback on how to best tackle this problem. The REPL aspect of this project is the most difficult, everything else is relatively easy.
It'll be pretty damn cool in the end to be able to combine code with some nice visuals and markdown text.
Thanks Araq, looks like it'll have to be HCR for now. For the life of me I couldn't get arraymancer to work with nim secret and deep learning stuff was originally what I intended on making the notebook IDE for.
As for your recommendation, I'll definitely head in that direction at a later time. I have to learn lower level programming, I don't even know where to begin with this "patching" lol
I found https://nim-lang.org/docs/memfiles.html for supporting shared memory and have a basic example working where I'm writing an integer into the file and loading it during another run. That works.
But how the hell do I store and access multiple variables using this library? Documentation is lacking a little bit, and ChatGPT has been no help. I'm assuming some kind of pointer arithmetic or something but would love to hear some feedback before digging in any deeper.
my example:
# no where close to a good implementation (ai wrote it.. most of it) but it'll do for now
import memfiles, os
const
SHM_FILE = "shared_memory.dat"
SHM_SIZE = 4096
# Create or open a memory-mapped file
proc initSharedMemory(): MemFile =
if fileExists(SHM_FILE):
result = memfiles.open(filename = SHM_FILE, mode = fmReadWrite, allowRemap = true)
else:
result = memfiles.open(filename = SHM_FILE, mode = fmReadWrite, newFileSize = SHM_SIZE, allowRemap = true)
# Initialize global variable pointer
var globalVar: ptr int
proc initGlobals() =
var shm = initSharedMemory()
if shm.handle == -1:
raise newException(OSError, "Failed to open memory-mapped file")
let p = mapMem(shm, fmReadWrite, SHM_SIZE)
if p == nil:
raise newException(OSError, "Failed to map memory")
globalVar = cast[ptr int](p)
proc useGlobalVar() =
globalVar[] = 42
proc printGlobalVar() =
echo "GlobalVar: ", globalVar[]
# Initialize shared memory and global variable pointer
initGlobals()
# Example usage
printGlobalVar() # Should print initial value (undefined on first run)
useGlobalVar()
printGlobalVar() # Should print 42
# Clean up
proc cleanupSharedMemory() =
var shm = initSharedMemory()
flush(shm)
close(shm)
cleanupSharedMemory()
@Niminem - I have done the std/memfiles thing a lot to avoid "recalculating The World" only because your process exited. >1 value is not hard and you're already close. https://github.com/c-blake/nio is really a generic data layer ("like CSV but running off live mmaps because, once you have an MMU/Virtual memory, memory-is-memory). That has a few ways to handle string data in FileArray s.
The simplest self-contained end-to-end application example I have is probably the ultimate ident bike shedder's tool: https://github.com/c-blake/thes . Something with a more ornate variable-length list allocation is the older and less simple https://github.com/c-blake/suggest . https://github.com/c-blake/ndup has a couple more examples.
https://github.com/c-blake/cligen has an alternative layering of file memory maps and some utility/parsing code in cligen/mfile & cligen/mslice which is used by wgt.nim in https://github.com/c-blake/bu along with the (sorry, rather partial) https://github.com/c-blake/adix/blob/master/adix/oats.nim new style concept hash table with "more delegated" memory handling. That lets wgt run off a live file-resident hash table which is much like a "database" (but trusting OSes to flush data to devices and with no multi-simultaneous-process access coordination controls since giving up either assumption pulls in a lot of locking/etc. complexity that many uses cases may well not need and so should be layered judiciously).
If all those examples in my code are not enough, @Vindaar has recently done https://github.com/Vindaar/forked and added a std/memfiles alternative to https://github.com/Vindaar/flatBuffers . If you map a file in a RAM filesystem (like /dev/shm on Linux) and populate it in one process and then read it in another you realize 1-write and 1-read communication (aka zero overhead comms) much like shared-memory multi-threading, but A) opting in to danger for surgically scoped memory and B) as a bonus having that memory outlive your process, C) if you are using the file system as your primal allocator and you are using a modern FS like btrfs, ZFS, bcachefs, .. then you can also save on IO with transparent data decompression (though modern NVMe drives can often go faster than even multi-threaded data decompression.. and some OSes like Linux will also compress RAM).
Nice benefits, but there are costs, namely needing to "name memory areas" in the filesystem, do your own allocators against them, and having much of the standard library working against types like string rather than openArray[char].
I should say little of this is really "new".. It's almost as old as processes themselves. https://en.wikipedia.org/wiki/MIT-SHM was doing it back in 1991 and other systems before that as a way to optimize large message transfers. It has always seemed "under attended" to me in designs of PLs and programming systems, though. Maybe that relates to deployment portability concerns since I guess some embedded CPU providers still skimp on MMUs, but to me virtual memory, files, and folders/directories are all done deals. Anyway, hopefully something in the above helps.
Does the stdlib HCR work at all? I tested the program in the docs (making sure that the logic was in a separate file), and on 2.0.2 and 2.0.4 it produced bad codegen. On 1.16.14 it compiled, but it didn't actually work.
Yeah it used to kind of work before 2.0 (but crashed even just printing a float). I tried integrating it into jupyternim but there were just too many bugs. The memfile idea is nice but introduces quite a bit of complexity with saving variables and reloading them, especially if you want to allow changing the type. Seems doable though, at least for the simpler cases that are usually explored in a REPL.
Nice benefits, but there are costs, namely needing to "name memory areas" in the filesystem, do your own allocators against them, and having much of the standard library working against types like string rather than openArray[char].
That's because you're doing it in the wrong layer. ;-) Somebody needs to patch alloc.nim to allocate from a mmap'ed named file.
Yeah.. that came up in a private conversation as the easiest next step for global "workspaces" like the R repl has. There' s still perhaps some trickiness if any objects contain pointers to other objects since pointers are not "fat" (or encoded as offsets to some base address to be "re-linked at load time"), but global numbers. So, you probably need to remap the data to the same VM address it had before (which is easier to demand on 64-bit systems, but not all OSes support this kind of thing, but portability may be less critical for a REPL, etc., etc. ).
If it helps @Niminem conceptualize things, the program state problem is virtually identical to the problem of .DLL/.so shared libraries but with "pre-compiled" data instead of/as well as pre-compiled object code. Old Lisp systems would just dump their entire memory image to disk for later reloading (this is surely what R workspace saving/loading was inspired by) and the GNU emacs build system still works this way today (though they recently added ".eln" ELisp-Native shared object files). What makes this easier (&so more common) for code is that being (mostly) up-front & read-only there is no shared mutable memory. Since data is under user semantics (not, e.g. Intel) and so more surgically editable than changing CPU instructions, people tend to reach more for edits (which DB people would complain about next, but devs can also be taught hazards, OSes/stdlibs can default to exclusive file access, etc.).
But also, @Niminem, the same private conversation also noted "data longer lived than the process" is sort of the easiest part of the problem. Most pithily, the "E" in R-E-P-L that is "the hard part" since to "evaluate" you want/need an expression interpreter/compiler and soon you want it to be "just like" AOT compiles (e.g. not just HCR/NimVM FFIs, but some knob to compile code that runs faster on a larger data sample once your logic is debugged). Then - boom! - you're back at incremental compilation (IC) to native code of some kind.
So, I doubt having program state in files can itself solve the Jupyter Notebook IDE problem for Nim that began this thread. My recommendation for rapid development environments is usually to just use regular old editor buffers. ( Everyone always wants faster compiles. A first, easy step for faster compiles is changing your nim.cfg from the factory default of -Og to -O0; PRs to get more Nim-science libs working with tcc could yield more. On some CPUs/C-compilers-compling Nim, PGO of the compiler itself is said to help. Etc., etc. This is an oft-discussed topic in the Forum.)
Ahhhh yes. I revisit this Jupyter Notebook-like IDE idea every so often, and every time I get a little deeper into the rabbit hole. Every time I go deeper into the rabbit hole I realize 1.) how little I currently know 2.) that this is a much larger and complex problem to solve than anticipated 3.) the time involved even getting the most basic proof of concepts.
Thank you for the deep reply, this is plenty to research and build from.