nimforum mirror - Hot Code Reloading viability for Jupyter-Notebook like Nim IDE

Niminem (orginal) [2021-09-26T23:43:19+02:00] view original

What's up guys,

I've been playing around with the concept of a Jupyter-Notebook like Nim IDE for a while. I call it Nim Notebook

So far I've experimented with nim secret (currently the best, and the default), inim, and various implementations of embedded nimscript as the REPL. The only method that has had a reliable and persistent state is with nim secret but as many of you know, it's using the Nim VM and has a fair share of issues- most importantly the inability to access FFI.

My first goal with this project is to have a REPL that:

allows the use of std, local, and 3rd party modules

keeps the same state- all variables will carry over / be accessible in the next cell(s)

I now have the idea of using Hot Code Reloading to help serve as the REPL.. a quasi-REPL anyway. From what I read in the docs, it seems viable?

Can I get some feedback from those of you that have used hot code reloading and understand its benefits/drawbacks/limitations before I dive too deep into this?

Otherwise, more than happy to get any feedback on how to best tackle this problem. The REPL aspect of this project is the most difficult, everything else is relatively easy.

It'll be pretty damn cool in the end to be able to combine code with some nice visuals and markdown text.

alexeypetrushin (orginal) [2021-09-27T03:07:10+02:00] view original

+1, and would be even better if REPL worked in compiled-to-JS too, so you have pure one page interactive Nim calculations.

Araq (orginal) [2021-09-27T08:52:53+02:00] view original

There is a patch for enabling the FFI with Nim's VM or you can use hot-code-reloading. I think neither approaches are that good, I encourage you to look into solutions that patch Nim's allocator to be support "shared" memory and patch the C(++) codegen so that globals are accessed via an indirection. Then the state of the program is persistent but you can "reload" the binary, simply by quit()'ing and restarting it.

DeletedUser (orginal) [2021-09-27T10:08:19+02:00] view original

nim js is pretty fast, enough to use for a normal REPL or scripting purposes, though it's not exactly a substitute for all the C bindings.

Clonk (orginal) [2021-09-27T13:02:29+02:00] view original

AS a REPL inim is quite good

Lecale (orginal) [2021-09-28T20:17:11+02:00] view original

This project looks fun, do you have any example notebooks?

Niminem (orginal) [2021-09-28T23:25:35+02:00] view original

Thanks Araq, looks like it'll have to be HCR for now. For the life of me I couldn't get arraymancer to work with nim secret and deep learning stuff was originally what I intended on making the notebook IDE for.

As for your recommendation, I'll definitely head in that direction at a later time. I have to learn lower level programming, I don't even know where to begin with this "patching" lol

Niminem (orginal) [2021-09-28T23:27:14+02:00] view original

Unfortunately not, I haven't put a saving functionality in yet. You can clone the repo and check it out for yourself though. Only two cells are avaible to run but.. it works!

Niminem (orginal) [2021-09-28T23:28:43+02:00] view original

The only issue I have with inim is that it recompiles everything each time a command is sent. If you're using any randomization in your variables, you'll get different variables everytime you have a new input. Booooooooooo

Niminem (orginal) [2024-05-30T02:24:10+02:00] view original

I found https://nim-lang.org/docs/memfiles.html for supporting shared memory and have a basic example working where I'm writing an integer into the file and loading it during another run. That works.

But how the hell do I store and access multiple variables using this library? Documentation is lacking a little bit, and ChatGPT has been no help. I'm assuming some kind of pointer arithmetic or something but would love to hear some feedback before digging in any deeper.

my example:

# no where close to a good implementation (ai wrote it.. most of it) but it'll do for now
import memfiles, os

const
  SHM_FILE = "shared_memory.dat"
  SHM_SIZE = 4096

# Create or open a memory-mapped file
proc initSharedMemory(): MemFile =
  if fileExists(SHM_FILE):
    result = memfiles.open(filename = SHM_FILE, mode = fmReadWrite, allowRemap = true)
  else:
    result = memfiles.open(filename = SHM_FILE, mode = fmReadWrite, newFileSize = SHM_SIZE, allowRemap = true)

# Initialize global variable pointer
var globalVar: ptr int

proc initGlobals() =
  var shm = initSharedMemory()
  if shm.handle == -1:
    raise newException(OSError, "Failed to open memory-mapped file")
  let p = mapMem(shm, fmReadWrite, SHM_SIZE)
  if p == nil:
    raise newException(OSError, "Failed to map memory")
  globalVar = cast[ptr int](p)

proc useGlobalVar() =
  globalVar[] = 42

proc printGlobalVar() =
  echo "GlobalVar: ", globalVar[]

# Initialize shared memory and global variable pointer
initGlobals()

# Example usage
printGlobalVar()  # Should print initial value (undefined on first run)
useGlobalVar()
printGlobalVar()  # Should print 42

# Clean up
proc cleanupSharedMemory() =
  var shm = initSharedMemory()
  flush(shm)
  close(shm)

cleanupSharedMemory()

enthus1ast (orginal) [2024-05-30T09:12:12+02:00] view original

You could write an object with system.copyMem but then you would have an issue with non shallow datatypes (like strings), so i think the best would be to serialize your data first. For example with https://github.com/treeform/flatty

cblake (orginal) [2024-05-30T10:29:54+02:00] view original

@Niminem - I have done the std/memfiles thing a lot to avoid "recalculating The World" only because your process exited. >1 value is not hard and you're already close. https://github.com/c-blake/nio is really a generic data layer ("like CSV but running off live mmaps because, once you have an MMU/Virtual memory, memory-is-memory). That has a few ways to handle string data in FileArray s.

The simplest self-contained end-to-end application example I have is probably the ultimate ident bike shedder's tool: https://github.com/c-blake/thes . Something with a more ornate variable-length list allocation is the older and less simple https://github.com/c-blake/suggest . https://github.com/c-blake/ndup has a couple more examples.

https://github.com/c-blake/cligen has an alternative layering of file memory maps and some utility/parsing code in cligen/mfile & cligen/mslice which is used by wgt.nim in https://github.com/c-blake/bu along with the (sorry, rather partial) https://github.com/c-blake/adix/blob/master/adix/oats.nim new style concept hash table with "more delegated" memory handling. That lets wgt run off a live file-resident hash table which is much like a "database" (but trusting OSes to flush data to devices and with no multi-simultaneous-process access coordination controls since giving up either assumption pulls in a lot of locking/etc. complexity that many uses cases may well not need and so should be layered judiciously).

If all those examples in my code are not enough, @Vindaar has recently done https://github.com/Vindaar/forked and added a std/memfiles alternative to https://github.com/Vindaar/flatBuffers . If you map a file in a RAM filesystem (like /dev/shm on Linux) and populate it in one process and then read it in another you realize 1-write and 1-read communication (aka zero overhead comms) much like shared-memory multi-threading, but A) opting in to danger for surgically scoped memory and B) as a bonus having that memory outlive your process, C) if you are using the file system as your primal allocator and you are using a modern FS like btrfs, ZFS, bcachefs, .. then you can also save on IO with transparent data decompression (though modern NVMe drives can often go faster than even multi-threaded data decompression.. and some OSes like Linux will also compress RAM).

Nice benefits, but there are costs, namely needing to "name memory areas" in the filesystem, do your own allocators against them, and having much of the standard library working against types like string rather than openArray[char].

I should say little of this is really "new".. It's almost as old as processes themselves. https://en.wikipedia.org/wiki/MIT-SHM was doing it back in 1991 and other systems before that as a way to optimize large message transfers. It has always seemed "under attended" to me in designs of PLs and programming systems, though. Maybe that relates to deployment portability concerns since I guess some embedded CPU providers still skimp on MMUs, but to me virtual memory, files, and folders/directories are all done deals. Anyway, hopefully something in the above helps.

Niminem (orginal) [2024-05-31T03:09:19+02:00] view original

Thank you for the suggestion that makes better sense now

Niminem (orginal) [2024-05-31T03:10:18+02:00] view original

Thank you for the suggestion as well @cblake. This is quite the homework!

termer (orginal) [2024-05-31T09:31:51+02:00] view original

Does the stdlib HCR work at all? I tested the program in the docs (making sure that the logic was in a separate file), and on 2.0.2 and 2.0.4 it produced bad codegen. On 1.16.14 it compiled, but it didn't actually work.

stisa (orginal) [2024-05-31T14:03:25+02:00] view original

Does the stdlib HCR work at all? I tested the program in the docs (making sure that the logic was in a separate file), and on 2.0.2 and 2.0.4 it produced bad codegen. On 1.16.14 it compiled, but it didn't actually work.

Yeah it used to kind of work before 2.0 (but crashed even just printing a float). I tried integrating it into jupyternim but there were just too many bugs. The memfile idea is nice but introduces quite a bit of complexity with saving variables and reloading them, especially if you want to allow changing the type. Seems doable though, at least for the simpler cases that are usually explored in a REPL.

Araq (orginal) [2024-05-31T15:13:17+02:00] view original

Nice benefits, but there are costs, namely needing to "name memory areas" in the filesystem, do your own allocators against them, and having much of the standard library working against types like string rather than openArray[char].

That's because you're doing it in the wrong layer. ;-) Somebody needs to patch alloc.nim to allocate from a mmap'ed named file.

cblake (orginal) [2024-06-01T12:22:25+02:00] view original

Yeah.. that came up in a private conversation as the easiest next step for global "workspaces" like the R repl has. There' s still perhaps some trickiness if any objects contain pointers to other objects since pointers are not "fat" (or encoded as offsets to some base address to be "re-linked at load time"), but global numbers. So, you probably need to remap the data to the same VM address it had before (which is easier to demand on 64-bit systems, but not all OSes support this kind of thing, but portability may be less critical for a REPL, etc., etc. ).

If it helps @Niminem conceptualize things, the program state problem is virtually identical to the problem of .DLL/.so shared libraries but with "pre-compiled" data instead of/as well as pre-compiled object code. Old Lisp systems would just dump their entire memory image to disk for later reloading (this is surely what R workspace saving/loading was inspired by) and the GNU emacs build system still works this way today (though they recently added ".eln" ELisp-Native shared object files). What makes this easier (&so more common) for code is that being (mostly) up-front & read-only there is no shared mutable memory. Since data is under user semantics (not, e.g. Intel) and so more surgically editable than changing CPU instructions, people tend to reach more for edits (which DB people would complain about next, but devs can also be taught hazards, OSes/stdlibs can default to exclusive file access, etc.).

But also, @Niminem, the same private conversation also noted "data longer lived than the process" is sort of the easiest part of the problem. Most pithily, the "E" in R-E-P-L that is "the hard part" since to "evaluate" you want/need an expression interpreter/compiler and soon you want it to be "just like" AOT compiles (e.g. not just HCR/NimVM FFIs, but some knob to compile code that runs faster on a larger data sample once your logic is debugged). Then - boom! - you're back at incremental compilation (IC) to native code of some kind.

So, I doubt having program state in files can itself solve the Jupyter Notebook IDE problem for Nim that began this thread. My recommendation for rapid development environments is usually to just use regular old editor buffers. ( Everyone always wants faster compiles. A first, easy step for faster compiles is changing your nim.cfg from the factory default of -Og to -O0; PRs to get more Nim-science libs working with tcc could yield more. On some CPUs/C-compilers-compling Nim, PGO of the compiler itself is said to help. Etc., etc. This is an oft-discussed topic in the Forum.)

Niminem (orginal) [2024-06-01T17:07:50+02:00] view original

Ahhhh yes. I revisit this Jupyter Notebook-like IDE idea every so often, and every time I get a little deeper into the rabbit hole. Every time I go deeper into the rabbit hole I realize 1.) how little I currently know 2.) that this is a much larger and complex problem to solve than anticipated 3.) the time involved even getting the most basic proof of concepts.

Thank you for the deep reply, this is plenty to research and build from.

Mirror of forum.nim-lang.org

8462 :: Hot Code Reloading viability for Jupyter-Notebook like Nim IDE