nimforum mirror - Merry christmas! Progress report on IC

Araq (orginal) [2025-12-25T10:49:03+01:00] view original

Incremental compilation (IC) has been a long-standing goal for Nim—the ability to recompile only what changed, dramatically speeding up development cycles. I planned to surprise you with a somewhat working IC feature for Christmas, but unfortunately the progress while steady is slower than anticipated.

So instead I will talk about how it's developed and how it will work. The design is taken from Nimony: There is a pre-pass over the program that inspects the import structure of your program and produces a dependency graph. This step uses a helper tool called "Nifler" and is incremental too: It does not reparse your entire program but only the modules that changed. We can do this reliably because the pure parsing step into a tree structure can be done without a symbol table. The dependency graph is then turned into a "nifmake"-file (similar to makefiles, but for .nif files); nifmake gives us incremental and parallel builds!

The nifmake-file mostly consists of nim m invocations. The new m switch is a key feature here. nim m x.nim precompiles the file to a x.nif internal representation but x's dependencies are not recompiled, they are loaded from their respective nif files instead. The m switch does not detect which files are outdated, that is nifmake's job. (The m switch is also what I use for development of the IC feature.)

After every module has its precompiled .nif file, the code generation backend runs over the set of .nif files and produces C/C++ code as usual. This step is not incremental for the first versions of the IC feature. Instead it uses Nim's existing dead-code-elimination (DCE) infrastructure and only produces code that ends up in the binary. DCE creates a challenge for incremental compilation: If you start using a symbol from a precompiled module that was previously unused (and thus eliminated), you have to recompile that module to regenerate the code.

The bad news is that it's not working yet. The good news is that this implementation of IC finally seems to meet my performance goals: The precompiled modules are not too large, they are fast to load from disk and the loading is lazy: what is not needed is not processed at all.

Merry Christmas! I'm looking forward to sharing more progress in the new year.

PMunch (orginal) [2025-12-25T18:54:31+01:00] view original

This is great news! Sans the fact that it isn't currently working of course, but it sounds like you're getting close. Haven't followed the progress on Nif that closely. How much processing is done in the Nim -> Nif processing? I hope macros fully complete and it's only a fairly straightforward process to create C code left?

Araq (orginal) [2025-12-25T19:42:19+01:00] view original

I hope macros fully complete and it's only a fairly straightforward process to create C code left?

Macros are fully precompiled and cause no overhead, yes. The process of creating C code is still quite involved though as we have to inject destructors etc and that step is not cached. It can be but every cache means more files and disk IO so it's always unclear of whether it's beneficial.

termer (orginal) [2025-12-25T19:56:52+01:00] view original

Great news. I can't wait for IC to be working, and can't wait for Nimony in general.

It can be cached but every cache means more files and disk IO so it's always unclear of whether it's beneficial.

Perhaps it could be an option. Some users may choose to mount a filesystem from memory to make I/O overhead almost zero, so I could see that being very useful if tuned properly.

PMunch (orginal) [2025-12-25T22:16:43+01:00] view original

Indeed, many Linux distros mount /tmp as a RAM based file system where this doesn't matter as much.

arnetheduck (orginal) [2025-12-26T11:35:12+01:00] view original

Macros are fully precompiled and cause no overhead, yes.

This sounds like a bit of a trap / highly case-dependent - we can assume that parsing nim and nif is about the same for unexpanded code - so if we look at generic expansion, template expansion, macro expansion, it's easy to see that one can construct cases that are both faster or slower with or without pre-processing - it's easiest to argue for a macro: a macro that outputs a lot of code and doesn't do any significant computation is not going to benefit from pre-expansion because parsing the outcome will be slower than performing the computation, but a macro that reduces the output (say .. hashes the AST or whatever) will benefit.

Similar with a template: a small template might be more efficient to expand if it appears in a context with complex overload matching (which is a complex computation), but in many cases, I'd assume that it's more efficient to load an unexpanded template and expand it during compilation since the template version is a more compact representation of the same code - specially when considering nested expansions - a nif file that expands templates recursively will be much larger than its corresponding unexpanded version.

This brings us to generics - the C++ route of expanding everything in every TU and then reconciling is one of the main reasons compiling C++ is slow - it takes time to generate the same code over and over. In the C++ case, LTO works around many of the performance problems related to this since optimization can happen post-deduplication and it's the optimization that is computation-heavy - in fact, what one would really like to cache is the post-optimization version of each function using a key derived from the code structure.

Similarly, pre-expanding per module would cause an explosion in nif size, and all of that redundancy must later be removed in a process not dissimilar to C++'s one definition rule - in C++, it's very easy to violate that rule and end up with broken compiles and rendering the whole process brittle.

So the trap here really is to perform the IC measurements on a simple codebase (where IC arguably doesn't matter) and expect the results to carry over to a complex codebase - where compounded expansions may very well change the equation of what ends up being a "total compilation time win". Beyond getting it to work at all (due to the additional brittleness of the pipeline), this is where similar efforts in both the LTO caching and JIT worlds have struggled and they solve more or less the same problem.

Araq (orginal) [2025-12-26T17:44:31+01:00] view original

a macro that outputs a lot of code and doesn't do any significant computation is not going to benefit from pre-expansion because parsing the outcome will be slower than performing the computation, but a macro that reduces the output (say .. hashes the AST or whatever) will benefit.

What you seem to be missing here is that we don't necessarily have to parse the expanded NIF code in subsequent runs at all. A NIF file has an index and we use mmap and offsets into the file and load sections on demand. It's not a purely based text format, it's actually a hybrid.

Also later iterations on the design can optimize it further: The text aspect of NIF that currently helps tremendously during development and debugging doesn't have to remain. Also a code generator that does not require the old PNode structures but instead operates on the NIF token streams is feasible, it is what Nimony already does after all.

arnetheduck (orginal) [2025-12-27T12:34:18+01:00] view original

don't necessarily have to parse the expanded NIF code in subsequent runs at all.

This is great - but what happens when you do have to parse it?

This is the main question above and it seems to me it arises every time you have to monomorphise a generic or re-expand a template or re-apply a macro which in nim happens quite often, ie even for seemingly trivial changes.

The thing I hope to get from IC (apart from its more obvious tooling applications, where again, you need to be able to expande templates and macros fast and not just work with post-expansion data) is a semantic per-function cache - basically what you get after performing the per-function simplification passes, and here, the really interesting thing is that you can de-duplicate the code - basically, Table.len is the same for all monomorphizations of Table - most table instantiations are also the same for all ref types since the size of the entry is always the same (sizeof ptr) and most ref types semantically behave the same from the point of view of the table and so on - in this world, each function would be checked for semantic equivalence and cached accordingly, but above all, it could then be optimised only once by the later optimization passes (imagine integrating this with the C/LLVM optimization passes).

Is the index fine-grained enough for something like this? Deduplicating semantically similar to how `treetab` does it would allow drastically reducing the amount of work that the compiler has to do end-to-end.

Araq (orginal) [2025-12-27T19:06:18+01:00] view original

The index already needs an "offers" section where we protocol which generic instantiations can be reused. What you describe seems to be a generalization where an AST (as opposed to a type) can be used as a key. Seems doable but the devil will be in the details and for now the focus is "get the code generation for 'hello world' to work"

Araq (orginal) [2025-12-29T10:43:24+01:00] view original

Progress: The code generation works for "Hello world" on Linux. (On Windows we don't access DLLs properly yet so it doesn't work.)

Commands I used (but don't report bugs!):


koch temp m --nimcache:nifcache lib/system.nim # precompile system
koch temp m --nimcache:nifcache hello.nim # precompile hello world
koch nifc  --nimcache:nifcache hello.nim # produce code from the precompiled modules

This is a significant milestone as generating code for system.nim isn't exactly trivial and echo is varargs and causes generic instanstantiations etc.

dneumann42 (orginal) [2025-12-29T17:11:30+01:00] view original

Thanks for all of your hard work, and the progress reports! One of my favorite parts of the day is checking up on how things are going. Is the plan to include IC in a new release of Nim 2.0, or is it planned for 3.0?

Araq (orginal) [2025-12-29T18:07:28+01:00] view original

IC is for version 2.4. Nim 3.0 continues to be developed as "Nimony". We develop a common backend with the ideas of NJ+VL and nativenif that both Nim 2 and Nim 3 will target. This way even more innovations can be brought to the 2.x line which is what pays my bills.

Araq (orginal) [2026-01-07T12:18:33+01:00] view original

Progress: There is a new switch nim ic foo.nim which automates the nim m steps, in fact, it uses Nifer and nifmake as outlined. This makes testing&developing the feature significantly easier.

Araq (orginal) [2026-01-09T14:34:53+01:00] view original

Progress: The IC feature is now covered by testament and we're porting the old IC tests for it.

Happy New Year (a bit late, I know).

Araq (orginal) [2026-01-16T10:48:47+01:00] view original

Progress: IC now uses the new NIF26 spec. As a nice side effect the resulting system.nif file went from 4MB to 3.3MB.

Araq (orginal) [2026-01-22T07:31:33+01:00] view original

Progress: IC now works for a program that uses the stdlib's parseutils. Next step: Getting strutils to work.

didlybom (orginal) [2026-01-22T15:57:10+01:00] view original

This is exciting progress!

I’m curious, what makes it harder for IC to work for parseutils than for strutils? Is it the complexity of the module itself?

Araq (orginal) [2026-01-22T16:13:24+01:00] view original

No, it's just that strutils imports parseutils and so that has to work first.

Mirror of forum.nim-lang.org

13585 :: Merry christmas! Progress report on IC