zensin (orginal) [2025-08-15T06:20:10+02:00] view original

Preface

Having recently come back to Nim after not using it much since pre-2.0, I was very confused about the state of things. I used to make use of async patterns, and now I need threads for a particular project. Nim is great fit for every other reason I can come up with, but the ambiguity and lack of direction relating to threading has set me back several days. Finally, I turned to AI for the answers. I will relay it to the community so that others in my position may have a clearer picture than I did, written by Claude's deep research...

Overview

Nim's concurrency features have undergone dramatic transformations from 2008 to August 2025, shifting from an ambitious compile-time safety model to deprecated standard library modules, with the creator now recommending his new Malebolgia library while the community fragments across competing solutions. The most significant current development is that the standard threadpool is officially deprecated, async/await has memory leaks under the default ORC memory manager, and developers face a 40% performance regression when using threads in Nim 2.0—creating what many describe as a concurrency crisis.

This research reveals a programming language caught between revolutionary ambitions and pragmatic constraints. The journey from Nimrod's 2013 vision of statically-verified thread safety to today's ecosystem of external libraries represents both technical evolution and philosophical shifts about where concurrency should live in a modern systems language. The continuous changes stem from fundamental conflicts between thread-local garbage collection, shared-heap memory models, and the challenge of composing different concurrency paradigms.

From theoretical ambition to pragmatic pivot (2008-2015)

The concurrency story begins with Andreas Rumpf's 2008 launch of Nimrod, initially featuring basic threading support typical of systems languages. By June 2013, Rumpf published an extraordinarily ambitious vision: a shared-memory concurrency system using compile-time verification to prevent data races through sophisticated type qualifiers like shared ptr and guarded ptr, combined with static lock hierarchy analysis to prevent deadlocks. The system would track lock levels at compile-time, ensuring locks were always acquired in order to mathematically prevent circular wait conditions.

This theoretical elegance proved too complex for practical implementation. The type system changes were overwhelming, the static analysis couldn't prevent all race conditions, and developers found the abstractions too abstract for real-world concurrent programming. Rumpf himself acknowledged the trade-offs, noting that while the system couldn't prevent every possible data race, it "surely looks like a sweet trade-off"—but the community disagreed.

The pragmatic pivot came in 2015 when Dominik Picheta created the asyncdispatch module, implementing async/await through Nim's macro system rather than core language features. This library-first approach succeeded where the ambitious type system failed, providing immediate utility for web developers building HTTP servers and network applications. The async/await model became Nim's primary concurrency story through version 1.0 (released September 2019), coexisting uneasily with the older spawn/threadpool system for CPU-bound parallelism. This dual model - async for I/O, threadpool for computation - established a pattern of fragmentation that would only intensify.

Memory management drives architectural upheaval (2020-2023)

The transition from Nim 1.x to 2.0 brought fundamental changes driven by memory management evolution. The original refc garbage collector used thread-local heaps, making it impossible to share GC-managed objects (strings, sequences, references) between threads without deep copies. This architectural constraint meant async/await and spawn/parallel couldn't compose—you literally couldn't await a spawned task because they operated in different memory universes.

Nim 2.0 introduced ORC (Optimized Reference Counting with Cycle collection) as the default memory manager in 2023, enabling shared heaps across threads. But this seemingly positive change triggered cascading problems. The reference counting operations weren't atomic, creating race conditions. Performance benchmarks showed programs compiled with --threads:on (now default in Nim 2.0) ran 40% slower than their single-threaded counterparts, even when using only one thread. Memory leaks appeared in async code under ORC that didn't exist under the old GC.

The threadpool module, already suffering from global queue contention and lack of work-stealing, became officially deprecated with an explicit message: "use the nimble packages malebolgia, taskpools or weave instead". The async/await implementation showed its age with documented memory leaks, poor cancellation support, and the inability to integrate with parallel constructs. One developer reported their production HTTP server experienced a 3x memory usage increase when upgrading from Nim 1.6 to 2.0, requiring the -d:useMalloc workaround to restore normal behavior.

Andreas Rumpf prescribes structured concurrency medicine

By 2024, Andreas Rumpf had developed clear opinions about Nim's concurrency future, crystallized in his Malebolgia library - a sub-300-line structured concurrency solution emphasizing predictability over flexibility. Malebolgia deliberately omits FlowVars, instead using awaitAll barriers for synchronization. It focuses on bounded memory consumption, built-in cancellation, and what Rumpf calls "the 'backpressure' problem as a side effect" of its design constraints.

Rumpf's current recommendations are unambiguous: use Malebolgia for general concurrency, or alternatives like taskpools (Status-im's lightweight solution) and weave (mratsim's high-performance computing runtime supporting trillions of tasks). He explicitly warns against the deprecated standard threadpool and acknowledges "open secret" criticisms of async/await, particularly its incompatibility with gc:orc and the complexity of its macro-based implementation generating 3x more code than equivalent threaded solutions.

The philosophical shift is striking. The 2013 vision sought to encode all concurrency safety in the type system; Malebolgia achieves safety through structural constraints - all tasks must complete within defined scopes. Where the original model offered maximum flexibility with compile-time verification, the new approach trades flexibility for predictability. Rumpf's warning on the old concurrency documentation captures this evolution: "The information presented here is severely outdated. Nim's concurrency is now based on different mechanisms (scope based memory management and destructors)."

Community experiences reveal deep frustration

Developer reactions to Nim's concurrency changes range from confusion to anger, with performance regressions and breaking changes creating production headaches. The 40% performance penalty from default threading in Nim 2.0 forced many to disable threads entirely or use memory allocator workarounds. Multiple developers reported that multi-threaded async code became 1.7x slower than single-threaded execution—the opposite of expected behavior.

The ecosystem fragmentation between asyncdispatch (standard library) and chronos (Status's alternative) forces library authors to choose sides or attempt supporting both incompatible APIs. Documentation gaps mean basic operations require searching forums, IRC logs, or reading source code. One frustrated developer compared async/await to threading implementations, finding threading required less code, generated smaller binaries, and was actually simpler despite its reputation for complexity. Their analysis showed async generated 125KB binaries versus 52KB for threads, with significantly more complex codegen.

Success stories exist but remain limited. Nim Forum runs successfully on async/await through the httpbeast server, demonstrating stability when properly implemented. Status's Ethereum client uses chronos for P2P networking without major issues. Some developers achieved OpenBLAS-level performance for matrix multiplication using careful threading. But these successes require deep expertise and careful navigation of undocumented pitfalls.

The most telling community sentiment comes from RFC #295: "This is not the Nim concurrency story I want to tell to newcomers, and I think if you're honest, it's not the one you want to share, either." Developers consistently mention envying Go's simple goroutines, Rust's clear async semantics, and even Python's better-documented asyncio despite its complexity.

Technical necessity drives continuous evolution

The engineering reasons for Nim's concurrency churn stem from fundamental incompatibilities between memory management strategies and concurrency models. Thread-local garbage collection makes thread communication expensive, requiring deep copies for safety. Shared-heap models enable communication but introduce race conditions in reference counting without atomic operations or complex synchronization.

Performance measurements reveal each model's limitations. Async/await generates 3x more C code than equivalent threading due to closure iterator transformations and macro complexity. The standard threadpool's global queue creates contention without work-stealing for load balancing, making it unsuitable for dynamic parallelism. The new ORC memory manager conflicts with existing async implementations, causing memory leaks in exception handlers and steady memory growth in long-running servers.

Different concurrency models optimize for incompatible goals - async/await for high-throughput I/O, ARC for deterministic real-time performance, spawn/parallel for CPU-intensive computation, and CPS for maximum composability. These models can't compose because they make different assumptions about memory ownership, execution contexts, and error handling. You literally cannot await a spawned task because spawn assumes thread-local heaps while await assumes single-threaded execution.

Platform differences compound the complexity. Windows shows different memory leak patterns than Linux under ARC/ORC. NUMA architectures require memory locality awareness that current implementations lack. The threading infrastructure includes platform-specific resource management like closeHandle on Windows that doesn't exist on Unix systems.

Current state reveals transitional chaos

As of August 2025, Nim's concurrency exists in managed chaos. The standard library threadpool is deprecated but still present. Async/await remains the default for I/O but leaks memory under the default ORC manager. The community has fragmented across Malebolgia, taskpools, weave, chronos, and experimental CPS implementations. The nim-lang/threading repository offers new multi-producer multi-consumer channels for ARC/ORC, but adoption remains limited.

Breaking changes continue accumulating. Code using spawn must migrate to external libraries with different APIs. Async code may leak memory without -d:useMalloc workarounds. The performance regression from default threading means many applications run slower after upgrading. Documentation hasn't kept pace, leaving developers to discover solutions through trial and error or community forums.

Migration paths exist but require significant effort. Moving from standard threadpool to taskpools means rewriting spawn calls with different semantics. Choosing between asyncdispatch and chronos affects entire library ecosystems. Developers must understand memory management implications to choose between refc (stable but limited), ARC (fast but sharp edges), and ORC (default but problematic).

Active development continues on multiple fronts. CPS (Continuation Passing Style) promises composition across all backends with minimal overhead—prototypes show 4x fewer instructions than async/await. RFC proposals suggest making FlowVars and channels awaitable to bridge parallelism and concurrency. The threading library evolves toward better channel abstractions. But without unified direction, these efforts risk creating more fragmentation rather than convergence.

Conclusion

Nim's concurrency evolution from 2008 to 2025 represents a case study in the challenges of evolving fundamental language features while maintaining compatibility and performance. The journey from ambitious compile-time verification through pragmatic async/await to external library solutions reflects both technical constraints and philosophical shifts about where complexity should live in a programming language.

The current prescription from Andreas Rumpf - use Malebolgia or third-party solutions rather than standard library features - acknowledges that concurrency may be too complex and evolving for standard library stability. This represents a fundamental shift from languages like Go or Java that provide canonical concurrency models, toward a more fragmented but flexible ecosystem where different use cases choose different solutions.

The continuous changes aren't arbitrary but driven by genuine engineering challenges: incompatible memory models, performance regressions, safety requirements, and the fundamental difficulty of composing different concurrency paradigms. Until ORC stabilizes, integration between models is achieved, and performance regressions are resolved, Nim's concurrency story will remain one of evolution rather than stability. For developers, this means carefully evaluating requirements, accepting that current solutions are temporary, and being prepared for future migrations as the ecosystem continues its search for the optimal balance between power, safety, and simplicity.

Araq (orginal) [2025-08-15T06:28:04+02:00] view original

Some parts of this article are terribly written.

elcritch (orginal) [2025-08-15T07:42:14+02:00] view original

Yep, concurrency is still hard. Nim's concurrency ecosystem isn't beginner friendly, is fragmented, but does exist and work.

For context, Nim's concurrency has rough edges but can work well and be very productive. Many Rust devs complain about issues with its async being very difficult and having ecosystem fragmentation. Experienced Go devs routinely recommend avoiding Goroutines for serious concurrency due to lacking key patterns for error handling and lots of race conditions. Python's support for true multi-threading is still in experimental and asyncio isn't compatible with many existing Python web servers or libraries.

IMO, only Elixir/Erlang and Java have truly "easy" concurrency in 2025, maybe C# is in there.

My take for Nim in 2025 for production concurrency:

Threadpools: nim-taskpools

Async: Chronos

Structured Parallelism: Malebolgia, Weave, Taskpool

Channels: threading/channels

Web Server: Mummy

Best concurrency experience: OS Threads + nim-taskpools, --mm:atomicArc, -d:useMalloc, threading/channels, plain old withLock combined with thread sanitizer, avoid async or use waitFor if needing an async only library.

I'm finding it super easy to work on a new greenfield project with good performance.

Note: avoiding cycles isn't too hard, and every Obj-C/Swift, C++, and Rust developer already do it.

The new ORC memory manager conflicts with existing async implementations, causing memory leaks in exception handlers and steady memory growth in long-running servers.

Note, I don't believe this is true anymore, at least for Chronos which was rewritten to support ARC. I think the stdlib was also had a PR to be compatible with ARC as well. Being compatible with ARC means no cycles in the async library and should avoid memory leaks. Of course making applications written with async not have cycles and leaks might still be challenging.

Still async code has problems of code and memory bloat. Threads FTW!

planetis (orginal) [2025-08-15T10:45:38+02:00] view original

It's plug and play: https://github.com/planetis-m/mimalloc_nim try it!

arnetheduck (orginal) [2025-08-15T11:57:08+02:00] view original

at least for Chronos which was rewritten to support ARC.

Not quite - ORC and refc are not that different, ie the thing that materially differs between them is the root/liveness tracking mechanism - the rest that came with the ORC package (move support etc) is entirely orthogonal. ARC is simply a crippled ORC - not terribly interesting on its own except in special cases.

In particular, in both refc and ORC, circular references are expensive / bad and cause memory to not be reused as efficiently.

chronos was rewritten to be memory-efficient and less CPU-hungry under refc and a collateral benefit is that the core became ARC-friendly. Your application will still most likely not be ARC-compatible regardless, ie it takes additional effort to write things this way and it's easy to mess up. A circular reference is not that much different from a memory leak in non-gc'd languages, ie you have to be careful to not introduce them which is hard in "user code".

We might do something about making it more thread-friendly soon as well, ie either by combining with taskpools or similar - for example, we already have async futures for threadpool tasks which is the basic building block for a lot of cross-thread work.

Re CPS, it's mostly the same as the current closure iterator transformation and will lead to the similar state machine bloat as async - what differs is how you plug in the executor that schedules continuations, mainly - much of the bloat that chronos adds is actually due to other things, such as exception support (a major source of bloat!) and supporting implicit returns (which requires doubling the compile-time code size and thus causes compile time costs) and other bells and whistles to make it "feel" like non-async code - these have nothing to do with async per se and are more side effects of available language features - with CPS, the compiler can maybe "cheat" and do slightly better - or it could expose more of its analysis to the macro / trait system, and then frameworks like chronos could be less bloated.

Re atomic arc, this is not at all what we need for the chronos/taskpools-based stack at least - instead, we'd want to model it as passing ownership of data between threads - ie we don't want multiple threads to access the same data and therefore, atomicarc doesn't really make sense - all we need is an efficient method for transferring ownership of data from one thread to another - for that, a single-owner, movable type would be a lot more useful which is where early efforts have gone (ie reporting lots of move-only type bugs :) ), as far as that part of the ecosystem is concerned.

termer (orginal) [2025-08-15T13:52:25+02:00] view original

I'm not reading an AI-generated post with random made-up statistics. Get this garbage out of here.

elcritch (orginal) [2025-08-15T14:00:38+02:00] view original

@planetis Excellent! Could you add a comparison with -d:useMalloc as well?

ARC is simply a crippled ORC - not terribly interesting on its own except in special cases.

Nah, ORC is just a fancy bloated ARC. ORC is really only useful for special cases like dealing with graphs. :P

@arenetheduck depends on your use cases. Sure ORC is needed if you’re dealing with lots of self-referential trees or graphs. You folks at Status deal with those a lot. Most software doesn’t though.

To your point about cycles being tricky, I’d like an ARC mode where the cycle detector just errors / flags cycles because they can be painful to detect if accidentally introduced. Hmmm though valgrind should detect cycles too and be more precise about it.

all we need is an efficient method for transferring ownership of data from one thread to another - for that, a single-owner, movable type would be a lot more useful which is where early efforts have gone (ie reporting lots of move-only type bugs :) )

That’d be ideal, but tricky currently and requires more ecosystem support. Unfortunately there’s enough move bugs and gotchas with things like createThread not moving things properly unless done just right. I’m finding atomicArc more practical and most other devs will too. It’s not any more overhead than Swift/ObjC or C++ have with their ARC and shared pointer systems.

cmc (orginal) [2025-08-16T20:42:29+02:00] view original

I’d like an ARC mode where the cycle detector just errors / flags cycles

Oh yes please.

Araq (orginal) [2025-08-16T20:46:37+02:00] view original

Huh? Recent Nim versions got an API for that...


when defined(nimOrcStats):
  type
    OrcStats* = object ## Statistics of the cycle collector subsystem.
      freedCyclicObjects*: int ## Number of freed cyclic objects.
  proc GC_orcStats*(): OrcStats =
    ## Returns the statistics of the cycle collector subsystem.
    result = OrcStats(freedCyclicObjects: freedCyclicObjects)

mig (orginal) [2025-08-17T21:14:38+02:00] view original

ORC is really only useful for special cases like dealing with graphs

Or, better yet, just don't represent graph-like data structures with ref types ;) - seq + indices or a stable seq type / an arena allocation strategy with ptr works better 95% of the time

Hallicon (orginal) [2025-08-21T00:54:25+02:00] view original

If your goal is to use threads then just use Malebolgia. It's simplistic to use. I guess the only way to find out if it's good or bad is for you to simply try it than read someone else's opinions on it.

Mirror of forum.nim-lang.org

13322 :: The turbulent evolution of Nim's concurrency story (As of August 2025)