nimforum mirror - Ctrie in Nim for concurrent iterators?

elcritch (orginal) [2022-07-24T03:25:24+02:00] view original

I ran into Ctries they other day and found them intriguing. It extends HAMT (trie hash maps) to be concurrent and provide non-blocking snapshots. Anyone tried implementing one in Nim? A quick search didn't yield anything.

What particularly seemed interesting to me was creation of concurrent safe iterators in O(1) time using the snapshots. See the Zephyr RTOS kernel makes extensive use of singly-linked lists to implement their thread safe queues so they could concurrently update and iterate on them. However their kernel queues were unbounded which sorta bugs me.

Might Ctrie's be a good alternative for that sort of low level system data structure? I haven't fully thought through the details, just ideating. Perhaps it'd be useful for something like sharing/synchronizing pages of memory. Combine that with the ability to make non-blocking iterators seems handy.

elcritch (orginal) [2022-07-24T03:26:55+02:00] view original

Here's an example: https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/trit.2018.1040 and here's one of the original papers: http://aleksandar-prokopec.com/resources/docs/p137-prokopec.pdf

Araq (orginal) [2022-07-24T17:26:39+02:00] view original

Controversial opinion ahead: All this lockfree stuff is so severely limited and error-prone and hard to reason about that it's to be avoided at almost all costs.

For example: Serial code like tab[k] = tab.len # map k to unique ID is not correct in a multi-threaded setting just because you switched tab to be a concurrent hash table. (And the concurrent hash table better does not provide a len operation for this reason.)

Usually you need a single lock to cover multiple different data structures for correctness, locks can do that easily but if every data structure by itself is lockfree nothing was gained except unnecessary complexity. If contention becomes an issue, split the data structures into buckets and use lock striping. If you need guaranteed O(1) behavior for these designs use a provably fair lock implementation and base your WCET on the maximum number of threads.

Disclaimer: I wrote this when I had a minor sunstroke.

elcritch (orginal) [2022-07-25T11:28:00+02:00] view original

Controversial opinion ahead: All this lockfree stuff is so severely limited and error-prone and hard to reason about that it's to be avoided at almost all costs.

I'd be disappointed if you didn't. ;) Generally I'd agree with you about lock's being simpler and better.

The use case on my mind for ctrie's would be in a kernel. Using linked lists protected with spin locks and iterating on them for thread tracking, queues, etc. However it all seems very error prone. In that case the complexity of a well designed lock free DS might be a better than spin-lock linked lists everywhere. Though it could just be mostly due to C's limitations.

Araq (orginal) [2022-07-25T11:39:35+02:00] view original

So let's assume that you iterate over every entry in a list and compute something simply like a sum. While you iterate new entries are being added. Since it's threadsafe and lockfree there is no real bug here, the linked list supports this traversal. But what are the actual semantics? You computed the sum or what exactly? Of some interim limbo state -- very hard to reproduce, very hard to debug. That's the problem.

mratsim (orginal) [2022-07-28T11:04:52+02:00] view original

I've looked into concurrent hash maps to keep track of tasks in a multithreading runtime.

There are some use-cases that would greatly benefit from a concurrent hashmap-like data structure.

However like Araq mentioned, it should absolutely not be marketed as a table and reuse the usual single threaded idioms.

For instance I like that Nim channels use peek instead of len as for a lockfree concurrent data structure, the number of items held is an approximation due to concurrent accesses.

elcritch (orginal) [2022-07-29T10:19:35+02:00] view original

While you iterate new entries are being added. Since it's threadsafe and lockfree there is no real bug here, the linked list supports this traversal. But what are the actual semantics?

Of course the semantics are tricky and I don't like lockless lists, but they're used a lot in kernel land so they are (hopefully) valuable enough to be worth the pain in some cases. The Linux docs list their reasons for lockless lists (RCU lists):

"A widely used usecase for RCU lists in the kernel is lockless iteration over all processes in the system. task_struct::tasks represents the list node that links all the processes. The list can be traversed in parallel to any list additions or removals."

The RCU lists aren't general purpose linked-lists and only make sense in a few specific use cases. Unfortunately those use cases are often interesting ones (for me that's low level, bare metal, or low latency data shuffling) and solutions like striped locks aren't ideal either. But on the lockless pain points, there's a good quote from LWN:

... They simply do not think in this kind of language. As long as lockless algorithms require this sort of understanding, they will be underused and many of the implementations that do show up are likely to be buggy in subtle ways.

Those pain points are why ctries intrigued me. They inherit some the "time travel" properties from HAMT's. A ctrie iterator creates a snapshot at a point-in-time. That means it should be possible to take snapshots at different points. That could make them actually debuggable unlike say the RCU lists. Perhaps something sort of like this. There was also a video of Clojure folks using their HAMT stuff to enable time-travel debugging web ui's.

I think it'd be nice to have a lockless DS that isn't completely a pain to use. As in locks are better for general cases, but if you have a specific use case where lockless is preferable here's a usable lockless DS.

However like Araq mentioned, it should absolutely not be marketed as a table and reuse the usual single threaded idioms.

Certainly! Hopefully that's clear from my previous posts. I'm interested in some specific use cases, i.e. implementing the core of a buffer-page sharing between different cores where latency is critical.

For instance I like that Nim channels use peek instead of len as for a lockfree concurrent data structure, the number of items held is an approximation due to concurrent accesses.

Good point. Implanting something like a ctrie should probably provide a limited API. Perhaps excluding ref types as well, etc. But having a concurrent hash-map style DS with valid point-in-time iterators just seems like it'd be useful for some scenarios. Even more so if you could enable cheap(ish) time-travel debugging on it.

Eh, I'm too busy to implement it at the moment anyways. I just like keeping a lookout for potentially useful concurrent DS's and I was hoping someone had implemented a version in Nim to play with.

Araq (orginal) [2022-07-29T22:08:11+02:00] view original

Unfortunately those use cases are often interesting ones (for me that's low level, bare metal, or low latency data shuffling) and solutions like striped locks aren't ideal either.

I'm quite familiar with RCU. I remain skeptical though. It's not all that clear to me that a kernel really needs all this complexity and performance, instead provide an API that allows for many more things to run in user space. Maybe a micro kernel, maybe an exokernel, maybe something new entirely.

Regardless of the kernel's architecture most systems have bottlenecks that are not in the kernel.

mratsim (orginal) [2022-07-30T15:56:04+02:00] view original

Regardless of the kernel's architecture most systems have bottlenecks that are not in the kernel.

A lot is bottlenecked on IO (networking, disk, pipes) and to provide selectors or completion or readiness-based IO you need to provide an efficient threadsafe event subscription mechanism in kernel.

C-trie might be a good fit for this.

Mirror of forum.nim-lang.org

9321 :: Ctrie in Nim for concurrent iterators?