nimforum mirror - Sorted Trees in Nim

Beaches (orginal) [2025-08-10T21:11:57+02:00] view original

Quick question folks,

What is the best practice Nim version of:


std::map<Price, Qty, std::greater<Price>> b;
std::map<Price, Qty, std::lesser<Price>> a;

Do we have a way to express something like this using Nim std

cmc (orginal) [2025-08-10T22:50:34+02:00] view original

I think you want algorithm/sortedByIt

Araq (orginal) [2025-08-10T23:51:49+02:00] view original

There is compiler/btrees.nim. Why it's not in the stdlib, I don't know. But you can import it as


import "$nim" / compiler / btrees

Beaches (orginal) [2025-08-11T10:27:23+02:00] view original

Thanks! But not quite. I want preserved sorted structure with logn insertions and removals

Beaches (orginal) [2025-08-11T10:27:35+02:00] view original

This works, thanks!

cblake (orginal) [2025-08-11T13:05:48+02:00] view original

Deletion in B-Trees is a PITA. You might find the B-Tree in https://github.com/c-blake/adix useful as it supports deletion (as well as an optimized bulk insert and access by rank, not just key, or really only rank as well as a seq that can efficiently insert/delete in the middle of the address space). It's sadly not that easy to use, but tests/btshell.nim is an example instantiation.

cblake (orginal) [2025-08-11T17:09:31+02:00] view original

I should perhaps also have said, @Beaches, that the example data structure you seem to be developing looks like an order book of bid & and ask prices & quantities. For such, it tends to be both simpler and faster to use sorted arrays (seq in Nim-ese) in two opposite orders. In real-life, order arrival/departure tends to be SO strongly biased to be near the "top of the book" (where the bid is nearest to the ask, aka where the "market is made"/where market makers "cross the market") that shifting around 5..15 entries is very cheap on average even if the order book is 1000s of entries deep on each side. Every once in a long while you might shift by more, but that tolerable if it is rare enough. Now, you might have something extremely deep with odd dynamics, but you might really want to hold off on reaching for more advanced data structures with worst case guarantees until you have prototyped something against real order flow.

Beaches (orginal) [2025-08-11T19:52:34+02:00] view original

Ahhh interesting, This talk from cppcon seems to suggest similar(seq/vector) https://youtu.be/sX2nF1fW7kI?si=TyrKfxtTLdumtZWb but it seems to be for a diff reason. Your explanation makes much more sense to me tbh. Thanks a lot! By the way were are still in prototyping phase hehe.

cblake (orginal) [2025-08-11T20:49:03+02:00] view original

I would say he more completely analyzes the problems & FPGA solutions and all that, but the main point is the exact same one that I was making is in his light/dark blue "brief look at our data" slide at 23min:41sec into the video. Top of book is 1e3 to 1e5 times more active than deep in the book which is notably only a few hundred times deeper at its deepest, or at least what he's showing of it. If this feature of the world were NOT true, then he'd probably have had to do something different on the FPGA/give a different talk. :-) Anyway, pure software can benefit as well as the FPGA hardware solutions from these dynamics.

Perhaps as a more mundane point, this is also how slow-as-molasses human traders used to organize their order books and literally the very reason it is called the "top of the book" - no pages to turn for most of the orders & cancellations coming & going.

I haven't read accounts of their bookkeeping optimizations, but I can guess that they'd just cross out slots and replace those "hot cache" entries and such on busy / noisy trading floors with (in exciting times!) phones ringing and all sorts of chaos going on, essentially necessity being the mother of invention of low human latency human paper&pencil algos back before there was even an International Business Machines / IBM corporation. Probably shortcuts to tally totals fast & all that, too. Various methods might be in some interesting biographies of some floor trader of around 1900 early 1920s.

On a more mundane note, std::map in C++ STL is simply awful. At least as implemented by g++ it had ludicrous space overhead - like 80+ bytes for 2 pointers like 20 years ago when I last looked at it. It is also quite slow and inflexible. The STL in general is kind of a perf disaster past std::vector (which is barely anything more than realloc).

Beaches (orginal) [2025-08-12T12:30:52+02:00] view original

Wow, thanks so much for such a detailed answer! @cblake Added more perspective for sure. Hehe, you remind me of my teacher lol.

cblake (orginal) [2025-08-12T15:42:29+02:00] view original

You're welcome. Really you should reach out to me over email (my github is c-blake) for more questions / background about what you're actually doing. There are probably other things in my adix or other projects that would be of use, some easier to decide on the early side of a project rather than the late side, prototyping or not.

cmc (orginal) [2025-08-16T20:45:46+02:00] view original

reach out to me over email

if you don't mind- whatever is convenient, put it right here, it's a fascinating read.

cblake (orginal) [2025-08-20T19:11:03+02:00] view original

Well, it's all a rather big topic not easy to summarize. Things in adix that might be of interest are fast sorts by numeric keys, a lot of the time-weighted/windowed moving distribution work. That's because finance deals with prices and markets are very non-stationary as well as heavy-tailed things. (Of course, computer systems & networks are also non-stationary and heavy-tailed...) I try to document all that stuff pretty well, and in some ways almost every single module is a large conversation.

There is also the topic of "why" the dynamics of order books are dominated by the "top" of the book to the tune of multiple orders of magnitude. That's basically just how price discovery works. Essentially it costs participants a lot of money one way or another if they don't mostly jiggle around the top of the book in small motions and it's very easy to see why just from the idea of the book.

Elaborating on that just a little since it may not be obvious to newcomers, if you want to buy a gagillion units of something freely traded on the open market you will "usually" not want to "sweep the book". If you do just match against all open orders like that, then you pay a high average cost or get a low average benefit. You get whatever the total money of price*quantity "down the book" is - a kind of VWAP - until your gagillion is met.

Instead, if you do little jiggly motions at the top, you give the "other side" of the market time to notice and become a counter party to your trading. So, even if you have a big order, you break up your buying/selling up into many little orders spread out over time.

I mean, in a panic people do just sweep the book because there is a perceived "free fall" or "spike up" other physical metaphor or "FOMO" or other psychological slang. That is not "usual", by definition, though it probably is a near continuous spectrum rather than a discrete state of being. Nevertheless, real-time patience for order fulfillment is not a bad way to measure "panic" in financial markets, and it relates to other measures. E.g., if you have a bunch of book sweepers out there then the price moves a lot further in both directions - so any measure of that like the many kinds of "volatility" will also be large.

This is sufficiently off topic for the Nim forum, though, that I might also ask you to reach out by email. :-) But a first principles explanation of the "why" behind the data structures seems at least apropos of "programming".

Mirror of forum.nim-lang.org

13307 :: Sorted Trees in Nim