nimforum mirror - Announce: LimDB, a fast, persistent table with LMDB under the hood

cmc (orginal) [2022-06-06T02:30:41+02:00] view original

Hello everyone,

I would like to announce LimDB, a table-like interface to LMDB that allows you to persist data to disk using memory-mapped files and full database semantics uniquely easily. It is based on the really great nim-lmdb wrapper. Thanks, Federico! And of course, Howard, Andreas and contributors.

It works like so:

# save.nim
import limdb

let db = initDatabase("myDirectory")
db["foo"] = "bar"  # that's it, persisted to disk

# load.nim
import limdb

let db = initDatabase("myDirectory")
echo db["foo"]  # oh, it says "bar"


apt install liblmdb0  # or equivalent
nimble install https://github.com/capocasa/limdb  # directory submission pending
nim r save.nim
nim r load.nim

Transactions, named databases and iteration is all supported, see the _API Documentation and the Code.

In real world use you would probably use it to save some kind of form data you get from jester. It would probably also work really well for machine learning datasets or perhaps game or app assets. It should also work well as an alternative for many tiny files because that's what the original use-case was- serving huge address books which were at the time were usually done with lots of little vcard files.

There is nothing particularly inventive here, just glue code that will probably be familiar and a really good and proven database with an often-used Nim wrapper. I hope this will lower the bar a bit to using fast in-process key value data storage with Nim.

One limitation I kept is that only strings are supported as keys and values, even though LMDB supports any blob of bytes. So you are for now expected to bring your own serializer. Flatty uses strings natively so it seems like a good match, but there is also Planetis-M and Frosty I know of.

The reason I left type support out for now is that I'm not quite sure how to do it in a way that would bring a lot of value above a BYOS, or bring-your-own-serilaizer, approach. Serializing costs mostly an additional memory chunk copy, which isn't that much. It also brings flexibility- you can serialize anything anywhere as long as you remember the format for each key. If this was a full Nim table-like made with generics then each key and each value of a database would have to be of the same type, a limitation that LMDB with doesn't have with its raw bytes.

Also, so far really can't wrap my head around how to do the whole pointer+length thing safely and flexibly- There's openArray but also UncheckedArray[bytes] and various objects with tho two fields. So strings it is for the time being.

What would be really nice- mostly for vanity- would be to implement an interface with Nim views. LMDB doesn't copy anything when reading data so it would be amazing if LimDB didn't either. For now it makes one copy, from the blob into the Nim-managed string.

I would love to hear what you think of the interface- it's not frozen yet- and also the implementation if you're interesting in looking in there. Bug reports are welcome too of course.

I like to use it with jester and sometimes combined with SQLite. Not really to take load off SQLite but because it's nice to keep the SQL tidy and store various ephemeral data blobs in a dedicated key/value instead of having to maintain a schema for them. LMDB is old, dependency should be readily available on most systems. And if your site actually does go into the zillions of views this actually would keep writing load off of SQLite.

I hope this ends up being useful for you and have a wonderful time.

Yardanico (orginal) [2022-06-06T07:00:37+02:00] view original

Nice project, however I wanted to note one interesting fact about SQLite: > I like to use it with jester and sometimes combined with SQLite. Not really to take load off SQLite but because it's nice to keep the SQL tidy and store various ephemeral data blobs in a dedicated key/value instead of having to maintain a schema for them.

Funny thing is that even SQLite's own creator uses SQLite as a key-value and blob storage in his other projects, notably Fossil. And you can always just have a separate SQLite connection/file to keep your key-value storage separately from the main DB :)

cmc (orginal) [2022-06-06T09:48:18+02:00] view original

Oh yes if you're looking to have fewer dependencies SQLIte is a perfectly adequate key value store.

But LMDB is full database semantics on memory mapped files- I don't know about you but there is something about that that just makes my mouth water.

Performance shows it as well- I think Howard bolted LMDB on to SQLite once as an experiment and the resulting Frankenbase was a heck of a lot faster just from the swapping the persistence layer.

What really interests me is how much worse is an LMDB store compared to just memory? Can I skip the loading step and just use the object like any other? That's an open question with LMDB and may require a view type but with SQLite the answer is most likely no.

cumulonimbus (orginal) [2022-06-06T14:17:47+02:00] view original

It's been years since I've used LMDB, and it was amazing. Note that there's a fork called MDBX which claims to be much better (do not test myself, but heard from others that these claims are credible). It appears to have moved from github to here: https://gitflic.ru/project/erthink/libmdbx#improvements-beyond-lmdb

cmc (orginal) [2022-06-06T15:19:56+02:00] view original

Yes, and there's a Nim wrapper for MDBX as well, NimDBX! But aside of portability issues that come from NimDBX' Nimterop usage I wanted the original for its maturity.

Gtriangle (orginal) [2023-02-17T12:50:55+01:00] view original

Great work, thank you.

SerjEpatoff (orginal) [2023-03-13T01:55:46+01:00] view original

DB–on–top–of–mmap idea is up in the air for many decades. But unfortunately no serious DB engine writers are using it in production. Some of them tried, early MongoDB is one example, but obviously gave up. Let's try to reason about this status quo.

First of all, read this paper: https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf

Then try to think about the fact that mmap is a completely opaque black box provided by the OS. You have no control on atomicity, page faults, liftings, implicit syscalls, caching strategy, coherence across CPU cores. Operations that look like RAM access may fall back to kernel for unpredictable time. Semantics of concurrent read-modify-write from multiple threads are not stricty specified. And so on, and so forth.

All brave–hearted persons thinking that mmap is good for DB just don't know what they don't know. Explicit memory area(s), explicit storage area(s), explicit up–down lifting algorithms under full control of developer is a strict must for DB engine.

It's better to consider this road: https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-qps-on-a-single-server/ if your DB is accessed from single server process.

sqlite is really a hidden gem with unbelievable scalability potential few people know about.

radsoc (orginal) [2023-03-13T09:23:19+01:00] view original

LMDB is a special beast.

Read this blog post from Cloudflare:

LMDB stability has been exceptional. It has been running in production for over three years. We have experienced only a single bug and zero data corruption. Considering we serve over 2.5 trillion read requests and 30 million write requests a day on over 90,000 database instances across thousands of servers, this is very impressive.

The only issue with LMDB is write amplification.

SerjEpatoff (orginal) [2023-03-13T15:16:05+01:00] view original

Hmm, very interesting, thank you. Key paragraph from text:

LMDB is also append-only, meaning it only writes new data, it doesn’t overwrite existing data. Beyond that, nothing is ever written to disk in a state which could be considered corrupted. This makes it crash-proof, after any termination it can immediately be restarted without issue. This means it does not require any type of crash recovery tooling.

Yes, the decision of non-overwriting existing data is a game changer which can make mmap-backed storage a perfect fit for some DB usage scenarios.

Mirror of forum.nim-lang.org

9210 :: Announce: LimDB, a fast, persistent table with LMDB under the hood