Hello everyone,
I would like to announce LimDB, a table-like interface to LMDB that allows you to persist data to disk using memory-mapped files and full database semantics uniquely easily. It is based on the really great nim-lmdb wrapper. Thanks, Federico! And of course, Howard, Andreas and contributors.
It works like so:
# save.nim
import limdb
let db = initDatabase("myDirectory")
db["foo"] = "bar" # that's it, persisted to disk
# load.nim
import limdb
let db = initDatabase("myDirectory")
echo db["foo"] # oh, it says "bar"
apt install liblmdb0 # or equivalent
nimble install https://github.com/capocasa/limdb # directory submission pending
nim r save.nim
nim r load.nim
Transactions, named databases and iteration is all supported, see the _API Documentation and the Code.
In real world use you would probably use it to save some kind of form data you get from jester. It would probably also work really well for machine learning datasets or perhaps game or app assets. It should also work well as an alternative for many tiny files because that's what the original use-case was- serving huge address books which were at the time were usually done with lots of little vcard files.
There is nothing particularly inventive here, just glue code that will probably be familiar and a really good and proven database with an often-used Nim wrapper. I hope this will lower the bar a bit to using fast in-process key value data storage with Nim.
One limitation I kept is that only strings are supported as keys and values, even though LMDB supports any blob of bytes. So you are for now expected to bring your own serializer. Flatty uses strings natively so it seems like a good match, but there is also Planetis-M and Frosty I know of.
The reason I left type support out for now is that I'm not quite sure how to do it in a way that would bring a lot of value above a BYOS, or bring-your-own-serilaizer, approach. Serializing costs mostly an additional memory chunk copy, which isn't that much. It also brings flexibility- you can serialize anything anywhere as long as you remember the format for each key. If this was a full Nim table-like made with generics then each key and each value of a database would have to be of the same type, a limitation that LMDB with doesn't have with its raw bytes.
Also, so far really can't wrap my head around how to do the whole pointer+length thing safely and flexibly- There's openArray but also UncheckedArray[bytes] and various objects with tho two fields. So strings it is for the time being.
What would be really nice- mostly for vanity- would be to implement an interface with Nim views. LMDB doesn't copy anything when reading data so it would be amazing if LimDB didn't either. For now it makes one copy, from the blob into the Nim-managed string.
I would love to hear what you think of the interface- it's not frozen yet- and also the implementation if you're interesting in looking in there. Bug reports are welcome too of course.
I like to use it with jester and sometimes combined with SQLite. Not really to take load off SQLite but because it's nice to keep the SQL tidy and store various ephemeral data blobs in a dedicated key/value instead of having to maintain a schema for them. LMDB is old, dependency should be readily available on most systems. And if your site actually does go into the zillions of views this actually would keep writing load off of SQLite.
I hope this ends up being useful for you and have a wonderful time.
Nice project, however I wanted to note one interesting fact about SQLite: > I like to use it with jester and sometimes combined with SQLite. Not really to take load off SQLite but because it's nice to keep the SQL tidy and store various ephemeral data blobs in a dedicated key/value instead of having to maintain a schema for them.
Funny thing is that even SQLite's own creator uses SQLite as a key-value and blob storage in his other projects, notably Fossil. And you can always just have a separate SQLite connection/file to keep your key-value storage separately from the main DB :)
Oh yes if you're looking to have fewer dependencies SQLIte is a perfectly adequate key value store.
But LMDB is full database semantics on memory mapped files- I don't know about you but there is something about that that just makes my mouth water.
Performance shows it as well- I think Howard bolted LMDB on to SQLite once as an experiment and the resulting Frankenbase was a heck of a lot faster just from the swapping the persistence layer.
What really interests me is how much worse is an LMDB store compared to just memory? Can I skip the loading step and just use the object like any other? That's an open question with LMDB and may require a view type but with SQLite the answer is most likely no.
It's better to consider this road: https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-qps-on-a-single-server/ if your DB is accessed from single server process.
sqlite is really a hidden gem with unbelievable scalability potential few people know about.
LMDB is a special beast.
Read this blog post from Cloudflare:
LMDB stability has been exceptional. It has been running in production for over three years. We have experienced only a single bug and zero data corruption. Considering we serve over 2.5 trillion read requests and 30 million write requests a day on over 90,000 database instances across thousands of servers, this is very impressive.
The only issue with LMDB is write amplification.
Hmm, very interesting, thank you. Key paragraph from text:
LMDB is also append-only, meaning it only writes new data, it doesn’t overwrite existing data. Beyond that, nothing is ever written to disk in a state which could be considered corrupted. This makes it crash-proof, after any termination it can immediately be restarted without issue. This means it does not require any type of crash recovery tooling.
Yes, the decision of non-overwriting existing data is a game changer which can make mmap-backed storage a perfect fit for some DB usage scenarios.