nimforum mirror - Why is db

rishavs (orginal) [2022-08-15T22:41:06+02:00] view original

Hey folks

I want to use Nim for server side development but the slow speed of db_postgres is making me hesitant. in the Techempower benchmarks, the db queries for nim stdlib have been benchmarked as being extremely slow.

I am opening this forum post to discuss why the postgres driver is so slow and what can we do to improve it. I think having a performant db driver will go a long way in making great for webdev as well.

The two immediately likely candidates that come to mind are that the db_postgres doesn't supports async queries and that the libpq it uses is old. Are these assumptions correct. Are there any other reasons?

treeform (orginal) [2022-08-15T23:07:46+02:00] view original

I can't find Nim's db_postgres benchmarks on Techempower. Do you have a link?

It uses the libpq nearly directly it should be roughly same speed as C. Is there a pure C libpq benchmark? Is there a faster library than libpq that nim could use?

Postgres itself is not that great with async. In my library I have an async pool of 10 connections: https://github.com/treeform/pg - the async does not really improve performance but does provide comfort when using from other async code.

Nim-Postgres layer speed is never been my bottleneck its always just internal postgres query time.

jasonfi (orginal) [2022-08-16T06:23:11+02:00] view original

You need to look at how the benchmark was implemented, not all are equal, but sometimes this is because of a technical problem with the library.

Relatedly, I've just found that a race condition occurs when multiple threads try to open a DB connection with db_postgres. While a lock fixes this, every dev will the same problem and need to implement a lock when they realize it's needed, so the lock really needs to be in db_postgres and possibly the other stdlib DB libraries.

I filed this issue yesterday: https://github.com/nim-lang/Nim/issues/20231

Polarian (orginal) [2022-08-16T07:44:02+02:00] view original

Are you using postgres on the same server, or is it over a network. If it is over the network it could be that there is considerable latency, especially if you are accessing a postgres server over the internet (WAN) instead over LAN.

The older libpg should not be a massive issue, but it depends solely on how old the version used is, and also whether there was any major issues patched in the newer versions, or if it was just small fixes.

You have to bare in mind, although postgres is fast compared to other SQL servers, it still is bottlenecked by the complexity of sql, and it also depends on the complexity of the tables, if the tables within the database are very complex there is additional overhead.

If you are writing large amounts of data to and from a postgres server, I highly recommend using a cache, either storing the data on memory and using a thread which syncs it with the database every x period (x being a set period of your desire, more often == more data stability, in case of sudden shutdown, but higher overhead). Furthermore, if you are caching in memory all the data you need, you should be able to easily use a separate thread for syncing the data with the persistent database.

Or you could use @treeform idea of using async, and use a postgres pool, and each time you want to modify the data, you modify the data in memory and then pass this data into a async task which syncs it with the database, this means that the data can be accessed quickly because it is on memory, and also has the stability of a persistent database which is updated immediately, the only downside is that having a pool and async tasks does have overhead, but being lightweight and speed often go against one another, lightweight implies a low memory fingerprint, while speed uses memory to reduce the latency of fetching data, more you cache in memory the less latency you have fetching from disk, or even worse, over the network!

TLDR: The speed of postgres depends on the complexity of the database and also where the server is running, other factors can affect the speed of the database operations. If you need fast data access and modification, using a persistent database directly is slow, you should use a cache in memory to do so.

Hope this helps :)

rishavs (orginal) [2022-08-16T08:53:19+02:00] view original

The benchmark numbers are here; https://www.techempower.com/benchmarks/#section=test&runid=ecfbe3c9-f577-4836-9403-41f9120dfc6e&l=zdk8an-1kn

nim stdlib is showing 0.6% of the top framework in Rust. Now while I don't expect nim to come in the above 60% group as most of those frameworks abuse db pipelining, I was expecting Nim to fall in the general area around crystal, kotlin etc - at about 20%+ of the fastest frameworks out there. 0.6% means there is some serious problem with the db driver. If you look at the other benchmarks, which are not about db connections, Nim ranks fairly high.

Here's the link to the code. https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Nim/nim-stdlib/src/handlers.nim

I had actually written this PR. I like Nim and really wanted it to work for me in webdev, but the benchmark numbers are way too low.

@treeform, while db_postgres uses libpq internally, I am not sure it uses async queries properly - which is a huge speed loss. WHile your library wraps db_postgres in async, I am not sure if there is much improvement in perf if db_postgres internally doesn't use the async query apis. https://www.postgresql.org/docs/current/libpq-async.html

@polarian, thanks! but the benchmarks use extremely simple data insertion and queries and the same db data apply to all benchmark targets.

dom96 (orginal) [2022-08-16T12:24:01+02:00] view original

The bottleneck here is likely that db_postgres blocks. I would give @treeform's async library a try.

On the other hand, even though Nim stdlib is 0.6% of the top Rust framework that's still nearly 4k QPS. If you want to help market Nim then look into optimising this, but if you want to just get shit done then 4k QPS is going to be more than enough. You'll run into other bottlenecks before the db driver becomes a problem (though for long-running queries you will need async so that a single request doesn't block everything).

The only reason Nim is near the top is because myself (and others) spent significant time making httpbeast fast. Anyone can do the same for any language. So these benchmarks are largely meaningless.

rishavs (orginal) [2022-08-16T14:33:47+02:00] view original

The bottleneck here is likely that db_postgres blocks. I would give @treeform's async library a try. I don't think it will help considering that @treeform's lib wraps db_postgres itself, instead of libpq
So these benchmarks are largely meaningless when it comes to how fast a language is, it's all about the particular libraries.

Agreed. That why I started this discussion on db_postgres, instead of Nim in general - as any improvement that we do, will have to be done in the libpq wrapper.

If you want to help market Nim then look into optimising this,

Yep. I am hoping to improve the webdev story for Nim. I myself am not blocked on this.

juancarlospaco (orginal) [2022-08-16T17:14:52+02:00] view original

https://nim-lang.github.io/Nim/db_postgres.html#unix-socket

anusree (orginal) [2022-08-16T17:57:57+02:00] view original

PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it's easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands.

Polarian (orginal) [2022-08-18T15:13:31+02:00] view original

You are free to write a postgres client fully in nim, the protocol is open, its probably a lot easier to have a full nim varient, instead of wrapping a library. It means you have more control instead of just adding nim bindings to a library, and you can design the infrastructure yourself (for example to make it fully async based, which will help with the blocking problem)

anusree (orginal) [2023-02-16T16:47:31+01:00] view original

The key problem seems to be IOPS - it's incredibly meager and pegged directly to storage size and unless you increase your storage size to terabytes and spend.

<a href="https://anonigstalk.com/">anonigviewer</a> <a href="https://bingenerator.one/">bingenerator</a>

alexeypetrushin (orginal) [2023-02-17T11:44:45+01:00] view original

A bit of critique... there somtimes questions arise why nim is not popular. This topics explains why.

Working with DB is very frequent task. And what Nim does? It bocks on network calls. This is not just slow, the "normal" CPU bounded slowness, this slownes is "wild" as its network bounded.

And the suggestion to write db driver themselves is even better, I imagine everyone deciding to adopt new language in a business would love this task :).

Also asynk, az I mentioned in critique async, you cant do it half way. You ether shall go full async with all 100% IO async only, or dont use async at all. Otherwise there going to be all sort of unpredictable blockings, and things like "but you cant use it for long running queried" etc.

alexeypetrushin (orginal) [2023-02-17T11:50:08+01:00] view original

And for many businesses, the software is just a smart router for IO data streams. Talk to one db, enrich it with data from another, fetch cashe, check access right in auth service, etc.

So reliable and simple IO is critical.

Nim missed that market completelly. As its IO is immature has lots of bugs and surprises and very hard to work with.

coffeepot (orginal) [2023-02-17T19:12:47+01:00] view original

I expect a not insignificant slowdown with the stdlib's db_* modules is because they don't use parameter binding and instead use string interpolation all the way down.

I wrote a lib ages ago (https://github.com/coffeepots/odbc) that uses proper parameter binding and was written with performance in mind, allocating only when necessary and using machine equivalent types instead of strings for everything. Back then, I benched it against Delphi's production ready ODBC driver and it was faster, and a lot faster than Nim's stdlib implementations.

I expect proper parameter binding would improve the stdlib performance significantly for the following reasons:

bound parameters can be changed without retokenising and preparing the query each time - this often means a round trip to the database over the network, recreating execution plans, and wrecking the database's plan caching mechanisms,

parameters are typed instead of just text which saves the database having to spend time translating this for you, helps planning efficient queries, and drivers can sanity check things client side,

string processing in databases is extremely complex with layers of translation happening based on your client collation settings and server collation settings and database collation settings as well. Then there's the complexities of internationalisation and values (1,000 vs 1.000, etc.),

there's no translation needed from strings to 'real' data types by the client when receiving data from queries,

no copying and heap churn from concatenating strings (especially for large data values),

less bandwidth sending native data.

Using strings to bind parameters is also not recommended from a security perspective for obvious reasons (e.g., little bobby tables). Personally I wouldn't even consider using string interpolation for query parameters in a production environment, and certainly not if your parameters are from user input.

Somewhere I had an experimental patch for stdlib to use parameters properly but unfortunately I didn't get around to submitting at the time and things have probably changed too much to use it now. It might be worth digging that out in light of the above.

Mirror of forum.nim-lang.org

9374 :: Why is db_postgres so slow?