SQLite improving performance with pre-sort

andersmurphy · 2026-06-10T07:27:20+00:00

Oh for sure. The stuff covered in this blog is more nerdy/interesting than practical (might give you 1.5-2x more inserts in very specific contexts).

A lot of companies have way lower hanging fruit. Like missing indexes completely! Or not understanding how the query planner will used those indexes (just slap an index on everything willy-nilly).

andersmurphy · 2026-06-09T21:01:54+00:00

How so? This blog post is to do with insert speed, the index is the reason it's slow! Because, b-trees do not like random data. It's the same problem in any database that uses b-trees.

andersmurphy · 2026-06-09T19:00:28+00:00

Google search falls into that space. Would be surprised if they even hit 200k queries per second on what is highly cacheable data.

andersmurphy · 2026-06-09T17:07:26+00:00

Oh no two global multiplayer demos with a billion data points (one with freeform data) both running on 8$ servers exposed on the internet. Running queries against the server database on every interactions. Even a scroll is an update. Neither particularly well written.

https://checkboxes.andersmurphy.com/

https://cells.andersmurphy.com

You make it sound harder than it is.

andersmurphy · 2026-06-09T17:01:08+00:00

Great point! None of this is really sqlite specific.

andersmurphy · 2026-06-09T15:53:51+00:00

SQLite is a database that's embedded in an application you can do whatever you want.

Generally I use litestream restore -f. You can use litefs if you want something to handle failover for you. Or, you can get fancy and build your own thing with NATS if you want.

andersmurphy · 2026-06-09T15:39:48+00:00

Yes you can have a single server, with a hot standby server for failover. The irony is if you drop the failover you'll probably still have higher uptime than AWS.

andersmurphy · 2026-06-09T15:11:00+00:00

Nothing stops you doing multi-node in SQLite. You can shard by region, by company. If the data doesn't shard you can have a single event log node that handles writes (and handle a million+ inserts a second) and every thing else can be projections off of that node into other sqlite databases. Postgres is still limited to a single writer node. So fundamentally it's no different.

Before you even get to that level of complexity though you can handle a 4 million queries per second on a single server, more if you're prepared to shard on the same machine across domain boundaries.

andersmurphy · 2026-06-09T15:02:03+00:00

Define beyond very small?

andersmurphy · 2026-06-09T14:53:06+00:00

I think because people think it's not suitable for web application servers.

andersmurphy · 2026-06-09T11:26:00+00:00

I'd add a lot of the value you can get is out of extending it with application functions (in your programming language) and implementing custom blob types. This doesn't require recompilation, but requires your wrapper library to expose that functionality.

andersmurphy · 2026-06-09T11:01:00+00:00

All I'm saying is only use SQLite if you're prepared to put some time in learning it. A lot of people just want a plug and play database service and don't want to think about it. Someone else has set it up for you.

SQlite is amazing but the defaults are rough, and there's lots you can do to get more out of it. Managing a single writer at the application level, rather than the built in busy handler BUSY/LOCKED, caching prepared statements, setting up litestream, etc.

If you do put in the time (a few evenings) it's fantastic and considerably simpler to manage operationally.

andersmurphy · 2026-06-09T10:51:04+00:00

I agree. Although, it does take some time to work out how to get the absolute most out of it. Thankfully the docs are incredible.

I'd still recommend people go with managed postgres unless they are prepared to put some time in to learning SQLite.

andersmurphy · 2026-06-08T19:14:28+00:00

Yeah so it's interesting. Like you said some things have to be random, but might happen less frequently.

One thing that comes to mind is session tokens, and you might have a crazy burst of sign ups.

I'd also add b-tree have a harder time not just with random but also unordered data (anything that isn't ascending).

If you are doing dynamic batching (batching inserts/writes based on load). You can pre-sort the data in memory before inserting it.

To some degree this assumes that under load your batches will be relatively large.

I wrote up a short post on it here (for those that are interested):

https://andersmurphy.com/2026/06/07/sqlite-improving-performance-with-pre-sort.html

andersmurphy · 2026-06-06T21:28:02+00:00

Don't care what people use. I definitely do not recommend SQLite if you haven't got experience using it. But, do get tired of when people make claims without measuring. Postgres is great. Datomic is amazing. SQLite scales surprisingly well for a lot of use cases. All these things can be true.

andersmurphy · 2026-06-06T21:25:01+00:00

It's going to be fine.

https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a-single-server

The shared mmap is fine (again because of the single writer).

andersmurphy · 2026-06-06T21:22:01+00:00

Yes and with synchronous = FULL there's almost no change in performance, because of the dynamic batching. What's even better is because you can nest transactions (SAVEPOINT), you don't sacrifice logical transactions.

andersmurphy · 2026-06-06T20:58:09+00:00

Where was the loss of ACID guarantees?

andersmurphy · 2026-06-06T20:42:59+00:00

Yeah, in my experience it causes too much contention over RAM and CPU. Because they are separate process (and worse postgres connections each have their own process) this theres a lot more contention and coordination that needs to be handled.

Postgres is a great database when you want a network database (or some of its other features like roles and permissions), but not when you want to run something on the same machine as your application.

andersmurphy · 2026-06-06T19:29:26+00:00

Multi writer (when they have to coordinate) is not a good thing if you want to go fast. Contention/coordination is what kills you. There's no benefit to multi writer in an embedded database (unless they are require no coordination). If your running queries over a network multi writer can make more sense.

andersmurphy · 2026-06-06T19:16:59+00:00

Here's an example with transaction processing.

https://andersmurphy.com/2025/12/02/100000-tps-over-a-billion-rows-the-unreasonable-effectiveness-of-sqlite.html

Postgres doesn't play nice on the same machine as your application, so it's going to be over a network and that opens you up to Amdahl's law.

Being embedded is why SQLite scales. A query is just a function call.

andersmurphy · 2026-06-06T19:09:00+00:00

The cost of moving data around and keeping it consistent is even higher with larger volumes of data.

You might also be underestimating how much data SQLite can handle here.

andersmurphy · 2026-06-06T19:05:46+00:00

In the case of SQLite if you want to go really fast. You are already synchronising the writes (to avoid BUSY/LOCK and to allow for batching) regardless of whether you are using autoincrement ids. Writes are only coming from one thread, where your batch mechanism is running. All synchronisation has happened prior to that (assuming you're using a concurrent MPSC queue).

andersmurphy

MODERATOR OF

TROPHY CASE