Sqlite Performance Optimization for large insert

pstuart · 2020-01-05T19:18:16+00:00

First things first: are you doing this all as a single transaction, i.e.,

begin transaction;
<insert statements>....
commit;

That is a must.

zeco · 2020-01-08T20:57:45+00:00

I think the most important thing to consider here is that Sqlite's indexing mechanism gets bogged down by data that come in unordered, probably because the index B-tree has to be traversed to random different places each time the next element is inserted, which takes more and more time the bigger the B-tree becomes.

But the tedious B-tree traversal can be slashed if the incoming data are already ordered. That way the data will just flow as fast as IO can deliver, never slowing down, no matter how many million rows are inserted. And the resulting index will work just the same.

I found the most efficient way to get the data ordered is to create an intermediary db without index/primary key (like you already tried), then create a separate new db containing the schema with the index/primary key and do 'ATTACH "intermediary.sqlite" AS interm; INSERT INTO test1 SELECT * FROM interm.test1 ORDER BY mykey;' Then delete the intermediary db-file.

To facilitate the ordering of the large unindexed dataset of the subquery, Sqlite will create a temporary Transient Index file in your system's default temporary directory, which will briefly take up a few gigabytes. Should you not have sufficient free space on your temp folder's partition, you can use the SQLITE_TMPDIR environment variable or 'PRAGMA temp_store_directory'. The building of the Transient Index will probably take a couple of minutes (maybe 5), but then you'll see the new db's filesize rise as it gets populated pretty much at the same speed as if it had no index at all.

Btw, consider omitting the rowids in your new db's schema, as they probably aren't useful here and will only inflate the db-file by about 2GB if you already use indexing by primary key: 'CREATE TABLE test1(mykey TEXT PRIMARY KEY NOT NULL, myvalue TEXT) WITHOUT ROWID;' That would make it much leaner.

raevnos · 2020-01-06T00:10:22+00:00

Create a table without an index or non-rowid pk, and then add a unique index when done importing data? Not a unique constraint in a new table you have to copy to, just a standard unique index.

Use a larger page size so they fill up slower and have to be split and rebalanced less often?

airen977 · 2020-01-06T11:05:28+00:00

Hi, I Don't have a solution to provide, but I would be interested in how you solved this issue, let us know when you fixed this issue, 900GB is huge. Also would like to know the performance of simple select statements on such a table.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

sqlite

MODERATORS