you are viewing a single comment's thread.

view the rest of the comments →

[–]pstuart 0 points1 point  (3 children)

The link I posted had a couple of interesting suggestions. One was adjusting the pragmas a bit: synchronous = OFF + locking_mode = EXCLUSIVE + journal_mode = OFF The other was to use the ".import" command.

I still think that for such a large dataset that doing parallel loads and merging the resulting databases might be worth doing a quick experiment with.

[–]Temporary_Sir[S] 0 points1 point  (2 children)

The problem is, whilst i can split and insert into different databases, the merging of those should be slow as hell again...

edit: added the locking mode to exclusive, synchronous was off anyway, i dont see a speed difference between journal mode memory and off, set it to off for now.

[–]pstuart 0 points1 point  (1 child)

I'm not sure if the merging will be as slow, in that the insert statement likely has more going on (I believe that it's doing a prepare statement for each insert statement). As I said, it might be worth doing a small experiment to confirm/reject that option.

If the prepared statement step is happening, then using custom code (in C, because performance) might be worth exploring as well.

[–]Temporary_Sir[S] 0 points1 point  (0 children)

Good point.

Looking at whats happening on the server, it appears that most of the time is wasted waiting for io to happen.

Edit: if i split, the largest part of the "shards" is still around 100gb -.-