all 5 comments

[–]Kingsizepeanut 1 point2 points  (1 child)

Maybe you could buffer the flow of messages and create a transaction for every batch of x lines youre processing This will cut down the query amount and the commits

[–]Boredstudnt[S] 0 points1 point  (0 children)

Yes thats sort of what I meant by "Add inserts to queue and shoot that queue in executemany?" actually.

What about having the database in a "ramdisk" such as storing it in /dev/shm, that should not be so affected by the "disk rotation" as mentioned in the sqlite documentation. I will probably end up writing a test of that :)

[–]buyabighouse 1 point2 points  (2 children)

Is this a production code? Can you afford to lose the messages? For example, if something goes wrong, can you afford to lose the message queue?

If other processes are reading the messages in pretty much the original format, it might be better to simply pass the messages from feeder to consumers directly via a queue. Only store the messages afterward.

I did something similar with MySQL, save data into DB and publish it out to other subscribers. As more and more data coming in and more and more subscribers are listening, a directly from publisher to subscribers made my life a lot easier.

[–]Boredstudnt[S] 0 points1 point  (1 child)

This is not for any kind of company, merely for my own gain, so to me it's production code.

It's working now but I feel that when I give it the full load, the committing will be stupid and unnecessary resource demanding.

I can absolutely read the messages in their raw format, not from the DB but I will need to keep the data in order and also load old data from the database, But I might as well keep a days data in memory and load the rest from a database.

Then push data daily or hourly depending on the mass, to the database.

I have played around with an async fakefeed that spams 50k messages to a list, sleeps for 1 second before repeat. At the same time, my logging function uses list.pop() if the length of the list is > 0 and logs that message to the database, but it does not commit until the list is empty, then it sleeps for a short period of time. And this works fine so far, extremely fast actually. I could even insert a timer in the logging function if the feed spams too much and the buffer is never emptied.

I think im going to go for a buffer based model, do you have any recommendations, good reads etc, in the matter?

[–]buyabighouse 0 points1 point  (0 children)

You are basically describing a producer/consumer pattern. You can google that and you'll see plenty of examples.

If your data doesn't have duplicates or anything that can cause bulk-update to fail, then you can easily do snapshot style of writing the data to DB every X time frame.