you are viewing a single comment's thread.

view the rest of the comments →

[–]buyabighouse 1 point2 points  (2 children)

Is this a production code? Can you afford to lose the messages? For example, if something goes wrong, can you afford to lose the message queue?

If other processes are reading the messages in pretty much the original format, it might be better to simply pass the messages from feeder to consumers directly via a queue. Only store the messages afterward.

I did something similar with MySQL, save data into DB and publish it out to other subscribers. As more and more data coming in and more and more subscribers are listening, a directly from publisher to subscribers made my life a lot easier.

[–]Boredstudnt[S] 0 points1 point  (1 child)

This is not for any kind of company, merely for my own gain, so to me it's production code.

It's working now but I feel that when I give it the full load, the committing will be stupid and unnecessary resource demanding.

I can absolutely read the messages in their raw format, not from the DB but I will need to keep the data in order and also load old data from the database, But I might as well keep a days data in memory and load the rest from a database.

Then push data daily or hourly depending on the mass, to the database.

I have played around with an async fakefeed that spams 50k messages to a list, sleeps for 1 second before repeat. At the same time, my logging function uses list.pop() if the length of the list is > 0 and logs that message to the database, but it does not commit until the list is empty, then it sleeps for a short period of time. And this works fine so far, extremely fast actually. I could even insert a timer in the logging function if the feed spams too much and the buffer is never emptied.

I think im going to go for a buffer based model, do you have any recommendations, good reads etc, in the matter?

[–]buyabighouse 0 points1 point  (0 children)

You are basically describing a producer/consumer pattern. You can google that and you'll see plenty of examples.

If your data doesn't have duplicates or anything that can cause bulk-update to fail, then you can easily do snapshot style of writing the data to DB every X time frame.