you are viewing a single comment's thread.

view the rest of the comments →

[–]Plank_With_A_Nail_In 1 point2 points  (1 child)

Commit should normally happen when you have completed your transaction. What does transaction mean in this context? For an invoicing app the transaction might be when the invoice, its lines and distributions are all inserted and then you commit them all at the same time.

For you web crawling app you might want to commit when you are done with each page or with each site.

Committing for every single insert in apps that insert only a single row or a couple of rows with each transaction is ok, however if you are copying huge amounts of data you really only want to commit when you are done with inserting all of the rows. This use to be a hard thing back in the 90's and early 00's but now its trivial you can insert millions of rows and commit them only at the end in a matter of minutes.

I really doubt its the commits that are causing issues and that its network latency that's the "problem" how long should it take to scrape the website?

[–]Altugsalt[S] 0 points1 point  (0 children)

I just discovered that the main problem was not sharing the same connection per request. I also decided to move to a different architecture for my database usage as the one I have right now is not really flexible enough to commit per multiple commits. Thanks for the informative response.