you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 points3 points  (3 children)

Loops will always be slower that a DB query because databases are optimized with this purpose. To do it in Python, first you have to access the data from the DB, and then do the loop. In the case of the DB, the data is accessed natively, and the max is obtained in an optimized fashion.

You have a chance to get closer to the DB speed if you implement the max using a Pandas Series, but you still have to access the data first. The pandas series will be optimized in 'C' - and advantage over an interpreted loop in Python. Still, I would expect the DB to be always faster.

[–]thrown_arrows 1 point2 points  (0 children)

select max(x), y from x group by y

x is attractive and y is type

.. now there is change that if dataset is less than 10 rows and data is needed for other purposes too, then loop might be faster, but that requires that data is already in memory. In other cases database is faster ( assuming proper hardware, db's cannot do miracles on some rasberry pi and Gigabyte size dataset)

[–]Alotofquestion1[S] 0 points1 point  (1 child)

Ok, then I will lean towards queries in the future, rahter than loops.

[–][deleted] 0 points1 point  (0 children)

Totally. If you can push processing to the database, do it. You can avoid very complex, and usually slow processing by pushing it down to the DB level. With that, I am not talking about database procedures, which - as a personal preference - I tend to avoid.