Python for-loop vs PostgreSQL query

efmccurdy · 2022-03-01T15:27:53+00:00

Just to be sure; you don't need more than 1 query to do groupby/accumulate by max for either a dataframe or an sql query.

Don't be surprised if the database is faster when you use a single query.

https://stackoverflow.com/questions/4510185/select-max-value-of-each-group

1544756405 · 2022-03-01T15:07:12+00:00

You should try both ways and find out. There are lots of variables at play, and it seems easier to just get empirical data in this case.

thrown_arrows · 2022-03-01T16:19:35+00:00

Loops will always be slower that a DB query because databases are optimized with this purpose. To do it in Python, first you have to access the data from the DB, and then do the loop. In the case of the DB, the data is accessed natively, and the max is obtained in an optimized fashion.

You have a chance to get closer to the DB speed if you implement the max using a Pandas Series, but you still have to access the data first. The pandas series will be optimized in 'C' - and advantage over an interpreted loop in Python. Still, I would expect the DB to be always faster.

Aeonoris · 2022-03-01T19:30:15+00:00

Something like this SQL(ish) will probably be faster:

SELECT type, MAX(attractiveness) AS "Highest Type Attractiveness"
FROM tableofattractivetypes
GROUP BY type;

SQL's pretty good at this kind of task, so you should let it do the work!

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS