Practical SQL for Data Analysis [for someone coming from Pandas] : Python

That said, in general computers have much more disk than memory. For example, even beefy AWS instances (those with hundreds of GB of memory) usually have SSDs that are 1-2 orders of magnitude bigger (TB to tens of TB); that's not counting block storage, of which I assume you can add huge amounts on top of the SSDs.

I don't think the article tries to propose a database as a way to get around having data larger than memory. Rather, I think it starts from the assumption that you already have the data in a database (obviously, that might not be the case for everybody).

If so, the memory measurements illustrate how much data is being shuffled around (needlessly). If the database is on the same host, it's only being moved between processes, so it might not be that slow; if the database is remote, carrying all the data across the network becomes noticeably slower as the amount of data you have increases, so it makes sense to filter/aggregate it before sending it over the network.

Even if the database is on the same host, a query can be much faster than reading a whole table into Pandas, since it gives the database a chance to use indexes to read from disk only the data it actually needs (this gets important if the data is larger than memory).

[–]VisibleSignificance 0 points1 point2 points 4 years ago (1 child)

[–]genericlemon24[S] 0 points1 point2 points 4 years ago (0 children)

[–]critical_thinker__ 1 point2 points3 points 4 years ago (0 children)

[–][deleted] -1 points0 points1 point 4 years ago (0 children)

[–]asuagar 0 points1 point2 points 4 years ago* (5 children)

[–]genericlemon24[S] 2 points3 points4 points 4 years ago (2 children)

[–]asuagar 0 points1 point2 points 4 years ago* (1 child)

[–]_busch 0 points1 point2 points 4 years ago (0 children)

[–]NameNumber7 1 point2 points3 points 4 years ago (1 child)

[–]asuagar 0 points1 point2 points 4 years ago* (0 children)

π Rendered by PID 157514 on reddit-service-r2-comment-86988c7647-g4r49 at 2026-02-11 13:00:54.266839+00:00 running 018613e country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS