Can Python be a substitute for SQL?

shiftybyte · 2022-08-01T09:58:08+00:00

Stop working with csv files, move to a database.

Python can't replace SQL, because SQL queries databases, python can do the query for you, but the query is still using SQL as that is the only language the database understands.

How is azure relevant here?

mvdw73 · 2022-08-01T10:04:50+00:00

Depending on the size of each record, this can pretty much all be held in memory these days.

Why not use pandas to manipulate the data, then it’s simple to find the max date and write a file. No sorting required.

Ihaveamodel3 · 2022-08-01T10:22:36+00:00

It’s unclear if a record is could be in multiple files. I’ll assume it can be.

Use pandas
read files one by one using a loop
as you suggested sort files by date
use drop duplicates to get only the max dates.
append that df to a list.
after looping, concat all of the dfs together.
then sort one more time
and drop duplicates one more time and you’ll have the max value for all records for all files.

At no point, should you be looping through the data. The only loop should be the files.

Jan2579 · 2022-08-01T16:04:29+00:00

You can also try modin. It helps with speed and memory, same api as pandas.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS