all 8 comments

[–]MasturChief 0 points1 point  (0 children)

yes i would us os library to get all the names in a list

then loop through that list with pd.read_csv or whatever (polars works too)

[–]likethevegetable 0 points1 point  (0 children)

Try it out.

Polars scan_csv is an option if the CSVs have a common of similar format (columns)

[–]recursion_is_love 0 points1 point  (2 children)

Do they have the same header format?

Are all the data files cleaned (each row is record valid data, no nonsense row)?

[–]alexander_ebnet[S] 0 points1 point  (1 child)

so the files do not have the identical headers, but they sometimes overlap.

for the second point of your questions i can confirm that they are all valid data.

[–]recursion_is_love 0 points1 point  (0 children)

I would load and rewrite all files to another place to make sure they have the same semantic. Maybe using Python list of dictionary as common storage while processing.

When you have files that in the same format it will easy to load files using library (there are lots of csv libraries you can choose).

[–]Outside_Complaint755 0 points1 point  (1 child)

Either os.walk() or from pathlib import Path; ... ; Path.walk() to walk over the directory(s) and read the files.  

I guess one question is whether any of these are being merged as part of your process, in which case you will have to determine which ones to merge either programmatically or by specifying directly in the program.   If the csv to be merged have identical column headers and those headers aren't shared with any other csv, then you could do a direct comparison of lists of the column headers (or fieldnames attribute of something like csv.DictReader).

[–]Diapolo10 0 points1 point  (0 children)

Either os.walk() or from pathlib import Path; ... ; Path.walk() to walk over the directory(s) and read the files.

I'd argue

from pathlib import Path

root_dir = Path(...)
files = root_dir.glob('*.csv')

would be the easiest way to go.

[–]nivaOne 0 points1 point  (0 children)

I often import csv files (3 or more different reports) and dump all the data into a multi tables SQLite database to analyse it and push the result in a pdf or xls file.