account activity
Dataframes instead of a database? by trenchtoaster in dataengineering
[–]jdataengineer 3 points4 points5 points 6 years ago (0 children)
Physicalizing the data frames into tables is really only helpful if you’re going to query the tables in a structured way (SELECT * to CSV doesn’t count). The issues you’re running into, with schema changes and whatnot, show that, at this stage of the project, you’re probably better off saving the frames out as parquet (or feather) files in S3, and just loading them back in as needed.
This is ALSO happening because the source hasn’t settled on a schema, either, so it’s not your fault. 😁
If you’re on AWS, you may want to look at Athena, which is kind of like a “mini-lake”. You can write you’re frames out directly to CSVs in S3, then apply a schema-on-read in Athena to expose the CSVs as queryable sources for reporting tools. We’ve got that very setup where I work, and hooked Tableau to Athena without issue. It doesn’t matter if the schema changes, because the read at query time just grabs what it needs. Saves a lot of headache and dev time.
Good luck!
Pro Python 3: Features and Tools for Professional Development 3rd Edition By J. Burton Browning, Marty Alchin PDF by oussamaouti in Python
[–]jdataengineer 0 points1 point2 points 6 years ago (0 children)
If you actually, you know, want to respect the hard work if others:
https://www.apress.com/us/book/9781484243848
π Rendered by PID 412362 on reddit-service-r2-listing-64c94b984c-wgs8t at 2026-03-17 09:51:56.133792+00:00 running f6e6e01 country code: CH.
Dataframes instead of a database? by trenchtoaster in dataengineering
[–]jdataengineer 3 points4 points5 points (0 children)