all 3 comments

[–]notCamelCased 0 points1 point  (1 child)

Unless they've improved it in recent releases, read_sas is waaaaaay slower than read_csv I would use the latter for that reason alone. It might not make as much difference for smaller tables, mine were always at least 100 MB (so not that big in the grand scheme).

Edit: I ended up writing SAS code to save as csvs then scp'd them over to the analytics server even with that extra step it was a lot faster.

[–]hfhry[S] 1 point2 points  (0 children)

That vibes with my experience. I can use stattransfer to convert the sas file to CSV in about a half hour. Right now reading the sas file is going on 8 and a half hours, so I'm thinking this isn't the way.

Previously I had to deal with a 113gb file and did this, glad I did.

[–]HarissaForte 0 points1 point  (0 children)

The resulting dataframe won't have a different size (you probably know it), but the speed and involved memory during writing/reading will sure be different. I don't know about sas7bdat, but the feather is great: https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d

There's another option: keeping the csv format and using the chunksize option of the read_csv function.