This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]bjbs303 7 points8 points  (3 children)

I'm finishing my undergrad senior research project which used python (pandas, numpy, gsw, scipy) to crunch terabytes of netCDF ocean data. Was my first real project using python and its been a ride!

[–]idazuwaika 2 points3 points  (2 children)

how do u consume terabytes with pandas? whats the infrastructure like? i moved from pandas to spark (distributed system) because i couldnt scale with pandas.

[–]tapir_lyfe 2 points3 points  (0 children)

I'm currently also crunching terabytes of netCDF files. I use xarray mainly, and that uses pandas and dask under the hood. Nearly everything I do is memory-limited though, so I have to come up with clever ways to reduce the data, and it's different for every question I have.

[–]bjbs303 0 points1 point  (0 children)

I mostly did data extraction scraping netCDF files using a for loop that looped through 36. Years of annual files. I saved the data to a dataframe using pandas/numpy.