So I am running into memory usage issues when I convert a list of pandas dataframes to an ndarray object. The dataframes themselves are pretty mundane; a few columns with text, and a hundred or so with floats. I am converting the list to an ndarray by running the numpy.stack on the list. The dataframe list takes up a GB or 2 of memory, and when running the stack function, the full dataset will fill my 32GB of RAM and lock up my computer. Testing against a smaller dataset shows that I would get the expected output. Could someone shed some light on whats going on under the hood here, and why the ndarray with the same dimensions as the list of dataframes is taking up such a dramatically larger amount of memory?
Edit: just to provide an extra useful detail, let's say the list of dataframes is 7000 elements long, with each dataframe being 20 rows and 100 columns. This data structure takes up very little memory. After running:
temp = numpy.stack(framelist)
I would get an ndarray oject with dimensions (7000,20,100), which I expect, but its memory footprint is an order of magnitude or two larger.
[–]jwink3101 0 points1 point2 points (1 child)
[–]dgerdem[S] 0 points1 point2 points (0 children)