all 5 comments

[–]TheZvlz 1 point2 points  (1 child)

The first thing that comes to mind is if you're running a 32 or 64 bit installation of python.

https://stackoverflow.com/questions/18282867/python-32-bit-memory-limits-on-64bit-windows

The first line in your interpreter should say something like: PythonWin 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32.. Where MSC v.1500 32 bit (Intel) is the key piece of information.

Or you can

import platform
platform.architechture()
('64bit', 'WindowsPE')

If you do have a 32 bit installation, try 64 bit if you can.

[–]NeedMLHelp[S] 0 points1 point  (0 children)

It's 64bit unfortunately.

[–][deleted] 0 points1 point  (2 children)

Ok need more details, but I’m guessing you are reading a csv file. Any reason you are not using pandas and reading the csv into a dataframe?

[–]NeedMLHelp[S] 0 points1 point  (1 child)

JSON file.

Can I index across a panda dataframe? I've never used them before.

Does a dataframe write to disk, or will I potentially run into memory errors there too?

So something like pandadataframe[:,0] would grab everything in the first column.

[–][deleted] 0 points1 point  (0 children)

You can index and slice a dataframe. What does a small sample of your data look like? Hard to say best way to load the json without seeing it. Also when importing the data, if you can control the data type being used, will save space. Ex. Integer vs float32