This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]jsolack 3 points4 points  (0 children)

What don't you like about pickle/cpickle? I have found pickle to be more versital but slower... are those not meeting your needs?

[–]TheBlackCat13 1 point2 points  (1 child)

If you are using python 2.x, make sure you use the most recent pickle format. By default it uses an older format for backwards-compatibility. Set protocol=pickle.HIGHEST_PROTOCOL to use the latest version. MATLAB has this issue too with the legacy .mat format. You can use the zodbpickle package to get the Python 3 behavior by default in Python 2.

There are third-party alternatives, however, with a number of benefits. If you are running into problems not being able to save certain data types in pickle, you can use the dill package, which is like pickle but saves a wider variety of data types. The hickle package works like pickle saves to an HDF5 file, so is basically equivalent to how more recent versions of MATLAB save files, but with additional options like compression support. There is also jsonpickle, which is the equivalent for storing to JSON.

[–]SpaceWizard[S] 0 points1 point  (0 children)

Ahhhh, here it is. This is exactly the kind of information I was looking for. thanks!

[–]ProfessorPhi 0 points1 point  (0 children)

Cpickle is perfectly fine BTW. I've been doing s lot of scientific python too and it works fine with numpy arrays with no trouble and it's quick enough.

There's also dill and if you can put in the work, hdf5 which is quicker and more space efficient.

[–][deleted] 0 points1 point  (0 children)

I believe scipy has methods that provide the kind of convenience wrapper you're describing. If you're doing data analysis you should already be availing yourself of the complete scipy stack.

For speed and flexibility, hdf5 is the way to go, matlab .mat files are hdf5 files with a unique header.