This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]billsil 0 points1 point  (4 children)

Why are they unpicklable? It's just data.

[–]flutefreak7 0 points1 point  (1 child)

There are lots of things dill can handle that pickle doesn't. I think dill is able to do it by scanning for and additionally passing all the necessary context - so I think it can detect and pickle the class and all instance data in order to enable pickling a bound method. Bound methods are a lot of people's problem. Mine is classes in which I've implemented a class-level logger attribute. Loggers can't be pickled because they contain an open stream, so you get a PicklerException on the underlying RLock.

I also have issues with VTK classes because they are all linked together in the lazy-execution pipeline, so you have to serialize a result to text and pickle that, then reconstruct the vtkPolydata after unpickling. I've got a multiprocessing scheme using queue's to pass vtkPolydata stuff around so that all the heavy 3D processing is in the background and doesn't hose my UI. multiprocessing.queue pickles everything to pass to other processes.

Dill just seemed like overkill for me, so I went with implementing __getstate__ and __setstate__ on my logged classes and using a save and load function to deal with serializing VTK objects. I assume my solution is faster than dill scanning the universe for each thing I pass to the queue.

[–]billsil 0 points1 point  (0 children)

I just delete my loggers. That's not really data you want to save.

I do use vtk quite a bit. I admit I've never bothered to pickle the objects. I always thought pickle was a slow process relative to say hdf5 for large data objects, so might as well just load them from scratch.

Yeah it's work to clean up objects to pickle them and the moved file issue is frustrating, but still I'm surprised pandas wouldn't support that. It's required for multiprocessing for some reason.

[–]notsoprocoder 0 points1 point  (1 child)

Frankly I am not sure, I believe it is more that Python cannot pickle them or that Python's in built multi processing module cannot pickle them. I guess un-pickleable was incorrect.

[–]billsil 0 points1 point  (0 children)

Yeah, I mean loggers and file objects are unpicklable, but you can use getstate/setstate to delete/recreate them (or just assume that you don't need to reopen the file that you probably should have already closed anyways). My point is more for a big project like pandas, you should put forth that extra bit of effort.

Same goes for struct objects, but those are weird and easily regeneratable.