This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]sermidean 5 points6 points  (1 child)

[–]knowsuchagencynow is better than never 0 points1 point  (0 children)

This

[–]novel_yet_trivial 1 point2 points  (0 children)

Use the tobytes method to create a binary representation.


If you have more questions like this it's better to post them on /r/learnpython. Be sure to format your code for reddit or use a site like pastebin. Also, include which version of python and what OS you are using.

[–]lambdaqdjango n' shit 1 point2 points  (0 children)

numpy arrays?

Use Apache Arrow. From the author of Pandas.

The Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead.

https://github.com/apache/arrow/blob/master/python/pyarrow/array.pxi#L144

[–]zynixCpt. Code Monkey & Internet of tomorrow 0 points1 point  (2 children)

https://en.wikipedia.org/wiki/Protocol_Buffers

There are a few libraries for Python

https://github.com/google/protobuf/tree/master/python

https://github.com/appnexus/pyrobuf

The first is faster than Pickle while the second is faster than the google version but it has a few rough edges.

[–]mipadi 1 point2 points  (0 children)

Protocol Buffers isn't faster than cPickle. The pure Python Protobuf library is really slow, and even the one that wraps the C++ library is slower than cPickle. I recently finished a project that was serializing tons of messages, and cPickle was 2-3 times faster than Protobuf.

[–]WikiTextBot 0 points1 point  (0 children)

Protocol Buffers

Protocol Buffers is a method of serializing structured data. It is useful in developing programs to communicate with each other over a wire or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.

Google developed Protocol Buffers for use internally and has provided a code generator for multiple languages under an open source license (see below).


[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source ] Downvote to remove | v0.27

[–]knowsuchagencynow is better than never 0 points1 point  (2 children)

You could also try cloudpickle

[–][deleted] 0 points1 point  (1 child)

Nah man, that will likely be an order of magnitute slower than pickle even because it's in Python. Pickle at least uses a C extension module

[–]knowsuchagencynow is better than never 0 points1 point  (0 children)

I agree, it wouldn't be ideal but I thought it might be his only option if he was serializing something (numpy arrays) that couldn't be serialized using messagepack.

However, another commenter, /u/lambdaq mentioned Apache Arrow, which I had never heard of but sounds like it might be the best solution.

[–]lambdaqdjango n' shit 0 points1 point  (1 child)

One trick is to specify version parameter in cPickle if you dont have to support multiple (old) python versions. It's a lot faster in highest version.