• mina86.com

  • Categories
  • Code
  • Contact
  • Stop using pickle already. Seriously, stop it!

    Perusing glossy magazines,1 I was made aware of CVE-2024-2912 which describes how a POST request could lead to remote code execution (RCE) in BentoML servers. A feature most users would rather live without. Bugs happen and I don’t want to criticise the developers unjustly, but knowing the root of the issue was Python’s pickle module, I can only wonder: How the fuck is this still happening?

    pickle is insecure by design

    import pickle
    pickle.loads(b'cos\nsystem\n'
                 b'(S"echo evil"\ntR.')
    Example how ‘unpickling’ an insecure data leads to execution of a shell command.

    pickle serialisation uses a stack-based virtual machine with a ‘reduce’ operation which allows calling arbitrary Python functions (as shown in figure on the right). It’s no surprise it keeps popping up in security vulnerabilities. It’s been known for decades using picule invites trouble.2 The documentation highlights the dangers quite clearly, but that’s apparently not enough.

    Call to action

    I call upon you to stop this madness. There are easy steps you can take to make everyone safer:

    • If you see a junior developer type import pickle, mentor them and explain the module must never be used due to security holes.
    • If you see +import pickle line during a code review, reject the patch.
    • If you write Python code yourself, use an alternative serialisation method. Some options are listed below.
    • And finally, if you’re Python project member, deprecate pickle. Many Python features have been deprecated already, so backwards compatibility by itself is not a valid excuse. C managed to get rid of gets, I believe it’s possible to heal Python as well.3

    Alternatives

    There are myriad alternatives, so — aside from sheer developer convenience at the cost of user safety — there’s no reason to use pickle in new projects. Some options are:

    FormatImplementationNotes
    JSONjson built-in

    Claims it’s ‘easy for humans to read and write,’ but in practice that’s not true. Does not naively support custom types.

    TOMLtomllib built-in

    Easy for humans to read and write. Does not naively support custom types.

    YAMLStrictYAML

    Easy for humans to read and write. StrictYAML supports a subset of YAML. See below for discussion of security implications.

    ParquetPyArrow

    Columnar format naively supported by pandas and Polars.

    Protocol Buffersprotobuf package

    Fast, binary format. Requires schema definitions and a compilation step.

    Borshborsh-construct package

    Fast, binary format. Requires schema definitions.

    Safetensorssafetensors package

    Designed for ML models.

    What about PyYAML?

    The most popular Python library implementing YAML is PyYAML.4 It is better than pickle in that it can be used safely, but for some inexplicable reason it has insecure defaults as seen in the following snippet:

    >>> import yaml
    >>> payload = '''!!python/object/apply:eval
    ... args: ["__import__('os').system('echo evil')"]'''
    >>> yaml.load(payload, yaml.Loader)
    evil
    0

    To guarantee safety, yaml.SafeLoader has to be used. Why a safe loader isn’t the default is beyond my comprehension, but because that is the case, I don’t recommend using PyYAML. It’s too easy to do the wrong thing as attested by CVE-2026-24009 published just a fortnight ago.

    Instead, StrictYAML is a safe alternative although it handles a subset of YAML features and may not always be a feasible option. If full YAML support is necessary, it is prudent to create wrapper functions which hide the choice of the loader and document explicitly security concerns.

    Conclusion

    Just say no to pickle. You should sooner do drugs than use pickle.

    1 Robbe Van Roey. 2025. Unveiling BentoML Pickle-Based Serialization. PageOut Issue 7. https://pagedout.institute/download/PagedOut_007.pdf 

    2 The issue has been discussed in 2001 and the oldest vulnerability in the CVE catalogue is a 15-year old CVE-2011-2520. Overall, the database lists 51 entries which reference ‘pickle’ in context of ‘Python’. 

    3 Completely eliminating pickle format might be harder but possibly unnecessary. It is used internally by the Python standard library, see multiprocessing module, but at least there it is encapsulated by an abstraction layer which minimises the risk of misuse. However, that legacy usage is no reason for users to interact with pickle module directly or to introduce new uses of the format. 

    4 According to pypistats.org, PyYAML gets 673 M monthly downloads, ruamel 145 M, StrictYAML 18.4 M and poyo only 234 k.