• mina86.com

  • Categories
  • Code
  • Contact
  • Stop using pickle already. Seriously, stop it!

    Perusing glossy magazines,1 I was made aware of CVE-2024-2912 which describes how a POST request can lead to Remote Code Execution (RCE) in BentoML servers. A feature most users would rather live without. Bugs happen and I don’t want to criticise the developers unjustly, but knowing the root of the issue was Python’s pickle module, I can only wonder: How the fuck is this still happening?

    pickle is insecure by design

    import pickle
    pickle.loads(b'cos\nsystem\n'
                 b'(S"echo evil"\ntR.')
    Example how ‘unpickling’ an insecure data leads to execution of a shell command.

    pickle serialisation uses a stack-based virtual machine with a ‘reduce’ operation which allows calling arbitrary Python functions (as shown in figure on the right). It’s no surprise it keeps popping up in security vulnerabilities. It’s been known for years using picule invites trouble.2 The documentation highlights the dangers quite clearly, but that’s apparently not enough.

    Call to action

    I call upon you to stop this madness. There are easy steps you can take to make everyone safer:

    • If you see a junior developer type import pickle, mentor them and explain the module must never be used due to security holes.
    • If you see +import pickle line during a code review, reject the patch.
    • If you write code yourself, use an alternative serialisation method, e.g. one listed below.
    • And finally, if you’re Python project member, deprecate pickle. Many features have been deprecated already, so backwards compatibility is not a valid excuse. C managed to get rid of gets, I believe it’s possible to heal Python as well.

    Alternatives

    There are myriad alternatives, so — aside from sheer developer convenience at the cost of user safety — there’s no reason to use pickle in new projects. Some options are presented in the following table:

    FormatProvided byNotes
    JSONjson built-in
    TOMLtomllib built-inHuman-editable.
    Protocol Buffersprotobuf packageFast, binary format, requires schema definitions.
    Borshborsh-construct packageFast, binary format, requires schema definitions.
    Safetensorssafetensors packageDesigned for ML models.

    Conclusion

    Just say no to pickle. You should sooner do drugs than use pickle.

    Appendix: What about PyYAML?

    YAML is another ‘human-friendly data serialisation language for all programming languages.’ Its handling in Python is implemented by PyYAML package which is better than pickle in that it can be used safely. However, for some inexplicable reason it has insecure defaults as seen in the following snippet:

    >>> import yaml
    >>> payload = '''!!python/object/apply:eval
    ... args: ["__import__('os').system('echo evil')"]'''
    >>> yaml.load(payload, yaml.Loader)
    evil
    0

    To guarantee safety, yaml.SafeLoader has to be used. Why a safe loader isn’t the default is beyond my comprehension, but because that is the case, I recommend steering clear of PyYAML. It’s too easy to do the wrong thing as attested by CVE-2026-24009 published just a fortnight ago.

    If YAML is used, helpful option is to create wrapper functions which hide the choice of the loader and document explicitly security concerns.

    1 Robbe Van Roey. 2025. Unveiling BentoML Pickle-Based Serialization. PageOut Issue 7. https://pagedout.institute/download/PagedOut_007.pdf 

    2 The oldest vulnerability in the CVE catalogue is a 15-year old CVE-2011-2520. Overall, the database lists 51 entries which reference ‘pickle’ in context of ‘Python’.