• mina86.com

  • Categories
  • Code
  • Contact
  • Stop using pickle already. Seriously, stop it!

    Perusing glossy magazines,1 I was made aware of CVE-2024-2912 which describes how a POST request can lead to Remote Code Execution (RCE) in BentoML servers. A feature most users would rather live without. Bugs happen and I don’t want to criticise the developers unjustly, but knowing the root of the issue was Python’s pickle module, I can only wonder: How the fuck is this still happening?

    pickle is insecure by design

    import pickle
    pickle.loads(b'cos\nsystem\n'
                 b'(S"echo evil"\ntR.')
    Example how ‘unpickling’ an insecure data leads to execution of a shell command.

    pickle serialisation uses a stack-based virtual machine with a ‘reduce’ operation which allows calling arbitrary Python functions (as shown in figure on the right). It’s no surprise it keeps popping up in security vulnerabilities. It’s been known for decades using picule invites trouble.2 The documentation highlights the dangers quite clearly, but that’s apparently not enough.

    Call to action

    I call upon you to stop this madness. There are easy steps you can take to make everyone safer:

    • If you see a junior developer type import pickle, mentor them and explain the module must never be used due to security holes.
    • If you see +import pickle line during a code review, reject the patch.
    • If you write code yourself, use an alternative serialisation method, e.g. one listed below.
    • And finally, if you’re Python project member, deprecate pickle. Many features have been deprecated already, so backwards compatibility is not a valid excuse. C managed to get rid of gets, I believe it’s possible to heal Python as well.

    No Nick, 7-bit colour depth is not enough

    In ‘Your Screen is Secretly 30 Years Old’ video on The Science Asylum channel, Nick Lucid argues that 7-bit colour depth is sufficient for screens, claiming that 2 million colours (vs 16 million at 8-bit colour depth) ‘would be more than enough for most people’:

    Can screens make fewer [than 16 million] colours without us noticing? The answer is absolutely yes. 16 million is an overkill. 2 million would be more than enough for most people. It’s just that controlling a screen with 16 million colours costs the same as the screen with 2 million because they use the same number of bytes.

    This article interrogates that statement. To provide a visual baseline, Fig. 1 compares two grey gradients: one uses 8 bits per component (bpc) while the other uses 7 bpc. Fewer colour discrete levels lead the 7 bpc gradient to exhibit clearly visible banding (where rather than smooth transition between colours, places where colours change can be identified) immediately undermining the premise that 7-bit colour depth is perceptually indistinguishable from higher bit-depths.

    Grey gradients using 8- and 7-bit colour depth
    Fig. 1 Grey gradients using different colour depths. Each goes from colour #10 10 10 to #7f 7f 7f but use different quantisation. The top gradient uses 112 distinct grey levels while the bottom one only 54.

    Save 22% of your storage with this one easy trick

    If you’re a digital archivist, photography enthusiast or data hoarder, you’ve likely faced the ‘No space left on device’ error. Before you start deleting your files, there’s a way to reclaim space without losing a single pixel: JPEG XL. Unlike typical lossy transcoding which degrades quality, JPEG XL allows for bit-for-bit reconstruction of the original files.

    JPEG XL is the newest image file format developed by the Joint Photographic Experts Group with, among other features, better compression. Normally converting between formats with lossy compression is inadvisable as it exacerbates quality loss; however, JPEG XL offers lossless JPEG transcoding.

    Unlike normal conversion (e.g. from JPEG to AVIF), JPEG transcoding doesn't re-encode the image data. It merely repackages the existing information so that the original JPEG file can be recreated bit-for-bit. And all that with 22% better compression.

    Linux distributions are not like cars

    Even though using Linux has never been easier, newcomers often hit a confusing roadblock: the overwhelming choice of a distribution. Someone coming from Windows or MacOS will find the very concept of a ‘distribution’ to be alien, and the endless listicles and user arguments about which one is ‘best’ only make the confusion worse.

    This article aims to clear up that confusion by providing a better way to think about Linux choices, so you won’t be overwhelmed by all the different options if you’re thinking of switching.

    Be My Guest at DSN 2025

    View of the Naples with Vesuvius in the background.

    I’m delighted to share my paper I’ve presented at the IEEE/IFIP International Conference on Dependable Systems: ‘Be My Guest: Welcoming Interoperability into IBC-Incompatible Block­chains’.

    It introduces the concept of a guest block­chain which runs on top of a block­chain and provides features necessary to support the Inter-Blockchain Communication (IBC) protocol. This enables trustless cross-chain interoperability between blockchains which would otherwise not support IBC-based communication. We demonstrate our approach by deploying the guest blockchain on Solana connecting it to the Cosmos ecosystem with performance comparable to native IBC implementations.

    A tale of two pull requests: Addendum

    In the previous post, I criticised Rust’s contribution process, where a simple patch languished due to communication hurdles. Rust isn’t unique in struggling with its process. This time, the story is about Python.

    Parsing HTML in Python

    As its name implies, the html.parser module provides interfaces for parsing HTML documents. It offers an HTMLParser base class users can extend to implement their own handling of HTML markup. Of our interest is the unknown_decl method, which ‘is called when an unrecognised declaration is read by the parser.’ It’s called with an argument containing ‘the entire contents of the declaration inside the <![...]> markup.’ For example:

    from html.parser import HTMLParser
    
    class MyParser(HTMLParser):
        def unknown_decl(self, data: str) -> None:
            print(data)
    
    parser = MyParser()
    parser.feed('<![if test]>')
        # Prints out: if test
        # (unless Python 3.13.4+, see below)
    parser.feed('<![CDATA[test]]>')
        # Prints out: CDATA[test

    A tale of two pull requests

    In November 2015, rmcgibbo opened Twine Issue #153. Less than two months later, he closed it with no explanation. The motive behind this baffling move might have remained an unsolved Internet mystery if not for one crucial fact: someone asked and rmcgibbo was willing to talk:

    role=none
    thedrow on Dec 31, 2015
    Contributor
    Were you able to resolve the issue?

    role=none
    rmcgibbo on Dec 31, 2015
    Author
    No. I decided I don’t care.

    We all had such moments, and this humorous exchange serves as a reminder that certain matters are not worth stressing about. Like Marcus Aurelius once said, ‘choose not to be harmed — and you won’t feel harmed.’ However, instead of discussing philosophy, I want to bring up some of my experiences to make a point about contributions to free software projects.

    The two pull requests

    Rather than London and Paris, this tale takes place on GitHub and linux-kernel mailing list. The two titular pull requests (PRs) are of my own making and contrast between them help discuss and critique Rust’s development process.

    Could this be null?

    In my previous post, I mentioned an ancient C++ technique of using ((X*)0)->f() syntax to simulate static methods. It ‘works’ by considering things from a machine code point of view, where a non-virtual method call is the same as a function call with an additional this argument. In general, a well-behaving obj->method() call is compiled into method(obj). With the assumption this is true, one might construct the following code:

    struct Value {
        int safe_get() {
            return this ? value : -1;
        }
        int value;
    };
    
    void print(Value *val) {
        printf("value = %d", val->safe_get());
        if (val == nullptr) puts("val is null");
    }

    Will it work as expected though?

    Axiomatic view of undefined behaviour

    Draw an arbitrary triangle with corners A, B and C. (Bear with me; I promise this is a post about undefined behaviour). Draw a line parallel to line BC that goes through point A. On each side of point A, mark points B′ and C′ on the new line such that ∠B′AB, ∠BAC and ∠CAC′ form a straight angle, i.e., ∠B′AB + ∠BAC + ∠CAC′ = 180°.

    Observe that line AB intersects two parallel lines: BC and B′C′. Via proposition 29, ∠B′AB = ∠ABC. Similarly, line AC intersects those lines, hence ∠C′AC = ∠ACB. We now get ∠BAC + ∠ABC + ∠ACB = ∠BAC + ∠B′AB + ∠C′AC = 180°. This proves that the sum of interior angles in a triangle is 180°.

    Proof that sum of internal triangle angles is 180° next to triangle drawn on a ball with sum of internal angles over 180°

    Now, take a ball whose circumference is c. Start drawing a straight line of length c/4 on it. Turn 90° and draw another straight line of length c/4. Finally, make another 90° turn in the same direction and draw a straight line closing the loop. You’ve just drawn a triangle whose internal angles sum to over 180°. Something we’ve just proved is impossible‽

    There is no secret. Everyone sees what is happening. The geometry of a sphere’s surface is non-Euclidean, so the proof doesn’t work on it. The real question is: what does this have to do with undefined behaviour?

    Is Ctrl+D really like Enter?

    Ctrl+D in terminal is like pressing Enter,’ Gynvael claims. A surprising proclamation, but pondering on it one realises that it cannot be discarded out of hand. Indeed, there is a degree of truth to it. However, the statement can create more confusion if it’s made without further explanations which I provide in this article.

    To keep things focused, this post assumes terminal in canonical mode. This is what one gets when running bash --noediting or one of many non-interactive tools which can read data from standard input such as cat, sed, sort etc. Bash, other shells and TUI programs normally run in raw mode and provide their own line editing capabilities.