• mina86.com

  • Categories
  • Code
  • Contact
  • HTML: No, you don’t need to escape that

    Posted by Michał ‘mina86’ Nazarewicz on 7th of March 2021 | (cite)

    This website being my personal project allows me to experiment and do things I’d never do in professional settings. Most notably, I’m rather fond of trying everything I can to reduce the size of the page. This goes beyond mere minification and eventually lead me to wonder if all those characters I’ve been escaping in HTML code require such treatment.

    Libraries offering HTML support will typically provide a function to indiscriminately replace all ampersands, quote characters, less-than and greater-then signs with their corresponding HTML-safe representation. This allows the result to be used in any context in the document and is a good choice for user-input validation. It’s a different matter when it comes to squeezing every last byte. Herein I will explore which characters and under what conditions need to be escaped in an HTML document.

    Regular expressions are broken

    Posted by Michał ‘mina86’ Nazarewicz on 28th of February 2021 | (cite)

    Quick! What does re.search('foo|foobar', 'foobarbaz').group() produce? Or for those not fluent in Python, how about /foo|foobar/.exec('foobarbaz')? Or to put it into words, what part of string foobarbaz will a foo|foobar regular expression match?

    foo foobar

    Perhaps it’s just me, but I expected the result to be foobar. That is, for the regular expression to match the longest leftmost substring. Alas, that’s not what is happening. Instead, Python’s and JavaScript’s regex engine will only match foo prefix.

    Knowing that, what does re.search('foobar|foo', 'foobarbaz').group() produce (notice the subexpressions in the alternation are swapped). This can be reasoned in two ways: either order of branches in the alternation doesn’t matter — in which case the result should be the same as before, i.e. foo — or it does matter — and now the result will be foobar.

    A computer scientist might lean towards the first option but a software engineer will know it’s the second. Here’s a live demonstration of current browser’s handling of those cases:

    /foo|foobar/.exec('foobarbaz') → ['foo']
    /foobar|foo/.exec('foobarbaz') → ['foobar'] // ✗ Failure

    Reading stdin with Emacs Client

    Posted by Michał ‘mina86’ Nazarewicz on 21st of February 2021 | (cite)

    One feature Emacs doesn’t have out of the box is reading data from standard input. Trying to open - (e.g. echo stdin | emacs -) results in Emacs complaining about unknown option (if it ends up starting in graphical mode) or that ‘standard input is not a tty’ (when starting in terminal).

    With sufficiently advanced shell one potential solution is the --insert flag paired with command substitution: echo stdin | emacs --insert <(cat). Sadly, it’s not a panacea. It messes up initial buffer (and thus may break setups with custom initial-buffer-choice) and doesn’t address the issue of standard input not being a tty when running Emacs in terminal.

    For me the biggest problem though is that it isn’t available when using emacsclient. Fortunately, as previously mentioned the Emacs Server protocol allows for far more than just instructions to open a file. Indeed, my solution to the problem revolves around the use of --eval option:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my @args = @ARGV;
    if (!@args) {
    	my $data;
    	$data = join '', <STDIN>;
    	$data =~ s/\\/\\\\/g;
    	$data =~ s/"/\\"/g;
    	$data = <<ELISP;
    (let ((buf (generate-new-buffer "*stdin*")))
      (switch-to-buffer buf)
      (insert "$data")
      (goto-char (point-min))
      (x-focus-frame nil)
      (buffer-name buf))
    ELISP
    	@args = ('-e', $data);
    }
    
    exec 'emacsclient', @args;
    die "emacsclient: $!\n";

    People allergic to Perl may find this Python version more palatable:

    Emacs remote file editing over SSHFS

    Posted by Michał ‘mina86’ Nazarewicz on 14th of February 2021 | (cite)

    Previous article described how to use emacsclient inside of an SSH session. While the solution mentioned there relied on TRAMP, I’ve confessed that it isn’t what I’m actually using. From my experience, TRAMP doesn’t cache as much information as it could and as a result some operations are needlessly slow. For example, delay of find-file prompt completion is noticeable when working over connections with latency in the range of tens of milliseconds or more. Because for a long while I’d been working on a workstation ‘in the cloud’ in a data centre in another country, I’ve built my setup based on SSHFS instead.

    It is important to note that TRAMP has myriad of features which won’t be available with this alternative approach. Most notably, it transparently routes shell commands executed from Emacs through SSH which often results in much faster execution than trying to do the same thing over SSHFS. grep command in particular will avoid copying entire files over network when done through TRAMP.

    Depending on one’s workflow, either TRAMP-based or SSHFS-based solution may be preferred. If you are happy with TRAMP’s performance or rely on some of its feature, there’s no reason to switch. Otherwise, you might want to try an alternative approach described below.

    Greyscale, you might be doing it wrong

    Posted by Michał ‘mina86’ Nazarewicz on 7th of February 2021 | (cite)

    While working on ansi_colours crate I’ve learned about colour spaces more than I’ve ever thought I would. One of the things were intricacies of greyscale. Or rather not greyscale itself but conversion from sRGB. ‘How hard can it be?’ one might ask following it up with a helpful suggestion to, ‘just sum all components and divide by three!’

    Taking an arithmetic mean of red, green and blue coordinates is an often mentioned method. Inaccuracy of the method is usually acknowledged and justified by its simplicity and speed. That’s a fair trade-off except that equally simple and fast algorithms which are noticeably more accurate exist. One such method is built on an observation that green contributes the most to the perceived brightens of a colour. The formula is (r + 2g + b) / 4 and it increases accuracy (by taking green channel twice) as well as speed (by changing division operation into a bit shift). But that’s not all. Even better formulæ exist.

    TL;DR

    fn grey_from_rgb_avg_32bit(r: u8, g: u8, b: u8) -> u8 {
        let y = 3567664 * r as u32 + 11998547 * g as u32 + 1211005 * b as u32;
        ((y + (1 << 23)) >> 24) as u8
    }

    The above implements the best algorithm for converting sRGB into greyscale if speed and simplicity is main concern. It does not involve gamma thus forgoes most complicated and time-consuming arithmetic. It’s much more precise and as fast as arithmetic mean.

    Emacs remote file editing over TRAMP

    Posted by Michał ‘mina86’ Nazarewicz on 31st of January 2021 | (cite)

    I often develop software on remote machines; logged in via SSH to a workstation where all source code reside. In those situations, I like to have things work the same way regardless of which host I’m on. Since more often than not I open files from shell rather than from within my editor, this in particular means having the same command opening files in Emacs available on all computers. emacsclient filename works locally but gets a bit tricky over SSH.

    Running Emacs in a terminal is of course possible, but graphical interface provides minor benefits which I like to keep. X forwarding is another option but gets sluggish over high-latency connections. And besides, having multiple Emacs instance running (one local and one remote) is not the way.

    Fortunately, by utilising SSH remote forwarding, Emacs can be configured to edit remote files and accept server commands from within an SSH session. Herein I will describe how to accomplish that.

    Hacking GOG.com for fun and profit

    Posted by Michał ‘mina86’ Nazarewicz on 24th of August 2020 | (cite)

    If you have a GOG account, you might have received an email announcing a Harvest Sale. While it’s unusual for harvest to last only 48 hours, but apart from that naming blunder, the sale is no different than many that came before it. What caught my attention was somewhat creative spot the difference puzzle that accompanied it. Specifically, as pretext to share some image processing insights.

    Portion of the spot the difference Harvest Sale puzzle

    Each identified difference presents a discount code for an exciting game. Surely, one cannot let it go to waste! Alas, years of sitting in a cave in front of a computer destroyed our eyesight; how could we possibly succeed‽ Simple!

    sRGBXYZ conversion

    Posted by Michał ‘mina86’ Nazarewicz on 7th of July 2019 | (cite)

    In an earlier post, I’ve shown how to calculate an RGBXYZ conversion matrix. It’s only natural to follow up with a code for converting between sRGB and XYZ colour spaces. While the matrix is a significant portion of the algorithm, there is one more step necessary: gamma correction.

    What is gamma correction?

    Human perception of light’s brightness approximates a power function of its intensity. This can be expressed as P=Sα where P is the perceived brightness and S is linear intensity. α has been experimentally measured to be less than one which means that people are more sensitive to changes to dark colours rather than to bright ones.

    Based on that observation, colour space’s encoding can be made more efficient by using higher precision when encoding dark colours and lower when encoding bright ones. This is akin to precision of floating-point numbers scaling with value’s magnitude. In RGB systems, the role of precision scaling is done by gamma correction. When colour is captured (for example from a digital camera) it goes through gamma compression which spaces dark colours apart and packs lighter colours more densely. When displaying an image, the opposite happens and encoded value goes through gamma expansion.

    TIL: Browsers ignore Expires header on reload

    Posted by Michał ‘mina86’ Nazarewicz on 26th of February 2019 | (cite)

    This may have been obvious, but I’ve just learned that browsers ignore Expires header when the user manually reloads the page (as in by pressing F5 or choosing Reload option).

    I’ve run into this when testing how Firefox treats pages which ‘never’ expire. To my surprise, the browser made requests for files it had a fresh copy of in its cache. To see behaviour much more representative of the experience of a returning user, one should select the address bar (Alt+D does the trick) and then press Return to navigate to the current page again. Hitting Reload is more akin, though not exactly the same, to the first visit.

    Of course, all of the above applies to the max-age directive of the Cache-Control header as well.

    Moral of the story? Make sure you test the actual real-life scenarios before making any decisions.

    Setting up Tor hidden service

    Posted by Michał ‘mina86’ Nazarewicz on 17th of February 2019 | (cite)

    Anyone can think of myriad reasons to run a Tor hidden service. Surely many unsavoury endeavours spring to mind but of course, there are as many noble ones. There are also various pragmatic causes like circumventing lousy NATs. Me? I just wanted to play around with my router.

    Configuring a hidden service is actually straightforward so to make things more interesting, this article will cover configuring a hidden service on a Turris Omnia router with the help of Linux Containers to maximise isolation of components. While some steps will be Omnia-specific, most translate easily to other systems, so this post may be applicable regardless of the distribution used.