Regular expressions are broken

Posted by Michał ‘mina86’ Nazarewicz on 28th of February 2021

Quick! What does re.search('foo|foobar', 'foobarbaz').group() produce? Or for those not fluent in Python, how about /foo|foobar/.exec('foobarbaz')? Or to put it into words, what part of string foobarbaz will a foo|foobar regular expression match?

foo foobar

Perhaps it’s just me, but I expected the result to be foobar. That is, for the regular expression to match the longest leftmost substring. Alas, that’s not what is happening. Instead, Python’s and JavaScript’s regex engine will only match foo prefix.

Knowing that, what does re.search('foobar|foo', 'foobarbaz').group() produce (notice the subexpressions in the alternation are swapped). This can be reasoned in two ways: either order of branches in the alternation doesn’t matter — in which case the result should be the same as before, i.e. foo — or it does matter — and now the result will be foobar.

A computer scientist might lean towards the first option but a software engineer will know it’s the second.

Reading stdin with Emacs Client

Posted by Michał ‘mina86’ Nazarewicz on 21st of February 2021

One feature Emacs doesn’t have out of the box is reading data from standard input. Trying to open - (e.g. echo stdin | emacs -) results in Emacs complaining about unknown option (if it ends up starting in graphical mode) or that ‘standard input is not a tty’ (when starting in terminal).

With sufficiently advanced shell one potential solution is the --insert flag paired with command substitution: echo stdin | emacs --insert <(cat). Sadly, it’s not a panacea. It messes up initial buffer (and thus may break setups with custom initial-buffer-choice) and doesn’t address the issue of standard input not being a tty when running Emacs in terminal.

For me the biggest problem though is that it isn’t available when using emacsclient. Fortunately, as previously mentioned the Emacs Server protocol allows for far more than just instructions to open a file. Indeed, my solution to the problem revolves around the use of --eval option:

#!/usr/bin/perl

use strict;
use warnings;

my @args = @ARGV;
if (!@args) {
	my $data;
	$data = join '', <STDIN>;
	$data =~ s/\\/\\\\/g;
	$data =~ s/"/\\"/g;
	$data = <<ELISP;
(let ((buf (generate-new-buffer "*stdin*")))
  (switch-to-buffer buf)
  (insert "$data")
  (goto-char (point-min))
  (x-focus-frame nil)
  (buffer-name buf))
ELISP
	@args = ('-e', $data);
}

exec 'emacsclient', @args;
die "emacsclient: $!\n";

People allergic to Perl may find this Python version more palatable:

Emacs remote file editing over SSHFS

Posted by Michał ‘mina86’ Nazarewicz on 14th of February 2021

Previous article described how to use emacsclient inside of an SSH session. While the solution mentioned there relied on TRAMP, I’ve confessed that it isn’t what I’m actually using. From my experience, TRAMP doesn’t cache as much information as it could and as a result some operations are needlessly slow. For example, delay of find-file prompt completion is noticeable when working over connections with latency in the range of tens of milliseconds or more. Because for a long while I’d been working on a workstation ‘in the cloud’ in a data centre in another country, I’ve built my setup based on SSHFS instead.

It is important to note that TRAMP has myriad of features which won’t be available with this alternative approach. Most notably, it transparently routes shell commands executed from Emacs through SSH which often results in much faster execution than trying to do the same thing over SSHFS. grep command in particular will avoid copying entire files over network when done through TRAMP.

Depending on one’s workflow, either TRAMP-based or SSHFS-based solution may be preferred. If you are happy with TRAMP’s performance or rely on some of its feature, there’s no reason to switch. Otherwise, you might want to try an alternative approach described below.

Greyscale, you might be doing it wrong

Posted by Michał ‘mina86’ Nazarewicz on 7th of February 2021

While working on ansi_colours crate I’ve learned about colour spaces more than I’ve ever thought I would. One of the things were intricacies of greyscale. Or rather not greyscale itself but conversion from sRGB. ‘How hard can it be?’ one might ask following it up with a helpful suggestion to, ‘just sum all components and divide by three!’

Taking an arithmetic mean of red, green and blue coordinates is an often mentioned method. Inaccuracy of the method is usually acknowledged and justified by its simplicity and speed. That’s a fair trade-off except that equally simple and fast algorithms which are noticeably more accurate exist. One such method it is built on an observation that green contributes the most to the perceived brightens of a colour. The formula is (r + 2g + b) / 4 and it increases accuracy of the calculation by taking green channel twice and speed by changing division operation into a bit shift. But that’s not all. Even if speed is an important factor, there are better algorithms.

TL;DR

fn grey_from_rgb_avg_32bit(r: u8, g: u8, b: u8) -> u8 {
    let y = 3567454 * r as u32 + 11998779 * g as u32 + 1210983 * b as u32;
    ((y + (1 << 23)) >> 24) as u8
}

The above function implements the best algorithm for converting sRGB into greyscale which does not involve gamma. At the same time it’s practically as fast as arithmetic mean. That’s hardly the end of the story though as slightly more complicated methods can increase accuracy much further.

Emacs remote file editing over TRAMP

Posted by Michał ‘mina86’ Nazarewicz on 31st of January 2021

I often develop software on remote machines; logged in via SSH to a workstation where all source code reside. In those situations, I like to have things work the same way regardless of which host I’m on. Since more often than not I open files from shell rather than from within my editor, this in particular means having the same command opening files in Emacs available on all computers. emacsclient filename works locally but gets a bit tricky over SSH.

Running Emacs in a terminal is of course possible, but graphical interface provides minor benefits which I like to keep. X forwarding is another option but gets sluggish over high-latency connections. And besides, having multiple Emacs instance running (one local and one remote) is not the way.

Fortunately, by utilising SSH remote forwarding, Emacs can be configured to edit remote files and accept server commands from within an SSH session. Herein I will describe how to accomplish that.