Will the real ARG_MAX please stand up? Part 1

Posted by Michał ‘mina86’ Nazarewicz on 14th of March 2021

arg max is a set of values from function’s domain at which said function reaches its maxima. That’s certainly an arg max but spelled without an underscore thus not the one we are searching for. No, this article is regarding the ARG_MAX that limits the length of arguments to an executable.

Or in other words, why you are getting:

bash: command: Argument list too long

HTML: No, you don’t need to escape that

Posted by Michał ‘mina86’ Nazarewicz on 7th of March 2021

This website being my personal project allows me to experiment and do things I’d never do in professional settings. Most notably, I’m rather found of trying everything I can to reduce the size of the page. This goes beyond mere minification and eventually lead me to wonder if all those characters I’ve been escaping in HTML code really need such treatment.

Libraries offering HTML support will typically provide a function to indiscriminately replace all ampersands, quote characters, less-than and greater-then signs with their corresponding HTML-safe representation. This allows the result to be used in any context in the document and is a good choice for user-input validation. It’s a different matter when it comes to squeezing every last byte. Herein I will explore just which characters and when exactly really need to be escaped in an HTML document.

Regular expressions are broken

Posted by Michał ‘mina86’ Nazarewicz on 28th of February 2021

Quick! What does re.search('foo|foobar', 'foobarbaz').group() produce? Or for those not fluent in Python, how about /foo|foobar/.exec('foobarbaz')? Or to put it into words, what part of string foobarbaz will a foo|foobar regular expression match?

foo foobar

Perhaps it’s just me, but I expected the result to be foobar. That is, for the regular expression to match the longest leftmost substring. Alas, that’s not what is happening. Instead, Python’s and JavaScript’s regex engine will only match foo prefix.

Knowing that, what does re.search('foobar|foo', 'foobarbaz').group() produce (notice the subexpressions in the alternation are swapped). This can be reasoned in two ways: either order of branches in the alternation doesn’t matter — in which case the result should be the same as before, i.e. foo — or it does matter — and now the result will be foobar.

A computer scientist might lean towards the first option but a software engineer will know it’s the second.

Reading stdin with Emacs Client

Posted by Michał ‘mina86’ Nazarewicz on 21st of February 2021

One feature Emacs doesn’t have out of the box is reading data from standard input. Trying to open - (e.g. echo stdin | emacs -) results in Emacs complaining about unknown option (if it ends up starting in graphical mode) or that ‘standard input is not a tty’ (when starting in terminal).

With sufficiently advanced shell one potential solution is the --insert flag paired with command substitution: echo stdin | emacs --insert <(cat). Sadly, it’s not a panacea. It messes up initial buffer (and thus may break setups with custom initial-buffer-choice) and doesn’t address the issue of standard input not being a tty when running Emacs in terminal.

For me the biggest problem though is that it isn’t available when using emacsclient. Fortunately, as previously mentioned the Emacs Server protocol allows for far more than just instructions to open a file. Indeed, my solution to the problem revolves around the use of --eval option:

#!/usr/bin/perl

use strict;
use warnings;

my @args = @ARGV;
if (!@args) {
	my $data;
	$data = join '', <STDIN>;
	$data =~ s/\\/\\\\/g;
	$data =~ s/"/\\"/g;
	$data = <<ELISP;
(let ((buf (generate-new-buffer "*stdin*")))
  (switch-to-buffer buf)
  (insert "$data")
  (goto-char (point-min))
  (x-focus-frame nil)
  (buffer-name buf))
ELISP
	@args = ('-e', $data);
}

exec 'emacsclient', @args;
die "emacsclient: $!\n";

People allergic to Perl may find this Python version more palatable:

Emacs remote file editing over SSHFS

Posted by Michał ‘mina86’ Nazarewicz on 14th of February 2021

Previous article described how to use emacsclient inside of an SSH session. While the solution mentioned there relied on TRAMP, I’ve confessed that it isn’t what I’m actually using. From my experience, TRAMP doesn’t cache as much information as it could and as a result some operations are needlessly slow. For example, delay of find-file prompt completion is noticeable when working over connections with latency in the range of tens of milliseconds or more. Because for a long while I’d been working on a workstation ‘in the cloud’ in a data centre in another country, I’ve built my setup based on SSHFS instead.

It is important to note that TRAMP has myriad of features which won’t be available with this alternative approach. Most notably, it transparently routes shell commands executed from Emacs through SSH which often results in much faster execution than trying to do the same thing over SSHFS. grep command in particular will avoid copying entire files over network when done through TRAMP.

Depending on one’s workflow, either TRAMP-based or SSHFS-based solution may be preferred. If you are happy with TRAMP’s performance or rely on some of its feature, there’s no reason to switch. Otherwise, you might want to try an alternative approach described below.

Greyscale, you might be doing it wrong

Posted by Michał ‘mina86’ Nazarewicz on 7th of February 2021

While working on ansi_colours crate I’ve learned about colour spaces more than I’ve ever thought I would. One of the things were intricacies of greyscale. Or rather not greyscale itself but conversion from sRGB. ‘How hard can it be?’ one might ask following it up with a helpful suggestion to, ‘just sum all components and divide by three!’

Taking an arithmetic mean of red, green and blue coordinates is an often mentioned method. Inaccuracy of the method is usually acknowledged and justified by its simplicity and speed. That’s a fair trade-off except that equally simple and fast algorithms which are noticeably more accurate exist. One such method it is built on an observation that green contributes the most to the perceived brightens of a colour. The formula is (r + 2g + b) / 4 and it increases accuracy of the calculation by taking green channel twice and speed by changing division operation into a bit shift. But that’s not all. Even if speed is an important factor, there are better algorithms.

TL;DR

fn grey_from_rgb_avg_32bit(r: u8, g: u8, b: u8) -> u8 {
    let y = 3567664 * r as u32 + 11998547 * g as u32 + 1211005 * b as u32;
    ((y + (1 << 23)) >> 24) as u8
}

The above implements the best algorithm for converting sRGB into greyscale if speed and simplicity is main concern. It does not involve gamma thus forgoes most complicated and time-consuming arithmetic. It’s much more precise and as fast as arithmetic mean.

That’s hardly the end of the story though as slightly more complicated methods can increase accuracy much more.

Emacs remote file editing over TRAMP

Posted by Michał ‘mina86’ Nazarewicz on 31st of January 2021

I often develop software on remote machines; logged in via SSH to a workstation where all source code reside. In those situations, I like to have things work the same way regardless of which host I’m on. Since more often than not I open files from shell rather than from within my editor, this in particular means having the same command opening files in Emacs available on all computers. emacsclient filename works locally but gets a bit tricky over SSH.

Running Emacs in a terminal is of course possible, but graphical interface provides minor benefits which I like to keep. X forwarding is another option but gets sluggish over high-latency connections. And besides, having multiple Emacs instance running (one local and one remote) is not the way.

Fortunately, by utilising SSH remote forwarding, Emacs can be configured to edit remote files and accept server commands from within an SSH session. Herein I will describe how to accomplish that.

Hacking GOG.com for fun and profit

Posted by Michał ‘mina86’ Nazarewicz on 24th of August 2020

If you have a GOG account, you might have received an email announcing a Harvest Sale. While it’s unusual for harvest to last only 48 hours, but apart from that naming blunder, the sale is no different than many that came before it. What caught my attention was somewhat creative spot the difference puzzle that accompanied it. Specifically, as pretext to share some image processing insights.

Portion of the spot the difference Harvest Sale puzzle

Each identified difference presents a discount code for an exciting game. Surely, one cannot let it go to waste! Alas, years of sitting in a cave in front of a computer destroyed our eyesight; how could we possibly succeed‽ Simple!

sRGB↔XYZ conversion

Posted by Michał ‘mina86’ Nazarewicz on 7th of July 2019

In an earlier post, I’ve shown how to calculate an RGB↔XYZ conversion matrix. It’s only natural to follow up with a code for converting between sRGB and XYZ colour spaces. While the matrix is a significant portion of the algorithm, there is one more step necessary: gamma correction.

What is gamma correction?

Human perception of light’s brightness approximates a power function of its intensity. This can be expressed as \(P = S^\alpha\) where \(P\) is the perceived brightness and \(S\) is linear intensity. \(\alpha\) has been experimentally measured to be less than one which means that people are more sensitive to changes to dark colours rather than to bright ones.

Based on that observation, colour space’s encoding can be made more efficient by using higher precision when encoding dark colours and lower when encoding bright ones. This is akin to precision of floating-point numbers scaling with value’s magnitude. In RGB systems, the role of precision scaling is done by gamma correction. When colour is captured (for example from a digital camera) it goes through gamma compression which spaces dark colours apart and packs lighter colours more densely. When displaying an image, the opposite happens and encoded value goes through gamma expansion.

TIL: Browsers ignore Expires header on reload

Posted by Michał ‘mina86’ Nazarewicz on 26th of February 2019

This may have been obvious, but I’ve just learned that browsers ignore Expires header when the user manually reloads the page (as in by pressing F5 or choosing Reload option).

I’ve run into this when testing how Firefox treats pages which ‘never’ expire. To my surprise, the browser made requests for files it had a fresh copy of in its cache. To see behaviour much more representative of the experience of a returning user, one should select the address bar (Alt+D does the trick) and then press Return to navigate to the current page again. Hitting Reload is more akin, though not exactly the same, to the first visit.

Of course, all of the above applies to the max-age directive of the Cache-Control header as well.

Moral of the story? Make sure you test the actual real-life scenarios before making any decisions.