Posted by Michał ‘mina86’ Nazarewicz on 20th of June 2021 | (cite)
In a past life I’ve talked about a challenge to write the shortest program which prints all prime numbers less than a hundred. Back then I’ve discussed a 60-character long solution written in C. Since Rust is the future, inspired by a recent thread on Sieve of Eratosthenes I’ve decided to carry the task for Rust as well.
To avoid spoiling the solution, I’m padding this article with a bit of unrelated content. To jump straight to the code, skip the next block of paragraphs. Otherwise, here’s a joke for ya:
Posted by Michał ‘mina86’ Nazarewicz on 13th of June 2021 | (cite)
Have you ever wondered how balanced audio cables work? For the longest time I have until finally deciding to look into it. Turns out the principle is actually rather straightforward.
In a normal, unbalanced wire an analogue signal S is sent over a pair of wires: one carries the signal while the other a reference zero. Receiver interprets voltage between the two as the signal. The issue is that over the length of a cable noise is introduced. While transmitter sends S, receiver gets S + e (where e denotes the noise).
A balanced cable addresses this problem by sending the information over three wires: hot (or positive), cold (or negative) and ground. Hot wire carries the signal S as before, cold one carries the inverse of the signal -S and ground is zero as before. Like before, when information travels over the cable, noise is introduced. Crucially, because it’s a single cable, noise on the positive and negative wires are strongly correlated. Receiver therefore gets S + e on hot wire and -S + e on cold wire. All it needs to do is inverse the signal on negative wire and add both signals together. Inversion changes phase of the noises on the cold wire such that it cancels out error remaining on the positive wire: (S + e) + -(-S + e) = S + e + S - e → S.
Posted by Michał ‘mina86’ Nazarewicz on 6th of June 2021 | (cite)
Continuing the new tradition of clickbaity titles, let’s talk about explicitness. It’s a subject that comes up when bike-shedding language and API designs. Pointing out that a construct or a function exhibits implicit behaviour is often taunted as an ultimate winning argument against it.
There are two problems with such line of reasoning. First of all, people claim to care about feature being explicit but came to accept a lot of implicit behaviour without batting an eye. Second of all, no one actually agrees what the terms mean.
In this article I’ll demonstrate those two issues and show that ‘explicit over implicit’ is the wrong value to uphold. It’s merely a proxy for a much more useful goal interfaces should strive for. By the end I’ll demonstrate what we should look at instead.
Posted by Michał ‘mina86’ Nazarewicz on 30th of May 2021 | (cite)
Updated in October 2021 to include direct comparison shift usage between Dvorak and Programmer Dvorak layouts.
A few years age I’ve made a decision that had the potential to change the course of history. Had I went a different path, the pl(dvp) layout might have never seen the light of day. But did I make a wise choice? Or had I chosen poorly?
I’m talking of course about the decision to learn Programmer Dvorak rather than a regular Dvorak keyboard layout. The main differences between the two is that in the former digits are entered with Shift key pressed down which allows several punctuation marks often used when programming to be typed without the need to reach for Shift. The hypothesis goes that developers use digits less often thus such design optimises the layout for them.
To test this I’ve grabbed all my git repositories and constructed a histogram of characters used in text files present there. Since letters are on the same position on both layouts in question, only digits and punctuation characters are compared on the histogram:
Posted by Michał ‘mina86’ Nazarewicz on 23rd of May 2021 | (cite)
Some years ago, during a friendly discussion about C++, a colleague challenged me with a question: what’s the best way to represent a sequence of numbers if delete operation is one that needs to be supported. I argued in favour of a linked list suggesting that with sufficiently large number of elements, it would be much preferred.
In a twist of fate, I’ve been recently discussing an algorithm which reminded my of that conversation. Except this time I was the one arguing against a node-based data structure. Rather than ending things at a conversation, I’ve decided to benchmark a few solutions to make sure which approach is the best.
The problem
The task at hand is simple. Design a data structure which stores a set of words, all of the same length, and offers lookup operation which returns all words matching globs in the form ‘prefix*suffix’. That is, words which start with a given prefix and end with a given suffix. Either part of the pattern may be empty and their concatenation is never longer than length of the words in the collection. Initialisation time and memory footprint are not a concern. Complexity of returning a result can be assumed to be constant.
In this article I’me going to describe possible solutions — some using a boring vector while others taking advantage of an exciting prefix tree — and benchmark the implementations in an ultimate battle between contiguous-memory-based and a node-based containers.
Posted by Michał ‘mina86’ Nazarewicz on 16th of May 2021 | (cite)
‘I’m using slock as my screen locker,’ a wise man once said. He had a beard so surely he was wise.
‘Oh?’ his colleague raised a brow intrigued. ‘Did they fix the PAM bug?’ he prodded inquisitively. Nothing but a confused stare came in reply. ‘slock crashes on systems using PAM,’ he offered an explanation and to demonstrate, he approached a nearby machine and pressed the Return key.
Screens, blanked by a locker a few minutes prior, came back to life, unlocked without the need to enter the password.
Posted by Michał ‘mina86’ Nazarewicz on 9th of May 2021 | (cite)
I’ve written about L*a*b* so it’s only fair that I’ll also describe its twin sister: the L*u*v* colour space (a.k.a. CIELUV). The two share a lot in common. For example, they use the same luminance value, base their chromaticity on opponent process theory and each of them has a corresponding cylindrical LCh coordinate system. Yet, despite those similarities — or perhaps because of them — the CIELUV colour space is often overlooked.
Even though L*a*b* is getting all the limelight, L*u*v* model has its advantages. Before we start comparing the two colour spaces, let’s first go through the conversion formulæ.
Posted by Michał ‘mina86’ Nazarewicz on 2nd of May 2021 | (cite)
Every now and again I need a specific name for operands or results of various arithmetic operations. It usually takes me embarrassingly long time to look that information up. To save time in the future, here’s the list:
† Occasionally used to mean any operand of addition. ‡ Occasionally used to mean the fraction itself rather than only the integer part.
Posted by Michał ‘mina86’ Nazarewicz on 18th of April 2021 | (cite)
In part one we’ve looked at the ARG_MAX parameter on Linux-based systems. We’ve established experimentally how it limits arguments passed to programs and what influences the value. This time, we’ll look directly at the source to verify our findings and see how ARG_MAX looks from the point of view of system libraries and kernel itself.
Posted by Michał ‘mina86’ Nazarewicz on 11th of April 2021 | (cite)
Anyone who uses a screen locker surely can recall a situation where they approached their computer and started typing their password to unlock it even though it was never locked. Even if the machine is configured to lock automatically after a period of inactivity, there may be situations when power saving blanks the monitor even before the automatic locking happens.
If one’s lucky, they realise their mistake in time before hitting Return in a chat window. It’s not uncommon however that one ends with the password blasted into ether over IRC or Google Docs; lazy people might ignore the secret getting saved in their shell history file but even that should facilitate, often annoying, password change.
What if I told you there’s a way to avoid those problems? A one simple trick which will eliminated at least some forms of possible leaks. Simply prefix all your passwords with /! (slash followed by an exclamation mark).
Posted by Michał ‘mina86’ Nazarewicz on 1st of April 2021 | (cite)
Some people find it surprising that zero is an even number. Turns out it’s such a controversial point that Wikipedia’s article on the subject has nearly 5000 words and 75 citations. That’s ten times as long as an entry on toast sandwich which is clearly a more important topic. On the other hand perhaps the confusion is to be expected considering that for centuries zero has been judged an odd digit (if a digit at all). Regardless, zero being even is not news and this is not what this post is about.
Rather, I wish to share another bit of knowledge. As it turns out, infinity is even. Who would have thought! As the working group for the C programming language (SC22 WC14) explains in C99 rationale, ‘all large positive floating-point values are even integers’.
There you go. Next time your Maths teacher asks what’s -42 to infinite power go ahead and exclaim with conviction that it’s plus infinity! The teacher will fail you of course (and rightfully so) unless you’re so lucky that they turn out to secretly be an expert on IEEE 754 standard.
Posted by Michał ‘mina86’ Nazarewicz on 28th of March 2021 | (cite)
No, your eyes are not deceiving you. This website has gone through a redesign and in the process gained a dark mode. Thanks to media queries, the darkness should commence automatically according to reader’s system preferences (as reported by the browsers). You can also customise this website in settings panel in top right (or bottom right).
What are media queries? And how to use them to adjust website’s appearance based on user preferences? I’m glad you’ve asked, because I’m about to describe the CSS and JavaScript magic that enables this feature.
Media queries overview
body { font-family: sans-serif; }
@media print {
body { font-family: serif; }
}
Media queries grew from the @media rule present since the inception of CSS. At first it provided a way to use different styles depending on a device used to view the page. Most commonly used media types where screen and print as seen in the example on the right. Over time the concept evolved into general media queries which allow checking other aspects of the user agent such as display size or browser settings. A simple stylesheet respecting reader’s preferences might be as simple as:
body {
/* Black-on-white by default */
background: #fff;
color: #000;
}
@media (prefers-color-scheme: dark) {
/* White-on-black if user prefers dark colour scheme */
body {
background: #000;
color: #fff;
}
}
That’s enough to get us started but not all browsers support that feature or provide a way for the user to specify desired mode. For example, without a desktop environment Chrome will report light theme preference and Firefox users need to go deep into the bowels of about:config to change ui.systemUsesDarkTheme flag if they are fond of darkness. To accommodate such situations, it’s desirable to provide a JavaScript toggle which defaults to option specified in system settings.
Fortunately, media can be queried through JavaScript and herein I’ll describe how it’s done and how to marry theme switching with browser preferences detection. TL;DR version is to grab a demonstration HTML file which includes a fully working CSS and JavaScript code that can be used to switch themes on a website.
Posted by Michał ‘mina86’ Nazarewicz on 21st of March 2021 | (cite)
After writing about conversion between sRGB and XYZ colour spaces I’ve been asked about a related process: moving between sRGB and CIELAB (perhaps better known as L*a*b*). As this may be of interest to others, I’ve decided to go ahead and make an article out of it. I’ll also touch on CIELChab which is a closely related colour representation.
The L*a*b* colour space was intended to be perceptually uniform. While it’s not truly uniform it’s nonetheless useful and widely used in the industry. For example, it’s the basis of the ΔE*00 colour difference metric. LChab aim to make L*a*b* easier to interpret by replacing a* and b* axes with more intuitive chroma and hue parameters.
Importantly, the conversion between sRGB and L*a*b* goes through XYZ colour space. As such, the full process has multiple steps with a round trip conversion being: sRGB→XYZ→L*a*b*→XYZ→sRGB. Because of that structure I will describe each of the steps separately.
Posted by Michał ‘mina86’ Nazarewicz on 14th of March 2021 | (cite)
arg max is a set of values from function’s domain at which said function reaches its maxima. That’s certainly an arg max but not the one we’re after. No, this article is regarding the ARG_MAX that limits the length of arguments to an executable. Or in other words, why you are getting:
Posted by Michał ‘mina86’ Nazarewicz on 7th of March 2021 | (cite)
This website being my personal project allows me to experiment and do things I’d never do in professional settings. Most notably, I’m rather fond of trying everything I can to reduce the size of the page. This goes beyond mere minification and eventually lead me to wonder if all those characters I’ve been escaping in HTML code require such treatment.
Libraries offering HTML support will typically provide a function to indiscriminately replace all ampersands, quote characters, less-than and greater-then signs with their corresponding HTML-safe representation. This allows the result to be used in any context in the document and is a good choice for user-input validation. It’s a different matter when it comes to squeezing every last byte. Herein I will explore which characters and under what conditions need to be escaped in an HTML document.
Posted by Michał ‘mina86’ Nazarewicz on 28th of February 2021 | (cite)
Quick! What does re.search('foo|foobar', 'foobarbaz').group() produce? Or for those not fluent in Python, how about /foo|foobar/.exec('foobarbaz')? Or to put it into words, what part of string foobarbaz will a foo|foobar regular expression match?
Perhaps it’s just me, but I expected the result to be foobar. That is, for the regular expression to match the longest leftmost substring. Alas, that’s not what is happening. Instead, Python’s and JavaScript’s regex engine will only match foo prefix.
Knowing that, what does re.search('foobar|foo', 'foobarbaz').group() produce (notice the subexpressions in the alternation are swapped). This can be reasoned in two ways: either order of branches in the alternation doesn’t matter — in which case the result should be the same as before, i.e. foo — or it does matter — and now the result will be foobar.
A computer scientist might lean towards the first option but a software engineer will know it’s the second. Here’s a live demonstration of current browser’s handling of those cases:
Posted by Michał ‘mina86’ Nazarewicz on 21st of February 2021 | (cite)
One feature Emacs doesn’t have out of the box is reading data from standard input. Trying to open - (e.g. echo stdin | emacs -) results in Emacs complaining about unknown option (if it ends up starting in graphical mode) or that ‘standard input is not a tty’ (when starting in terminal).
With sufficiently advanced shell one potential solution is the --insert flag paired with command substitution: echo stdin | emacs --insert <(cat). Sadly, it’s not a panacea. It messes up initial buffer (and thus may break setups with custom initial-buffer-choice) and doesn’t address the issue of standard input not being a tty when running Emacs in terminal.
For me the biggest problem though is that it isn’t available when using emacsclient. Fortunately, as previously mentioned the Emacs Server protocol allows for far more than just instructions to open a file. Indeed, my solution to the problem revolves around the use of --eval option:
#!/usr/bin/perl
use strict;
use warnings;
my @args = @ARGV;
if (!@args) {
my $data;
$data = join '', <STDIN>;
$data =~ s/\\/\\\\/g;
$data =~ s/"/\\"/g;
$data = <<ELISP;
(let ((buf (generate-new-buffer "*stdin*")))
(switch-to-buffer buf)
(insert "$data")
(goto-char (point-min))
(x-focus-frame nil)
(buffer-name buf))
ELISP
@args = ('-e', $data);
}
exec 'emacsclient', @args;
die "emacsclient: $!\n";
People allergic to Perl may find this Python version more palatable:
Posted by Michał ‘mina86’ Nazarewicz on 14th of February 2021 | (cite)
Previous article described how to use emacsclient inside of an SSH session. While the solution mentioned there relied on TRAMP, I’ve confessed that it isn’t what I’m actually using. From my experience, TRAMP doesn’t cache as much information as it could and as a result some operations are needlessly slow. For example, delay of find-file prompt completion is noticeable when working over connections with latency in the range of tens of milliseconds or more. Because for a long while I’d been working on a workstation ‘in the cloud’ in a data centre in another country, I’ve built my setup based on SSHFS instead.
It is important to note that TRAMP has myriad of features which won’t be available with this alternative approach. Most notably, it transparently routes shell commands executed from Emacs through SSH which often results in much faster execution than trying to do the same thing over SSHFS. grep command in particular will avoid copying entire files over network when done through TRAMP.
Depending on one’s workflow, either TRAMP-based or SSHFS-based solution may be preferred. If you are happy with TRAMP’s performance or rely on some of its feature, there’s no reason to switch. Otherwise, you might want to try an alternative approach described below.
Posted by Michał ‘mina86’ Nazarewicz on 7th of February 2021 | (cite)
While working on ansi_colours crate I’ve learned about colour spaces more than I’ve ever thought I would. One of the things were intricacies of greyscale. Or rather not greyscale itself but conversion from sRGB. ‘How hard can it be?’ one might ask following it up with a helpful suggestion to, ‘just sum all components and divide by three!’
Taking an arithmetic mean of red, green and blue coordinates is an often mentioned method. Inaccuracy of the method is usually acknowledged and justified by its simplicity and speed. That’s a fair trade-off except that equally simple and fast algorithms which are noticeably more accurate exist. One such method is built on an observation that green contributes the most to the perceived brightens of a colour. The formula is (r + 2g + b) / 4 and it increases accuracy (by taking green channel twice) as well as speed (by changing division operation into a bit shift). But that’s not all. Even better formulæ exist.
TL;DR
fn grey_from_rgb_avg_32bit(r: u8, g: u8, b: u8) -> u8 {
let y = 3567664 * r as u32 + 11998547 * g as u32 + 1211005 * b as u32;
((y + (1 << 23)) >> 24) as u8
}
The above implements the best algorithm for converting sRGB into greyscale if speed and simplicity is main concern. It does not involve gamma thus forgoes most complicated and time-consuming arithmetic. It’s much more precise and as fast as arithmetic mean.
Posted by Michał ‘mina86’ Nazarewicz on 31st of January 2021 | (cite)
I often develop software on remote machines; logged in via SSH to a workstation where all source code reside. In those situations, I like to have things work the same way regardless of which host I’m on. Since more often than not I open files from shell rather than from within my editor, this in particular means having the same command opening files in Emacs available on all computers. emacsclient filename works locally but gets a bit tricky over SSH.
Running Emacs in a terminal is of course possible, but graphical interface provides minor benefits which I like to keep. X forwarding is another option but gets sluggish over high-latency connections. And besides, having multiple Emacs instance running (one local and one remote) is not the way.
Fortunately, by utilising SSH remote forwarding, Emacs can be configured to edit remote files and accept server commands from within an SSH session. Herein I will describe how to accomplish that.