Techblog — mina86.comhttp://mina86.com/atom/cat/techblog/content/html/2023-04-01T13:13:13ZMichał ‘mina86’ Nazarewiczhttps://mina86.comRAII: Tragedy in three actshttp://mina86.com/2023/04/01/raii-c-cpp-and-rust2023-04-01T13:13:13Z2023-04-01T13:13:13ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p>In <a href="//www.youtube.com/watch?v=pTMvh6VzDls">a recent Computerphile video</a>, Ian Knight talked about RAII idiom and it’s application in C++ and Rust. While the video described the general concepts, I felt different examples could be more clearly convey essence of the topic.<p>I’ve decided to give my own explanation to hopefully better illustrate what RAII is and how it relates to Rust’s ownership. Then again, for whatever reason I’ve decided to write it as a play with dialogue in faux Old English so it may well be even more confusing instead.<style>.play h3{font-size:1.25em;margin:0}.play td{vertical-align:top}.play td:not(:last-child){font-size:0.875em;font-variant:small-caps;text-align:right;text-transform:uppercase;}.play td:not(:last-child) abbr{font-variant:small-caps;text-transform:uppercase;}.play td:last-child{text-align:justify;}.dir{font-style:oblique}</style><table class=play><tbody><tr><td><td><h2>Cast of characters</h2><tr><td><td class=sm>(In the order of appearance)<tr><td>Gregory<td>A software engineer and Putuel’s employee number #1<tr><td>Sampson<td>A software engineer and a self-proclaimed 10× developer<tr><td>Paris<td>An apprentice returning to Putuel two summers in a row<tr><td>CTO<td>Puteal’s Chief Technical Officer spending most of his time in meetings<tr><td>Admin<td>Administrative assistant working in Puteal Corporation’s headquarters in Novear<tbody><tr><td><td><h2>Act I</h2><tbody><tr><td><h3>Scene I</h3><td>Novear. A public place.<tr><td><td class=dir>Enter Sampson and Gregory, two senior engineers of the Puteal Corporation, carrying laptops and phones<tr><td>Gregory<td>Pray tell, what doth the function’s purpose?<tr><td>Sampson<td>It doth readeth a number from a file. A task as trivial as can be and yet QA reports memory leak after my change. Hence, I come to thee for help.<tr><td><td class=dir>Both look at a laptop showing code Sampson has written [error handling omitted for brevity from all source code listings]:<tr><td colspan=2><pre>double read_double(FILE *fd) {
char *buffer = malloc(1024); <i>/* allocate temporary buffer */</i>
fgets(buffer, 1024, fd); <i>/* read first line of the file */</i>
return atof(buffer); <i>/* parse and return the number */</i>
}</pre><tr><td>Gregory<td>Thine mistake is apparent. Thou didst allocate memory but ne’er freed it. Verily, in C thou needs’t to explicitly free any memory thou dost allocate. Submit this fix and thy code shall surely pass.<tr><td colspan=2><pre>double read_double(FILE *fd) {
char *buffer = malloc(1024); <i>/* allocate temporary buffer */</i>
fgets(buffer, 1024, fd); <i>/* read first line of the file */</i>
double result = atoi(buffer); <i>/* parse the line */</i>
<ins>free(buffer); <i>/* free the temporary buffer */</i></ins>
return result; <i>/* return parsed number */</i>
}</pre><tbody><tr><td><h3>Scene II</h3><td>A hall.<tr><td><td class=dir>Enter Puteal CTO, an apprentice called Paris and an Admin<tr><td>Paris<td>I’ve done as Sampson beseeched of me. I’ve taken the <code>read_double</code> function and changed it so that it doth taketh file path as an argument. He hath warned me about managing memory and so I’ve made sure all temporary buffers are freed. Nonetheless, tests fail.<tr><td colspan=2><pre>double read_double(const char *path) {
FILE *fd = fopen(path, "r"); <i>/* open file */</i>
<small>char *buffer = malloc(1024);</small>
<small>fgets(buffer, 1024, fd);</small>
<small>double result = atof(buffer);</small>
<small>free(buffer);</small>
<small>return result;</small>
}</pre><tr><td>CTO<td>Thou didst well managing memory, but memory isn’t the only resource that needs to be freed. Just like allocations, if thou dost open a file, thou must close it anon once thou art done with it.<tr><td><td class=dir>Exit CTO and Admin towards sounds of a starting meeting<tr><td>Paris<td>Managing resources is no easy task but I think I’m starting to get the hang of it.<tr><td colspan=2><pre>double read_double(const char *path) {
FILE *fd = fopen(path, "r"); <i>/* open file */</i>
<small>char *buffer = malloc(1024);</small>
<small>fgets(buffer, 1024, fd);</small>
<ins>fclose(fd); <i>/* close the file */</i></ins>
<small>double result = atof(buffer);</small>
<small>free(buffer);</small>
<small>return result;</small>
}</pre><tbody><tr><td><h3 na>Scene III</h3><td>Novear. A room in Puteal’s office.<tr><td><td class=dir>Enter Paris and Sampson they set them down on two low stools, and debug<tr><td>Paris<td>The end of my apprenticeship is upon me and yet my code barely doth work. It canst update the sum once but as soon as I try doing it for the second time, nothing happens.<tr><td colspan=2><pre>double update_sum_from_file(mtx_t *lock,
double *sum,
const char *path) {
double value = read_double(path); <i>/* read next number from file */</i>
mtx_lock(lock); <i>/* reserve access to `sum` */</i>
value += sum->value; <i>/* calculate sum */</i>
sum->value = value; <i>/* update the sum */</i>
return value; <i>/* return new sum */</i>
}</pre><tr><td>Sampson<td>Thou hast learned well that resources need to be acquired and released. But what thou art missing is that not only system memory or a file descriptor are resources.<tr><td>Paris<td>So just like memory needs to be freed, files need to be closed and locks needs to be unlocked!<tr><td colspan=2><pre>double update_sum_from_file(mtx_t *lock,
double *sum,
const char *path) {
double value = read_double(path); <i>/* read next number from file */</i>
mtx_lock(lock); <i>/* reserve access to `sum` */</i>
value += *sum; <i>/* calculate sum */</i>
*sum = value; <i>/* update the sum */</i>
<ins>mtx_unlock(lock); <i>/* release `sum` */</i></ins>
return value; <i>/* return new sum */</i>
}</pre><tr><td rowspan=2>Paris<td>I’m gladdened I partook the apprenticeship. Verily, I’ve learned that resources need to be freed once they art no longer used. But also that many things can be modelled like a resource.<tr><td>I don’t comprehend why it all needs to be done manually?<tr><td><td class=dir>Exit Sampson while Paris monologues leaving him puzzled<tbody><tr><td><td><h2>Act II</h2><tbody><tr><td><h3>Scene I</h3><td>Court of Puteal headquarters.<tr><td><td class=dir>Enter Sampson and Paris bearing a laptop before him<tr><td>Paris<td>Mine last year’s apprenticeship project looks naught like mine own handiwork.<tr><td>Sampson<td>Thou seest, in the year we migrated our code base to C++.<tr><td>Paris<td>Aye, I understandeth. But I spent so much time learning about managing resources and yet the new code doth not close its file.<tr><td><td class=dir>Enter Gregory and an Admin with a laptop. They all look at code on Paris’ computer:<tr><td colspan=2><pre>double read_double(const char *path) {
std::fstream file{path}; <i>/* open file */</i>
double result; <i>/* declare variable to hold result */</i>
file >> result; <i>/* read the number */</i>
return result; <i>/* return the result */</i>
}</pre><tr><td>Sampson<td>Oh, that’s RAII. <dfn>Resource Acquisition Is Initialisation</dfn> idiom. C++ usetht it commonly.<tr><td rowspan=2>Gregory<td>Resource is acquired when object is initialised and released when it’s destroyed. The compiler tracks lifetimes of local variables and thusly handles resources for us.<tr><td>By this method, all manner of resources can be managed. And forsooth, for more abstract concepts without a concrete object representing them, such as the concept of exclusive access to a variable, a guard class can be fashioned. Gaze upon this other function:<tr><td colspan=2><pre>double update_sum_from_file(std::mutex &lock,
double *sum,
const char *path) {
double value = read_double(path); <i>/* read next number from file */</i>
std::lock_guard<std::mutex> lock{mutex}; <i>/* reserve access to `sum` */</i>
value += *sum; <i>/* calculate sum */</i>
*sum = value; <i>/* update the sum */</i>
return value; <i>/* return new sum */</i>
}</pre><tr><td>Paris<td>I perceive it well. When the <code>lock</code> goes out of scope, the compiler shall run its destructor, which shall release the mutex. Such was my inquiry yesteryear. Thus, compilers can render managing resources more automatic.<tbody><tr><td><h3>Scene II</h3><td>Novear. Sampson’s office.<tr><td><td class=dir>Enter Gregory and Sampson<tr><td>Sampson<td>Verily, this bug doth drive me mad! To make use of the RAII idiom, I’ve writ an <code>nptr</code> template to automatically manage memory.<tr><td colspan=2><pre>template<class T>
struct nptr {
nptr(T *ptr) : ptr(ptr) {} <i>/* take ownership of the memory */</i>
~nptr() { delete ptr; } <i>/* free memory when destructed */</i>
T *operator->() { return ptr; }
T &operator*() { return *ptr; }
private:
T *ptr;
};</pre><tr><td>Gregory<td>I perceive… And what of the code that bears the bug?<tr><td>Sampson<td>'Tis naught but a simple code which calculates sum of numbers in a file:<tr><td colspan=2><pre>std::optional<double> try_read_double(nptr<std::istream> file) {
double result;
return *file >> result ? std::optional{result} : std::nullopt;
}
double sum_doubles(const char *path) {
nptr<std::istream> file{new std::fstream{path}};
std::optional<double> number;
double result = 0.0;
while ((number = try_read_double(file))) {
result += *number;
}
return result;
}</pre><tr><td><td class=dir>Enter Paris with an inquiry for Sampson; seeing them talk he pauses and listens in<tr><td rowspan=2>Gregory<td>The bug lies in improper ownership tracking. When ye call the <code>try_read_double</code> function, a copy of thy <code>nptr</code> is made pointing to the file stream. When that function doth finish, it frees that very stream, for it believes that it doth own it. Alas, then you try to use it again in next loop iteration.<tr><td>Why hast thou not made use of <code>std::unique_ptr</code>?<tr><td>Sampson<td>Ah! I prefer my own class, good sir.<tr><td>Gregory<td>Thine predicament would have been easier to discern if thou hadst used standard classes. In truth, if thou wert to switch to the usage of <code>std::unique_ptr</code>, the compiler would verily find the issue and correctly refuse to compile the code.<tr><td colspan=2><pre>std::optional<double> try_read_double(<ins>std::unique_ptr<std::istream></ins> file) {
double result;
return *file >> result ? std::optional{result} : std::nullopt;
}
double sum_doubles(const char *path) {
<ins>auto file = std::make_unique<std::fstream>(path);</ins>
std::optional<double> number;
double result = 0.0;
while ((number = try_read_double(file))) { <i>/* compile error */</i>
result += *number;
}
return result;
}</pre><tr><td><td class=dir>Exit Gregory, exit Paris moment later<tbody><tr><td><h3 na>Scene III</h3><td>Before Sampson’s office.<tr><td><td class=dir>Enter Gregory and Paris, meeting<tr><td>Paris<td>I’m yet again vexed. I had imagined that with RAII, the compiler would handle all resource management for us?<tr><td>Gregory<td>Verily, for RAII to function, each resource must be owned by a solitary object. If the ownership may be duplicated then problems shall arise. Ownership may only be moved.<tr><td>Paris<td>Couldn’t compiler enforce that just like it can automatically manage resources?<tr><td>Gregory<td>Mayhap the compiler can enforce it, but it’s not a trivial matter. Alas, if thou art willing to spend time to model ownership in a way that the compiler understands, it can prevent some of the issues. However, thou wilt still require an escape hatch, for in the general case, the compiler cannot prove the correctness of the code.<tr><td><td class=dir>Exit Gregory and Paris, still talking<tbody><tr><td><td><h2 na>Act III</h2><tbody><tr><td><h3>Scene I</h3><td>A field near Novear.<tr><td><td class=dir>Enter Gregory and Paris<tr><td>Gregory<td>Greetings, good fellow! How hast thou been since thy apprenticeship?<tr><td>Paris<td>I’ve done as thou hast instructed and looked into Rust. It is as thou hast said. I’ve recreated Sampson’s code and the compiler wouldn’t let me run it:<tr><td colspan=2><pre>fn try_read_double(rd: Box<dyn std::io::Read>) -> Option<f64> {
todo!()
}
fn sum_doubles(path: &std::path::Path) -> f64 {
let file = std::fs::File::open(path).unwrap();
let file: Box<dyn std::io::Read> = Box::new(file);
let mut result = 0.0;
while let Some(number) = try_read_double(file) {
result += number;
}
result
}</pre><tr><td>Gregory<td>Verily, the compiler hath the vision to behold the migration of file’s ownership into the realm of <code>try_read_double</code> function during the first iteration and lo, it is not obtainable any longer by <code>sum_doubles</code>.<tr><td colspan=2><pre>error[E0382]: use of moved value: `file`
let file: Box<dyn std::io::Read> = Box::new(file);
---- move occurs because `file` has type `Box<dyn std::io::Read>`,
which does not implement the `Copy` trait
let mut result = 0.0;
while let Some(number) = try_read_double(file) {
^^^^ value moved here, in previous
iteration of loop</pre><tr><td>Paris<td>Alas, I see not what thou hast forewarned me of. The syntax present doth not exceed that which wouldst be used had this been writ in C++:<tr><td colspan=2><pre>fn try_read_double(rd: &dyn std::io::Read) -> Option<f64> {
todo!()
}
fn sum_doubles(path: &std::path::Path) -> f64 {
let file = std::fs::File::open(path).unwrap();
let file: Box<dyn std::io::Read> = Box::new(file);
let mut result = 0.0;
while let Some(number) = try_read_double(&*file) {
result += number;
}
result
}</pre><tr><td>Gregory<td>Verily, the Rust compiler is of great wit and often elides lifetimes. Nonetheless, other cases may prove more intricate.<tr><td colspan=2><pre>struct Folder<T, F>(T, F);
impl<T, F: for <'a, 'b> Fn(&'a mut T, &'b T)> Folder<T, F> {
fn push(&mut self, element: &T) {
(self.1)(&mut self.0, element)
}
}</pre><tr><td>Paris<td>Surely though, albeit this code is more wordy, it is advantageous if I cannot commit an error in ownership.<tr><td>Gregory<td>Verily, there be manifold factors in the selection of a programming tongue. And there may be aspects which may render other choices not imprudent.</table><h2>Aforeword</h2><p>A thing to keep in mind is that the examples are somewhat contrived. For example, the buffer and file object present in <code>read_double</code> function can easily live on the stack. A real-life code wouldn’t bother allocating them on the heap. Then again, I could see a beginner make a mistake of trying to bypass <code>std::unique_ptr</code> not having a copy constructor by creating objects on heap and passing pointers around.<p>In the end, is this better explanation than the one in aforementioned Computerphile video? I’d argue code examples represent discussed concepts better though to be honest form of presentation hinders clarity of explanation. Yet, I had too much fun messing around with this post so here it is in this form.<p>Lastly, I don’t know Old English so the dialogue is probably incorrect. I’m happy to accept corrections but otherwise I don’t care that much. One shouldn’t take this post too seriously.Monospace considered harmfulhttp://mina86.com/2023/03/19/monospace-considered-harmful2023-03-19T16:34:53Z2023-03-19T16:34:53ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p>No, I haven’t gone completely mad yet and still, I write this as an appeal to stop using monospaced fonts for code <small>(conditions may apply)</small>. While fixed-width fonts have undeniable benefits when authoring software, their use is excessive and even detrimental in certain contexts. Specifically, when displaying inline code within a paragraph of text, proportional fonts are a better choice.<h2>The downsides</h2><figure class=fr><svg xmlns=http://www.w3.org/2000/svg version=1.1 width=17.5em height=10em viewbox="0 0 280 160" stroke-width=1><g fill=var(--i) stroke=var(--e)><circle cx=16 cy=12 r=4></circle><circle cx=32 cy=30 r=4></circle><circle cx=80 cy=48 r=4></circle><circle cx=92 cy=66 r=4></circle><circle cx=96 cy=84 r=4></circle><circle cx=132 cy=102 r=4></circle><circle cx=160 cy=120 r=4></circle></g><path stroke=var(--e) d=M20,0v135M140,0v135M260,0v135 /><text font-size=0.75em text-anchor=middle fill=var(--e)><tspan x=20 y=144>4′30″</tspan><tspan x=140 y=144>5′</tspan><tspan x=260 y=144>5′30″</tspan></text><text font-size=0.75em><tspan x=26 y=16>Tahoma</tspan><tspan x=42 y=34>Times New Roman</tspan><tspan x=90 y=52>Verdana</tspan><tspan x=102 y=70>Arial</tspan><tspan x=106 y=88>Comic Sans</tspan><tspan x=142 y=106>Georgia</tspan><tspan x=170 y=124>Courier New</tspan></text></svg><figcaption>Fig. 1. Comparison of time needed to read text set with different fonts. Reading fixed-width Courier New is 13% slower than reading Tahoma.</figcaption></figure><p>Fixed-width fonts for inline code have a handful of downsides. Firstly, text set in such font takes up more space and, depending on the font pairing, individual letters may appear larger. This creates unbalanced look and opportunities for awkward line wrapping.<p>Moreover, a fixed-width typeface <a href=//blog.codinghorror.com/comparing-font-legibility/ >has been shown to be slower to read</a>. Even disregarding the speed differences, switching between two drastically different types of font isn’t comfortable.<p>To make matters worse, many websites apply too many styles to inline code fragments. For example GitHub and GitLab (i) change the font, (ii) decrease its size, (iii) add background and (iv) add padding. This overemphasis detracts from the content rather than enhancing it.<h2>A better way</h2><p id=b1>A better approach is using serif (sans-serif) font for the main text and a sans-serif (serif) font for inline code<a href=#f1>†</a>. Or if serif’s aren’t one’s cup of tea, even within the same font group a pairing allowing for clear differentiation between the main text and the code is possible. For example a humanist font paired with a complementary geometric font.<p id=b2>Another option is to format code with a different colour. To avoid using <a href=//www.w3.org/WAI/WCAG21/Understanding/use-of-color.html>it as the only mean of conveying information</a>, a subtle colour change may be used in conjunction with font change. This is the approach I’ve taken on this blog<a href=#f2>‡</a>.<p>It’s also worth considering whether inline code even needs any kind of style change. For example, the sentence ‘Execution of a C program starts from the main function’ is perfectly understandable whether or not ‘main’ is styled differently.<h2>Epilogue</h2><p>What about code blocks? Using proportional typefaces for them can be done with some care. Indentation isn’t a major problem but some alignment may need adjustments. Depending on the type of code listings, it may be an option. Having said that, I don’t claim this as the only correct option for web publishing.<p>As an aside, what’s the deal with parenthesise after a function name? To demonstrate, lets reuse an earlier example: ‘Execution of a C program starts from the <code>main()</code> function’. The brackets aren’t part of the function name and unless they are used to disambiguate between multiple overloaded functions, there’s no need for them.<p>To conclude, while fixed-width fonts have their place when writing code, their use in displaying inline code is often unnecessary. Using a complementary pairing of proportional typefaces is a better options that can enhance readability. Changing background of inline code is virtually never a good idea.<p id=f1><a href=#b1>†</a> Using serif faces on websites used to carry risk of aliasing reducing legibility. Thankfully, the rise of high DPI displays largely alleviated those concerns.<p id=f2><a href=#b2>‡</a> Combining colour change and typeface change breaks principle of using small style changes. Nonetheless, I believe some leniency for websites is in order. It’s not always guaranteed that readers will see fonts author has chosen making colour change kind of a backup. Furthermore, compared to books, change in colour isn’t as focus-grabbing on the Internet.URLs with <code>//</code> at the beginninghttp://mina86.com/2022/12/04/urls-support-slash-slash2022-12-04T02:09:04Z2022-12-04T02:09:04ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p>A quick reminder that relative URLs can start with a double slash and that this means something different than a single slash at the beginning. Specifically, such relative addresses are resolved by taking the schema (and only the schema) of the website they are on.<p>For example, the code for the link to my repositories in the site’s header is <code><a href="//github.com/mina86">Code</a></code>. Since this page uses <code>https</code> schema, browsers will navigate to <code>https://github.com/mina86</code> if the link is activated.<p>This little trick can save you some typing, but more importantly, if you’re developing a URL parsing code or a crawler, make sure that it handles this case correctly. It may seem like a small detail, but it can have a lasting impact on the functionality of your code.Secret command Google doesn’t want you to knowhttp://mina86.com/2022/11/20/how-to-change-google-language2022-11-20T17:55:45Z2022-11-20T17:55:45ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p style=text-align:center>Or how to change language of Google website.<p>If you’ve travelled abroad, you may have noticed that Google <em>tries</em> to be helpful and uses the language of the region you’re in on its websites. It doesn’t matter if your operating system is set to Spanish, for example; Google Search will still use Portuguese if you happen to be in Brazil.<p>Fortunately, there’s a simple way to force Google to use a specific language. All you need to do is append <code>?hl=<var>lang</var></code> to the website’s address, replacing <var>lang</var> with <a href=//en.wikipedia.org/wiki/List_of_ISO_639-1_codes>a two-letter code</a> for the desired language. For instance, <a href="//www.google.com/?hl=es"><code>?hl=es</code></a> for Spanish, <a href="//www.google.com/?hl=ht"><code>?hl=ht</code></a> for Haitian, or <a href="//www.google.com/?hl=uk"><code>?hl=uk</code></a> for Ukrainian.<p>If the URL already contains a question mark, you need to append <code>&hl=<var>lang</var></code> instead. Additionally, if it contains a hash symbol, you need to insert the string immediately before the hash symbol. For example:<ul><li><code>https://www.google.com/<strong>?hl=es</strong></code><li><code>https://www.google.com/search?q=bread+sandwich<strong>&hl=es</strong></code><li><code>https://analytics.google.com/analytics/web/<strong>?hl=es</strong>#/report-home/</code></ul><p>By the way, as a legacy of Facebook having hired many ex-Google employees, the parameter also work on some of the Facebook properties.<p>This trick doesn’t work on all Google properties. However, it seems to be effective on pages that try to guess your language preference without giving you the option to override it. For example, while Gmail ignores the parameter, you can change its display language in the settings (accessible via the gear icon near the top right of the page). Similarly, YouTube strips the parameter, but it respects preferences configured in the web browser.<p>Anyone familiar with HTTP may wonder why Google doesn’t simply look at <a href=//developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language>the <code>Accept-Language</code> header</a>. Problem is that many users have their browser configured with defaults that send English as the only option, even though they would prefer another language. In those cases, it’s more user-friendly to ignore that header. As it turns out, localisation is hard.Curious case of missing πhttp://mina86.com/2022/06/28/curious-case-of-missing-pi2022-06-28T03:18:53Z2022-06-28T03:18:53ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p>π is one of those constants which pops up when least expected. At the same time it’s sometimes missing when most needed. For example, consider the following application calculating area of a disk (not to be confused with area of a circle which is zero):<pre>#include <math.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
for (int i = 1; i < argc; ++i) {
const double r = atof(argv[i]);
printf("%f\n", M_PI * r * r);
}
}</pre><p>It uses features introduced in the 1999 edition of the C standard (often referred to as C99) so it might be good to inform the compiler of that fact with a <code>-std=c99</code> flag. Unfortunately, doing so leads to an error:<pre>$ gcc -std=c99 -o area area.c
area.c: In function ‘main’:
area.c:8:18: error: ‘M_PI’ undeclared (first use in this function)
8 | printf("%f\n", M_PI * r * r);
| ^~~~</pre><p>What’s going on? Shouldn’t <code>math.h</code> provide the definition of <code>M_PI</code> symbol? It’s what <a href=//pubs.opengroup.org/onlinepubs/9699919799/basedefs/math.h.html>the specification</a> claims after all. ‘glibc is broken’ some may even proclaim. In this article I’ll explain why the compiler conspire with the standard library to behave this way and why it’s the only valid thing it can do.<h2>The problem</h2><p id=b1>First of all, it needs to be observed that the aforecited specification is <em>not</em> the C standard. Instead, it’s POSIX and it marks <code>M_PI</code> with an [XSI] tag. This means that <code>M_PI</code> ‘is part of the X/Open Systems Interfaces option’ and ‘is an extension to the ISO C standard.’ Indeed, <a href=http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf>The C99 standard</a><a href=#f1>†</a> doesn’t define this constant.<p>Trying to support multiple standards, gcc and glibc behave differently depending on arguments. With <code>-std=c99</code> switch the compiler conforms to C standard; without the switch, it includes all the POSIX extensions.<p>A naïve approach would be to make life easier and unconditionally provide the constant. Alas, the <code>M_PI</code> identifier is neither defined nor reserved by the C standard. In other words, programmer can freely use it and a conforming compiler cannot complain. For example, the following is a well-formed C program and C compiler has no choice but to accept it:<pre>#include <math.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
const double M_PI = 22.0 / 7.0;
for (int i = 1; i < argc; ++i) {
const double r = atof(argv[i]);
printf("%f\n", M_PI * r * r);
}
}</pre><p>Should compiler always define <code>M_PI</code> in <code>math.h</code> the above code wouldn’t work.<h2>The solution</h2><p>The developer who needs π constant has a few ways to solve the problem. For maximum portability it has to be defined in the program itself. To make things work even when building on a Unix-like system, an <code>ifdef</code> guard can be used. For example:<pre>⋮
#ifndef M_PI
# define M_PI 3.141592653589793238462643383279502884
#endif
⋮</pre><p>Another solution is to limit compatibility to Unix-like systems. In this case, <code>M_PI</code> constant can be freely used and <code>-std</code> switch shouldn’t be passed when building the program.<h3>Feature test macros</h3><p>glibc provides one more approach. The C standard has a notion of reserved identifiers which cannot be freely used by programmers. They are used for future language development and to allow implementations to provide their own extensions to the language.<p>For example, when C99 added boolean type to the language, it did it by defining a <code>_Bool</code> type. A <code>bool</code>, <code>true</code> and <code>false</code> symbols became available only through a <code>stdbool.h</code> file. Such approach means that C89 code continues to work when built with C99 compiler even if it used <code>bool</code>, <code>true</code> or <code>false</code> identifiers in its sown way..<p>Similarly, glibc introduced <a href=//lwn.net/Articles/590381>feature test macros</a>. They all start with an underscore followed by a capital letter. Identifiers in this form are reserved by the C standard thus using them in a program invokes undefined behaviour. Technically speaking, the following is not a well-formed C program:<pre>#define _XOPEN_SOURCE
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
for (int i = 1; i < argc; ++i) {
const double r = atof(argv[i]);
printf("%f\n", M_PI * r * r);
}
}</pre><p>However, glibc <a href=//gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html>documents the behaviour</a> making the program well-defined.<p>It’s worth noting that uClibc and musl libraries handle the cases in much the same way. Microsoft uses the same technique though different macros. To get access to <code>M_PI</code> in particular, a <code>_USE_MATH_DEFINES</code> symbol needs to be defined. Newlib will define symbols conflicting with C standard unless code is compiled in strict mode (e.g. with <code>-std=c99</code> flag). Lastly, Bionic and Diet Libc define the constant unconditionally which strictly speaking means that they don’t conform to the C standard.<p id=f1><a href=#b1>†</a> Yes, I’m aware this link is to a draft. The actual standard is <a href=//webstore.ansi.org/Standards/INCITS/INCITSISOIEC98991999R2005>60 USD from ANSI webstore</a>. Meanwhile, for most practical uses the draft is entirely sufficient. It’s certainly enough for the discussion in this article.Pro tip: You can put URLs in C & C++ codehttp://mina86.com/2022/04/01/urls-in-cpp2022-04-01T01:17:43Z2022-04-01T01:17:43ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p>Documenting source code is important part of software engineering. Code is read more often than it’s written making it crucial to provide enough context for reader to understand what the implementation is doing. This can come in the form of links to external resources providing description of an algorithm, reference for an API or historic context justifying the code.<p>As it turns out, C and C++ languages offer a little-known feature which allows URLs to be included directly in the function source code. For example:<pre>static float rsqrt(float x) {
<a href=//en.wikipedia.org/wiki/Fast_inverse_square_root>https://en.wikipedia.org/wiki/Fast_inverse_square_root</a>
static_assert(std::numeric_limits<float>::is_iec559);
auto i = std::bit_cast<uint32_t>(x) >> 1;
auto y = std::bit_cast<float>(UINT32_C(0x5F375A86) - i);
y *= 1.5f - x * 0.5F * y * y;
return y;
}</pre>Primes ≤ 100 in Rusthttp://mina86.com/2021/06/20/prime-numbers-less-than-a-hundred-in-rust2021-06-20T00:49:43Z2021-06-20T00:49:43ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p>In a past life <a href=/2010/prime-numbers-less-than-a-hundred/ >I’ve talked about</a> a challenge to write the shortest program which prints all prime numbers less than a hundred. Back then I’ve discussed a 60-character long solution written in C. Since Rust is the future, inspired by <a href=//www.reddit.com/r/rust/comments/o1i3d2/prime_number_generator_in_rust_664579_primes/ >a recent thread on Sieve of Eratosthenes</a> I’ve decided to carry the task for Rust as well.<p>To avoid spoiling the solution, I’m padding this article with a bit of unrelated content. To jump straight to the code, skip the next block of paragraphs. Otherwise, here’s a joke for ya:<blockquote><p>After realising he got lost, a man in a hot air balloon spotted a woman below. He descended and shouted, ‘Excuse me, can you help me? I’ve promised a friend I would meet him an hour ago, but I don’t know where I am.’<p>The woman below looked up and replied matter-of-factly, ‘You are in a hot air balloon hovering around ten metres above the ground. You are between 47 and 48 degrees north latitude and between 8 and 9 degrees east longitude.’<p>‘You must be an engineer,’ the balloonist concluded.<p>‘I am,’ the woman replied intrigued, ‘How did you know?’<p>‘Well, everything you told me is technically correct, but I have no idea how to use your information and I am still lost. Frankly, you’ve not been much help.’<p>The woman pondered for a while and responded, ‘You must be in management.’<p>‘I am,’ the man confirmed, ‘but how did you know?’<p>‘Well, you don’t know where you are or where you are going, you have risen to where you are thanks to hot air, you made a promise which you have no idea how to keep and you expect people beneath you to solve your problems. The fact is you are in exactly the same position you were in before we met, but now, somehow, it’s my fault!’</blockquote><h2>The solution</h2><p>Now back to the matter at hand. Let’s first go with a 67-character long solution I came up with. It is as follows:<pre>fn main(){for n in 2..99{if(2..n).all(|k|n%k!=0){println!("{n}")}}}</pre><p>For comparison, here’s the aforementioned C variant:<pre>main(i,j){for(;j=++i<99;j<i||printf("%d\n",i))for(;i%++j;);}</pre><p>Let’s break it down a little taking this opportunity to talk about Rust.<p>Commonality between the two variants is lack of type declarations. It’s important to note that, while in C this was due to (since deprecated) rule that variables are implicitly integers, Rust performs type inference. In many situations in Rust there’s no need to declare types and the compiler will figure out the correct ones.<p>Rust doesn’t have a C-style <code>for</code> syntax and offers range loop instead. <code>for n in 2..99 { <var>body</var> }</code> will execute body with <code>n</code> variable ranging from 2 to 98 inclusively. Since 99 is not a prime, we don’t need to include it in the range. By the way, <code>2..99</code> is not part of the syntax for the loop; rather, it declares a range object. And yes, ranges are right-open (though there’s also syntax for closed intervals).<p><code>|<var>args</var>| <var>expr</var></code> is Rust’s syntax for lambdas (also known as anonymous functions). I’m not a fan of the pipe characters in there — I’d much rather have Haskell’s syntax instead — but it’s something one can get used to.<p>The <code>n % k != 0</code> expression demonstrates Rust doesn’t <a href=/2021/explicit-isnt-better-than-implicit/ >implicitly</a> convert integers to booleans. In fact, the exclamation mark unary operator performs binary (not logical) negation when applied to integer types. That’s something tilde does in C. Tilde used to declare boxed types and values in Rust but is now unused.<p>Perhaps due to quirk of history, ranges in Rust are iterators (as opposed to merely implementing <a href=//doc.rust-lang.org/std/iter/trait.IntoIterator.html><code>IntoIterator</code> trait</a>) which means that methods such as <code>all</code> are available on them. <code>all</code>, of course, checks whether all elements satisfy the predicate given as an argument. This means that <code>(2..n).all(<var>predicate</var>)</code> will test the <var>predicate</var> for all integers from 2 to n-1 inclusively (again, ranges are right-open unless different operator is used).<p>And finally, <a href=//doc.rust-lang.org/std/macro.println.html><code>println!</code></a> is rather self-explanatory. Since <a href=//rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html><code>format_args_implicits</code> feature</a> is now stable, the <code>"{n}"</code> syntax can be used to save one character. This is something Python programmers should be familiar with though in Rust the <code>f</code> sigil is not necessary. Programmer needs to know from context when <code>"{n}"</code> means string interpolation and when it’s a plain string literal.<h2>66-character solution?</h2><p>There is a way to reach 66 characters. It’s much more boring and I’m not sure if it’s in the spirit of the challenge. The trick is to hard-code the list of primes as a byte buffer. It’s not pretty, but it works:<pre>fn main(){for n in b"\r%)+/5;=CGIOSYa"{println!("{n}")}}</pre><p>Note that your user agent may fail to display control characters present in the above listing. Copying and pasting should work though.<p>The <code>b</code> sigil in front of the string is necessary to declare a byte-array rather than a <code>str</code> object. The latter cannot be iterated over without invoking <code>bytes</code> or <code>chars</code> method which would make the solution too long.How do balanced audio cables workhttp://mina86.com/2021/06/13/balanced-audio2021-06-13T12:00:11Z2021-06-13T12:00:11ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p>Have you ever wondered how balanced audio cables work? For the longest time I have until finally deciding to look into it. Turns out the principle is actually rather straightforward.<p>In a normal, unbalanced wire an analogue signal <var>S</var> is sent over a pair of wires: one carries the signal while the other a reference zero. Receiver interprets voltage between the two as the signal. The issue is that over the length of a cable noise is introduced. While transmitter sends <var>S</var>, receiver gets <var>S + e</var> (where <var>e</var> denotes the noise).<figure><svg width=45em height=15em viewbox="-10 -10 480 160" fill=none><defs><path id=sin stroke=var(--i) stroke-width=2 d="m0.375,7.3575822c0,0 1.45216,-6.99595999 2.82746,-6.98255999 1.3753,0.0134 2.83797,6.96253999 2.83797,6.96253999 0,0 1.5241,6.9844798 2.86424,6.9891998 1.34014,0.005 2.87126,-6.9891998 2.87126,-6.9891998"/><use href=#sin id=nis transform="scale(1, -1)"/><path id=noise stroke-width=0.4 d="M0.0202051,6.3820138 0.2720171,2.2287188 0.5242361,12.947371 0.7760481,8.6079708e-4 1.0282661,5.9265648l0.251814,0.742079 0.251812,3.9994562 0.252219,-6.0981682 0.251813,6.2265022 0.251812,-6.4269002 0.252219,6.1945212 0.251812,-9.3263952 0.252218,1.87281 0.251814,-1.806416 0.251812,1.590229 0.252219,1.2704 0.251812,-1.312909 0.252219,9.7409552 0.251813,-1.969162 0.251812,-7.5426522 0.252219,7.1677662 0.251813,-0.211733 0.252218,-5.9208462 0.251813,5.85769 0.251812,-4.293778 0.252219,8.2964722 0.251813,-13.1857742 0.251812,3.458179 0.252219,-3.638739 0.251813,7.513908 0.252217,-6.481151 0.251814,5.832185 0.251812,4.7893062 0.252219,-6.8127172 0.251813,1.592656 0.252217,-6.27063 0.251814,9.7490522 0.251813,-4.9791782 0.252217,-0.442495 0.251814,-1.555816 0.2518129,7.3677582 0.252217,-4.1140262 0.251814,2.536349 0.252217,-6.510703 0.251813,2.758608"/></defs><path stroke=var(--e) stroke-width=1 d=M0,126V0H126V126ZM255,126V0h179V126Z stroke-dasharray="1 1"/><text font-style=italic x=6 y=6>Transmitter<tspan x=428 text-anchor=end>Receiver</tspan><tspan font-style=normal font-size=20 x=170 y=57 dy="0 2.5 -2.75 0 2.25" dx="0 1 -0.5 2 -0.75" rotate="5 -7 -13 5 -5">noise</tspan></text><g stroke=currentcolor><path d="m138.75,138h7.5M135,135h15m-18.75,-3h22.5M142.5,104.25V132M392.25001,52.5h12.75v13.500001m-21,-5.250001v11.250069c-126.21168,1.175002-209.78467,0.24377-335.999734,0V33c126.216284,2.5e-5 209.783454,6.9e-5 335.999734,6.9e-5V44.25M22.499997,58.5V52.500069H48.000276M32.999997,104.25H405.00001V90.000001M375.75001,52.5l8.25,-8.25 8.25,8.25-8.25,8.25z"/><g fill=var(--a)><path d=m95.249996,72.000069-26.999998,-15v30zm0,-39-26.999998,-15v30zm229.823004,39-27,-15v30zm0,-39-27,-15v30z /><circle cx=98 cy=72 r=6 /><circle cx=328 cy=72 r=6 /></g><path d=M273.27,22.35h5M273.27,62.85h5M275.77,19.85v5M275.77,60.5v5M351,22.35h5M353.5,19.85v5M351,62.85h5M384,49v7M380.5,52.5h7 /></g><path stroke=var(--f) stroke-width=4 stroke-linecap=round d="m22.499991,63.208315c3.5162,0 6.34694,2.83073 6.34694,6.34693v16.57465c0,3.5162-2.83074,6.34694-6.34694,6.34694-3.5162,0-6.34693,-2.83074-6.34693,-6.34694v-16.57465c0,-3.5162 2.83073,-6.34693 6.34693,-6.34693zm406.521429,-5.342357-14.33261,12.41469h-13.56474v15.97585h13.35921l14.53814,12.59375zM22.499991,97.317001V104.25m-5.40959,0 10.81917,-6e-5m6.367676,-18.697485c2e-6,6.492508-5.284824,11.757544-11.777325,11.764546-6.4925,0.007-11.763346,-5.246675-11.77735,-11.739166M437.93866,69.450978a16.294282,16.294282 0 0 1 0,17.88193m5.93278,-23.81471a25.068125,25.068125 0 0 1 0,29.7475m5.43143,-35.17893a32.421441,32.421441 0 0 1 0,40.61036"/><use href=#sin x=27 y=35 /> <use href=#sin x=108 y=15 /> <use href=#nis x=108 y=70 /> <use href=#sin x=259 y=15 /> <use href=#nis x=259 y=70 /> <use href=#sin x=337 y=15 /> <use href=#sin x=337 y=55 /> <use href=#sin x=337 y=55 /> <use href=#sin x=392 y=35 /><g stroke=var(--q) stroke-dasharray="1, 0.5, 0.75, 1.25"><path stroke-width=0.5 d="m140.40622,48.622739 1.16321,-26.54809 1.15941,68.51422 1.16319,-82.7548196 1.16321,37.8774396 1.15941,4.7434 1.16319,25.56474 1.16321,-38.97981 1.16318,39.80013 1.15941,-41.08109 1.16322,39.59571 1.16317,-59.61482 1.15942,11.97108 1.1632,-11.54669 1.1632,10.16483 1.1594,8.12046 1.16322,-8.39219 1.16318,62.26472 1.16321,-12.58698 1.15942,-48.21304 1.16317,45.81674 1.16322,-1.35342 1.15941,-37.84637 1.16318,37.44269 1.16321,-27.44607 1.15942,53.0315 1.16318,-84.28418 1.16321,22.10488 1.16318,-23.25904 1.15942,48.02931 1.16321,-41.42787 1.16317,37.27965 1.15943,30.6135 1.1632,-43.54726 1.16318,10.18036 1.15942,-40.08223 1.16321,62.31649 1.16318,-31.82719 1.15941,-2.82845 1.16323,-9.94485 1.16316,47.09511 1.16322,-26.29709 1.15942,16.21249 1.16318,-41.61677 1.1632,17.63316 1.15942,-29.83716 1.16319,29.27561 1.1632,-23.94736 1.15942,46.63966 1.16317,-16.23836 1.16323,14.64946 1.16316,-20.94037 1.15943,47.78606 1.16321,-37.81274 1.16318,-2.48686 1.15942,-39.99423 1.1632,41.74617 1.16319,23.13739 1.15941,-23.64977 1.16321,-39.07557 1.16318,20.74887 1.15942,50.45406 1.16318,-38.48036 1.16321,-7.76337 1.16321,20.98434 1.15942,-44.34946 1.16318,41.47962 1.16321,19.96476 1.15939,-48.86774 1.1632,48.85223 1.16321,-23.47123 1.15939,-20.80062 1.1632,42.72693 1.16322,-32.04971 1.16318,2.67575 1.15943,-16.23576 1.1632,-7.99625 1.16318,51.81265 1.15942,8.01696 1.16318,-51.36239 1.1632,-0.0881 1.15942,29.14364 1.16321,8.43619 1.16318,-24.43905 1.16321,-11.81584 1.15942,7.26909 1.16319,-28.4216496 1.16316,35.7683796 1.15946,-22.23167 1.16318,-7.63655"/><use href=#noise x=279.87 y=15.39 /> <use href=#noise x=279.87 y=55.89 /> <use href=#noise x=357.55 y=15.39 /> <use href=#noise x=357.55 y=55.89 /></g></svg><figcaption>Illustration of transmission of an analogue signal over a balanced cable. For brevity the diagram missuses symbols from digital signal processing and should not be taken as a technically correct representation.</figcaption></figure><p>A balanced cable addresses this problem by sending the information over three wires: hot (or positive), cold (or negative) and ground. Hot wire carries the signal <var>S</var> as before, cold one carries the inverse of the signal <var>-S</var> and ground is zero as before. Like before, when information travels over the cable, noise is introduced. Crucially, because it’s a single cable, noise on the positive and negative wires are <em>strongly</em> correlated. Receiver therefore gets <var>S + e</var> on hot wire and <var>-S + e</var> on cold wire. All it needs to do is inverse the signal on negative wire and add both signals together. Inversion changes phase of the noises on the cold wire such that it cancels out error remaining on the positive wire: <var>(S + e) + -(-S + e) = S + e + S - e → S</var>.Explicit isn’t better than implicithttp://mina86.com/2021/06/06/explicit-isnt-better-than-implicit2021-06-06T20:12:49Z2021-06-06T20:12:49ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p>Continuing <a href=/2021/embrace-the-bloat/ >the new tradition of clickbaity titles</a>, let’s talk about explicitness. It’s a subject that comes up when bike-shedding language and API designs. Pointing out that a construct or a function exhibits implicit behaviour is often taunted as an ultimate winning argument against it.<p>There are two problems with such line of reasoning. First of all, people claim to care about feature being explicit but came to accept a lot of implicit behaviour without batting an eye. Second of all, no one actually agrees what the terms mean.<p>In this article I’ll demonstrate those two issues and show that ‘explicit over implicit’ is the wrong value to uphold. It’s merely a proxy for a much more useful goal interfaces should strive for. By the end I’ll demonstrate what we should look at instead.<h2>Dispelling the myths</h2><p>Let’s start by examining just how explicit Python and Rust are. After all, their communities often boast the virtue of explicitness in their respective languages. I’m sure it’s going to be completely uncontroversial to suggest that they may be much more implicit than some give them credit for.<h3>The Zen of Python is a lie</h3><p><a href=//www.python.org/dev/peps/pep-0020/ >The Zen of Python</a> is a collection of aphorisms which represent Python’s guiding principles. It’s not clear, at least to me, whether they are listed in order of importance, but the second entry states that ‘explicit is better than implicit’. Despite that, there are multiple instances where this rule is broken. In particular, Python implicitly:<ul><li>creates new variables. One cannot even argue that the assignment statement defines a variable since that’s not the case as can be seen in the following toy example:<pre>foo = 'foo'
def func():
return foo
foo = 'bar'
print(func())</pre><li>propagates exceptions turning any statement into a possible function exit point;<li>converts values to booleans in conditions. For example, <code>if some_list:</code> is a Pythonic way to check if a list is non-empty while <code>if not var:</code> is a Pythonic way of checking whether a variable is <code>None</code>;<li>converts between booleans, integers and floats in arithmetic operations. Depending on the operation, this happens even if both operands are of the same type;<li>concatenates strings separated by white-space;<li>constructs tuples from comma-separated values (without the need to type parenthesise);<li>loads package’s <code>__init__.py</code> file when importing a module; and<li>implicitly returns <code>None</code> from functions lacking explicit return.</ul><p>It could even be argued that garbage collection is an implicit behaviour. After all, objects are never explicitly freed and all memory management is hidden from the user.<h3>Rust philosophy of explicitness is not a thing</h3><p>But let’s not dwell on scripting languages, change gears and go a level lower to a compiled and strictly-typed Rust. While it doesn’t have a formal list of guiding principles, explicitness is often cited as an important value. Yet, what’s often forgotten is that Rust implicitly:<ul><li>infers types,<li>infers lifetime in function prototypes,<li>shortens lifetimes of references,<li>performs <code>Deref</code> coercion,<li>passes <code>self</code> by value or reference based on method prototype,<li>clones <code>Copy</code> types when passed to functions by value,<li>calls <code>drop</code> of <code>Drop</code> objects when they go out of scope,<li>converts error type when question mark operator is used,<li>implicitly returns <code>()</code> from functions lacking tail or return expression and<li>resolves `format!` named arguments not explicitly listed in invocation of the macro.</ul><p>To be even more contrarian, I could claim <code>sort</code> method <em>implicitly</em> uses natural ordering. Or that order of operations in <code>230 - 220 / 2</code> expression isn’t explicitly specified.<p>But the point here isn’t to demonstrate that Python or Rust aren’t ‘explicit’. Rather, it is to show that even languages which seemingly champion explicitness compromise on that principle. This means that saying some new feature exhibits implicit behaviour is not a be-all and end-all argument for blocking such design.<h2>‘You’ve typed it so it’s explicit’</h2><p>Than again, maybe I’m completely off the mark? Perhaps all the aforementioned behaviour aren’t implicit after all? For example, Python documents <a href=//docs.python.org/3/reference/expressions.html#booleans>quite clearly</a> what values are interpreted as false and which as true. This means that there is nothing implicit in <code>if some_list:</code> not executing the body if the list is empty, right?<p>Some of the examples I’ve enumerated are definitely less clear-cut than others, but I challenge anyone who thinks that none of them are valid to come up with justification and then present an example of a feature of any non-esoteric language which is implicit. One will quickly realise that either at least some of the aforementioned behaviours are implicit or there’s no such thing as an implicit behaviour and thus the whole discussion is moot.<p>Ultimately this leaves us with no commonly agreed definition of what it means for a feature or interface to be explicit. There’s not even a consensus on some vague understanding of the phrase. On one extreme, a not unreasonable argument that nothing is implicit could be made (after all a program behaves exactly according to language’s documentation), on the other, some C♯ programmers argue that <code>""</code> isn’t an explicit-enough way of specifying an empty string. Unfortunately, I don’t have a definition which would satisfy everyone and thus solve this particular problem. Instead, I’m side-stepping the entire discussion.<h2>Explicit doesn’t matter</h2><p>Because, you see, when people say ‘design X is bad because it’s not explicit’ they actually don’t care about the feature being implicit. Instead, they (potentially subconsciously) use the level of explicitness as a proxy to decide how easy it is to reason about the program.<p>To continue with Python’s boolean coercion example, it is well understood that Python considers zero values and empty containers to be false. Therefore there’s little issue with <code>if some_list:</code> checking whether the list is empty. However, time of day is neither a collection nor a number so <code>if some_time:</code> <a href=//lwn.net/Articles/590299/ >checking whether object represents midnight</a> <a href=//github.com/python/cpython/commit/ee6bdc07d65d5df418ad9dc2bb7139666af9a6b2>was</a> an issue. It wasn’t because the check was implicit (this was also true when testing for empty container) but rather because it was an unexpected behaviour.<p id=b1>When Rust infers types, the compiler picks the only one sensible and obvious choice. If there’s any ambiguity, programmer has to explicitly specify the type. Contrast it with function overloading in C++. The rules are defined, but they are so convoluted only a handful of people understand them. Again, the issue isn’t whether the feature is implicit or not; the problem is how easy it is to reason about the code and how likely it is that the compiler does something unexpected.<a href=#f1>†</a><h3>Principle of least astonishment</h3><p>What actually matters is following the principle of least astonishment. Rather than wondering whether a particular design is explicit enough, the correct question is to ask how likely it is for a feature to lead to a surprising behaviour.<p>Bugs often emerge when the compiler does something programmer doesn’t expect. Python treating <code>True</code> as one in arithmetic operations is the only sensible non-throwing interpretation which means that the principle is preserved. On the other hand C promoting integers can easily lead to astonishing results (such as unsigned object being less than negative value of a signed type) which violates the principle.<h3>Summary</h3><p>In conclusion, advocating explicitness for explicitness’s sake is not sound. Being explicit is a tool which, in some cases, helps minimise surprises in the code and makes it easier to reason about a program. If implicit behaviour does not make the source more confusing that it normally would be, there’s no reason to fight it.<p>But even then, all those things need to be weighted in context of other useful properties of a design. Ergonomics of a programming language matter and it may sometimes be worth sacrificing the principle of least astonishment if that means the code may be more beautiful (which coincidentally is another aphorism from The Zen of Python).<p id=f1><a href=#b1>†</a> As an aside, confusing a general idea of a feature with implementation of the feature in a specific language is another common fallacy. Function resolution in C++ may be convoluted but that doesn’t mean that function overloading in general needs to be confusing. For example, picking methods in Java is quite straightforward and allowing overloading on <a href=//en.wikipedia.org/wiki/Arity>arity</a> in Rust would leave no ambiguity in the code. Arguing against default parameters on the account of how mystifying C++ can get is therefore invalid. Similar comparison could be made with Python’s treatment of false values. Yes, it may sometimes lead to astonishing results (e.g. intending to test for <code>None</code> but forgetting that non-<code>None</code> values are tested as well), but if adapted to Rust, those surprising behaviours would not be an issue (thanks to strict typing).Programmer (vs) Dvorakhttp://mina86.com/2021/05/30/programmer-dvorak2021-05-30T11:36:23Z2021-05-30T11:36:23ZMichał ‘mina86’ Nazarewiczhttps://mina86.com<p><small>Update: The article was updated in October 2021 to include direct comparison shift usage between Dvorak and Programmer Dvorak layouts.</small><p>A few years age I’ve made a decision that had the potential to change the course of history. Had I went a different path, the <a href="//bugs.freedesktop.org/show_bug.cgi?id=25200">pl(dvp)</a> layout might have never seen the light of day. But did I make a wise choice? Or had I chosen poorly?<p>I’m talking of course about the decision to learn <a href=//www.kaufmann.no/roland/dvorak/ >Programmer Dvorak</a> rather than a regular Dvorak keyboard layout. The main differences between the two is that in the former digits are entered with Shift key pressed down which allows several punctuation marks often used when programming to be typed without the need to reach for Shift. The hypothesis goes that developers use digits less often thus such design optimises the layout for them.<p>To test this I’ve grabbed <a href=//github.com/mina86>all my git repositories</a> and constructed a histogram of characters used in text files present there. Since letters are on the same position on both layouts in question, only digits and punctuation characters are compared on the histogram:<figure><svg xmlns=http://www.w3.org/2000/svg version=1.1 width=44em height=17em viewbox="0 0 704 272" stroke-width=1><defs> <pattern id=un patternunits=userSpaceOnUse width=4 height=4><rect width=4 height=4 fill=#1b9e77 /><path d=M-1,1l2,-2M0,4l4,-4M3,5l2,-2 stroke=#000 /></pattern> <pattern id=sh patternunits=userSpaceOnUse width=4 height=4><rect width=4 height=4 fill=#d95f02 /><path d=M-1,-1l2,2M0,0l4,4M3,3l2,2 stroke=#000 /></pattern> </defs><g id=gr><path stroke=var(--e) d=M8,32h688m0,40H8m0,40h688m0,40H8m0,40h688 /><path fill=#fffc d=M438,32h246v80H438Z /><text style=fill:#212121><tspan x=470 y=55>Not number row</tspan><tspan x=470 y=79>Unshifted (number row)</tspan><tspan x=470 y=103>Shifted (number row)</tspan></text></g><path fill=url(#un) d=M64,192v-125h14v125zm16,0v-124h14v124zm48,0v-111h14v111zm16,0v-110h14v110zm160,0v-50h14v50zm16,0v-41h14v41zm16,0v-41h14v41zm128,0v-26h14v26zm16,0v-24h14v24zm32,0v-24h14v24zm32,0v-14h14v14zm16,0v-10h14v10zM446,64v16h16V64zM16,224h360v14H16z /><path fill=url(#sh) d=M192,192v-89h14v89zm64,0v-66h14v66zm16,0v-55h14v55zm80,0v-37h14v37zm16,0v-35h14v35zm16,0v-33h14v33zm32,0v-32h14v32zm16,0v-31h14v31zm16,0v-27h14v27zm48,0v-24h14v24zm80,0v-10h14v10zm48,0v-6h14v6zm32,0v-3h14v3zM446,88v16h16V88zM16,240h230v14H16z /><path fill=#7570b3 d=M16,192v-160h14v160zm16,0v-151h14v151zm16,0v-148h14v148zm48,0v-114h14v114zm16,0v-112h14v112zm48,0v-99h14v99zm16,0v-95h14v95zm32,0v-84h14v84zm16,0v-84h14v84zm16,0v-73h14v73zm48,0v-53h14v53zm112,0v-32h14v32zm128,0v-18h14v18zm64,0v-9h14v9zm16,0v-8h14v8zm32,0v-4h14v4zm32,0v-2h14v2zM446,40v16h16V40zM16,208h640v14H16z /><text font-size=0.875em text-anchor=middle><tspan x=23 y=24>-</tspan><tspan x=39 y=33>"</tspan><tspan x=55 y=36>.</tspan><tspan x=71 y=59>)</tspan><tspan x=87 y=60>(</tspan><tspan x=103 y=70>,</tspan><tspan x=119 y=72>/</tspan><tspan x=135 y=73>*</tspan><tspan x=151 y=74>=</tspan><tspan x=167 y=85>_</tspan><tspan x=183 y=89>;</tspan><tspan x=199 y=95>0</tspan><tspan x=215 y=100>></tspan><tspan x=231 y=100>:</tspan><tspan x=247 y=111><</tspan><tspan x=263 y=118>1</tspan><tspan x=279 y=129>2</tspan><tspan x=295 y=131>'</tspan><tspan x=311 y=134>#</tspan><tspan x=327 y=143>{</tspan><tspan x=343 y=143>}</tspan><tspan x=359 y=147>4</tspan><tspan x=375 y=149>3</tspan><tspan x=391 y=151>8</tspan><tspan x=407 y=152>\</tspan><tspan x=423 y=152>5</tspan><tspan x=439 y=153>6</tspan><tspan x=455 y=157>9</tspan><tspan x=471 y=158>$</tspan><tspan x=487 y=160>[</tspan><tspan x=503 y=160>7</tspan><tspan x=519 y=160>]</tspan><tspan x=535 y=166>&</tspan><tspan x=551 y=170>+</tspan><tspan x=567 y=174>!</tspan><tspan x=583 y=174>%</tspan><tspan x=599 y=175>|</tspan><tspan x=615 y=176>@</tspan><tspan x=631 y=178>`</tspan><tspan x=647 y=180>?</tspan><tspan x=663 y=181>~</tspan><tspan x=679 y=182>^</tspan><tspan x=672 y=219>52%</tspan><tspan x=392 y=235>29%</tspan><tspan x=262 y=251>19%</tspan></text></svg><figcaption>Fig. 1. Histogram of characters used in text files authored by me present in my Git repositories.</figcaption></figure><h2>Analysis</h2><p>The graph supports the idea that punctuation is used more often than digits when programming. Keys outside of the number row are for the most part on the same position on regular Dvorak and Programmer Dvorak layouts so shifted and unshifted characters are of main interest. Of those, the first four are non-digits and within the first ten only three are digits.<p>A quite striking feature of Programmer Dvorak is that it puts digits in a ‘7531902468’ sequence (rather than ordering them). Again, the histogram supports that design decision as well. The two most used digits are zero and one which are accessible with index finger. Traditional sorted ordering puts them in little finger’s column.<p>A minor difference between the two Dvorak variants is position of the apostrophe and semicolon keys. The data shows that I use double quote quite often. Since I’ve never learned to properly use Shift keys, location used by Programmer Dvorak ends up beneficial for me. After all, pressing Shift with the key right next to it is easier than Shift and the key two rows up.<p id=b1>Lastly, Programmer Dvorak moves plus and equals signs to the number row. On regular Dvorak layout those keys are accessible by stretching right little finger one row up and two columns to the side. It’s hardly a comfortable key to reach. Programmer Dvorak puts caret and at sign there instead. And yes, you’ve guessed it, the data supports that decision: equals and plus are used much more than at sign and caret.<a href=#f1>†</a><h3>Direct comparison to Dvorak</h3><p>While Dvorak and Programmer Dvorak share most of the layout outside of the number row, there are some differences. Most notably, the equals sign is present in both layouts on unshifted positions. On top row in Dvorak and number row in Programmer Dvorak. Looking at frequencies of number row keys only isn’t enough to evaluate the two layouts.<figure><svg xmlns=http://www.w3.org/2000/svg version=1.1 width=44em height=6em viewbox="0 0 704 96" stroke-width=1><path fill=url(#un) d=M0,29h384v16H0ZM0,75h423v16H0Z /><path fill=url(#sh) d=M384,29h320v16H384ZM423,75h281v16H423Z /><text x=8 y=25>Dvorak <tspan font-size=0.875em> <tspan x=380 text-anchor=end>55% unshifted</tspan> <tspan x=388>45% shifted</tspan> </tspan></text><text x=8 y=71>Programmer Dvorak <tspan font-size=0.875em> <tspan x=419 text-anchor=end>60% unshifted</tspan> <tspan x=427>40% shifted</tspan> </tspan></text></svg><figcaption>Fig. 2. Frequencies of shifted and unshifted non-letter characters on Dvorak and Programmer Dvorak layouts.</figcaption></figure><p>Collating data for all non-letter characters shows that Programmer Dvorak has a slight advantage with 60% of the typed characters being on unshifted positions on that layout compared to 55% on Dvorak. Big win for the former are parenthesise and the asterisk while the latter catches up a little with the first three natural numbers.<h3>Language composition</h3><p>Different languages use their own syntax and thus the most commonly used characters end up different. Someone writing exclusively in Lisp will wear out their parenthesise much quicker than someone writing in Haskell. Turns out, I write mostly in languages whose syntax is based on C.<table><thead><tr><th>File type<th>Percentage of lines<tbody><tr style=background:var(--j)><td>C or C++<td class=r>41.53%<tr style="background:linear-gradient(to right,var(--j) 22%,#fff0 22%)"><td>HTML or XML-based<td class=r>8.98%<tr style="background:linear-gradient(to right,var(--j) 19%,#fff0 19%)"><td>Java<td class=r>8.02%<tr style="background:linear-gradient(to right,var(--j) 17%,#fff0 17%)"><td>Misc configuration<td class=r>7.20%<tr style="background:linear-gradient(to right,var(--j) 16%,#fff0 16%)"><td>Python<td class=r>6.64%<tr style="background:linear-gradient(to right,var(--j) 14%,#fff0 14%)"><td>Shell script<td class=r>5.65%<tr style="background:linear-gradient(to right,var(--j) 9%,#fff0 9%)"><td>Rust<td class=r>3.89%<tr style="background:linear-gradient(to right,var(--j) 9%,#fff0 9%)"><td>Perl<td class=r>3.89%<tr style="background:linear-gradient(to right,var(--j) 8%,#fff0 8%)"><td>Misc text<td class=r>3.33%<tr style="background:linear-gradient(to right,var(--j) 7%,#fff0 7%)"><td>JavaScript<td class=r>2.94%<tr style="background:linear-gradient(to right,var(--j) 6%,#fff0 6%)"><td>Lisp<td class=r>2.55%<tr style="background:linear-gradient(to right,var(--j) 6%,#fff0 6%)"><td>LaTeX<td class=r>2.33%<tr style="background:linear-gradient(to right,var(--j) 3%,#fff0 3%)"><td>Makefile<td class=r>1.34%<tr style="background:linear-gradient(to right,var(--j) 4%,#fff0 4%)"><td>(Rest)<td class=r>1.71%</table><p>But I also dabble in Lisp. This brings up a question: is that why I write so many parenthesise? Analysing all files except for <code>*.el</code> reveal that my using Emacs doesn’t skew the result too much. There is a little bit of shuffling but parenthesise still end up in top five (even though their relative frequency suffers) and first digit (zero) on 12th position.<figure><svg xmlns=http://www.w3.org/2000/svg version=1.1 width=44em height=12em viewbox="0 0 704 192" stroke-width=1><use href=#gr /><path fill=url(#un) d=M64,192v-127h14v127zm16,0v-125h14v125zm48,0v-119h14v119zm16,0v-119h14v119zm160,0v-54h14v54zm16,0v-45h14v45zm16,0v-44h14v44zm128,0v-28h14v28zm32,0v-26h14v26zm16,0v-25h14v25zm32,0v-15h14v15zm16,0v-11h14v11zM446,64v16h16V64z /><path fill=url(#sh) d=M192,192v-96h14v96zm64,0v-72h14v72zm16,0v-61h14v61zm80,0v-41h14v41zm16,0v-39h14v39zm16,0v-36h14v36zm16,0v-35h14v35zm16,0v-34h14v34zm32,0v-29h14v29zm32,0v-27h14v27zm96,0v-11h14v11zm48,0v-7h14v7zm32,0v-3h14v3zM446,88v16h16V88z /><path fill=#7570b3 d=M16,192v-160h14v160zm16,0v-159h14v159zm16,0v-159h14v159zm48,0v-125h14v125zm16,0v-123h14v123zm48,0v-108h14v108zm16,0v-100h14v100zm32,0v-91h14v91zm16,0v-90h14v90zm16,0v-79h14v79zm48,0v-57h14v57zm144,0v-33h14v33zm96,0v-19h14v19zm64,0v-10h14v10zm16,0v-8h14v8zm32,0v-4h14v4zm32,0v-2h14v2zM446,40v16h16V40z /><text font-size=0.875em text-anchor=middle><tspan x=23 y=24>.</tspan><tspan x=39 y=25>"</tspan><tspan x=55 y=25>-</tspan><tspan x=71 y=57>)</tspan><tspan x=87 y=59>(</tspan><tspan x=103 y=59>/</tspan><tspan x=119 y=61>,</tspan><tspan x=135 y=65>*</tspan><tspan x=151 y=65>=</tspan><tspan x=167 y=76>_</tspan><tspan x=183 y=84>;</tspan><tspan x=199 y=88>0</tspan><tspan x=215 y=93>></tspan><tspan x=231 y=94>:</tspan><tspan x=247 y=105><</tspan><tspan x=263 y=112>1</tspan><tspan x=279 y=123>2</tspan><tspan x=295 y=127>'</tspan><tspan x=311 y=130>#</tspan><tspan x=327 y=139>{</tspan><tspan x=343 y=140>}</tspan><tspan x=359 y=143>4</tspan><tspan x=375 y=145>3</tspan><tspan x=391 y=148>8</tspan><tspan x=407 y=149>5</tspan><tspan x=423 y=150>6</tspan><tspan x=439 y=151>\</tspan><tspan x=455 y=155>9</tspan><tspan x=471 y=156>$</tspan><tspan x=487 y=157>7</tspan><tspan x=503 y=158>[</tspan><tspan x=519 y=159>]</tspan><tspan x=535 y=165>&</tspan><tspan x=551 y=169>+</tspan><tspan x=567 y=173>!</tspan><tspan x=583 y=173>%</tspan><tspan x=599 y=174>|</tspan><tspan x=615 y=176>@</tspan><tspan x=631 y=177>`</tspan><tspan x=647 y=180>?</tspan><tspan x=663 y=181>~</tspan><tspan x=679 y=182>^</tspan></text></svg><figcaption>Fig. 3. Histogram of characters used in text files authored by me present in my Git repositories excluding Emacs Lisp files.</figcaption></figure><h2>Conclusion</h2><p>Looks like my choice was correct. It has been scientifically proven that Programmer Dvorak is better than regular Dvorak. Who knows what catastrophes were averted thanks to me!<p>Except of course this was hardly scientific and the results should be taken with a grain of salt. After all, looking at the characters in source files doesn’t necessary reflect what keys are pressed to construct those files. For example, while I don’t configure my editor to automatically insert parenthesise, braces or apostrophes, if I had such feature enabled, methodology chosen here would overestimate use of punctuation characters. Looking at the files also ignores effects of copying and pasting text. It’s hard to tell whether that would affect the data though or whether the effect balances itself out.<p>Overall though, I’m fairly satisfied with the results demonstrating that for me Programmer Dvorak was a better choice than regular Dvorak layout.<h2>Addendum: Qwerty</h2><p>Since inevitably someone will be curious about comparison to Qwerty layout, the statistics for the letters are as follows:<table><thead><tr><th>Letter<th>Frequency<th>Dvorak<th>Qwerty<tbody><tr style=background:var(--j)><th>e<td class=r>11.5%<td>home<td>top<tr style="background:linear-gradient(to right,var(--j) 79%,#fff0 79%)"><th>t<td class=r>9.1%<td>home<td>top*<tr style="background:linear-gradient(to right,var(--j) 65%,#fff0 65%)"><th>i<td class=r>7.5%<td>home*<td>top<tr style="background:linear-gradient(to right,var(--j) 62%,#fff0 62%)"><th>a<td class=r>7.2%<td>home<td>home<tr style="background:linear-gradient(to right,var(--j) 61%,#fff0 61%)"><th>n<td class=r>7.0%<td>home<td>bottom*<tr style="background:linear-gradient(to right,var(--j) 59%,#fff0 59%)"><th>s<td class=r>6.8%<td>home<td>home<tr style="background:linear-gradient(to right,var(--j) 59%,#fff0 59%)"><th>r<td class=r>6.7%<td>top<td>top<tr style="background:linear-gradient(to right,var(--j) 56%,#fff0 56%)"><th>o<td class=r>6.5%<td>home<td>top<tr style="background:linear-gradient(to right,var(--j) 40%,#fff0 40%)"><th>l<td class=r>4.6%<td>top<td>home<tr style="background:linear-gradient(to right,var(--j) 37%,#fff0 37%)"><th>c<td class=r>4.3%<td>top<td>bottom<tbody><tr style="background:linear-gradient(to right,var(--j) 32%,#fff0 32%)"><th>d<td class=r>3.7%<td>home*<td>home<tr style="background:linear-gradient(to right,var(--j) 27%,#fff0 27%)"><th>p<td class=r>3.1%<td>top<td>top<tr style="background:linear-gradient(to right,var(--j) 27%,#fff0 27%)"><th>u<td class=r>3.1%<td>home<td>top<tr style="background:linear-gradient(to right,var(--j) 26%,#fff0 26%)"><th>m<td class=r>2.9%<td>bottom<td>bottom<tr style="background:linear-gradient(to right,var(--j) 25%,#fff0 25%)"><th>h<td class=r>2.9%<td>home<td>home*<tr style="background:linear-gradient(to right,var(--j) 24%,#fff0 24%)"><th>f<td class=r>2.8%<td>top*<td>home<tr style="background:linear-gradient(to right,var(--j) 18%,#fff0 18%)"><th>g<td class=r>2.1%<td>top<td>home*<tr style="background:linear-gradient(to right,var(--j) 14%,#fff0 14%)"><th>b<td class=r>1.6%<td>bottom*<td>bottom*<tr style="background:linear-gradient(to right,var(--j) 14%,#fff0 14%)"><th>y<td class=r>1.6%<td>top*<td>top*<tr style="background:linear-gradient(to right,var(--j) 12%,#fff0 12%)"><th>w<td class=r>1.3%<td>bottom<td>top<tbody><tr style="background:linear-gradient(to right,var(--j) 9%,#fff0 9%)"><th>v<td class=r>1.0%<td>bottom<td>bottom<tr style="background:linear-gradient(to right,var(--j) 8%,#fff0 8%)"><th>k<td class=r>1.0%<td>bottom<td>home<tr style="background:linear-gradient(to right,var(--j) 8%,#fff0 8%)"><th>x<td class=r>0.9%<td>bottom*<td>bottom<tr style="background:linear-gradient(to right,var(--j) 4%,#fff0 4%)"><th>z<td class=r>0.4%<td>bottom<td>bottom<tr style="background:linear-gradient(to right,var(--j) 2%,#fff0 2%)"><th>j<td class=r>0.3%<td>bottom<td>home<tr style="background:linear-gradient(to right,var(--j) 2%,#fff0 2%)"><th>q<td class=r>0.2%<td>bottom<td>top</table><p>Asterisk indicates keys towards the middle of the keyboard which index finger needs to stretch sideways to reach.<p>This also gives some credibility to the idea of swapping ‘i’ and ‘u’ keys on Dvorak-based layouts. At least in the source code I write, the former is used over twice as frequent as the latter. On the other hand, as <a href=//www.reddit.com/r/dvorak/comments/uonpi6/comment/i8gsuh0/ >The Temp has suggested</a>, a reason to prefer keeping the two keys as they are might be bigrams: ‘“ou” is very common, and it would be a strain to enter that sequence if “u” and “i” were swapped.’<p id=f1><a href=#b1>†</a> Then again, I usually type on Kinesis Advantage keyboard with <a href=//github.com/mina86/dot-files/tree/master/kinesis>a small dose of remapping</a>. This puts at sign under my middle finger at the bottom of the keyboard. As a result, it’s slightly easier to type than equals sign which requires my index finger to be stretched far up and to the right.