• mina86.com

  • Categories
  • Code
  • Contact
  • Programmer (vs) Dvorak

    Posted by Michał ‘mina86’ Nazarewicz on 30th of May 2021 | (cite)

    Updated in October 2021 to include direct comparison shift usage between Dvorak and Programmer Dvorak layouts.

    A few years age I’ve made a decision that had the potential to change the course of history. Had I went a different path, the pl(dvp) layout might have never seen the light of day. But did I make a wise choice? Or had I chosen poorly?

    I’m talking of course about the decision to learn Programmer Dvorak rather than a regular Dvorak keyboard layout. The main differences between the two is that in the former digits are entered with Shift key pressed down which allows several punctuation marks often used when programming to be typed without the need to reach for Shift. The hypothesis goes that developers use digits less often thus such design optimises the layout for them.

    To test this I’ve grabbed all my git repositories and constructed a histogram of characters used in text files present there. Since letters are on the same position on both layouts in question, only digits and punctuation characters are compared on the histogram:

    Not number rowUnshifted (number row)Shifted (number row)-".)(,/*=_;0>:<12'#{}438\569$[7]&+!%|@`?~^52%29%19%
    Fig. 1. Histogram of characters used in text files authored by me present in my Git repositories.

    Analysis

    The graph supports the idea that punctuation is used more often than digits when programming. Keys outside of the number row are for the most part on the same position on regular Dvorak and Programmer Dvorak layouts so shifted and unshifted characters are of main interest. Of those, the first four are non-digits and within the first ten only three are digits.

    A quite striking feature of Programmer Dvorak is that it puts digits in a ‘7531902468’ sequence (rather than ordering them). Again, the histogram supports that design decision as well. The two most used digits are zero and one which are accessible with index finger. Traditional sorted ordering puts them in little finger’s column.

    A minor difference between the two Dvorak variants is position of the apostrophe and semicolon keys. The data shows that I use double quote quite often. Since I’ve never learned to properly use Shift keys, location used by Programmer Dvorak ends up beneficial for me. After all, pressing Shift with the key right next to it is easier than Shift and the key two rows up.

    Lastly, Programmer Dvorak moves plus and equals signs to the number row. On regular Dvorak layout those keys are accessible by stretching right little finger one row up and two columns to the side. It’s hardly a comfortable key to reach. Programmer Dvorak puts caret and at sign there instead. And yes, you’ve guessed it, the data supports that decision: equals and plus are used much more than at sign and caret.1

    Direct comparison to Dvorak

    While Dvorak and Programmer Dvorak share most of the layout outside of the number row, there are some differences. Most notably, the equals sign is present in both layouts on unshifted positions. On top row in Dvorak and number row in Programmer Dvorak. Looking at frequencies of number row keys only isn’t enough to evaluate the two layouts.

    Dvorak 55% unshifted 45% shifted Programmer Dvorak 60% unshifted 40% shifted
    Fig. 2. Frequencies of shifted and unshifted non-letter characters on Dvorak and Programmer Dvorak layouts.

    Collating data for all non-letter characters shows that Programmer Dvorak has a slight advantage with 60% of the typed characters being on unshifted positions on that layout compared to 55% on Dvorak. Big win for the former are parenthesise and the asterisk while the latter catches up a little with the first three natural numbers.

    Language composition

    Different languages use their own syntax and thus the most commonly used characters end up different. Someone writing exclusively in Lisp will wear out their parenthesise much quicker than someone writing in Haskell. Turns out, I write mostly in languages whose syntax is based on C.

    File typePercentage of lines
    C or C++41.53%
    HTML or XML-based8.98%
    Java8.02%
    Misc configuration7.20%
    Python6.64%
    Shell script5.65%
    Rust3.89%
    Perl3.89%
    Misc text3.33%
    JavaScript2.94%
    Lisp2.55%
    LaTeX2.33%
    Makefile1.34%
    (Rest)1.71%

    But I also dabble in Lisp. This brings up a question: is that why I write so many parenthesise? Analysing all files except for *.el reveal that my using Emacs doesn’t skew the result too much. There is a little bit of shuffling but parenthesise still end up in top five (even though their relative frequency suffers) and first digit (zero) on 12th position.

    ."-)(/,*=_;0>:<12'#{}43856\9$7[]&+!%|@`?~^
    Fig. 3. Histogram of characters used in text files authored by me present in my Git repositories excluding Emacs Lisp files.

    Conclusion

    Looks like my choice was correct. It has been scientifically proven that Programmer Dvorak is better than regular Dvorak. Who knows what catastrophes were averted thanks to me!

    Except of course this was hardly scientific and the results should be taken with a grain of salt. After all, looking at the characters in source files doesn’t necessary reflect what keys are pressed to construct those files. For example, while I don’t configure my editor to automatically insert parenthesise, braces or apostrophes, if I had such feature enabled, methodology chosen here would overestimate use of punctuation characters. Looking at the files also ignores effects of copying and pasting text. It’s hard to tell whether that would affect the data though or whether the effect balances itself out.

    Overall though, I’m fairly satisfied with the results demonstrating that for me Programmer Dvorak was a better choice than regular Dvorak layout.

    Addendum: Qwerty

    Since inevitably someone will be curious about comparison to Qwerty layout, the statistics for the letters are as follows:

    LetterFrequencyDvorakQwerty
    e11.5%hometop
    t9.1%hometop*
    i7.5%home*top
    a7.2%homehome
    n7.0%homebottom*
    s6.8%homehome
    r6.7%toptop
    o6.5%hometop
    l4.6%tophome
    c4.3%topbottom
    d3.7%home*home
    p3.1%toptop
    u3.1%hometop
    m2.9%bottombottom
    h2.9%homehome*
    f2.8%top*home
    g2.1%tophome*
    b1.6%bottom*bottom*
    y1.6%top*top*
    w1.3%bottomtop
    v1.0%bottombottom
    k1.0%bottomhome
    x0.9%bottom*bottom
    z0.4%bottombottom
    j0.3%bottomhome
    q0.2%bottomtop

    Asterisk indicates keys towards the middle of the keyboard which index finger needs to stretch sideways to reach.

    This also gives some credibility to the idea of swapping ‘i’ and ‘u’ keys on Dvorak-based layouts. At least in the source code I write, the former is used over twice as frequent as the latter. On the other hand, as The Temp has suggested, a reason to prefer keeping the two keys as they are might be bigrams: ‘“ou” is very common, and it would be a strain to enter that sequence if “u” and “i” were swapped.’

    1 Then again, I usually type on Kinesis Advantage keyboard with a small dose of remapping. This puts at sign under my middle finger at the bottom of the keyboard. As a result, it’s slightly easier to type than equals sign which requires my index finger to be stretched far up and to the right.