Java: `String`↔`char[]`

Posted by Michał ‘mina86’ Nazarewicz on 9th of February 2019 | (cite)

Do you recall when I decided to abuse Go’s runtime and play with string↔[]byte conversion? Fun times… I wonder if we could do the same to Java?

To remind ourselves of the ‘problem’, strings in Java are immutable but because Java has no concept of ownership or const keyword to make true on that promise, Java runtime has to make a defensive copy each time a new string is created or when string’s characters are returned.

Alas, do not despair! There is another way (exception handling elided for brevity):

private static Field getValueField() {
	final Field field = String.class.getDeclaredField("value");
	field.setAccessible(true);
	/* Test that it works. */
	final char[] chars = new char[]{'F', 'o', 'o'};
	final String string = new String();
	field.set(string, chars);
	if (string.equals("Foo") && field.get(string) == chars) {
		return field;
	}
	throw new UnsupportedOperationException(
		"UnsafeString not supported by the runtime");
}

private final static Field valueField = getValueField();

public static String fromChars(final char[] chars) {
	final String string = new String();
	valueField.set(string, chars);
	return string;
}

public static char[] toChars(final String string) {
	return (char[]) valueField.get(string);
}

However. There is a twist…

Calculating RGB↔XYZ matrix

Posted by Michał ‘mina86’ Nazarewicz on 3rd of February 2019 | (cite)

I’ve recently found myself in need of an RGB↔XYZ transformation matrix expressed to the maximum possible precision. Sources on the Internet typically limit the precision to a handful decimal places so I’ve performed do the calculations myself.

What we’re looking for is a 3-by-3 matrix $M$ which, when multiplied by red, green and blue coordinates of a colour, produces its XYZ coordinates. In other words, a change of basis matrix from a space whose basis vectors are RGB’s primary colours: $M = [\begin{matrix} X_{r} & X_{g} & X_{b} \\ Y_{r} & Y_{g} & Y_{b} \\ Z_{r} & Z_{g} & Z_{b} \end{matrix}]$

PSA: Yes, 64-byte key file is okay

Posted by Michał ‘mina86’ Nazarewicz on 4th of April 2017 | (cite)

In an earlier entry I’ve changed generated key file used for disk encryption from 4096 to meagre 64 bytes. I gave no mention of that adjustment considering it unimportant but have since been enquired about security of such a short password.

Rest assured, a 64-byte key file is sufficient for any symmetric encryption (disk encryption being one example) and anything more does not improve security.

Go: `string`↔`[]byte`

Posted by Michał ‘mina86’ Nazarewicz on 28th of February 2017 | (cite)

Yes… I’ve started coding in Go recently. It lacks many things but the one feature relevant to this post is const keyword. Arrays and slices in particular are always mutable and so equivalent of C’s const char * does not exist.

On the other hand, strings are immutable which means that conversion between a string and []byte requires memory allocation and copying of the data^†. Often this might be acceptable but to squeeze every last cycle the following two functions might help achieve zero-copy implementation:

func String(bytes []byte) string {
	hdr := *(*reflect.SliceHeader)(unsafe.Pointer(&bytes))
	return *(*string)(unsafe.Pointer(&reflect.StringHeader{
		Data: hdr.Data,
		Len:  hdr.Len,
	}))
}

func Bytes(str string) []byte {
	hdr := *(*reflect.StringHeader)(unsafe.Pointer(&str))
	return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
		Data: hdr.Data,
		Len:  hdr.Len,
		Cap:  hdr.Len,
	}))
}

Depending on the length of the strings, the difference in performance might be noticeable:

PSA: Creating world-unreadable files

Posted by Michał ‘mina86’ Nazarewicz on 5th of February 2017 | (cite)

I’ve been reading tutorials on using key-files for disk encryption. Common approach for generating such files is to create it using something similar to head -c 4096 /dev/urandom >key-file and only then change it’s permissions (usually with a plain chmod 400 key-file) to prevent others from reading it.

Please, stop doing this and spreading that method. The correct way of achieving the effect is:

(umask 077; head -c 64 /dev/random >key-file)

Or if the file needs to be created as root while command is run by a different user:

sudo sh -c 'umask 077; head -c 64 /dev/random >key-file'

The first method creates the file as world-readable^† and before its permission are changed anyone can read it. The second method creates the file as readable only by its owner from the beginning thus preventing the secret disclosure.

Generating random reals

Posted by Michał ‘mina86’ Nazarewicz on 26th of December 2016 | (cite)

A well known way of generating random floating-point numbers in the presence of a pseudo-random number generator (PRNG) is to divide output of the latter by one plus its maximum possible return value.

extern uint64_t random_uint64(void);

double random_double(void) {
	return random_uint64() / (UINT64_MAX + 1.0);
}

This method is simple, effective, inefficient and wrong on a few levels.

Strach

Posted by Michał ‘mina86’ Nazarewicz on 28th of September 2016 | (cite)

English version available on The Codeless Code.

Niedawno przyjęty do świątyni mnich zbliżył się do mistrza.

― Otrzymałem zadanie dodania kilku nowych funkcji do systemu obsługi zamówień Cesarskiego Szewca, ale nie jestem w stanie zrozumieć, jak on działa. Logika jest rozproszona pomiędzy wiele aplikacji zaimplementowanych przy użyciu najróżniejszych technologii. Zamiast stworzyć wspólne biblioteki, autorzy najzwyklej skopiowali fragmenty kodu pomiędzy różnymi miejscami, często wprowadzając subtelne rozbieżności. Zadania pracujące w tle wyszukują i modyfikują rekordy w bazie danych bez żadnego udokumentowanego powodu. Sama baza danych wydaje się spiskować przeciwko mnie: prosta modyfikacja jednej tabeli może wyzwolić kaskadę zmian w wielu innych.

Python tips and tricks

Posted by Michał ‘mina86’ Nazarewicz on 1st of September 2016 | (cite)

Python! My old nemesis, we meet again. Actually, we meet all the time, but despite that there are always things which I cannot quite remember how to do and need to look them up. To help with the searching, here there are collected in one post:

On Unicode

Posted by Michał ‘mina86’ Nazarewicz on 25th of October 2015 | (cite)

There are a lot of misconceptions about Unicode. Most are there because people assume what they know about ASCII or ISO-8859-* is true about Unicode. They are usually harmless but they tend to creep into minds of people who work with text which leads to badly designed software and technical decisions made based on false information.

Without further ado, here’s a few facts about Unicode that might surprise you.

Bash right prompt

Posted by Michał ‘mina86’ Nazarewicz on 28th of September 2015 | (cite)

There are multiple ways to customise Bash prompt. There’s no need to look for long to find plethora of examples with fancy, colourful PS1s. What have been a bit problematic is having text on the right of the input line. In this article I’ll try to address that shortcoming.

Getting text on the right

The typical approach is using PROMPT_COMMAND to output desired content. The variable specifies a shell code Bash executes prior to rendering the primary prompt (i.e. PS1).

The idea is to align text to the right and then using carrier return move the cursor back to the beginning of the line where Bash will start rendering its prompt. Let’s look at an example of showing time in various locations:

__command_rprompt() {
	local times= n=$COLUMNS tz
	for tz in ZRH:Europe/Zurich PIT:US/Eastern \
	          MTV:US/Pacific TOK:Asia/Tokyo; do
		[ $n -gt 40 ] || break
		times="$times ${tz%%:*}\e[30;1m:\e[0;36;1m"
		times="$times$(TZ=${tz#*:} date +%H:%M)\e[0m"
		n=$(( $n - 10 ))
	done
	[ -z "$times" ] || printf "%${n}s$times\\r" ''
}
PROMPT_COMMAND=__command_rprompt

Terminal window presenting right prompt behaviour.

Clearing the line on execution

It has one annoying issue. The right text reminds on screen even after executing a command. Typically this is a matter of aesthetic but it also makes copying and pasting session history more convoluted.

A manual solution is to use redraw-current-line readline function (e.g. often bound to C-l). It clears the line and prints the prompt and whatever input has been entered thus far. PROMPT_COMMAND is not executed so the right text does not reappear.

Lack of automation can be addressed with a tiny bit of readline magic and a ~/.inputrc file which deserves much more fame than what it usually gets.

Tricky part is bindind C-m and C-j to two readline functions, redraw-current-line followed by accept-line, which is normally not possible. This limitation can be overcome by binding the key sequences to a different sequence which will be interpreted recursively.

To test that idea it’s enough to execute:

bind '\C-l:redraw-current-line'
bind '\M-\C-j:accept-line'
bind '\C-j:"\C-l\M-\C-j"' '\C-m:"\C-j"'

Making this permanent is as easy as adding the following lines to ~/.inputrc:

$if Bash
    "\C-l": redraw-current-line
    "\e\C-j": accept-line
    "\C-j": "\C-l\e\C-j"
    "\C-m": "\C-l\e\C-j"
$endif

With that, the right prompt will disappear as soon as the shell command is executed. (Note the use of \M- in bind command vs. \e in ~/.inputrc file).