Java: String↔char[]

Posted by Michał ‘mina86’ Nazarewicz on 9th of February 2019

Do you recall when I decided to abuse Go’s run-time and play with string[]byte conversion? Fun times… I wonder if we could do the same to Java?

To remind ourselves of the ‘problem’, strings in Java are immutable but because Java has no concept of ownership or const keyword (can we move the industry to Rust already?) to make true on that promise, Java run-time has to make a defensive copy each time a new string is created or when string’s characters are returned.

Alas, do not despair! There is another way (exception handling elided for brevity):

private static Field getValueField() {
	final Field field = String.class.getDeclaredField("value");
	field.setAccessible(true);
	/* Test that it works. */
	final char[] chars = new char[]{'F', 'o', 'o'};
	final String string = new String();
	field.set(string, chars);
	if (string.equals("Foo") && field.get(string) == chars) {
		return field;
	}
	throw new UnsupportedOperationException(
		"UnsafeString not supported by the run-time");
}

private final static Field valueField = getValueField();

public static String fromChars(final char[] chars) {
	final String string = new String();
	valueField.set(string, chars);
	return string;
}

public static char[] toChars(final String string) {
	return (char[]) valueField.get(string);
}

However. There is a twist…

Benchmarks

Benchmarks shouldn’t surprise anyone:

Argument lengthUnsafeString​::​fromChars [ns]String​::​new [ns]
013.924.970.37✕
113.988.070.58✕
314.078.070.57✕
414.068.090.58✕
1014.159.250.65✕
3314.2312.540.88✕
10014.1229.682.10✕
1000014.172937.98207.33
100000014.04319440.0022754.73
Argument lengthUnsafeString​::​toChars [ns]String​::​toCharArray [ns]
05.794.640.80✕
15.139.171.79✕
35.579.081.63✕
45.139.131.78✕
105.6710.471.85✕
335.4913.032.37✕
1005.1129.385.75✕
100005.122950.88575.79
10000005.15318074.0061728.38

The unsafe variant takes roughly the same amount of time regardless of the size of the argument while safe variant scales linearly. Interestingly, because reflection is slow, safe call is faster for short strings.

The code including tests and benchmarks can be found in the java-unsafe-string repository.

If the benchmarks aren’t surprising, what’s up with the twist then?

Java 6

While we’re on the subject of messing with the Java’s String object it might be good to mention the above code won’t work in Java 6 and earlier versions.

Until Java 7u6, String::substring created objects which shared character array with the ‘parent’ string. This had some advantages – the operation was constant time and constant memory operation – but could lead to memory leaks (if the base string got garbage collected its entire contents would remain in memory even if the substring needed just a small portion of it) and complicated the String class (by requiring offset and length fields).

In the end, the implementation has been changed and strings now own the entire character array. Interestingly, the ‘trigger’ for the change was the introduction of (now removed) new hashing algorithm for strings. Whatever the case, the code presented here won’t work before Java 7u6.

But wait, this is still not the twist I’ve promised. ;)

Java 9

The above benchmarks were run on Java 8 and by now probably everyone and their dog knows that this particular version’s support has ended. Let’s jump to the next LTS version, Java 11:

$ javac com/mina86/unsafe/*.java &&
      echo && java -version && echo &&
      java com.mina86.unsafe.UnsafeStringBenchmark

openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment (build 11.0.2+9-Debian-3)
OpenJDK 64-Bit Server VM (build 11.0.2+9-Debian-3, mixed mode, sharing)

Testing safe implementation: ........................... done, all ok
 +   safe::fromChars/0      : 1194602409 ops in 1731473175 ns: 1.44941 ns/op
 +   safe::fromChars/1      :  204060009 ops in 1622419993 ns: 7.95070 ns/op
 +   safe::fromChars/3      :  312337323 ops in 2857745803 ns: 9.14955 ns/op
 +   safe::fromChars/4      :  124336092 ops in 2170864835 ns: 17.4597 ns/op
 +   safe::fromChars/10     :  306122448 ops in 2816903678 ns: 9.20189 ns/op
 +   safe::fromChars/33     :  172483182 ops in 1914933095 ns: 11.1021 ns/op
 +   safe::fromChars/100    :  103099869 ops in 2107079434 ns: 20.4373 ns/op
 +   safe::fromChars/10000  :     661688 ops in 1031572901 ns: 1559.00 ns/op
 +   safe::fromChars/1000000:       4397 ops in 1002248806 ns: 227939 ns/op
 +     safe::toChars/0      :  280731006 ops in 2171809870 ns: 7.73627 ns/op
 +     safe::toChars/1      :  273448179 ops in 2172255240 ns: 7.94394 ns/op
 +     safe::toChars/3      :  284117814 ops in 2760800696 ns: 9.71710 ns/op
 +     safe::toChars/4      :  240143619 ops in 2666941237 ns: 11.1056 ns/op
 +     safe::toChars/10     :  234594930 ops in 2264769324 ns: 9.65396 ns/op
 +     safe::toChars/33     :  205747203 ops in 2952933911 ns: 14.3522 ns/op
 +     safe::toChars/100    :   94298106 ops in 2873368834 ns: 30.4711 ns/op
 +     safe::toChars/10000  :     357551 ops in 1046061057 ns: 2925.63 ns/op
 +     safe::toChars/1000000:       9012 ops in 2813949290 ns: 312245 ns/op

So far so good. The times are a bit noisier though the creation of an empty string seemed to be optimised. Let’s see how unsafe version compares.

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.mina86.unsafe.UnsafeStringImpl (file:/home/mpn/code/unsafe-str/) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of com.mina86.unsafe.UnsafeStringImpl
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
java.lang.IllegalArgumentException: Can not set final [B field java.lang.String.value to [C
	at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
	at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
	at java.base/jdk.internal.reflect.UnsafeQualifiedObjectFieldAccessorImpl.set(UnsafeQualifiedObjectFieldAccessorImpl.java:83)
	at java.base/java.lang.reflect.Field.set(Field.java:780)
	at com.mina86.unsafe.UnsafeStringImpl.makeUnsafeMaybe(UnsafeStringImpl.java:19)
	at com.mina86.unsafe.UnsafeStringBenchmark.main(UnsafeStringBenchmark.java:15)
Testing unsafe implementation: unsupported by the run-time

How’s that for a twist? Uh? I overhyped the twist, you say? Well… Dumbledore dies!

On a serious note though, yes, starting with Java 9, Oracle started locking down internal APIs making some low-level optimisations no longer possible, so as you move from Java 8 remember to check any libraries which achieve high performance by messing around Java’s internals.