Go: string[]byte

Posted by Michał ‘mina86’ Nazarewicz on 28th of February 2017

Yes… I’ve started coding in Go recently. It lacks many things but the one feature relevant to this post is const keyword. Arrays and slices in particular are always mutable and so equivalent of C’s const char * does not exist.

On the other hand, strings are immutable which means that conversion between a string and []byte requires memory allocation and copying of the data. Often this might be acceptable but to squeeze every last cycle the following two functions might help achieve zero-copy implementation:

func String(bytes []byte) string {
	hdr := *(*reflect.SliceHeader)(unsafe.Pointer(&bytes))
	return *(*string)(unsafe.Pointer(&reflect.StringHeader{
		Data: hdr.Data,
		Len:  hdr.Len,
	}))
}

func Bytes(str string) []byte {
	hdr := *(*reflect.StringHeader)(unsafe.Pointer(&str))
	return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
		Data: hdr.Data,
		Len:  hdr.Len,
		Cap:  hdr.Len,
	}))
}

Depending on the length of the strings, the difference in performance might be noticeable:

Argument lengthString(bytes) [ns]string(bytes) [ns]
01.863.081.7✕
101.869.755.2✕
10001.86232.  124. ✕
1000001.8913981.  7397. ✕
100000001.881126164.  599023. ✕
Argument lengthBytes(str) [ns][]byte(str) [ns]
01.866.243.4✕
101.867.313.9✕
10001.85230.  124. ✕
1000001.8615198.  8171. ✕
100000001.891141424.  603928. ✕

As expected, unsafe variant stays constant since it always allocates the same amount of memory (a single structure) and copies two or three values. The safe version on the other hand scales roughly linearly since it has to copy the data but is slower even when dealing with an empty string. This could be an overhead of a function call which unsafe version avoids by getting inlined.

The code, including tests and benchmarks used to generate the table above, can be found in a git repository at github.com/mina86/unsafeConvert.

As of now and to the best of my knowledge Go compiler is not smart enough to recognise when the copying can be avoided without sacrificing any promises the language is making.