java.lang.String is probably one of the most used classes in Java. Naturally, it contains its string data internally.
Do you know how the data is actually stored in String, and what happens when instantiating a String from a byte array? In this post, we'll explore the internal structure of java.lang.String and discuss ways to improve instantiation performance.
Internal structure of java.lang.String in Java 8 or earlier
In Java 8, java.lang.String contains its string data as a 16-bit char array.
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];
When instantiating a String from a byte array, StringCoding.decode() is called.
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charset, bytes, offset, length);
}
In the case of US_ASCII, sun.nio.cs.US_ASCII.Decoder.decode() is finally called, which copies the bytes of the source byte array into a char array one by one.
public int decode(byte[] src, int sp, int len, char[] dst) {
int dp = 0;
len = Math.min(len, dst.length);
while (dp < len) {
byte b = src[sp++];
if (b >= 0)
dst[dp++] = (char)b;
else
dst[dp++] = repl;
}
return dp;
}
The newly created char array is used as the new String instance's char array value.
As you notice, even if the source byte array contains only single byte characters, the byte-to-char copy iteration occurs.
Internal structure of java.lang.String in Java 9 or later
In Java 9 or later, java.lang.String contains its string data as a 8-bit byte array.
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/**
* The value is used for character storage.
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*
* Additionally, it is marked with {@link Stable} to trust the contents
* of the array. No other facility in JDK provides this functionality (yet).
* {@link Stable} is safe here, because value is never null.
*/
@Stable
private final byte[] value;
When instantiating a String from a byte array, StringCoding.decode() is also called.
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBoundsOffCount(offset, length, bytes.length);
StringCoding.Result ret =
StringCoding.decode(charset, bytes, offset, length);
this.value = ret.value;
this.coder = ret.coder;
}
In the case of US_ASCII, StringCoding.decodeASCII() is called, which copies the source byte array using Arrays.copyOfRange(), as both the source and destination are byte arrays. Arrays.copyOfRange() internally uses System.arrayCopy() that is a native method and significantly fast.
private static Result decodeASCII(byte[] ba, int off, int len) {
Result result = resultCached.get();
if (COMPACT_STRINGS && !hasNegatives(ba, off, len)) {
return result.with(Arrays.copyOfRange(ba, off, off + len),
LATIN1);
}
byte[] dst = new byte[len<<1];
int dp = 0;
while (dp < len) {
int b = ba[off++];
putChar(dst, dp++, (b >= 0) ? (char)b : repl);
}
return result.with(dst, UTF16);
}
You may notice COMPACT_STRINGS constant. This improvement introduced in Java 9 is called Compact Strings. The feature is enabled by default, but you can disable it if you want. See https://docs.oracle.com/en/java/javase/17/vm/java-hotspot-virtual-machine-performance-enhancements.html#GUID-D2E3DC58-D18B-4A6C-8167-4A1DFB4888E4 for detail.
The performance of new String(byte[]) in Java 8, 11, 17 and 21
I wrote this simple JMH benchmark code to evaluate the performance of new String(byte[]):
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
@Measurement(time = 3, iterations = 4)
@Warmup(iterations = 2)
public class StringInstantiationBenchmark {
private static final int STR_LEN = 512;
private static final byte[] SINGLE_BYTE_STR_SOURCE_BYTES;
private static final byte[] MULTI_BYTE_STR_SOURCE_BYTES;
static {
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < STR_LEN; i++) {
sb.append("x");
}
SINGLE_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
}
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < STR_LEN / 2; i++) {
sb.append("あ");
}
MULTI_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
}
}
@Benchmark
public void newStrFromSingleByteStrBytes() {
new String(SINGLE_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
}
@Benchmark
public void newStrFromMultiByteStrBytes() {
new String(MULTI_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
}
}
The benchmark results are as follows:
- Java 8
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1672.397 ± 11.338 ops/ms
newStrFromSingleByteStrBytes thrpt 4 4789.745 ± 553.865 ops/ms
- Java 11
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1507.754 ± 17.931 ops/ms
newStrFromSingleByteStrBytes thrpt 4 15117.040 ± 1240.981 ops/ms
- Java 17
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1529.215 ± 168.064 ops/ms
newStrFromSingleByteStrBytes thrpt 4 17753.086 ± 251.676 ops/ms
- Java 21
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1543.525 ± 69.061 ops/ms
newStrFromSingleByteStrBytes thrpt 4 17711.972 ± 1178.212 ops/ms
The throughput of newStrFromSingleByteStrBytes() was drastically improved from Java 8 to Java 11. It's likely because of the change from the char array to the byte array in String class.
Further performance improvement with zero copy
Okay, Compact Strings is a great performance improvement. But there is no room to improve the performance of String instantiation from a byte array? String(byte bytes[], int offset, int length, Charset charset) in Java 9 or later copies the byte array. Even it uses System.copyArray() that is a native method and fast, it takes some time.
When I read the source code of Apache Fury which is "a blazingly-fast multi-language serialization framework powered by JIT (just-in-time compilation) and zero-copy", I found their StringSerializer achieves zero copy String instantiation. Let's look into the implementation.
The usage of the StringSerializer is as follows:
import org.apache.fury.serializer.StringSerializer;
...
byte[] bytes = "Hello".getBytes();
String s = StringSerializer.newBytesStringZeroCopy(LATIN1, bytes);
System.out.println(s); // >>> Hello
What StringSerializer.newBytesStringZeroCopy() finally achieves is to call non-public String constructor new String(byte[], byte coder), where the source byte array is directly set to String.value without copying bytes.
/*
* Package private constructor which shares value array for speed.
*/
String(byte[] value, byte coder) {
this.value = value;
this.coder = coder;
}
When StringSerializer.newBytesStringZeroCopy() is called, the method calls BYTES_STRING_ZERO_COPY_CTR BiFunction or LATIN_BYTES_STRING_ZERO_COPY_CTR Function.
public static String newBytesStringZeroCopy(byte coder, byte[] data) {
if (coder == LATIN1) {
// 700% faster than unsafe put field in java11, only 10% slower than `new String(str)` for
// string length 230.
// 50% faster than unsafe put field in java11 for string length 10.
if (LATIN_BYTES_STRING_ZERO_COPY_CTR != null) {
return LATIN_BYTES_STRING_ZERO_COPY_CTR.apply(data);
} else {
// JDK17 removed newStringLatin1
return BYTES_STRING_ZERO_COPY_CTR.apply(data, LATIN1_BOXED);
}
} else if (coder == UTF16) {
// avoid byte box cost.
return BYTES_STRING_ZERO_COPY_CTR.apply(data, UTF16_BOXED);
} else {
// 700% faster than unsafe put field in java11, only 10% slower than `new String(str)` for
// string length 230.
// 50% faster than unsafe put field in java11 for string length 10.
// `invokeExact` must pass exact params with exact types:
// `(Object) data, coder` will throw WrongMethodTypeException
return BYTES_STRING_ZERO_COPY_CTR.apply(data, coder);
}
}
BYTES_STRING_ZERO_COPY_CTR is initialized to a BiFunction returned from getBytesStringZeroCopyCtr():
private static BiFunction<byte[], Byte, String> getBytesStringZeroCopyCtr() {
if (!STRING_VALUE_FIELD_IS_BYTES) {
return null;
}
MethodHandle handle = getJavaStringZeroCopyCtrHandle();
if (handle == null) {
return null;
}
// Faster than handle.invokeExact(data, byte)
try {
MethodType instantiatedMethodType =
MethodType.methodType(handle.type().returnType(), new Class[] {byte[].class, Byte.class});
CallSite callSite =
LambdaMetafactory.metafactory(
STRING_LOOK_UP,
"apply",
MethodType.methodType(BiFunction.class),
handle.type().generic(),
handle,
instantiatedMethodType);
return (BiFunction) callSite.getTarget().invokeExact();
} catch (Throwable e) {
return null;
}
}
The method returns a BiFunction that receives byte[] value, byte coder as arguments. The function invokes a MethodHandle
for the String constructor new String(byte[] value, byte coder) via CallSite using LambdaMetafactory.metafactory(). It seems faster than directly calling MethodHandle.invokeExact(). I guess that's because of skipping bootstrap process by reusing the CallSite.

https://cr.openjdk.org/~ntv/talks/eclipseSummit16/indyunderTheHood.pdf

https://cr.openjdk.org/~ntv/talks/eclipseSummit16/indyunderTheHood.pdf
LATIN_BYTES_STRING_ZERO_COPY_CTR is initialized to a Function returned from getLatinBytesStringZeroCopyCtr():
private static Function<byte[], String> getLatinBytesStringZeroCopyCtr() {
if (!STRING_VALUE_FIELD_IS_BYTES) {
return null;
}
if (STRING_LOOK_UP == null) {
return null;
}
try {
Class<?> clazz = Class.forName("java.lang.StringCoding");
MethodHandles.Lookup caller = STRING_LOOK_UP.in(clazz);
// JDK17 removed this method.
MethodHandle handle =
caller.findStatic(
clazz, "newStringLatin1", MethodType.methodType(String.class, byte[].class));
// Faster than handle.invokeExact(data, byte)
return _JDKAccess.makeFunction(caller, handle, Function.class);
} catch (Throwable e) {
return null;
}
}
The method returns a Function that receives byte[] (coder isn't needed since it's only for LATIN1) as arguments like getBytesStringZeroCopyCtr(). This Function invokes a MethodHandle
for StringCoding.newStringLatin1(byte[] src) instead of the String constructor new String(byte[] value, byte coder). _JDKAccess.makeFunction() wraps the invocation of a MethodHandle with LambdaMetafactory.metafactory() as well as in getBytesStringZeroCopyCtr().
StringCoding.newStringLatin1() is removed at Java 17. So, BYTES_STRING_ZERO_COPY_CTR function is used in Java 17 or later, while LATIN_BYTES_STRING_ZERO_COPY_CTR function is used otherwise.
The points are:
- Call non-public StringCoding.newStringLatin1() or new String(byte[] value, byte coder) to avoid byte array copy
- Minimize the cost of MethodHandle invocation via CallSite as much as possible.
It's time for the benchmark. I updated the JMH benchmark code as follows:
-
build.gradle.kts
dependencies {
implementation("org.apache.fury:fury-core:0.9.0")
...
-
org/komamitsu/stringinstantiationbench/StringInstantiationBenchmark.java
package org.komamitsu.stringinstantiationbench;
import org.apache.fury.serializer.StringSerializer;
import org.openjdk.jmh.annotations.*;
import java.nio.charset.StandardCharsets;
import java.util.concurrent.TimeUnit;
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
@Measurement(time = 3, iterations = 4)
@Warmup(iterations = 2)
public class StringInstantiationBenchmark {
private static final int STR_LEN = 512;
private static final byte[] SINGLE_BYTE_STR_SOURCE_BYTES;
private static final byte[] MULTI_BYTE_STR_SOURCE_BYTES;
static {
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < STR_LEN; i++) {
sb.append("x");
}
SINGLE_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
}
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < STR_LEN / 2; i++) {
sb.append("あ");
}
MULTI_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
}
}
@Benchmark
public void newStrFromSingleByteStrBytes() {
new String(SINGLE_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
}
@Benchmark
public void newStrFromMultiByteStrBytes() {
new String(MULTI_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
}
// Copied from org.apache.fury.serializer.StringSerializer.
private static final byte LATIN1 = 0;
private static final Byte LATIN1_BOXED = LATIN1;
private static final byte UTF16 = 1;
private static final Byte UTF16_BOXED = UTF16;
private static final byte UTF8 = 2;
@Benchmark
public void newStrFromSingleByteStrBytesWithZeroCopy() {
StringSerializer.newBytesStringZeroCopy(LATIN1, SINGLE_BYTE_STR_SOURCE_BYTES);
}
@Benchmark
public void newStrFromMultiByteStrBytesWithZeroCopy() {
StringSerializer.newBytesStringZeroCopy(UTF8, MULTI_BYTE_STR_SOURCE_BYTES);
}
}
And the result is as follows:
- Java 11
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1505.580 ± 13.191 ops/ms
newStrFromMultiByteStrBytesWithZeroCopy thrpt 4 2284141.488 ± 5509.077 ops/ms
newStrFromSingleByteStrBytes thrpt 4 15246.342 ± 258.381 ops/ms
newStrFromSingleByteStrBytesWithZeroCopy thrpt 4 2281817.367 ± 8054.568 ops/ms
- Java 17
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1545.503 ± 15.283 ops/ms
newStrFromMultiByteStrBytesWithZeroCopy thrpt 4 2273566.173 ± 10212.794 ops/ms
newStrFromSingleByteStrBytes thrpt 4 17598.209 ± 253.282 ops/ms
newStrFromSingleByteStrBytesWithZeroCopy thrpt 4 2277213.103 ± 13380.823 ops/ms
- Java 21
Benchmark Mode Cnt Score Error Units
newStrFromMultiByteStrBytes thrpt 4 1556.272 ± 16.482 ops/ms
newStrFromMultiByteStrBytesWithZeroCopy thrpt 4 3698101.264 ± 429945.546 ops/ms
newStrFromSingleByteStrBytes thrpt 4 17803.149 ± 204.987 ops/ms
newStrFromSingleByteStrBytesWithZeroCopy thrpt 4 3817357.204 ± 89376.224 ops/ms
The benchmark code failed with Java 8 due to NPE. Maybe I used the method in a wrong way.
The performance of StringSerializer.newBytesStringZeroCopy() was more than 100 times faster in Java 17 and more than 200 times faster in Java 21 than the normal new String(byte[] bytes, Charset charset). Maybe this is one on the secrets of why Fury is blazing-fast.
A possible concern of using such a zero-copy strategy and implementation is that the byte array passed to new String(byte[] value, byte coder) could be owned by multiple objects; the new String object and objects having reference to the byte array.
byte[] bytes = "Hello".getBytes();
String s = StringSerializer.newBytesStringZeroCopy(LATIN1, bytes);
System.out.println(s); // >>> Hello
bytes[4] = '!';
System.out.println(s); // >>> Hell!
This mutability could cause an issue that a string content is unexpectedly changed.
Conclusion
- Use Java 9 or later as much as possible if you're using Java 8, in terms of the performance of String instantiation.
- There is a technique to instantiate a String from a byte array with zero copy. It's blazing-fast.
Top comments (0)