Mitsunori Komatsu

Posted on Dec 8, 2024 • Edited on Dec 12, 2024

Inside `java.lang.String`: Understanding and Optimizing Instantiation Performance

#java #performance #development #fury

java.lang.String is probably one of the most used classes in Java. Naturally, it contains its string data internally.

Do you know how the data is actually stored in String, and what happens when instantiating a String from a byte array? In this post, we'll explore the internal structure of java.lang.String and discuss ways to improve instantiation performance.

Internal structure of `java.lang.String` in Java 8 or earlier

In Java 8, java.lang.String contains its string data as a 16-bit char array.

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

When instantiating a String from a byte array, StringCoding.decode() is called.

    public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }

In the case of US_ASCII, sun.nio.cs.US_ASCII.Decoder.decode() is finally called, which copies the bytes of the source byte array into a char array one by one.

        public int decode(byte[] src, int sp, int len, char[] dst) {
            int dp = 0;
            len = Math.min(len, dst.length);
            while (dp < len) {
                byte b = src[sp++];
                if (b >= 0)
                    dst[dp++] = (char)b;
                else
                    dst[dp++] = repl;
            }
            return dp;
        }

The newly created char array is used as the new String instance's char array value.

As you notice, even if the source byte array contains only single byte characters, the byte-to-char copy iteration occurs.

Internal structure of `java.lang.String` in Java 9 or later

In Java 9 or later, java.lang.String contains its string data as a 8-bit byte array.

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {

    /**
     * The value is used for character storage.
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     *
     * Additionally, it is marked with {@link Stable} to trust the contents
     * of the array. No other facility in JDK provides this functionality (yet).
     * {@link Stable} is safe here, because value is never null.
     */
    @Stable
    private final byte[] value;

When instantiating a String from a byte array, StringCoding.decode() is also called.

    public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBoundsOffCount(offset, length, bytes.length);
        StringCoding.Result ret =
            StringCoding.decode(charset, bytes, offset, length);
        this.value = ret.value;
        this.coder = ret.coder;
    }

In the case of US_ASCII, StringCoding.decodeASCII() is called, which copies the source byte array using Arrays.copyOfRange(), as both the source and destination are byte arrays. Arrays.copyOfRange() internally uses System.arrayCopy() that is a native method and significantly fast.

    private static Result decodeASCII(byte[] ba, int off, int len) {
        Result result = resultCached.get();
        if (COMPACT_STRINGS && !hasNegatives(ba, off, len)) {
            return result.with(Arrays.copyOfRange(ba, off, off + len),
                               LATIN1);
        }
        byte[] dst = new byte[len<<1];
        int dp = 0;
        while (dp < len) {
            int b = ba[off++];
            putChar(dst, dp++, (b >= 0) ? (char)b : repl);
        }
        return result.with(dst, UTF16);
    }

You may notice COMPACT_STRINGS constant. This improvement introduced in Java 9 is called Compact Strings. The feature is enabled by default, but you can disable it if you want. See https://docs.oracle.com/en/java/javase/17/vm/java-hotspot-virtual-machine-performance-enhancements.html#GUID-D2E3DC58-D18B-4A6C-8167-4A1DFB4888E4 for detail.

The performance of `new String(byte[])` in Java 8, 11, 17 and 21

I wrote this simple JMH benchmark code to evaluate the performance of new String(byte[]):

@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
@Measurement(time = 3, iterations = 4)
@Warmup(iterations = 2)
public class StringInstantiationBenchmark {
  private static final int STR_LEN = 512;
  private static final byte[] SINGLE_BYTE_STR_SOURCE_BYTES;
  private static final byte[] MULTI_BYTE_STR_SOURCE_BYTES;
  static {
    {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < STR_LEN; i++) {
        sb.append("x");
      }
      SINGLE_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
    }
    {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < STR_LEN / 2; i++) {
        sb.append("あ");
      }
      MULTI_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
    }
  }

  @Benchmark
  public void newStrFromSingleByteStrBytes() {
    new String(SINGLE_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
  }

  @Benchmark
  public void newStrFromMultiByteStrBytes() {
    new String(MULTI_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
  }
}

The benchmark results are as follows:

Java 8

Benchmark                        Mode  Cnt     Score     Error   Units
newStrFromMultiByteStrBytes     thrpt    4  1672.397 ±  11.338  ops/ms
newStrFromSingleByteStrBytes    thrpt    4  4789.745 ± 553.865  ops/ms

Java 11

Benchmark                        Mode  Cnt      Score      Error   Units
newStrFromMultiByteStrBytes     thrpt    4   1507.754 ±   17.931  ops/ms
newStrFromSingleByteStrBytes    thrpt    4  15117.040 ± 1240.981  ops/ms

Java 17

Benchmark                        Mode  Cnt      Score     Error   Units
newStrFromMultiByteStrBytes     thrpt    4   1529.215 ± 168.064  ops/ms
newStrFromSingleByteStrBytes    thrpt    4  17753.086 ± 251.676  ops/ms

Java 21

Benchmark                        Mode  Cnt      Score      Error   Units
newStrFromMultiByteStrBytes     thrpt    4   1543.525 ±   69.061  ops/ms
newStrFromSingleByteStrBytes    thrpt    4  17711.972 ± 1178.212  ops/ms

The throughput of newStrFromSingleByteStrBytes() was drastically improved from Java 8 to Java 11. It's likely because of the change from the char array to the byte array in String class.

Further performance improvement with zero copy

Okay, Compact Strings is a great performance improvement. But there is no room to improve the performance of String instantiation from a byte array? String(byte bytes[], int offset, int length, Charset charset) in Java 9 or later copies the byte array. Even it uses System.copyArray() that is a native method and fast, it takes some time.

When I read the source code of Apache Fury which is "a blazingly-fast multi-language serialization framework powered by JIT (just-in-time compilation) and zero-copy", I found their StringSerializer achieves zero copy String instantiation. Let's look into the implementation.

The usage of the StringSerializer is as follows:

import org.apache.fury.serializer.StringSerializer;

...

    byte[] bytes = "Hello".getBytes();
    String s = StringSerializer.newBytesStringZeroCopy(LATIN1, bytes);
    System.out.println(s);    // >>> Hello

What StringSerializer.newBytesStringZeroCopy() finally achieves is to call non-public String constructor new String(byte[], byte coder), where the source byte array is directly set to String.value without copying bytes.

   /*
    * Package private constructor which shares value array for speed.
    */
    String(byte[] value, byte coder) {
        this.value = value;
        this.coder = coder;
    }

When StringSerializer.newBytesStringZeroCopy() is called, the method calls BYTES_STRING_ZERO_COPY_CTR BiFunction or LATIN_BYTES_STRING_ZERO_COPY_CTR Function.

  public static String newBytesStringZeroCopy(byte coder, byte[] data) {
    if (coder == LATIN1) {
      // 700% faster than unsafe put field in java11, only 10% slower than `new String(str)` for
      // string length 230.
      // 50% faster than unsafe put field in java11 for string length 10.
      if (LATIN_BYTES_STRING_ZERO_COPY_CTR != null) {
        return LATIN_BYTES_STRING_ZERO_COPY_CTR.apply(data);
      } else {
        // JDK17 removed newStringLatin1
        return BYTES_STRING_ZERO_COPY_CTR.apply(data, LATIN1_BOXED);
      }
    } else if (coder == UTF16) {
      // avoid byte box cost.
      return BYTES_STRING_ZERO_COPY_CTR.apply(data, UTF16_BOXED);
    } else {
      // 700% faster than unsafe put field in java11, only 10% slower than `new String(str)` for
      // string length 230.
      // 50% faster than unsafe put field in java11 for string length 10.
      // `invokeExact` must pass exact params with exact types:
      // `(Object) data, coder` will throw WrongMethodTypeException
      return BYTES_STRING_ZERO_COPY_CTR.apply(data, coder);
    }
  }

BYTES_STRING_ZERO_COPY_CTR is initialized to a BiFunction returned from getBytesStringZeroCopyCtr():

  private static BiFunction<byte[], Byte, String> getBytesStringZeroCopyCtr() {
    if (!STRING_VALUE_FIELD_IS_BYTES) {
      return null;
    }
    MethodHandle handle = getJavaStringZeroCopyCtrHandle();
    if (handle == null) {
      return null;
    }
    // Faster than handle.invokeExact(data, byte)
    try {
      MethodType instantiatedMethodType =
          MethodType.methodType(handle.type().returnType(), new Class[] {byte[].class, Byte.class});
      CallSite callSite =
          LambdaMetafactory.metafactory(
              STRING_LOOK_UP,
              "apply",
              MethodType.methodType(BiFunction.class),
              handle.type().generic(),
              handle,
              instantiatedMethodType);
      return (BiFunction) callSite.getTarget().invokeExact();
    } catch (Throwable e) {
      return null;
    }
  }

The method returns a BiFunction that receives byte[] value, byte coder as arguments. The function invokes a MethodHandle
for the String constructor new String(byte[] value, byte coder) via CallSite using LambdaMetafactory.metafactory(). It seems faster than directly calling MethodHandle.invokeExact(). I guess that's because of skipping bootstrap process by reusing the CallSite.

https://cr.openjdk.org/~ntv/talks/eclipseSummit16/indyunderTheHood.pdf

LATIN_BYTES_STRING_ZERO_COPY_CTR is initialized to a Function returned from getLatinBytesStringZeroCopyCtr():

  private static Function<byte[], String> getLatinBytesStringZeroCopyCtr() {
    if (!STRING_VALUE_FIELD_IS_BYTES) {
      return null;
    }
    if (STRING_LOOK_UP == null) {
      return null;
    }
    try {
      Class<?> clazz = Class.forName("java.lang.StringCoding");
      MethodHandles.Lookup caller = STRING_LOOK_UP.in(clazz);
      // JDK17 removed this method.
      MethodHandle handle =
          caller.findStatic(
              clazz, "newStringLatin1", MethodType.methodType(String.class, byte[].class));
      // Faster than handle.invokeExact(data, byte)
      return _JDKAccess.makeFunction(caller, handle, Function.class);
    } catch (Throwable e) {
      return null;
    }
  }

The method returns a Function that receives byte[] (coder isn't needed since it's only for LATIN1) as arguments like getBytesStringZeroCopyCtr(). This Function invokes a MethodHandle
for StringCoding.newStringLatin1(byte[] src) instead of the String constructor new String(byte[] value, byte coder). _JDKAccess.makeFunction() wraps the invocation of a MethodHandle with LambdaMetafactory.metafactory() as well as in getBytesStringZeroCopyCtr().

StringCoding.newStringLatin1() is removed at Java 17. So, BYTES_STRING_ZERO_COPY_CTR function is used in Java 17 or later, while LATIN_BYTES_STRING_ZERO_COPY_CTR function is used otherwise.

The points are:

Call non-public StringCoding.newStringLatin1() or new String(byte[] value, byte coder) to avoid byte array copy
Minimize the cost of MethodHandle invocation via CallSite as much as possible.

It's time for the benchmark. I updated the JMH benchmark code as follows:

build.gradle.kts

dependencies {
    implementation("org.apache.fury:fury-core:0.9.0")
    ...

org/komamitsu/stringinstantiationbench/StringInstantiationBenchmark.java

package org.komamitsu.stringinstantiationbench;

import org.apache.fury.serializer.StringSerializer;
import org.openjdk.jmh.annotations.*;

import java.nio.charset.StandardCharsets;
import java.util.concurrent.TimeUnit;

@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
@Measurement(time = 3, iterations = 4)
@Warmup(iterations = 2)
public class StringInstantiationBenchmark {
  private static final int STR_LEN = 512;
  private static final byte[] SINGLE_BYTE_STR_SOURCE_BYTES;
  private static final byte[] MULTI_BYTE_STR_SOURCE_BYTES;
  static {
    {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < STR_LEN; i++) {
        sb.append("x");
      }
      SINGLE_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
    }
    {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < STR_LEN / 2; i++) {
        sb.append("あ");
      }
      MULTI_BYTE_STR_SOURCE_BYTES = sb.toString().getBytes(StandardCharsets.UTF_8);
    }
  }

  @Benchmark
  public void newStrFromSingleByteStrBytes() {
    new String(SINGLE_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
  }

  @Benchmark
  public void newStrFromMultiByteStrBytes() {
    new String(MULTI_BYTE_STR_SOURCE_BYTES, StandardCharsets.UTF_8);
  }

  // Copied from org.apache.fury.serializer.StringSerializer.
  private static final byte LATIN1 = 0;
  private static final Byte LATIN1_BOXED = LATIN1;
  private static final byte UTF16 = 1;
  private static final Byte UTF16_BOXED = UTF16;
  private static final byte UTF8 = 2;

  @Benchmark
  public void newStrFromSingleByteStrBytesWithZeroCopy() {
    StringSerializer.newBytesStringZeroCopy(LATIN1, SINGLE_BYTE_STR_SOURCE_BYTES);
  }

  @Benchmark
  public void newStrFromMultiByteStrBytesWithZeroCopy() {
    StringSerializer.newBytesStringZeroCopy(UTF8, MULTI_BYTE_STR_SOURCE_BYTES);
  }
}

And the result is as follows:

Java 11

Benchmark                                  Mode  Cnt        Score      Error   Units
newStrFromMultiByteStrBytes               thrpt    4     1505.580 ±   13.191  ops/ms
newStrFromMultiByteStrBytesWithZeroCopy   thrpt    4  2284141.488 ± 5509.077  ops/ms
newStrFromSingleByteStrBytes              thrpt    4    15246.342 ±  258.381  ops/ms
newStrFromSingleByteStrBytesWithZeroCopy  thrpt    4  2281817.367 ± 8054.568  ops/ms

Java 17

Benchmark                                  Mode  Cnt        Score       Error   Units
newStrFromMultiByteStrBytes               thrpt    4     1545.503 ±    15.283  ops/ms
newStrFromMultiByteStrBytesWithZeroCopy   thrpt    4  2273566.173 ± 10212.794  ops/ms
newStrFromSingleByteStrBytes              thrpt    4    17598.209 ±   253.282  ops/ms
newStrFromSingleByteStrBytesWithZeroCopy  thrpt    4  2277213.103 ± 13380.823  ops/ms

Java 21

Benchmark                                  Mode  Cnt        Score        Error   Units
newStrFromMultiByteStrBytes               thrpt    4     1556.272 ±     16.482  ops/ms
newStrFromMultiByteStrBytesWithZeroCopy   thrpt    4  3698101.264 ± 429945.546  ops/ms
newStrFromSingleByteStrBytes              thrpt    4    17803.149 ±    204.987  ops/ms
newStrFromSingleByteStrBytesWithZeroCopy  thrpt    4  3817357.204 ±  89376.224  ops/ms

The benchmark code failed with Java 8 due to NPE. Maybe I used the method in a wrong way.

The performance of StringSerializer.newBytesStringZeroCopy() was more than 100 times faster in Java 17 and more than 200 times faster in Java 21 than the normal new String(byte[] bytes, Charset charset). Maybe this is one on the secrets of why Fury is blazing-fast.

A possible concern of using such a zero-copy strategy and implementation is that the byte array passed to new String(byte[] value, byte coder) could be owned by multiple objects; the new String object and objects having reference to the byte array.

    byte[] bytes = "Hello".getBytes();
    String s = StringSerializer.newBytesStringZeroCopy(LATIN1, bytes);
    System.out.println(s);    // >>> Hello
    bytes[4] = '!';
    System.out.println(s);    // >>> Hell!

This mutability could cause an issue that a string content is unexpectedly changed.

Conclusion

Use Java 9 or later as much as possible if you're using Java 8, in terms of the performance of String instantiation.
There is a technique to instantiate a String from a byte array with zero copy. It's blazing-fast.

DEV Community

Inside `java.lang.String`: Understanding and Optimizing Instantiation Performance

Internal structure of `java.lang.String` in Java 8 or earlier

Internal structure of `java.lang.String` in Java 9 or later

The performance of `new String(byte[])` in Java 8, 11, 17 and 21

Further performance improvement with zero copy

Conclusion

Top comments (0)

Internal structure of java.lang.String in Java 8 or earlier

Internal structure of java.lang.String in Java 9 or later

The performance of new String(byte[]) in Java 8, 11, 17 and 21

Further performance improvement with zero copy

Conclusion

Internal structure of `java.lang.String` in Java 8 or earlier

Internal structure of `java.lang.String` in Java 9 or later

The performance of `new String(byte[])` in Java 8, 11, 17 and 21