Juan Torchia

Posted on May 17 • Originally published at juanchi.dev

Spring Boot 2026: Why Measuring Only Startup Time Is a Trap

#english #performance #arquitectura #springboot

There's a question that surfaces every time someone mentions GraalVM or Spring AOT in a technical meeting: how long does it take to start? It's the first metric that hits the screen, the number that closes the debate in five minutes. The problem is that question alone isn't enough to make any serious architecture decision, and in 2026 we have enough evidence to prove it with a reproducible lab.

I built JuanTorchia/springboot-jvm-2026 (tag editorial-final-startup-matrix) around exactly that working hypothesis: if you only look at startup time, you're ignoring half the costs that actually matter in production.

The lab backend is not a Hello World

Choosing what to measure matters as much as measuring it. A GET /ping endpoint that returns {"status":"ok"} doesn't activate the same bean graph or the same JIT behavior as a real application. So the lab backend has concrete surface area:

POST /api/orders with Jakarta Validation on a record
GET /api/orders/{id} with Spring Data JDBC on PostgreSQL 17
POST /api/work with deterministic work (iterative CRC32, up to 5,000 iterations)
Flyway for migrations, Actuator for readiness/liveness
HikariCP with the pool explicitly configured in the benchmark profile

The WorkService deserves its own paragraph because it's the only endpoint that mixes real CPU with a database query (countOrders()). That matters: without that endpoint, native and classic JVM look practically identical on warm latency because the JIT has nothing interesting to optimize.

// WorkService.java — deterministic work to force real differences between modes
public long calculateScore(String input, int iterations) {
    byte[] seed = input.getBytes(StandardCharsets.UTF_8);
    long score = 17;
    for (int i = 0; i < iterations; i++) {
        CRC32 crc = new CRC32();
        crc.update(seed);
        crc.update(longToBytes(score + i));
        // rotation + golden Fibonacci constant for dispersion
        score = Long.rotateLeft(score ^ crc.getValue(), 7) + 0x9E3779B97F4A7C15L;
    }
    return score & Long.MAX_VALUE;
}

The 5_000 iteration cap isn't arbitrary: I validated it with WorkServiceTest to keep the cap predictable and prevent the benchmark from accidentally becoming a throughput test.

Four modes, four distinct operational surfaces

The lab compares:

jvm: java -jar on Eclipse Temurin 21, the baseline for every team that hasn't touched anything
cds: JVM with a dynamic AppCDS archive prepared in a separate phase
aot-jvm: Spring Boot AOT on JVM, with -Dspring.aot.enabled=true verified in the container
native: GraalVM Native Image compiled inside ghcr.io/graalvm/native-image-community:21

That last point about AOT has a story. In the editorial run on May 17, 2026 (17:31–17:44 Buenos Aires time), the aot-jvm results made no sense until I confirmed the flag was actually reaching the container. Without spring.aot.enabled=true verified in the runtime env, AOT mode is indistinguishable from classic JVM on startup. The results/environment.json captures exactly that so anyone reproducing the lab knows what was actually running.

The Dockerfile.native does the full build inside the builder container:

# Dockerfile.native — the native build happens inside the builder, no local GraalVM required
FROM ghcr.io/graalvm/native-image-community:21 AS builder
WORKDIR /workspace
RUN microdnf install -y maven && microdnf clean all
COPY .mvn/ .mvn/
COPY mvnw pom.xml ./
COPY src/ src/
RUN chmod +x ./mvnw && ./mvnw -Pnative -DskipTests native:compile

FROM ubuntu:24.04
# final image with no JRE: just the compiled binary
COPY --from=builder /workspace/target/startup-lab /workspace/startup-lab
ENTRYPOINT ["/workspace/startup-lab"]

That means the startup-lab binary runs without a JRE in the final image. Smaller image, much faster startup, but the cost shifted entirely to build time. That's the central trade-off of native mode: you don't eliminate work, you move it from runtime to build time.

What the startup number doesn't capture

In this local matrix, native reduced startup time and RSS compared to JVM modes. That's true and reproducible on the editorial-final-startup-matrix tag. But that number alone doesn't tell the full story.

Build time for native is an order of magnitude higher than a classic mvn package. If you're on a CI pipeline with frequent deploys, that cost shows up on every merge to main. It's not a startup cost: it's a development cycle cost.

First-request latency can differ materially from warm latency. On classic JVM, the first request pays the cost of unloaded classes and a cold JIT. On native there's no JIT, so the first request and request number one thousand have a similar profile. That can be an advantage or a disadvantage depending on your actual load profile.

The AppCDS preparation cost is a third dimension that only appears in cds mode: there's an archive dump phase that runs before the container is ready for traffic. Operationally that means an initialization step that doesn't exist in the other modes, and that you need to model in your deploy pipeline if CDS is the option.

Warm latency under sustained load, GC behavior under high memory pressure, and scheduling on Kubernetes are dimensions this lab intentionally doesn't measure. Running three iterations on Docker Desktop over WSL2 on Windows is not production. What the lab does guarantee is local reproducibility: anyone can clone the repo and reproduce the matrix with:

# Windows — full editorial run with 3 runs per mode and native enabled
powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\run-lab.ps1 -Preset editorial

The decision startup time can't make on its own

My position after building this: startup time is useful as a tiebreaker when everything else is even. Using it as the primary metric to choose between classic JVM, AppCDS, AOT-JVM, and native is making an architecture decision on a single axis.

What I can claim with evidence from this matrix:

If the requirement is startup around 1.4 seconds and controlled RSS in this matrix, native delivers that, but you pay with higher build time and the loss of JIT at warm.
If the team needs fast CI cycles and current startup is tolerable, AOT-JVM with -Dspring.aot.enabled=true improves boot time without changing the deploy artifact.
AppCDS has the lowest operational change cost of all four, but it has that preparation phase that needs to be explicitly modeled.
Classic JVM is still the correct baseline for any comparison. Dropping it without measuring the other three axes is pure vibes.

There's no universal winner. There are trade-offs that depend on how many times per hour the service scales, how heavy the CI pipeline is, and whether the team can take on the additional operational complexity of native.

The repo is at JuanTorchia/springboot-jvm-2026, tag editorial-final-startup-matrix. Raw results are in results/raw/*.json and the aggregated matrix in results/comparison.md. If you're going to cite it, use the wording from the README: "In the editorial-final-startup-matrix tag of JuanTorchia/springboot-jvm-2026, measured locally on Windows Docker Desktop/WSL2..." — that environment context isn't a decorative disclaimer, it's part of the data.

What's the dimension that drives your decision most between these four modes? Build time, warm latency, or library compatibility on native?

This article was originally published on juanchi.dev

Top comments (1)

buildbasekit • May 23

This is such an important point.

Startup time gets way too much attention because it’s the easiest number to compare, but real production decisions are rarely that one-dimensional.

The build-time vs runtime tradeoff for native is where a lot of teams underestimate the actual cost. Really liked that you used a non-trivial benchmark instead of the usual hello-world comparisons.