Arkadiusz Przychocki

Posted on Apr 7 • Originally published at blog.arkstack.dev

StructuredTaskScope beyond toy examples: dependency-aware kernel bootstrap in modern Java

#java #projectloom #structuredconcurrency #architecture

I did not start this because I wanted to write an article about StructuredTaskScope.

I got there from a more annoying direction: bootstrap had stopped being a startup script.

Once the kernel had a real subsystem graph — config, memory, persistence, graph, events, flow, transport — the old mental model broke down. The question was no longer "how do I start modules?" It became "what is actually allowed to start now, what must already be ready, and what happens if one piece fails halfway through?"

That is a different problem from request fan-out.

This article is a follow-up to my earlier piece on DOP, ScopedValue, and Loom. There, I used StructuredTaskScope as a clean example of native fail-fast execution. Here I want to show the more useful case: what happened once I tried to fit it into a real lifecycle model.

Constraint upfront: this only makes sense when the execution path is still under my control. If you are building a plugin surface or a highly open extension model, parts of this break down quickly.

Bootstrap stopped being linear

A lot of startup code still assumes the system is basically a list:

build some objects
call start()
maybe wait a bit
hope shutdown is the reverse

That works until the dependency graph becomes real.

In Exeris, bootstrap is constrained by subsystem relationships, not by the order I happen to like in a main() method. Some subsystems are foundational. Some are optional. Some can start only after several others are already running. Some failures can degrade. Some cannot.

At that point, startup becomes a graph problem whether you admit it or not.

What I kept from the old model was determinism.
What I dropped was the idea that everything meaningful should happen inside one generic "start all modules" phase.

The shape of the graph matters more than the urge to parallelize it.

Figure 1: Dependency-aware kernel bootstrap graph in Exeris. The point is that concurrency is legal only where the graph permits it.

Figure 1: Dependency-aware kernel bootstrap graph in Exeris. The point is not that several subsystems exist. The point is that concurrency is legal only where the graph permits it.

I noticed that once the graph was explicit, "just parallelize bootstrap" stopped being a serious answer pretty quickly. The graph already tells you where concurrency is allowed and where it is simply too early.

The bootstrap docs in Exeris describe the same thing from the subsystem side: L0 remains foundational, higher layers can move only after the substrate is ready, and shutdown keeps that structure in reverse.

The split that actually mattered

The design choice that mattered most was not using StructuredTaskScope.

It was deciding where not to use it.

At first, the obvious temptation was to parallelize more of bootstrap. If the JVM gives you virtual threads and structured concurrency, it is very easy to start looking for places to apply them.

I ended up doing less than that.

I kept initialize() sequential and topological.
I only allowed structured parallelism in start().

That was not ideological. It was practical.

Initialization is where the orchestrator builds structure:

provider bindings
health registration
active subsystem ordering
dependency-safe lifecycle state
bootstrap telemetry hooks

That phase wants determinism more than it wants speed. I did not want graph construction, provider composition, and lifecycle execution to collapse into one concurrent blur.

Startup is different. Once the graph is already resolved and the active set is known, concurrency becomes useful — but only if it stays inside the same lifecycle boundaries the graph already established.

That led to a much simpler rule:

initialization stays ordered, startup may become parallel.

This was the point where the old model stopped making sense for me. I was no longer trying to make bootstrap faster in the abstract. I was trying to keep lifecycle ownership readable.

for (BootstrapPhase phase : BootstrapPhase.values()) {
    List<Subsystem> forPhase = orderedSubsystems.stream()
            .filter(s -> s.phase() == phase)
            .toList();
    if (forPhase.isEmpty()) {
        continue;
    }

    if (phase == BootstrapPhase.FOUNDATION) {
        startSequential(forPhase, phase, profileName, startedNames);
    } else {
        startParallel(forPhase, phase, profileName, startedNames);
    }
}

In practice, FOUNDATION stays sequential on purpose. That includes the parts of bootstrap that decide whether the rest of the kernel can even be interpreted correctly: configuration roots, base runtime substrate, exception boundaries, and core providers.

I could have parallelized more of that. I did not.

The trade-off is deliberate:

I give up some startup parallelism early
in exchange for a cleaner substrate
and less ambiguity when the higher layers begin to move

This is not universal. If your startup graph is shallow and your layers are genuinely independent, you can be more aggressive. In my case, the real cost of a bad foundation was not a few extra milliseconds. It was a fuzzier lifecycle model and harder-to-classify failures later.

ScopedValue still mattered at the boundary

This article is about StructuredTaskScope, but I ended up reusing the same lesson from the previous piece: context propagation only stays clean if the boundary is explicit.

In Exeris, bootstrap resolves configuration once, then binds it at the kernel boundary before the rest of the lifecycle begins. Everything spawned under that boundary inherits the same immutable context.

try {
    ScopedValue.where(KernelProviders.CURRENT_CONFIG, config)
            .call(() -> {
                runBootInsideScope(orchestrator, config, configRegistry, configWatcher, kernelMain);
                return null;
            });
} catch (SubsystemCircularDependencyException ex) {
    throw ex;
} catch (SubsystemOrchestrator.BootstrapException ex) {
    throw new BootstrapException("Subsystem bootstrap failed: " + ex.getMessage(), ex);
}

That choice mattered more than another layer of constructor wiring would have.

I did not want every subsystem, handler, or virtual thread to receive config through argument threading just because bootstrap needed lifecycle scope. I also did not want to fall back to ThreadLocal and reintroduce the same inheritance and mutability problems I had already rejected elsewhere.

So the boundary stayed strict:

config is resolved once
bound once
inherited downward
and torn down when boot exits

That kept the lifecycle model cleaner. It also meant that when I later opened structured startup rounds, they inherited the same immutable runtime context without extra ceremony.

The useful part was not STS itself

The useful part was computing a safe round before opening a scope.

I do not want to smooth this into a generic explanation, because it is really the center of the design.

The orchestrator does not just fork all pending subsystems for a phase and wait.

It first computes which subsystems are actually safe to start now.

Set<String> pendingNames = pending.stream()
        .map(Subsystem::name)
        .collect(java.util.stream.Collectors.toCollection(LinkedHashSet::new));

List<Subsystem> ready = pending.stream()
        .filter(subsystem -> dependenciesReadyForRound(subsystem, pendingNames, startedNames))
        .toList();

if (ready.isEmpty()) {
    throw new BootstrapException(
            "Phase " + phase + " cannot make progress: unresolved dependencies among pending subsystems "
            + pendingNames);
}

That changed the role of StructuredTaskScope completely.

It was no longer responsible for discovering order.
It was responsible for executing one dependency-safe round inside an order the orchestrator had already made explicit.

That is why I keep saying this is a graph problem first and a concurrency problem second.

Figure 2: Dependency-safe startup round. StructuredTaskScope is opened only after the orchestrator computes a ready set from the graph.

Figure 2: Dependency-safe startup round. The orchestrator computes eligibility first, then gives StructuredTaskScope a bounded unit of work to own.

This was the point where the old "just launch it and coordinate later" model stopped making sense. I did not want startup order to become an emergent property of timing, future composition, or whichever task completed first.

I wanted concurrency to appear after dependency eligibility had already been established.

This is where StructuredTaskScope actually earned its place

Once the ready set exists, the role of StructuredTaskScope becomes very narrow and very clean.

It owns one startup round.

That is it.

try (var scope = StructuredTaskScope.open()) {
    List<StructuredTaskScope.Subtask<Object>> tasks = ready.stream()
            .<StructuredTaskScope.Subtask<Object>>map(
                    subsystem -> scope.fork(() -> {
                        doStart(subsystem, phase, profile);
                        return null;
                    }))
            .toList();

    scope.join();

    List<Throwable> failures = tasks.stream()
            .filter(task -> task.state() == StructuredTaskScope.Subtask.State.FAILED)
            .map(StructuredTaskScope.Subtask::exception)
            .toList();

    if (!failures.isEmpty()) {
        Throwable first = failures.getFirst();
        throw new BootstrapException(
                failures.size() + " subsystem(s) failed in phase " + phase
                + ". First failure: " + first.getMessage(), first);
    }
}

This is the part I actually like.

Not because it is clever. Mostly because it is boring in the right way.

The round has:

an owner
explicit lifetime
explicit completion
explicit failure collection

No task belongs to some vague executor that outlives the lifecycle moment that created it. No background startup work escapes into "maybe still running" territory. The concurrency boundary finally matches the lifecycle boundary.

That was the point.

And that is also why I think StructuredTaskScope is more interesting here than in the usual "fetch two things in parallel" examples. Those examples prove the API works. This kind of orchestrator is where it starts to fit the shape of the system.

I could have done this with futures. I did not want to.

There is nothing impossible about building this with:

ExecutorService
CompletableFuture
latches
custom worker tracking
hand-rolled failure aggregation

If the goal was just "run multiple startup actions in parallel," all of those would work.

But the real problem was never just parallelism.

What I actually cared about was:

who owns this work
when exactly this round ends
what belongs to this phase and what does not
how failure is surfaced
how shutdown reasoning stays clean

Bootstrap is one of the worst places to tolerate vague concurrency ownership. When startup fails, I do not want to guess whether some task is still alive in the background or whether a future chain has already detached from the lifecycle moment that spawned it.

That is the real difference here.

The point is not that StructuredTaskScope can run tasks. The point is that it gives this round a proper boundary.

I would still use more conventional concurrency tools when the lifecycle is shallower, the ownership model is already loose, or the surrounding architecture does not benefit from such a strict boundary. This is not a universal replacement story.

The failure policy mattered at least as much as the concurrency primitive

I do not think this model would feel coherent without an explicit failure policy.

Exeris supports both:

FAIL_FAST
DEGRADE

But not symmetrically.

Foundational subsystems are still mandatory. They do not get to degrade just because higher layers can. That boundary matters.

Inside the orchestrator, that asymmetry is explicit. Optional subsystems may be removed under DEGRADE, but a mandatory failure still aborts boot.

boolean isMandatory =
        (subsystem.phase() == BootstrapPhase.FOUNDATION) || !subsystem.isOptional();

if (failurePolicy == FailurePolicy.DEGRADE && !isMandatory) {
    removeSubsystemAndTransitiveDependents(subsystem.name());
} else {
    healthMonitor.markKernelState(KernelHealthMonitor.KernelState.FAILED);
    throw new BootstrapException(
            "Subsystem '" + subsystem.name() + "' failed: "
            + failure.getMessage(), failure);
}

I kept that asymmetry because not all failures mean the same thing. An optional higher-level capability failing to start can be survivable. A foundation-layer failure usually means the system no longer has a sane substrate to run on.

This was another place where I resisted smoothing the model into something more uniform. Uniformity would have looked cleaner on paper, but it would have made the lifecycle semantics less truthful.

So the useful question was never:

can these tasks run in parallel?

It was:

what does it mean for the kernel if this one fails right now?

That question forces the architecture to stay honest.

Startup only makes sense if shutdown keeps the same shape

One thing I did not want to lose was lifecycle symmetry.

A graph-shaped startup model should not collapse into an improvised shutdown path. If startup order is derived from dependency structure, shutdown should preserve that structure in reverse.

In practice, that means I care about reverse topological shutdown just as much as startup rounds.

List<Subsystem> reversed = new ArrayList<>(orderedSubsystems);
Collections.reverse(reversed);

for (Subsystem subsystem : reversed) {
    if (subsystem.isRunning()) {
        subsystem.stop();
    }
}

That sounds obvious, but it matters more once concurrency enters the picture. Structured startup is easier to trust when the rest of the lifecycle still behaves like one coherent model instead of a collection of unrelated hooks.

Figure 3: Lifecycle symmetry in Exeris. Startup derives capability from dependency order; shutdown preserves that order in reverse so the lifecycle remains one coherent model.

I would not make reverse shutdown the headline of the article, but I would not treat it as a footnote either. It is part of the same argument: structured concurrency helps most when the surrounding lifecycle is already structured.

What I measured, and what I did not claim

This article is architectural first, but I do not want to leave it floating at the level of "this feels cleaner."

The evidence I care about here is not generic throughput. It is lifecycle evidence.

For this model, the useful signals are things like:

sequential startup vs phase-grouped structured startup
total cold boot duration until boot-ready
per-phase startup timing
round timing in parallel phases
repeated cold-start variance
degraded boot timing when optional subsystems are removed
jdk.VirtualThreadPinned events during startup
final active subsystem count recorded at boot-ready

That is why I treat JFR as part of the architecture here, not just as a performance tool. If the startup model is real, it should leave a readable lifecycle trace behind it.

The bootstrap documentation in Exeris already treats startup telemetry as part of the contract: boot-ready, shutdown completion, dependency-cycle detection, and lifecycle state are all first-class signals rather than incidental logs.

I also have some early exploratory startup measurements, although I am deliberately treating them as supporting evidence rather than a headline claim.

Metric	Exeris community h1	Quarkus JVM VT tuned
Startup → health-ready	1132 ms	2182 ms
Startup → first request	1205 ms	2432 ms
Health-ready → first request	73 ms	250 ms

These measurements were taken on dev hardware and remain sensitive to local runtime conditions, including machine state and GUI / no-GUI setup. So I am not using them to claim broad startup superiority yet.

What they do show already is narrower, but still useful: this bootstrap model is measurable in operational terms. It is not just architecturally cleaner on paper.

The smaller health-ready → first-request gap is especially interesting to me, because it suggests the lifecycle boundary is not only short on paper but also closer to usable work.

I have also validated the same runtime under a constrained exploratory profile with zero request errors, but that belongs to a different discussion than this article. The point here is narrower: the lifecycle model is observable and survives contact with measurement.

I am also deliberately keeping the claim scope narrow. Early measurements are useful, but they are still sensitive to:

classloading state
JIT state
machine noise
native load conditions
startup environment shape

So I would rather say:

this model is measurable and operationally inspectable

than jump too quickly to:

this model is definitively faster.

That comes later, if the data actually holds.

What this does not solve

This model does not fix bad subsystem boundaries.

It does not fix circular graphs.
It does not fix startup work that should not exist in the first place.
It does not mean every subsystem should suddenly become parallel.
And it definitely does not generalize to every runtime architecture.

I would still use more conventional patterns when:

the graph is shallow
the lifecycle is simpler
subsystem ownership is fuzzy by design
plugin-style openness matters more than deterministic startup shape

This is not a universal recipe. It applies when the execution path is still under my control and the lifecycle itself is part of the architecture.

That boundary matters.

What I kept, what I dropped

I think this is the part that usually gets lost when articles get too polished.

I did not end up with a universal "use STS for bootstrap" rule.

What I kept:

topological lifecycle order
deterministic initialization
explicit phase boundaries
explicit failure policy
reverse shutdown symmetry

What I dropped:

the idea that initialize() and start() should be the same phase
the idea that startup parallelism should be maximal
the idea that concurrency should appear before dependency safety is known

I also considered using StructuredTaskScope in other places where it looked fashionable on paper. In at least one case, it simply did not buy me anything meaningful, so I left it out.

That contrast was useful. It made the bootstrap use case clearer. StructuredTaskScope was not valuable because it was new. It was valuable because this part of the system already had a natural owner, a natural boundary, and a natural failure model.

Conclusion

A few things became clearer to me while building this:

Bootstrap became a graph problem before it became a concurrency problem.
That changed which part of the design actually needed structure.
StructuredTaskScope helped only after order was already explicit.
The useful move was not "fork everything" but "compute a safe round, then run it inside a bounded scope."
The trade-off is intentional.
I kept initialize() and FOUNDATION sequential on purpose. I gave up some parallelism to keep lifecycle ownership and failure semantics easier to reason about.

What this unlocks next in Exeris is not just cleaner startup code. It gives me a more inspectable lifecycle model for later work around health, telemetry, subsystem isolation, and eventually more demanding cold-start contracts.

If you want to see what this looks like outside a toy example, the bootstrap and lifecycle code is in the Exeris Kernel repository.

Explore the Exeris Kernel — zero-allocation architecture in running code:
🔗 exeris-systems/exeris-kernel

DEV Community