DEV Community

Arkadiusz Przychocki
Arkadiusz Przychocki

Posted on • Originally published at blog.arkstack.dev

StructuredTaskScope beyond toy examples: dependency-aware kernel bootstrap in modern Java

I did not start this because I wanted to write an article about StructuredTaskScope.

I got there from a more annoying direction: bootstrap had stopped being a startup script.

Once the kernel had a real subsystem graph — config, memory, persistence, graph, events, flow, transport — the old mental model broke down. The question was no longer "how do I start modules?" It became "what is actually allowed to start now, what must already be ready, and what happens if one piece fails halfway through?"

That is a different problem from request fan-out.

This article is a follow-up to my earlier piece on DOP, ScopedValue, and Loom. There, I used StructuredTaskScope as a clean example of native fail-fast execution. Here I want to show the more useful case: what happened once I tried to fit it into a real lifecycle model.

Constraint upfront: this only makes sense when the execution path is still under my control. If you are building a plugin surface or a highly open extension model, parts of this break down quickly.


Bootstrap stopped being linear

A lot of startup code still assumes the system is basically a list:

  1. build some objects
  2. call start()
  3. maybe wait a bit
  4. hope shutdown is the reverse

That works until the dependency graph becomes real.

In Exeris, bootstrap is constrained by subsystem relationships, not by the order I happen to like in a main() method. Some subsystems are foundational. Some are optional. Some can start only after several others are already running. Some failures can degrade. Some cannot.

At that point, startup becomes a graph problem whether you admit it or not.

What I kept from the old model was determinism.
What I dropped was the idea that everything meaningful should happen inside one generic "start all modules" phase.

The shape of the graph matters more than the urge to parallelize it.

Figure 1: Dependency-aware kernel bootstrap graph in Exeris. The point is that concurrency is legal only where the graph permits it.

Figure 1: Dependency-aware kernel bootstrap graph in Exeris. The point is not that several subsystems exist. The point is that concurrency is legal only where the graph permits it.

I noticed that once the graph was explicit, "just parallelize bootstrap" stopped being a serious answer pretty quickly. The graph already tells you where concurrency is allowed and where it is simply too early.

The bootstrap docs in Exeris describe the same thing from the subsystem side: L0 remains foundational, higher layers can move only after the substrate is ready, and shutdown keeps that structure in reverse.


The split that actually mattered

The design choice that mattered most was not using StructuredTaskScope.

It was deciding where not to use it.

At first, the obvious temptation was to parallelize more of bootstrap. If the JVM gives you virtual threads and structured concurrency, it is very easy to start looking for places to apply them.

I ended up doing less than that.

I kept initialize() sequential and topological.
I only allowed structured parallelism in start().

That was not ideological. It was practical.

Initialization is where the orchestrator builds structure:

  • provider bindings
  • health registration
  • active subsystem ordering
  • dependency-safe lifecycle state
  • bootstrap telemetry hooks

That phase wants determinism more than it wants speed. I did not want graph construction, provider composition, and lifecycle execution to collapse into one concurrent blur.

Startup is different. Once the graph is already resolved and the active set is known, concurrency becomes useful — but only if it stays inside the same lifecycle boundaries the graph already established.

That led to a much simpler rule:

initialization stays ordered, startup may become parallel.

This was the point where the old model stopped making sense for me. I was no longer trying to make bootstrap faster in the abstract. I was trying to keep lifecycle ownership readable.

for (BootstrapPhase phase : BootstrapPhase.values()) {
    List<Subsystem> forPhase = orderedSubsystems.stream()
            .filter(s -> s.phase() == phase)
            .toList();
    if (forPhase.isEmpty()) {
        continue;
    }

    if (phase == BootstrapPhase.FOUNDATION) {
        startSequential(forPhase, phase, profileName, startedNames);
    } else {
        startParallel(forPhase, phase, profileName, startedNames);
    }
}
Enter fullscreen mode Exit fullscreen mode

In practice, FOUNDATION stays sequential on purpose. That includes the parts of bootstrap that decide whether the rest of the kernel can even be interpreted correctly: configuration roots, base runtime substrate, exception boundaries, and core providers.

I could have parallelized more of that. I did not.

The trade-off is deliberate:

  • I give up some startup parallelism early
  • in exchange for a cleaner substrate
  • and less ambiguity when the higher layers begin to move

This is not universal. If your startup graph is shallow and your layers are genuinely independent, you can be more aggressive. In my case, the real cost of a bad foundation was not a few extra milliseconds. It was a fuzzier lifecycle model and harder-to-classify failures later.


ScopedValue still mattered at the boundary

This article is about StructuredTaskScope, but I ended up reusing the same lesson from the previous piece: context propagation only stays clean if the boundary is explicit.

In Exeris, bootstrap resolves configuration once, then binds it at the kernel boundary before the rest of the lifecycle begins. Everything spawned under that boundary inherits the same immutable context.

try {
    ScopedValue.where(KernelProviders.CURRENT_CONFIG, config)
            .call(() -> {
                runBootInsideScope(orchestrator, config, configRegistry, configWatcher, kernelMain);
                return null;
            });
} catch (SubsystemCircularDependencyException ex) {
    throw ex;
} catch (SubsystemOrchestrator.BootstrapException ex) {
    throw new BootstrapException("Subsystem bootstrap failed: " + ex.getMessage(), ex);
}
Enter fullscreen mode Exit fullscreen mode

That choice mattered more than another layer of constructor wiring would have.

I did not want every subsystem, handler, or virtual thread to receive config through argument threading just because bootstrap needed lifecycle scope. I also did not want to fall back to ThreadLocal and reintroduce the same inheritance and mutability problems I had already rejected elsewhere.

So the boundary stayed strict:

  • config is resolved once
  • bound once
  • inherited downward
  • and torn down when boot exits

That kept the lifecycle model cleaner. It also meant that when I later opened structured startup rounds, they inherited the same immutable runtime context without extra ceremony.


The useful part was not STS itself

The useful part was computing a safe round before opening a scope.

I do not want to smooth this into a generic explanation, because it is really the center of the design.

The orchestrator does not just fork all pending subsystems for a phase and wait.

It first computes which subsystems are actually safe to start now.

Set<String> pendingNames = pending.stream()
        .map(Subsystem::name)
        .collect(java.util.stream.Collectors.toCollection(LinkedHashSet::new));

List<Subsystem> ready = pending.stream()
        .filter(subsystem -> dependenciesReadyForRound(subsystem, pendingNames, startedNames))
        .toList();

if (ready.isEmpty()) {
    throw new BootstrapException(
            "Phase " + phase + " cannot make progress: unresolved dependencies among pending subsystems "
            + pendingNames);
}
Enter fullscreen mode Exit fullscreen mode

That changed the role of StructuredTaskScope completely.

It was no longer responsible for discovering order.
It was responsible for executing one dependency-safe round inside an order the orchestrator had already made explicit.

That is why I keep saying this is a graph problem first and a concurrency problem second.

Figure 2: Dependency-safe startup round. StructuredTaskScope is opened only after the orchestrator computes a ready set from the graph.

Figure 2: Dependency-safe startup round. The orchestrator computes eligibility first, then gives StructuredTaskScope a bounded unit of work to own.

This was the point where the old "just launch it and coordinate later" model stopped making sense. I did not want startup order to become an emergent property of timing, future composition, or whichever task completed first.

I wanted concurrency to appear after dependency eligibility had already been established.


This is where StructuredTaskScope actually earned its place

Once the ready set exists, the role of StructuredTaskScope becomes very narrow and very clean.

It owns one startup round.

That is it.

try (var scope = StructuredTaskScope.open()) {
    List<StructuredTaskScope.Subtask<Object>> tasks = ready.stream()
            .<StructuredTaskScope.Subtask<Object>>map(
                    subsystem -> scope.fork(() -> {
                        doStart(subsystem, phase, profile);
                        return null;
                    }))
            .toList();

    scope.join();

    List<Throwable> failures = tasks.stream()
            .filter(task -> task.state() == StructuredTaskScope.Subtask.State.FAILED)
            .map(StructuredTaskScope.Subtask::exception)
            .toList();

    if (!failures.isEmpty()) {
        Throwable first = failures.getFirst();
        throw new BootstrapException(
                failures.size() + " subsystem(s) failed in phase " + phase
                + ". First failure: " + first.getMessage(), first);
    }
}
Enter fullscreen mode Exit fullscreen mode

This is the part I actually like.

Not because it is clever. Mostly because it is boring in the right way.

The round has:

  • an owner
  • explicit lifetime
  • explicit completion
  • explicit failure collection

No task belongs to some vague executor that outlives the lifecycle moment that created it. No background startup work escapes into "maybe still running" territory. The concurrency boundary finally matches the lifecycle boundary.

That was the point.

And that is also why I think StructuredTaskScope is more interesting here than in the usual "fetch two things in parallel" examples. Those examples prove the API works. This kind of orchestrator is where it starts to fit the shape of the system.


I could have done this with futures. I did not want to.

There is nothing impossible about building this with:

  • ExecutorService
  • CompletableFuture
  • latches
  • custom worker tracking
  • hand-rolled failure aggregation

If the goal was just "run multiple startup actions in parallel," all of those would work.

But the real problem was never just parallelism.

What I actually cared about was:

  • who owns this work
  • when exactly this round ends
  • what belongs to this phase and what does not
  • how failure is surfaced
  • how shutdown reasoning stays clean

Bootstrap is one of the worst places to tolerate vague concurrency ownership. When startup fails, I do not want to guess whether some task is still alive in the background or whether a future chain has already detached from the lifecycle moment that spawned it.

That is the real difference here.

The point is not that StructuredTaskScope can run tasks. The point is that it gives this round a proper boundary.

I would still use more conventional concurrency tools when the lifecycle is shallower, the ownership model is already loose, or the surrounding architecture does not benefit from such a strict boundary. This is not a universal replacement story.


The failure policy mattered at least as much as the concurrency primitive

I do not think this model would feel coherent without an explicit failure policy.

Exeris supports both:

  • FAIL_FAST
  • DEGRADE

But not symmetrically.

Foundational subsystems are still mandatory. They do not get to degrade just because higher layers can. That boundary matters.

Inside the orchestrator, that asymmetry is explicit. Optional subsystems may be removed under DEGRADE, but a mandatory failure still aborts boot.

boolean isMandatory =
        (subsystem.phase() == BootstrapPhase.FOUNDATION) || !subsystem.isOptional();

if (failurePolicy == FailurePolicy.DEGRADE && !isMandatory) {
    removeSubsystemAndTransitiveDependents(subsystem.name());
} else {
    healthMonitor.markKernelState(KernelHealthMonitor.KernelState.FAILED);
    throw new BootstrapException(
            "Subsystem '" + subsystem.name() + "' failed: "
            + failure.getMessage(), failure);
}
Enter fullscreen mode Exit fullscreen mode

I kept that asymmetry because not all failures mean the same thing. An optional higher-level capability failing to start can be survivable. A foundation-layer failure usually means the system no longer has a sane substrate to run on.

This was another place where I resisted smoothing the model into something more uniform. Uniformity would have looked cleaner on paper, but it would have made the lifecycle semantics less truthful.

So the useful question was never:

can these tasks run in parallel?

It was:

what does it mean for the kernel if this one fails right now?

That question forces the architecture to stay honest.


Startup only makes sense if shutdown keeps the same shape

One thing I did not want to lose was lifecycle symmetry.

A graph-shaped startup model should not collapse into an improvised shutdown path. If startup order is derived from dependency structure, shutdown should preserve that structure in reverse.

In practice, that means I care about reverse topological shutdown just as much as startup rounds.

List<Subsystem> reversed = new ArrayList<>(orderedSubsystems);
Collections.reverse(reversed);

for (Subsystem subsystem : reversed) {
    if (subsystem.isRunning()) {
        subsystem.stop();
    }
}
Enter fullscreen mode Exit fullscreen mode

That sounds obvious, but it matters more once concurrency enters the picture. Structured startup is easier to trust when the rest of the lifecycle still behaves like one coherent model instead of a collection of unrelated hooks.

Figure 3: Lifecycle symmetry in Exeris. Startup order and shutdown order are two sides of the same dependency model.

Figure 3: Lifecycle symmetry in Exeris. Startup derives capability from dependency order; shutdown preserves that order in reverse so the lifecycle remains one coherent model.

I would not make reverse shutdown the headline of the article, but I would not treat it as a footnote either. It is part of the same argument: structured concurrency helps most when the surrounding lifecycle is already structured.


What I measured, and what I did not claim

This article is architectural first, but I do not want to leave it floating at the level of "this feels cleaner."

The evidence I care about here is not generic throughput. It is lifecycle evidence.

For this model, the useful signals are things like:

  • sequential startup vs phase-grouped structured startup
  • total cold boot duration until boot-ready
  • per-phase startup timing
  • round timing in parallel phases
  • repeated cold-start variance
  • degraded boot timing when optional subsystems are removed
  • jdk.VirtualThreadPinned events during startup
  • final active subsystem count recorded at boot-ready

That is why I treat JFR as part of the architecture here, not just as a performance tool. If the startup model is real, it should leave a readable lifecycle trace behind it.

The bootstrap documentation in Exeris already treats startup telemetry as part of the contract: boot-ready, shutdown completion, dependency-cycle detection, and lifecycle state are all first-class signals rather than incidental logs.

I also have some early exploratory startup measurements, although I am deliberately treating them as supporting evidence rather than a headline claim.

Metric Exeris community h1 Quarkus JVM VT tuned
Startup → health-ready 1132 ms 2182 ms
Startup → first request 1205 ms 2432 ms
Health-ready → first request 73 ms 250 ms

These measurements were taken on dev hardware and remain sensitive to local runtime conditions, including machine state and GUI / no-GUI setup. So I am not using them to claim broad startup superiority yet.

What they do show already is narrower, but still useful: this bootstrap model is measurable in operational terms. It is not just architecturally cleaner on paper.

The smaller health-ready → first-request gap is especially interesting to me, because it suggests the lifecycle boundary is not only short on paper but also closer to usable work.

I have also validated the same runtime under a constrained exploratory profile with zero request errors, but that belongs to a different discussion than this article. The point here is narrower: the lifecycle model is observable and survives contact with measurement.

I am also deliberately keeping the claim scope narrow. Early measurements are useful, but they are still sensitive to:

  • classloading state
  • JIT state
  • machine noise
  • native load conditions
  • startup environment shape

So I would rather say:

this model is measurable and operationally inspectable

than jump too quickly to:

this model is definitively faster.

That comes later, if the data actually holds.


What this does not solve

This model does not fix bad subsystem boundaries.

It does not fix circular graphs.
It does not fix startup work that should not exist in the first place.
It does not mean every subsystem should suddenly become parallel.
And it definitely does not generalize to every runtime architecture.

I would still use more conventional patterns when:

  • the graph is shallow
  • the lifecycle is simpler
  • subsystem ownership is fuzzy by design
  • plugin-style openness matters more than deterministic startup shape

This is not a universal recipe. It applies when the execution path is still under my control and the lifecycle itself is part of the architecture.

That boundary matters.


What I kept, what I dropped

I think this is the part that usually gets lost when articles get too polished.

I did not end up with a universal "use STS for bootstrap" rule.

What I kept:

  • topological lifecycle order
  • deterministic initialization
  • explicit phase boundaries
  • explicit failure policy
  • reverse shutdown symmetry

What I dropped:

  • the idea that initialize() and start() should be the same phase
  • the idea that startup parallelism should be maximal
  • the idea that concurrency should appear before dependency safety is known

I also considered using StructuredTaskScope in other places where it looked fashionable on paper. In at least one case, it simply did not buy me anything meaningful, so I left it out.

That contrast was useful. It made the bootstrap use case clearer. StructuredTaskScope was not valuable because it was new. It was valuable because this part of the system already had a natural owner, a natural boundary, and a natural failure model.


Conclusion

A few things became clearer to me while building this:

  1. Bootstrap became a graph problem before it became a concurrency problem.
    That changed which part of the design actually needed structure.

  2. StructuredTaskScope helped only after order was already explicit.
    The useful move was not "fork everything" but "compute a safe round, then run it inside a bounded scope."

  3. The trade-off is intentional.
    I kept initialize() and FOUNDATION sequential on purpose. I gave up some parallelism to keep lifecycle ownership and failure semantics easier to reason about.

What this unlocks next in Exeris is not just cleaner startup code. It gives me a more inspectable lifecycle model for later work around health, telemetry, subsystem isolation, and eventually more demanding cold-start contracts.

If you want to see what this looks like outside a toy example, the bootstrap and lifecycle code is in the Exeris Kernel repository.


Explore the Exeris Kernel — zero-allocation architecture in running code:
🔗 exeris-systems/exeris-kernel

Top comments (0)