Arkadiusz Przychocki

Posted on Jun 17 • Originally published at blog.arkstack.dev

What I Knew, and What I Built

#java #architecture #spi #jvm

This is the eighth article in the Exeris Kernel series. The first
seven covered decisions I'm confident about:
context propagation,
where runtime polymorphism stops paying for itself,
where StructuredTaskScope earns its keep,
off-heap TLS,
the Flow engine,
compensation correctness,
and the LoanedBuffer pattern.
This one is different. It's about two contracts I had to retrofit
after the initial design closed, and the one thing they have in
common that I didn't see until both had already happened.

I had one architectural vision for Exeris Kernel and two
oversights in executing it. They look unrelated from the outside —
one is in the HTTP client, the other is in the security
subsystem — but they have the same shape. In both cases I built
a single concrete instance of something the architecture needed
as a pluggable seam. And in both cases the gap stayed invisible
until I read the consuming code end to end.

That last part is the whole point of this article, so I'll state
it up front: I did not catch either of these in a design review.
I caught them by being the consumer of my own kernel. The reason
that matters — and the reason this isn't just a confession — is
that most people who design a contract never occupy the position
that reveals its gaps. They ship the contract, someone else
consumes it two layers downstream, and the gap surfaces as a
production incident. I caught mine early and on my own only
because I don't only work on the kernel. I also build the tooling
that generates code against it, and the product that runs on top
of that. I sit on both sides of the boundary I designed. And
"early" is literal: I found the codec gap in local testing,
reading both the generator in exeris-tooling and the code it
emits. The tooling that emits this code isn't release-tagged yet —
generated output is regenerated on demand, with nothing published
for a downstream app to pin. So the version of the generator that
hard-wired the concrete codec never had a consumer depending on it;
by the time the rewrite that consumes the SPI landed, the contract
it consumes was already in a released kernel. That's not luck. I was
standing at the consumer edge while the contract still had zero
dependents — the cheapest possible moment to find a gap like this.

There's a third, smaller version of the same mistake nested
inside the fix for the first one. I'll get to it — it's the part
I find most useful, because it shows the blind spot isn't a
one-time lapse you can resolve by being more careful. It recurs,
even while you're actively fixing an instance of it.

The HttpClient I Built and Couldn't Use

A note on version numbers first, because there are two tracks and
conflating them is easy. The numbered versions in this article are
kernel releases — 0.5.0 and 0.8.0 below, 0.8.1 current, all tagged
on GitHub. The tooling that consumes these seams is a separate
track: implemented through 0.4, sitting at 0.5.0-SNAPSHOT,
deliberately not release-tagged yet. So when I say a fix "shipped,"
I mean it's in the released kernel. When I say a generator rewrite
is "sequenced," I mean the tooling track, which hasn't tagged a
release on purpose — more on why at the end.

The break point was concrete. I was building the SDK/tooling
layer — the code generator in exeris-tooling that emits typed
per-entity REST clients (WidgetClient.findById(id),
WidgetClient.create(widget)) against whatever kernel artifact
the application's POM selects. The generator's whole job is to be
tier-neutral: it emits code that compiles against the Community
tier (HTTP/1.1 + HTTP/2) or the Enterprise tier (HTTP/3 + Panama
JSON, when it ships) without knowing or caring which.

It couldn't. The HTTP client facade I'd shipped — then called
CommunityWebClient — hard-wired Jackson 3 as the JSON codec.
Body marshalling lived inside the facade: mapper.writeValueAsBytes
on the way out, mapper.readValue on the way in. The constructor
took an ObjectMapper directly. Two consequences fell out of that,
and I'd internalized neither until the generator forced them into
view.

The first was that the codec was not swappable. If a consumer
wanted Protobuf, CBOR, or a future Panama-native JSON binding,
there was no seam to plug into — the format was welded into the
facade. That was a deliberate decision when I made it. Jackson
was already the codec the rest of the kernel linked; exposing
ObjectMapper on the constructor was the path of least resistance
when I built the facade (ADR-026), and I told myself the kernel had
no concrete need for a second codec yet. Locally rational. The problem is that "no
concrete need yet" was a statement about the kernel in isolation,
and the consumer that needed the seam — the tier-neutral generator —
didn't exist yet when I made the call.

The second consequence was worse, because it wasn't even a
decision I remembered making. The facade was named
CommunityWebClient, and the generator emitted that name into
every client file it produced:

public final class WidgetClient {
    private final CommunityWebClient client;          // tier identity, in user code
    public WidgetClient(CommunityWebClient client) { ... }
    public Optional<Widget> findById(UUID id) {
        try { return Optional.ofNullable(client.get(...)); }
        catch (CommunityWebClient.WebClientException ex) { ... }  // tier identity again
    }
}

The tier name leaked into every application class the tooling
generated. This is the build-time analogue of a rule I'd already
written down and enforced everywhere else in the kernel — The
Wall (ADR-006): the kernel boundary visible to applications must
be implementation-blind. I had enforced it rigorously at runtime.
I had not noticed it applied at build time too, to the symbols
that show up in generated source.

The pattern I'd already built — once

Here's the part that made this sting. The server side already had
the SPI I was missing on the client side. HttpResponseBodyEncoder
had been a tier-neutral SPI seam since 0.5.0 (the ADR-009 era):
an encoder contract, an encoding-context record, a registry with
an empty() factory, a reusable off-heap HttpEncodedBody
carrier, and a Community-side Jackson driver implementing it. I
had designed exactly the right shape — encoder + context +
registry, with the concrete codec as a swappable driver — and
then built it in precisely one place and not noticed that the same
shape was needed in three others.

The body-codec design space is a 2×2 matrix: {request, response} × {encode, decode}. One cell had a proper SPI from the start —
server-side response-encode, shipped at 0.5.0 (ADR-009). The other
three were hard-wired to Jackson, welded in at different times as
each surface got built: client-side request-encode and
response-decode went in when I later built the client facade
(CommunityWebClient, ADR-026), and server-side request-decode
lived in the generated request handlers. I wasn't missing the
concept of a codec SPI. I'd authored it — before the facade that
needed it even existed — and then didn't reuse it when I built that
facade, because at the time the one cell I'd filled was the only
one with a consumer staring at it.

	encode	decode
request	client-side — ADR-034 (new)	server-side — ADR-036 (overlooked)
response	server-side — ADR-009 (0.5.0)	client-side — ADR-034 (new)

The fix (ADR-034)

ADR-034 closed the client-side half. Six new SPI types in
eu.exeris.kernel.spi.http, deliberately mirroring the existing
server-side encoder triplet for grep symmetry:
HttpRequestBodyEncoder, HttpRequestEncodingContext,
HttpRequestBodyEncoderRegistry on the encode side;
HttpResponseBodyDecoder, HttpResponseDecodingContext,
HttpResponseBodyDecoderRegistry on the decode side. The Jackson
binding became two Community drivers (CommunityJsonRequestBodyEncoder,
CommunityJsonResponseBodyDecoder) behind those contracts. The
HttpEncodedBody carrier from 0.5.0 was reused rather than
duplicated.

The cost is real and worth naming: every body now crosses a
registry resolution and a driver indirection that in the original
facade was a direct mapper.writeValueAsBytes call. For the single-codec case
that's overhead I'm paying to keep the seam open — a deliberate
trade, not a free win.

The facade itself moved out of the Community tier into Core and
lost the tier name: CommunityWebClient became KernelWebClient
in eu.exeris.kernel.core.http.client. Its constructor lost the
ObjectMapper parameter — Jackson descended to the codec driver,
and Core stopped having any opinion about JSON at all. The
generator now emits KernelWebClient into application code; no
tier identity surfaces. ADR-034 superseded ADR-026, which had
been the facade's previous home and had treated the tier-name
leak as a string-substitution problem to fix in lockstep rather
than the structural symptom it actually was.

I want to be precise about what was a decision and what was an
oversight here, because they're not the same and conflating them
would be too kind to myself. Hard-wiring Jackson into the client
facade was a
decision whose limits I underestimated — defensible when made,
wrong once the consumer arrived. The tier-name leak was not a
decision at all. It was something I'd have caught instantly if I'd
read the generator's output as application code instead of as my
own code. The difference between those two is the difference
between miscalibrated judgment and not looking — and only one of
them is fixable by thinking harder at design time.

The oversight inside the fix (ADR-036)

This is the part I find most useful, and it's why I'm writing the
article at all.

ADR-034 completed three of the four matrix quadrants:
server-response-encode (already there since 0.5.0),
client-request-encode (new), client-response-decode (new). It left
the fourth — server-side request-body decode, the wire body that
becomes a typed handler argument like Widget for POST /widgets.
That quadrant had no SPI seam at all. It was hard-wired Jackson
inside generated code: the tooling's KernelHandlerGenerator
emitted a static Jackson MAPPER field and a JacksonException
import into every controller it produced. The exact same
build-time Wall breach I'd just fixed on the client side, sitting
untouched on the server side, in a different generator.

It surfaced the exact same way the original problem did — by
reading the consuming code end to end. The first set of gaps came
out when I re-read KernelClientGenerator. This one came out when
I re-read KernelHandlerGenerator. Same act, different file, same
class of finding.

I want to separate this cleanly from a decision I did make
deliberately in ADR-034, because the honest version of this story
depends on the distinction. ADR-034 explicitly deferred one thing:
unifying the server-side encode path — refactoring the working
0.5.0 JsonBodyEncoder so it subscribes a content-type the way
the new client decoder does. That was a choice, recorded as a
v1.0 cleanup, because it's a refactor of a seam that already
works. The server-side request-decode quadrant was a different
thing entirely: not a seam I chose to leave alone, but a seam that
never existed, hidden inside generated code I wasn't looking at.
One was deferred. The other was overlooked. ADR-036 closed the
overlooked one, mirroring the response-decoder triplet verbatim
into HttpRequestBodyDecoder / HttpRequestDecodingContext /
HttpRequestBodyDecoderRegistry. Both ADRs shipped in kernel
0.8.0 — the four-quadrant codec SPI is in the released kernel
today. The tooling generator that consumes the new server-decode
seam has since been rewritten to resolve through the registry
instead of baking in a concrete codec symbol; where that leaves the
tooling release, I come back to at the end.

This is the part that matters: I was actively working on exactly
this problem — codec seams, build-time Wall hygiene, the matrix —
and still left one cell hard-wired, because the cell lived in a
consumer I hadn't re-read yet. Not carelessness. I was looking at
three of the four places the pattern lived; the fourth was
somewhere I hadn't pointed my attention. The closing section is
about why that's structural rather than a discipline failure.

The Validator I Built and the Seam I Didn't

The second oversight is in the security subsystem, and it's the
one that taught me the most — because the first time I described
it to myself, I got it wrong. My initial framing was "I forgot to
build identity provider support." That's not what happened, and
the real version is more useful.

I did not forget about identity. The kernel has had token
validation since 0.5.0. SecurityProvider.authenticate(LoanedBuffer rawToken) returns an AuthenticationResult carrying a
PrincipalContext (principal id, tenant id, roles, scopes) and a
StorageContext (the multi-tenant routing decision). That seam is
consumed at the HTTP edge today. Behind it sits
CommunityJwksValidator — a Nimbus JOSE pipeline that does the
real work: kid → key → signature → issuer → audience → expiry,
fail-closed at every step per ADR-012. Identity validation isn't
missing. It works, and it's been working.

What I built was a single, static, RSA-only validator fused into
one SecurityProvider implementation. The key set is an immutable
Map<kid, RSAPublicKey> injected at construction — no rotation,
no JWKS fetch, no EC, and crucially, no way to host a second
identity provider without subclassing or duplicating the
cross-cutting logic that sits around validation. The validation
pipeline is already about eighty percent of an OIDC resource-server
validator. I just welded that eighty percent into one concrete
class instead of exposing it as a contract.

I didn't notice this was the same mistake as the HttpClient codec
until I'd already written the validator down the same way. Different
subsystem, identical shape: there I built one codec quadrant and
left the rest concrete; here I built one validator and left no seam
for a second. In both cases the thing I
shipped works perfectly for the single case it handles. In both
cases the architecture needed a pluggable contract, and I gave it
a concrete instance.

Where it surfaced

The HttpClient gap surfaced when I read the tooling generators.
This one surfaced when I tried to put a real product on top of the
kernel.

A B2B SaaS deployment doesn't authenticate against one identity
provider. It fronts employees through one (say Okta), B2C users
through another (Auth0), and service-to-service traffic through a
third (something internal). Dispatch happens by inspecting the
token's issuer before validation. My single-SecurityProvider
architecture had no room for that. The only way to bolt on a
second provider was an application-level composite that would have
to re-implement the fail-closed, deterministic-deny invariants
that ADR-012 deliberately keeps inside the kernel. In other words:
to use the kernel for the thing I built the kernel for, an
application would have to reach around the kernel and re-implement
its most safety-critical contract. That's the gap. It's not
missing validation. It's a missing seam, and the seam is exactly
the part a real consumer needs first.

The consumer that revealed it was BudgetHQ — the product I'm
building on the kernel. BudgetHQ is B2C and B2B at once: individual
users on one side, business workspaces on the other. The B2C path
authenticates fine against a single provider, which is exactly why
the gap stayed hidden — the single-provider validator handles the
common case without complaint. It surfaced the moment I started
planning multi-provider authentication for business workspaces: a
workspace that federates its own employees through its own identity
provider while B2C users keep authenticating through the default
one. The path from Authorization: Bearer … to a populated
PrincipalContext was, for that multi-provider case, simply absent
as a kernel-supported path. The roadmap names it without hedging:
the single largest "ship a B2B SaaS" blocker for the ecosystem. I
only know that because I'm also the one trying to ship the business
workspace.

The decision (RFC, not yet code)

I'll be honest about the status of this one, because it differs
from the HttpClient case and the difference matters. The HttpClient
oversight is fixed and released — ADR-034 and ADR-036 shipped in
kernel 0.8.0, and the seam is in the kernel running today. The
tooling generator that consumes it has since been rewritten to
match; the only piece still in flight is the tooling release that
carries that rewrite, on a track that isn't tagged yet. The IDP
oversight is at a different stage entirely: diagnosed and decided,
but not yet built. The decision went through an RFC rather than
straight to an ADR, because unlike the codec fix it carried
genuinely open strategic choices — and I want to show that
distinction rather than paper over it.

The decision: introduce a dedicated IdentityProvider SPI that
SecurityProvider delegates to. SecurityProvider stays the entry
point but becomes a thin dispatcher — it selects one
IdentityProvider by a cheap issuer/format peek, delegates
validation, then applies the ADR-012 cross-cutting concerns
(isolation-claim → StorageContext, fail-closed semantics)
uniformly to whatever provider was dispatched. Validation becomes
per-provider; orchestration stays central. The first reference
driver is OIDC + JWKS, because the existing CommunityJwksValidator
is already most of the way there — extraction is mostly refactor,
which is the lowest-risk way to ship the first driver. The registry
reads the same way as the codec registries from ADR-034
(of(List<…>), priority-ordered) — deliberately, so that an
identity provider that didn't support pluggable variants would be
the odd one out, not the norm.

That dispatch isn't free either: every request now pays a
pre-validation peek to select a provider before any crypto runs.
Cheap next to signature verification, but it's a second step on
the auth edge the single-validator design didn't have.

There's a sharp failure mode hiding in that design — the kind of
thing that turns a security seam into a security hole. If the dispatcher retried the next provider
after a selected provider's validation failed, a token matching
issuer A but failing A's signature could be "rescued" by provider
B — a federation fail-open. The contract closes it normatively:
selection precedes validation and is separate from it; once a
provider is selected, its failure is terminal, with no fall-through.
That invariant is a mandatory TCK assertion, not a code comment.
The seam I didn't build the first time turns out to have a
correctness property I wouldn't have specified at all if I hadn't
been forced to design the pluggable version.

That's what catching the gap at the seam level instead of in
production buys you: the production version of this lesson is
a fail-open incident, discovered by someone who isn't you, in a
system you can no longer cheaply change. The design-time version is
a TCK assertion in an RFC. Same lesson, vastly different cost.

Identity propagation outbound — how a parsed identity travels to
the next service in a call chain — is a related question with its
own seam (ADR-032's HttpClientRequestEnricher, which propagates
parsed X-Tenant-Id / X-Principal-Id headers rather than
forwarding raw bearer tokens). That part already exists and is
deliberately kept narrow: the kernel holds the parsed identity, not
the raw credential, so it can't leak a token it never retained.
The inbound seam — the IdentityProvider SPI — is the one the RFC
decides and v0.10 implements.

Two Oversights, One Blind Spot

From the outside the two look like different categories of mistake.
One is a codec hard-wired into a facade; the other is a validator
welded into a single provider. Different subsystems, different
consequences, different fixes. But mechanically they're the same
error, and they failed the same way.

The error is the same: I built a concrete instance where the
architecture needed a pluggable seam — Jackson welded into the
client, OIDC welded into one SecurityProvider. Both worked, for
exactly the one case that had a consumer staring at it when I wrote
the code; the shape I'd have needed for the second case was already
authored elsewhere (the server-side encoder SPI) or eighty percent
present (the JWKS validator), so this was never ignorance of the
right design. The failure mode is the same too: both gaps were
invisible from inside the kernel and obvious from the consumer's
side. The defect was never in the code I was reading — it was in
the relationship between that code and a consumer a layer or two
out, and you cannot see a relationship by staring harder at one end
of it.

The thing that actually caught both gaps was occupying the
consumer's position myself. I found the codec gap because I write
the tooling that generates code against the kernel. I found the
validator gap because I'm building a product that runs on the
kernel. In both cases I crossed from the author's side of the
boundary to the consumer's side, read the contract from there, and
the gap was immediately obvious. Not because I got smarter on the
walk across — because the gap is only visible from that side.

That generalizes past me, and not in a flattering direction. On
most teams the person who designs a contract is structurally
prevented from occupying its consumer position. They ship the
contract; a different team consumes it two layers downstream; the
gap surfaces months later as a production incident that nobody can
cheaply fix because the contract has hardened and acquired
dependents. The firefight is not a sign anyone was careless. It's
the default outcome of a division of labour where the designer
never stands where the gap is visible. I caught mine at the design
seam instead of in production for one structural reason, and it has
nothing to do with discipline: I happen to sit on both sides of the
boundary I designed.

I don't think the takeaway is "always be your own consumer" —
that's often not possible, and stated that broadly it's the kind of
advice that's true and useless. The narrow, actionable version is
this: a contract's gaps live at its consumer edge, so someone has
to read the contract from the consumer's side before it hardens.
If that can't be the designer, it has to be a real consumer brought
in early — not a reviewer evaluating the contract on its own terms,
but someone forced to actually build against it. The cheapest
moment to find these gaps is before the contract has dependents.
Once it has them, the same fix costs a coordinated SPI change plus
a migration of every consumer that hardened against the old shape —
which is the state most teams discover the gap in.

The recursion in the HttpClient section is the proof that this
isn't a willpower problem. I left the fourth codec quadrant hard-wired
while actively fixing the other three. I was as alert to this
exact failure mode as I will ever be, and it still got past me,
because the fourth quadrant lived in a consumer I hadn't re-read
yet. You don't beat that by trying harder. You beat it by
arranging to read every consumer end to end before the contract
ships — or by accepting that the ones you don't read are where the
next gap is.

What This Doesn't Solve

The heuristic — read the consumer's code before the contract
hardens — has a boundary I want to be honest about.

It assumes a consumer exists to read. When you're designing a
contract genuinely ahead of any consumer, there's nothing to read
from the other side yet, and "be your own consumer" degenerates
into guessing. The method also isn't free: pulling a real consumer
in early, before the contract stabilizes, slows the design down and
couples two moving targets. I absorbed that cost cheaply because the
consumer and the contract are both mine. A team paying it
deliberately is making a real trade, not collecting a free win.

And it doesn't close the identity story. The IdentityProvider
seam is inbound only — token to PrincipalContext. Outbound
service-to-service identity past parsed-identity headers (token
exchange, on-behalf-of, client-credentials) is reserved in the RFC
as a future OutboundCredentialProvider seam, not built. A
zero-trust mesh that needs the kernel to mint a downstream
credential is exactly the case this design names and then defers.

What's Next

Concretely, on the kernel:

The HttpClient codec SPI is shipped and released. ADR-034
(client-side request encode + response decode, KernelWebClient
facade) and ADR-036 (server-side request decode, completing the
2×2 matrix) both landed in kernel 0.8.0; the four-quadrant seam is
in the kernel running today, 0.8.1. The fix wasn't housekeeping —
it was a precondition. The tier-neutral generator couldn't be
written while the codec was welded into the facade, so the seams
the kernel shipped in 0.8.0 are exactly what unblocked the generator
that needed them. Fixing the contract in the released kernel cleared
the next layer up. I didn't fix the codec in a vacuum — I fixed it
because the thing I was building next required it.

And the next layer followed. The generators that consume those
seams — KernelHandlerGenerator, KernelClientGenerator — have
since been rewritten to use them: the handler resolves the request
body through the decoder registry instead of an inlined Jackson
call, and the client emits the tier-neutral KernelWebClient,
which also closes the original tier-name leak this article opened
with. That rewrite is merged on the tooling's main line. What's
still pending is narrower than the rewrite — it's the tooling
release itself: the tooling sits at 0.5.0-SNAPSHOT with no tag
cut, a deliberate hold whose next tag waits on the Capabilities/SKU
milestone, the same way the kernel tagged 0.8.0 only once its seams
were ready. The loop is closed in source — kernel seam released,
consuming generators rewritten — and the only thing still sequenced
is the release that ships it.

The IdentityProvider SPI is decided and reserved — the direction
is locked by RFC (dedicated SPI, SecurityProvider as dispatcher,
OIDC-first reference driver, fail-closed terminal selection). The
ADR that locks the detail and the implementation both land in
v0.10. A load-bearing dependency lands first: JWKS key rotation
with an overlap window, in v0.9, which the OIDC driver then
consumes.

Neither fix is the interesting part. The interesting part is the
question I now run against every contract before I call its design
closed: who consumes this, and have I read their code — not my
code — end to end? It's a weak question. It doesn't sound like
architecture. But it catches the specific class of gap that strong
architectural thinking is structurally blind to, and after two
instances of the same blind spot I trust the weak question more
than I trust my own confidence that a design is complete.

The codec SPI discussed here lives in exeris-kernel-spi under eu.exeris.kernel.spi.http; the KernelWebClient facade is in exeris-kernel-core. The security seam (SecurityProvider, PrincipalContext, StorageContext) is in exeris-kernel-spi under eu.exeris.kernel.spi.security. The decision records — ADR-034, ADR-036, ADR-032, and the IdentityProvider RFC — are in the kernel docs tree:
🔗 exeris-systems/exeris-kernel

DEV Community