Sergiy Yevtushenko

Posted on Apr 2

We Should Write Java Code Differently: Frictionless Prod

#java #devops #architecture #distributed

We Should Write Java Code Differently: Frictionless Prod

It's not a secret that modern production deployment is extremely complex. Following "best practices" and deploying in Kubernetes "for scalability" makes things even more complex. But how complex exactly? Let's look at the numbers.

The Setup

A mid-sized e-commerce platform. Nothing exotic -- catalog, cart, checkout, orders, payments, shipping, inventory, pricing, promotions, notifications. Standard bounded contexts, standard domain decomposition.

In a microservices architecture, this translates to roughly 30 services. Not because someone wanted 30 -- because the domain naturally decomposes into ~10 core services, ~10 supporting services (integrations, async processing, admin), and ~10 platform services (gateway, auth, search, analytics, event processing).

30 is not a large number. It is a realistic baseline for a system that does what mid-sized e-commerce systems do.

What 30 Services Actually Cost

Each service needs to be built, deployed, configured, monitored, and kept alive -- independently. On managed Kubernetes, the standard deployment substrate for this scale, here is what the numbers look like.

Production environment:

12-18 shared platform components (ingress controller, cert manager, external DNS, metrics stack, logging pipeline, tracing collector, secrets integration, policy controller, autoscaler, GitOps controller, backup controller, image registry integration)
220-280 Kubernetes workload objects (deployments, services, config maps, secrets, service accounts, network policies, autoscalers, pod disruption budgets, ingress rules, worker/consumer deployments)
260-340 configuration sets (image tags, rollout strategies, replica counts, CPU/memory limits, probes, environment variables, feature flags, secret references, IAM permissions, network exposure rules, alerting configs -- per service)

Staging environment:

Topologically similar to production. You save on capacity, not on configuration complexity. 190-250 workload objects. 220-300 configuration sets.

Testing environment:

80-160 workload objects in a shared cluster with ephemeral namespaces. 100-180 configuration sets.

Across all three environments: 500-700 managed runtime objects and 580-820 configuration surfaces.

This is before counting databases, message brokers, CDN, object storage, and external SaaS integrations. The application itself -- the business logic that actually generates revenue -- is a small fraction of this surface.

The Team That Manages This

This infrastructure does not manage itself. For a 30-service system on managed Kubernetes:

Minimum viable: 3-5 platform/DevOps engineers
Typical realistic: 5-8 engineers
Comfortable/mature: 8-12 engineers

This is platform and SRE combined -- not including the feature development teams that write the actual business logic. These engineers manage pipelines, rollout policies, cluster security, network configuration, secrets, observability, capacity planning, incident response, and the endless stream of version upgrades across 30 independent deployment units.

The dominant complexity is not code. It is coordination: version coexistence (new service talking to old service), schema evolution (new code on old database), deployment ordering (which service goes first), and failure propagation (one bad deploy cascading through dependent services).

A mid-sized organization pays for 5-8 engineers whose entire job is keeping the deployment machinery running. Not building features. Not serving customers. Managing the gap between code and production.

Where This Complexity Comes From

This is not accidental. The complexity has three structural sources, each one a consequence of architectural decisions made so long ago that they feel like laws of nature. They are not.

The Monolith Turned Inside Out

A microservices system is a monolith turned inside out. Every internal interaction -- a method call, a shared data structure, a module boundary -- transforms into infrastructure configuration. What was a function call within a single process becomes a network call that needs discovery, routing, serialization, timeout handling, retry logic, and circuit breaking. What was an internal module boundary becomes a deployment boundary with its own pipeline, versioning, and rollout policy.

The problem is not microservices as a concept. The problem is that there are no predefined patterns or limits on how services interact. The infrastructure must be infinitely flexible to accommodate every possible communication topology. Infinite flexibility means every single interaction path must be configured -- sometimes multiple times in different places. A call from the order service to the inventory service touches ingress rules, network policies, service discovery, load balancing, timeout configuration, and retry policy. Each one configured separately. Each one a potential source of misconfiguration.

A 30-service system with 50-100 service-to-service interactions does not have 30 configuration problems. It has a combinatorial configuration problem that grows with the interaction graph, not with the service count.

This creates an inherent contradiction. The ideal service boundary is determined by the business domain -- by cohesion, coupling, and team ownership. But the configuration and operational cost of each additional service is so high that it becomes a technical factor in the decomposition decision. Teams merge services that should be separate to avoid operational overhead, or keep services together that should be split because nobody wants to set up another pipeline. The result is almost inevitably a suboptimal split -- service boundaries driven by infrastructure cost rather than domain structure.

And there is a human cost too. Every service boundary is a communication boundary. Two services owned by two teams means coordination meetings, API contracts, versioning negotiations, shared testing environments, deployment ordering discussions. Conway's Law works in reverse here -- the architecture forces organizational communication patterns that would not exist if the split were different. The more services, the more cross-team coordination. The more coordination, the slower the delivery. The very thing microservices promised to fix -- team independence -- is undermined by the infrastructure overhead of maintaining the boundaries between them.

The Substrate-Application Disconnect

Kubernetes runs containers. It starts a binary, monitors a health endpoint, restarts it if it fails. That is the extent of the relationship. Once the binary is running, it is on its own.

This creates a two-way blindness. The application cannot trust the environment -- it must assume that any network call can fail, any service can be unavailable, any response can be delayed. So the application implements its own retries, its own circuit breakers, its own service discovery, its own health reporting. All of this requires configuration.

The blindness goes the other direction too. The substrate knows nothing about the application. It does not know which services talk to each other, what constitutes a meaningful health check for this specific business logic, which services must be deployed before others, or whether a particular service's "healthy" status actually means it can process requests. The cluster is fault-tolerant at the container level, but the application gets no benefit from that -- it must build its own fault tolerance on top.

The result: the application carries infrastructure concerns that the runtime should handle, and the runtime cannot provide services that would require understanding the application. Both sides are doing extra work because neither side can see the other.

There is a particularly painful consequence of this disconnect: the application and its infrastructure share a lifecycle. Every microservice bundles its own web server, serialization library, HTTP client, connection pool, metrics agent, and retry framework. When any of these components needs a security patch -- a CVE in Netty, a vulnerability in Jackson, an update in the connection pool -- the business logic must be rebuilt, retested, repackaged, and redeployed. All 30 services. Zero changes to business logic. Pure infrastructure maintenance.

This is not a deployment -- it is a tax. A Netty CVE means 30 rebuilds, 30 pipeline runs, 30 test suites, 30 rollouts. Each one risks introducing regressions in business logic that nobody touched. The operational burden scales with the service count, and the trigger has nothing to do with the business. The application and the runtime are monolithically coupled inside every single service, even as the services themselves are distributed.

The Tool Multiplication Effect

Because the substrate and application are disconnected, the gap between them must be filled. Each gap spawns a tool:

Routing and mTLS between services? Service mesh.
TLS certificate lifecycle? Certificate manager.
Configuration across environments? Config service.
Database schema evolution? Migration tool.
Connection management? Connection pooler.
Metrics and tracing? Observability agents.
Deployment orchestration? GitOps controller.
Secret management? Vault or cloud secrets integration.

Each tool has its own configuration language, its own upgrade cycle, its own failure modes, and its own operational surface. Each tool solves a real problem -- but the problem only exists because the substrate and application cannot communicate.

A 30-service system on managed Kubernetes typically depends on 9-12 distinct operational tools beyond Kubernetes itself. Each one was added to solve a legitimate gap. Together, they are the gap. The complexity is not in any single tool -- it is in the interaction between all of them. A certificate renewal that breaks a service mesh sidecar that causes a health check failure that triggers a cascading restart -- this class of incident exists only because the tools operate independently, each with partial knowledge, none with the full picture.

What If the Gap Didn't Exist?

The three sources of complexity share a root cause: the application and the runtime are strangers. The runtime starts a binary and watches a health endpoint. The binary assumes a hostile environment and brings its own infrastructure. The gap between them fills with tools. The tools fill with configuration. The configuration fills with inconsistencies. The inconsistencies fill incident postmortems.

What changes if the runtime understands the application?

Unification: From Multiplication to Addition

In a Kubernetes-based system, configuration complexity is multiplicative. Each service multiplied by each environment multiplied by each configuration surface produces the 580-820 number we saw earlier. Every new service adds a full column of config. Every new environment multiplies the entire matrix.

This multiplication exists because nothing is shared. Each service configures its own database connection, its own TLS, its own retries, its own health checks, its own metrics, its own log format. Even when two services use the same database, they each carry their own connection configuration -- independently managed, independently misconfigured.

A unified runtime changes the math. When the runtime handles TLS, there is one TLS configuration -- not 30. When the runtime handles database connections, there is one database configuration per database -- not one per service. When the runtime handles metrics, there is one observability configuration -- not 30 agents with 30 scrape configs.

The dependency structure changes from a product of factors to a sum of components. 30 slices sharing 3 databases, 1 TLS configuration, and 1 observability setup produce roughly 35 configuration surfaces -- not 340. The complexity scales with the number of distinct resources, not with the number of services that use them.

Integration: The Cluster Becomes Service-Aware

When the application is embedded into the runtime -- not just started by it -- the two-way blindness disappears.

The runtime knows what slices exist, what resources they need, which slices communicate with each other, and what "healthy" means for each one. It knows because the application declared it: this slice needs a database, that slice publishes to a stream, this slice depends on that one. The declarations are not configuration files scattered across repositories. They are part of the application itself -- compiled in, type-checked, deployed as a single artifact.

From the application's perspective, the environment becomes trustworthy. Retries, circuit breaking, load balancing, service discovery -- these are not application concerns anymore. The runtime handles them because the runtime knows the topology. When slice A calls slice B, the runtime knows where B lives, which instances are healthy, how to route the request, and what to do if it fails. The application code is a method call. Everything between the call and the response is the runtime's responsibility.

This is not a framework providing libraries. It is a managed runtime providing services -- the way an operating system provides networking and storage to applications. The application does not implement TCP. It calls an API. The same principle, applied one level up.

The Communication Fabric

The deepest integration point is inter-slice communication. In a microservices system, service-to-service calls cross process boundaries, network boundaries, and trust boundaries. Each crossing adds latency, failure modes, and configuration.

When slices run inside the runtime, the communication fabric is built in. A call from one slice to another is a typed method invocation. The runtime resolves the target, handles serialization, routes the request -- potentially to the same node (microsecond latency, zero network overhead) or to a remote node (transparent, with automatic retry and failover). The slice developer writes inventory.check(items) and gets back a Promise<Availability>. The infrastructure between the call and the response is invisible.

This is not RPC dressed up as a method call. The type system guarantees that the caller and callee agree on the contract at compile time. There are no surprise serialization failures, no version mismatches discovered in production, no "the other service changed its API and nobody told us." The contract is a Java interface. The compiler enforces it.

The same fabric carries pub/sub, streaming, and scheduled task execution. All inter-slice communication -- synchronous, asynchronous, event-driven -- flows through the same runtime-managed infrastructure. One communication model, one set of guarantees, one thing to understand.

Resource Provisioning: Declare, Don't Configure

In the traditional model, the application includes its dependencies and configures them. A service that needs a database brings a connection pool library, configures the URL, credentials, pool size, timeouts, and retry behavior. A service that needs messaging brings a client library, configures the broker address, topics, serialization, and consumer groups. Each dependency is another library, another config surface, another thing to upgrade.

In a unified runtime, the application declares what it needs. A slice annotated with @Sql gets a database connection -- provisioned, pooled, health-checked, and monitored by the runtime. A slice that declares a stream gets a stream -- partitioned, replicated, persistent. The slice does not know or care how the database connection is managed. It declares intent. The runtime delivers.

This is why 30 slices do not produce 30× configuration. The resources are shared by design. Three slices using the same database share one connection pool. Ten slices publishing to the same stream share one stream configuration. The configuration count tracks resources, not consumers.

Different Tradeoffs: Splitup Without Penalty

When the operational cost of each additional slice is zero -- no pipeline, no deployment descriptor, no config set, no monitoring surface -- the decomposition decision changes fundamentally.

In a microservices system, splitting a service in two is an infrastructure project: new repository, new pipeline, new deployment config, new monitoring, new network policies, new on-call surface. This cost discourages splitting, leading to oversized services that bundle unrelated logic because nobody wants to pay the operational tax of separation.

When slices are the unit, splitting is a code decision. Extract an interface, annotate it with @Slice, implement the business logic. The runtime handles deployment, routing, scaling, and monitoring automatically. There is no operational penalty for having 50 slices instead of 10. The only consideration is the domain: does this logic belong together, or should it be independent?

This aligns perfectly with vertical slicing. Each slice is a complete vertical: an interface (the contract), an implementation (the logic), and resource declarations (what it needs). No horizontal layers spread across packages. No shared mutable state between slices. Each slice is independently deployable, independently scalable, and independently understandable.

The scaling is linear -- in development, not just in production. Adding a new slice to the system does not require understanding the deployment topology, the network configuration, or the CI/CD pipeline. It requires understanding the business domain. Write the interface. Implement the logic. Deploy.

JBCT: Structure That Carries Across Boundaries

This is where the code methodology meets the runtime. JBCT's six structural patterns -- Sequencer, Fork-Join, Condition, Iteration, Aspects, Leaf -- are not just coding rules. They are the structural language of business processes. A Sequencer is a sequence of dependent steps. A Fork-Join is parallel independent operations. A Condition is a routing decision. These map directly to BPMN constructs -- the same notation business analysts use to describe processes.

When every slice follows the same six patterns, several things happen at once:

The code becomes the specification. A business analyst who understands the process can review the code structure and verify that it matches the intended workflow. Not because the code is simple -- because the structure is familiar. The patterns are the same shapes they draw on whiteboards.

AI-assisted development becomes predictable. The six patterns constrain the generation space. An AI that produces JBCT code generates variations of known structures, not arbitrary architectures. The mechanical rules mean less manual correction, less review overhead, less accumulated drift.

Developer onboarding compresses. A developer who learns the six patterns can read and contribute to any slice in the system. There is no per-service learning curve, no "this team does it differently." The patterns are the same everywhere.

The entire stack speaks the same language. The business process is described in BPMN. The code implements it in JBCT patterns that mirror the BPMN structure. The runtime deploys and operates it without configuration. From business requirement to production, the structure is consistent. No translation layers. No impedance mismatches. No "we need a meeting to explain what the code does."

Independent Lifecycles

Remember the Netty CVE problem -- 30 rebuilds, 30 pipelines, 30 rollouts, zero business logic changes?

When the application and the runtime are separate layers, their lifecycles decouple completely. The runtime handles web serving, TLS, connection pooling, serialization, metrics, retry logic. The application handles business logic. When Netty needs a patch, the runtime updates -- once, across all nodes, without touching a single slice. When business logic changes, the slice deploys -- without rebuilding the runtime.

Update the runtime: roll out new nodes, slices continue serving. Update a slice: deploy a new version, the runtime handles routing. Update the database driver: runtime concern, not application concern. Update Java itself: runtime update, slices are unaffected.

Two independent update cadences. Two independent test surfaces. Two independent rollout schedules. The security patch that used to trigger 30 rebuilds is now a single runtime update that applications never see.

Aether: What This Looks Like in Practice

Everything described above is not a thought experiment. It is a working system called Aether.

Here is what the 30-slice e-commerce application looks like in practice.

The Application

A slice is a Java interface:

@Slice
public interface OrderService {
    Promise<OrderResult> placeOrder(PlaceOrderRequest request);

    static OrderService orderService(InventoryService inventory,
                                     PricingService pricing) {
        return request -> inventory.check(request.items())
                                   .flatMap(pricing::calculate)
                                   .map(OrderResult::placed);
    }
}

That is the entire service. The interface is the contract. The factory method parameters are the dependencies. The implementation is the business logic. There is no framework, no annotations for routing or serialization or retry, no configuration file. The runtime handles all of it.

The Deployment

The blueprint is a TOML file that describes the desired state:

id = "org.example:commerce:1.0.0"

[[slices]]
artifact = "org.example:order-service:1.0.0"
instances = 3

[[slices]]
artifact = "org.example:inventory-service:1.0.0"
instances = 5

# ... remaining slices

The Maven plugin packages the blueprint automatically -- along with database schema migration scripts and application configuration -- into a single deployable JAR. The entire build-to-production workflow:

# Build and install to local Maven repository
mvn install

# Push artifacts from local Maven repo to production cluster
aether artifact push org.example:commerce:1.0.0

# Deploy
aether deploy org.example:commerce:1.0.0

Three commands. The runtime resolves artifacts, runs schema migrations, distributes slice instances across nodes, registers routes, and starts serving traffic. No Docker images, no Helm charts, no deployment descriptors, no pipeline configuration.

The Cluster

[deployment]
type = "aws"
region = "eu-west-1"

[deployment.instances]
core = "m5.large"

[cluster]
name = "production"
core.count = 5

[cluster.auto_heal]
enabled = true

aether cluster bootstrap --config cluster.toml

Five nodes. Auto-healing. TLS derived from a shared secret -- no external certificate authority, no cert-manager. Service discovery via consensus -- no Consul, no etcd. Metrics and tracing built in -- no Prometheus scrape configs, no observability agents. QUIC transport with mandatory TLS 1.3 between all nodes.

Scaling

aether cluster scale --core 7

Two additional nodes join the cluster automatically -- same binary, same config. Worker nodes can scale independently via gossip protocol, adding capacity without touching the consensus layer. A 5-node cluster grows to 50 nodes without architecture changes. The runtime includes a reactive scaling controller that adjusts capacity based on CPU, latency, queue depth, and error rate.

Local Development

# Build your slices
mvn install

# Start a 5-node cluster on your laptop with your application deployed
aether-forge --blueprint org.example:commerce:1.0.0

A full cluster on your laptop. Real consensus, real routing, real failure scenarios. Web dashboard with live metrics. Kill a node, watch recovery. Deploy a new slice version, watch traffic shift. The same runtime that runs in production runs on your machine -- not a simulation, not a mock, the actual system. When it works in Forge, it works in production.

The Numbers

For the same 30-slice e-commerce application:

Configuration sets (production): 260-340 on Kubernetes. ~15 on Aether.
Configuration sets (all environments): 580-820 on Kubernetes. ~18 on Aether.
Platform components: 12-18 per environment on Kubernetes. 1 binary on Aether.
Deployment: 30 independent pipelines on Kubernetes. 3 commands on Aether.
Runtime security patch: 30 rebuilds on Kubernetes. 1 rolling update on Aether.
Operational team: 5-8 dedicated engineers on Kubernetes. 1 person on Aether.

One person does not mean zero operational work. Monitoring the cluster, watching for bottlenecks, checking alerts, reviewing scaling behavior, planning capacity -- that is enough work for one person. But it is one person, not eight. And that person spends time on operational judgment, not on managing the configuration machinery that connects 12 tools to 30 services across 3 environments.

Running Java should be as easy as writing it.

With Aether, it is.

pragmaticalabs.io

Previously: Introduction to Pragmatic Functional Java (2019) | We Should Write Java Code Differently (2021) | We Should Write Java Code Differently: Let's Get Practical (2026)

DEV Community

We Should Write Java Code Differently: Frictionless Prod

We Should Write Java Code Differently: Frictionless Prod

The Setup

What 30 Services Actually Cost

The Team That Manages This

Where This Complexity Comes From

The Monolith Turned Inside Out

The Substrate-Application Disconnect

The Tool Multiplication Effect

What If the Gap Didn't Exist?

Unification: From Multiplication to Addition

Integration: The Cluster Becomes Service-Aware

The Communication Fabric

Resource Provisioning: Declare, Don't Configure

Different Tradeoffs: Splitup Without Penalty

JBCT: Structure That Carries Across Boundaries

Independent Lifecycles

Aether: What This Looks Like in Practice

The Application

The Deployment

The Cluster

Scaling

Local Development

The Numbers

Top comments (0)