Data fails first

#tpf #java #quarkus #datascience

The strange thing was that the first real scaling problems had nothing to do with scale. Or at least, scale as in "infrastructure scale".

The application was still relatively small. Traffic was growing, yes, but nothing dramatic. CPUs were healthy, latency was acceptable, deployments remained pleasantly uneventful. From the outside, the system still looked exactly like what it technically was: a single Quarkus application running as a monolith, with pipeline steps invoking one another directly inside the same process.

And yet developers had started speaking about it differently.

Not formally. Nobody walked into a meeting and announced that “the architecture had changed.” It emerged indirectly, through the kind of comments engineers make when systems begin slipping away from them:

“Be careful changing that field.”
“I think another step depends on this.”
“We should probably keep the old format for compatibility.”
“I’m not entirely sure who consumes this anymore.”

The incidents themselves were subtle enough to avoid triggering panic. Customer support found two apparently identical shipments taking different routing paths. Finance noticed that premium customers occasionally missed SLA prioritization. A new carrier integration introduced a field named region, while another team added shipping_region, and for several months both quietly coexisted because somewhere in the middle of the pipeline a mapper handled the ambiguity without anybody realizing it.

Nothing crashed.

That was precisely the problem.

The failures were no longer technical failures. They were semantic failures. The system continued running while different parts of the application slowly stopped agreeing on what the data actually meant.

Marta recognized the smell immediately because she had seen it before in event-driven and serverless systems: payloads beginning life as wonderfully flexible JSON blobs and gradually hardening into undocumented distributed contracts. Except this time, it was happening entirely inside a monolith.

That realization bothered her more than any infrastructure issue they had faced so far because it exposed something most engineers rarely think about when they talk about distributed systems.

When people hear “distributed systems,” they usually imagine the visible mechanics:

remote calls
retries
eventual consistency
latency
concurrency
partitions

But those are often second-order effects. The deeper problem appears much earlier, when independently evolving parts of a system begin sharing meaning imperfectly.

Two teams interpreting the same field differently. A payload surviving longer than the assumptions under which it was created. Producers and consumers changing independently without realizing the semantic contract between them has drifted.

That is already a distributed systems problem, even if everything still runs inside one JVM.

And that was the moment Marta realized something important:

Distribution is not a deployment problem. It starts as a data problem.

The architecture diagrams had not changed yet, but the organizational reality already had. Different teams owned different stages of the pipeline. External integrations introduced their own semantics. Some transformations were deterministic and easy to reason about; others depended on databases, carrier APIs, fraud services, and infrastructure outside the team’s control. The application might still have been physically centralized, but conceptually it had already become a collection of independently evolving boundaries.

That was where TPF started becoming much more than a convenient orchestration framework.

The team was not using it as a generic workflow interpreter where loosely defined tasks execute dynamically at runtime. What TPF increasingly provided was something much closer to a typed application pipeline compiler: a system capable of understanding the boundaries between stages well enough to generate contracts, validate transformations, wire transports, expose endpoints, and fail builds when parts of the pipeline drifted out of alignment.

The distinction mattered enormously.

The pipeline steps themselves remained ordinary application code. Developers wrote Quarkus services, plain Java methods, some reactive Mutiny handlers where concurrency justified the complexity, some blocking implementations where simplicity mattered more than theoretical throughput. Nobody had to abandon the normal ergonomics of application development to “enter the workflow engine.”

But the boundaries between those steps became explicit.

Instead of passing anonymous maps through the system and hoping conventions would survive organizational growth, stages now declared typed inputs and outputs, mapper rules, cardinality expectations, operator contracts, and transport compatibility in ways the compiler could actually verify.

The effect on development was immediate and surprisingly human. Refactoring became less frightening. Teams stopped relying on tribal knowledge to understand dependencies. A mapper mismatch became a compilation error instead of a production incident discovered three weeks later through telemetry dashboards and apologetic Slack messages.

Peace of mind was alas, restored, once all the data contracts between steps were formalised, and the architecture stopped evolving accidentally.

DEV Community

Data fails first

Top comments (0)