System Design Interview - Designing from Invariants

#career #architecture #distributedsystems #systemdesign

Designing from Invariants

Software architecture is frequently treated as an exercise in connecting infrastructure components. We often reach for Kafka, Redis, or microservice boundaries as if they are the building blocks of the business logic itself. But when tools come before logic, the resulting design prioritizes infrastructure choices over the problem they are meant to solve.

A high reliability system does not start with a distributed queue or a complex workflow engine. It starts with the core constraints the invariants that make the system reliable. If you start by choosing your infrastructure before you have defined the logic that keeps your data correct, you are building complexity on an undefined foundation.

The Distribution Trap

Most designs become unmanageable because they assume every step of a business process must be distributed across new infrastructure from the beginning.

In this style of design, the business logic is spread across a database, a queue, and a workflow engine. To answer a simple question like "What is the state of this payment?", you have to reconstruct the story from multiple logs. This introduces the Dual Write problem where a database update succeeds but a message publish fails before the system has even achieved its basic purpose.

Coherence as the Minimal Solution

The strongest designs identify the Invariants first. An invariant is a statement that must always be true for the business to be valid. For example:

"A cleared risk decision must never exist without an authoritative payment record."

If the business rules require two things to change together to be valid, the simplest and most robust solution is to keep them in the same transaction.

This logical anchor is the Transactional Center.

The core state machine for an important process should have one queryable home, usually a relational database like Postgres. By starting here, you eliminate entire classes of distributed system bugs. You can scale the system outward later, but the authority remains in one place.

Scaling without Scattering

Scaling should be a reaction to a requirement, not a default architecture. The Four Plane Model provides a way to distribute workloads without losing the source of truth.

Plane 1 Transactional Truth

This is the core. It owns the current state, the audit trail, and the records used to reliably notify the rest of the system.

Plane 2 Action Systems

These are Kafka workers and background jobs. They react to the truth committed in Plane 1. Asynchronous tasks like notifications or external fraud checks happen here without slowing down the core transaction.

Plane 3 Real Time Reads

When you need fast dashboards, move those reads to a specialized replica like ClickHouse. This keeps analytical traffic from overwhelming the transactional core.

Plane 4 Historical Analytics

This is for deep history and data science (BigQuery or Snowflake). It stays completely separate from the operational system.

Choosing Your Path

The decision to distribute should always follow the logic of the problem.

Start with a Transactional Center when

Consistency is part of the business value. If a payment must be atomic with an order update, keep them together. This is the simplest possible solution and the most resilient to failure.

Extend to Distributed Choreography when

Domains are truly independent or you have reached a scale where a single database cannot handle the write volume. Use patterns like Sagas only when the local boundary can no longer support the technical requirements of the system.

A resilient system starts by identifying the center. Ask one question: Where is the authority?

Originally published at: https://looppass.mindmeld360.com/blog/system-design-transactional-center/