Kaustubh Alandkar

Posted on Apr 3

How I Built a High-Throughput Transaction Processor with Kafka, Redis, PostgreSQL, and MongoDB

#architecture #backend #kafka #systemdesign

When I started building this project, I wanted to learn by building something similar to how backend systems in payment processing apps work.

I wanted to build something that made me think carefully about throughput, ordering, idempotency, auditability, and failure boundaries together.

That led me to build HVTP (High Volume Transaction Processor) — a portfolio-grade, event-driven transaction processor that behaves more like a small transaction backend.

What made this project valuable for me wasn’t just wiring Kafka into a system.

It was learning how to shape the system so the right work happened in the right place.

What the project actually is

At a practical level, HVTP is a signed transaction ingestion pipeline.

A merchant client sends a transaction request over HTTP. The system validates the request at ingress, accepts it quickly, and then hands it off for asynchronous processing.

From there, the system:

validates and processes the transaction
enforces idempotency
persists ledger state
stores immutable audit events
exposes a status API
supports reconciliation between stores
emits terminal outcomes through downstream flows

The stack looks like this:

Kafka for event flow
Valkey (open-source Redis fork) for idempotency and some read-path control
PostgreSQL for the ledger / queryable durable state
MongoDB for immutable audit events
Spring Boot services split by responsibility
k6 for ingress load testing

This project is not about reproducing a regulated payments platform.

It is about building a system shape where correctness, isolation of responsibility, and observable behavior matter.

Why I kept the request path small

One option was to do everything in the request path:

Receive HTTP request
Validate everything in the same service
Write directly to PostgreSQL
Also write to MongoDB
Return success

That would have been simpler to build at first.

But for this project, I wanted to separate request acceptance from downstream processing. I wanted the ingress layer to stay focused on validating, accepting, and handing work off quickly, instead of taking on ledger writes, audit writes, and every other downstream concern synchronously.

That decision shaped the rest of the architecture.

The architecture I ended up with

I split the write path into a small event-driven pipeline:

That split gave each service one main responsibility:

api-service → signed ingress + fast acceptance
processor-service → validation + idempotency
ledger-writer-service → durable ledger persistence
audit-service → immutable audit history

What I liked about this structure was that each boundary had a clear reason to exist.

Main architecture and design decisions

1) Keep the API fast

The api-service just accepts the request and returns 202 Accepted.

I did this to keep the HTTP layer as an intake boundary, and not process the full transaction.

In HVTP, the ingress path is intentionally limited to:

validate request shape
verify signature
publish to Kafka
return acceptance

That means the API is not waiting on:

Idempotency checks
PostgreSQL ledger persistence
MongoDB audit writes
downstream webhook behavior

This was one of the most important decisions in the project because it kept the front door responsive even when downstream work had different timing characteristics.

2) Use Kafka for decoupling

I used Kafka because I wanted request acceptance, transaction processing, ledger persistence, and audit persistence to move at different speeds without being tightly bound to one another.

HVTP currently uses:

transaction_requests
transaction_log
dead-letter topics for failure paths

That gave me a few concrete benefits:

the API can accept requests without waiting for downstream writes
the ledger writer and audit service can scale independently
replay becomes possible
failure handling becomes clearer

I also used accountId as the key for the main topics.

That was deliberate.

For this project, the ordering boundary I cared about was not global ordering across every transaction.

It was preserving ordering for transactions belonging to the same account.

3) Treat idempotency as a correctness concern

To deal with:

retries
duplicate submissions
consumer reprocessing

and to make the system idempotent, I used an idempotency key.

Each request sent from the client includes an Idempotency-Key.

Without it, processing the same request twice could result in:

Duplicate ledger updates
Duplicate audit events
Inconsistent downstream outcomes

I used Valkey (open-source Redis fork) to store and check this idempotency key in the processor service.

One of the most useful mindset shifts from this project was moving from:

“How do I process this request?”

to:

“What must remain true even if this request appears more than once?”

That question improved the architecture more than any individual framework decision.

4) Let PostgreSQL and MongoDB do different jobs

I used two stores intentionally because the write patterns and query needs are different.

PostgreSQL is the ledger

PostgreSQL stores the durable transaction state that the system can query through the status path.

It holds the queryable record of a transaction in a ledger-style structure, including fields like:

transaction_id
idempotency_key
merchant_id
account_id
amount
currency
type
status
processed_at

That is the durable store for the transaction state I want to query directly.

MongoDB is the audit trail

MongoDB stores immutable audit events, including values such as:

transaction IDs
merchant/account IDs
correlation IDs
statuses
source topic
timestamps

These stores answer different questions.

The ledger answers:

“What is the durable transaction state?”

The audit store answers:

“What happened around this transaction over time?”

Separating those concerns made the model cleaner and easier to reason about.

5) Design for replay and reconciliation

The ledger writer and audit service consume from the same event stream, but they write to different storage systems.

That means there is always some possibility of drift, timing gaps, or mismatched writes across stores.

So I added reconciliation support.

The project includes a reconciliation model that compares recent ledger and audit state and records summary runs like:

audit count
ledger count
missing in ledger
run status
notes

I also wanted replay support to exist in the architecture before it became necessary.

That decision made the system feel more operationally realistic.

It shifted the design from “write to multiple places” toward “write, verify, and recover.”

6) Measure ingress behavior under overload

I also ran k6 load tests against the signed transaction ingestion endpoint at multiple offered rates, including 50K RPS and 100K RPS.

The purpose was not to describe the whole system as completing transactions at those rates end to end.

The goal was more specific:

“How does the ingress layer behave when offered far more traffic than the machine can sustain?”

That framing was important to me because it matched what I was actually measuring.

What the numbers showed

In local testing on a single machine, the API maintained:

0% HTTP failure rate
100% 202 Accepted for completed HTTP requests
accepted ingress throughput that leveled off around 3.1K–3.2K req/sec

A few highlights:

At 1K offered RPS, it handled 60,001 accepted requests in 60s
At 50K offered RPS, accepted throughput peaked at about 3,172.5 req/sec
At 100K offered RPS, it still completed 189,936 accepted requests in 60s
P95/P99 latency increased under overload, but the HTTP layer remained responsive

What I liked about that result was not the raw offered rate, but the saturation behavior.

The ingress layer stayed usable, throughput leveled off in a predictable way, and latency rose before failure.

That is a useful property in an asynchronous system.

The important caveat

These are HTTP ingress acceptance results, not end-to-end transaction completion metrics.

So the correct interpretation is:

the API accepted the requests
downstream completion happens asynchronously
the numbers describe front-door behavior, not full workflow completion

For this project, that was the honest and useful performance story to tell.

Real implementation friction and subtle problems

The architecture diagram is the clean version.

Implementation is where the edge cases become visible.

1) `202 Accepted` creates a visibility obligation

Returning 202 Accepted simplified the ingress path, but it also meant the system needed to answer follow-up questions such as:

did it persist?
did it fail?
was it rejected?
is it still in-flight?

That is why HVTP includes:

a status endpoint
correlation IDs
downstream event flow for terminal outcomes and tracing

Moving work out of the synchronous path reduced coupling, but it also increased the need for visibility.

2) Ordering had to be defined carefully

Early on, I had to be specific about what “ordering” meant in this system.

For HVTP, global ordering across all transactions was not the target.

Per-account ordering was the meaningful boundary.

That is why Kafka messages are keyed by accountId.

It gives the ordering guarantee I actually needed without forcing all traffic through one serialized path.

3) Multi-store systems introduce operational edges

Using PostgreSQL for ledger state and MongoDB for audit events was the right choice for this project.

It also meant I had to care whether both stores continued to reflect the same logical transaction stream.

That is why reconciliation became part of the design rather than an afterthought.

There was also a useful implementation lesson here: the Mongo mapping used for reconciliation has to stay aligned with the collection the audit service is actually writing to.

That kind of mismatch does not always fail loudly.

It can quietly reduce trust in operational checks.

4) Performance framing matters

Once I added the higher offered-RPS tests, I spent time thinking about how to describe the results precisely.

The more useful framing was not a headline number.

It was explaining what the tests actually demonstrated:

the ingress layer remains stable under overload
throughput saturates at a predictable point
latency rises as load increases
the asynchronous boundary protects the front door on this hardware

That framing is more useful because it stays aligned with what the measurements actually represent.

What changed in how I think

Before building this, I mostly thought about high throughput as a performance problem.

After building it, I think about it much more as a boundary design problem.

The question that stayed with me was not:

“How fast can one service go?”

It was:

“Where should work happen, where should it not happen, and what must remain true when parts of the system are delayed, retried, duplicated, or partially broken?”

That shift changed how I think about backend systems.

A few things became much clearer to me:

Async systems need strong visibility
Idempotency is part of the design, not just an implementation detail
Storage choices should follow write semantics
Graceful saturation is a useful success condition
Good architecture is often about clean responsibility boundaries

One practical lesson from this project was that precision matters.

202 Accepted should mean something specific.
A benchmark should measure something specific.
And each service should have a clearly defined responsibility.

That mindset ended up being one of the most useful outcomes of the project.

Final takeaway

If I had to compress the whole project into one sentence, it would be this:

I built HVTP to practice designing a system that can accept load quickly while keeping correctness, separation of concerns, and recovery paths in view.

That is what this project gave me.

It helped me think more clearly about how to keep the front door fast, how to handle duplicates intentionally, how to separate durable state from audit history, and how to design for verification instead of assuming everything will always stay aligned.

For me, that was far more valuable than just assembling a stack.

Final thoughts

If you’ve built something in this space, I’d be genuinely interested in how you approached trade-offs around:

202 Accepted vs synchronous confirmation
Redis idempotency boundaries
ledger vs audit store separation
what you consider a useful throughput benchmark

Those design choices ended up being the most interesting part of the project for me.

If you want to explore the implementation, docs, and load tests, the full repo is here:

kaustubh-26 / high-volume-transaction-processor

Event-driven transaction processor with signed ingress, Kafka workflows, ledger persistence, audit storage, and webhook notifications

High Volume Transaction Processor

High Volume Transaction Processor — An event-driven transaction processor

A production-style, event-driven transaction pipeline showcasing signed API ingestion, asynchronous Kafka processing, Redis idempotency, PostgreSQL ledger writes, and MongoDB audit persistence.

The repository is structured like a small payment platform:

signed transaction ingestion over HTTP
asynchronous processing over Kafka
Redis-backed idempotency protection
ledger persistence in PostgreSQL
immutable audit persistence in MongoDB
dead-letter topics for failed records
webhook notifications for transaction state changes
Actuator and Prometheus endpoints on every service

What This Project Demonstrates

Event-driven microservices with clearly separated write responsibilities
Per-account ordering by using accountId as the Kafka message key
Idempotency enforcement in the processor with Redis TTL-backed keys
PostgreSQL as the ledger source of truth for persisted transactions
MongoDB as an append-only audit store
Reconciliation between the audit store and the ledger
Replay support for rebuilding ledger state from transaction_log
Signed ingress requests and API-key-protected status…

View on GitHub

Top comments (2)

Andre Cytryn • Apr 3

the decision to keep the API path thin and push complexity downstream is underrated. most early-stage systems do the opposite, synchronously write to everything in the request handler, and then wonder why latency spikes under load.

the idempotency key treatment stood out too. treating it as a correctness concern rather than a best-effort retry mechanism is a meaningful distinction. once you model it that way, the architecture shifts toward asking "what does processing this twice actually break?" which surfaces failure modes early.

curious how you handle visualization of the event flows as the system evolves. the architecture diagram looks clean, but keeping those updated as services drift is a common pain point.

Kaustubh Alandkar • Apr 4

Really appreciate this, especially the point about idempotency being a correctness concern rather than just a retry convenience. That framing influenced a lot of the design here.

Keeping the API path thin was intentional too. I wanted ingress to stay predictable under load and avoid doing too much synchronous work in the request handler.

On the visualization side, I’m currently treating the architecture diagram more like a versioned high-level map than a perfect source of truth. The actual source of truth is more distributed across:

topic / event contracts
service responsibilities
persistence boundaries
flow docs

As the system evolves, my approach is to keep the top-level diagram stable and use more flow-specific sequence diagrams / ADR-style docs for the parts that drift faster.

Totally agree though, keeping event flow documentation honest as services evolve is one of the trickiest parts.