Vatsal Valecha

Posted on Jul 16

✨ CQRS: Debezium + Kafka = Pure Magic

#cqrs #kafka #debezium #springboot

Let’s cut to the chase.

I’m working on a banking application made up of 10+ microservices, each owning its own schema and data domain. Some real-world challenges kept repeating:

❓ How do I keep queries blazing fast without hammering my transactional database?
❓ How do I fetch nested data (like user → accounts → cards) without orchestrating 3+ services?
❓ How do I scale reads and writes independently?

That’s when I decided to lean fully into CQRS — with PostgreSQL for writes, MongoDB for reads, and the real magic: Debezium + Kafka to wire it all together. ⚡

🏦 The Setup: A Modular Banking System

Here’s a simplified breakdown:

✅ User Service

Tables: user, user_metadata, user_address, user_kyc

✅ Account Service

Tables: account, account_metadata
Types: savings, loan, FD, credit

✅ Card Service

Tables: cards, card_controls
Types: debit, credit, forex

Each service uses PostgreSQL as its transactional store and exposes APIs for CRUD operations.

🔍 The Query Headache

Here are the actual use cases I wanted to support:

Get all accounts for a user
Get all cards for an account
Get all accounts and cards for a user in one shot

Sounds simple — but across microservices, this turns into a mess of joins, API calls, and orchestration.

💡 The CQRS Solution

Here’s the architectural shift I made:

✅ Writes go to PostgreSQL (normalized, ACID-compliant)
✅ Reads happen via MongoDB (denormalized, query-optimized)
🔄 Changes from Postgres are synced in real-time using Debezium → Kafka → Mongo

This lets me build denormalized Mongo documents like:

{
  "userId": "u123",
  "name": "Alice",
  "accounts": [
    {
      "accountId": "a456",
      "type": "Savings",
      "cards": [
        {
          "cardId": "c789",
          "type": "Debit",
          "limit": 50000
        }
      ]
    }
  ]
}

🧪 Implementation: Two Approaches I Tried

❌ Option 1: JPA Entity Listeners

Use Spring JPA lifecycle hooks like @PostPersist and @PostUpdate to listen for changes:

Capture before and after states of the entity
Push updates to Kafka or any message queue
A downstream service listens and updates the Mongo read model

📉 Why I Moved Away:

Too tightly coupled with business logic
Doesn’t capture raw DB writes (e.g., SQL scripts or tools)
Hard to maintain consistency across services
Requires manual plumbing for every entity

✅ Option 2: Debezium + Kafka (Deep Dive)

Debezium is a CDC (Change Data Capture) tool that listens to changes in your database (like INSERTs, UPDATEs, DELETEs) and publishes those changes to Kafka topics in real time. Your services can then consume these topics and build query-optimized read models (e.g., in MongoDB).

🔍 How It Works — Step by Step

PostgreSQL emits changes to the WAL (Write-Ahead Log)
- WAL logs are internal logs that track every change made to the DB.
- They are designed for crash recovery, but Debezium uses them for streaming.
A replication slot is created in PostgreSQL
- Debezium uses this slot to track the WAL changes.
- Think of it as a pointer to the latest event — so Debezium knows what to stream next.
Debezium connects to PostgreSQL via logical decoding
- Logical decoding turns WAL logs into human-readable change events.
- Debezium uses this to get structured data like {before, after}.
Debezium pushes changes to Kafka
- Each table you're monitoring emits to its own Kafka topic.
- Format: serverName.schema.table, e.g., banking_app.public.user.
Kafka consumers (like your Mongo Writer Service) consume and store it in the read DB.

🧠 Key Concepts

🔌 PostgreSQL Replication Slot

A replication slot keeps track of how much of the WAL has been read by Debezium. It ensures:

No WAL segments are deleted until they've been consumed
Debezium doesn’t miss any change
Only one logical consumer per slot

📌 Important: Don’t forget to drop the slot if you stop using Debezium, or your disk will fill up with old WAL logs!

-- View existing slots
SELECT * FROM pg_replication_slots;

-- Create manually (Debezium creates this for you automatically)
SELECT * FROM pg_create_logical_replication_slot('my_slot', 'pgoutput');

🔁 Real-Time Change Data Capture from PostgreSQL to Kafka using Debezium

In this post, we'll explore how to use Debezium to stream real-time changes from a PostgreSQL database into Apache Kafka, making it easy to sync data into systems like MongoDB, Elastic, or downstream microservices.

📁 Debezium Kafka Connector Config (PostgreSQL)

Here's a sample Kafka Connect configuration for Debezium to monitor changes in a PostgreSQL database:

{
  "name": "banking-postgres-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "plugin.name": "pgoutput",
    "database.hostname": "localhost",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "dbz",
    "database.dbname": "banking",
    "database.server.name": "banking_app",
    "table.include.list": "public.user,public.account,public.cards",
    "database.include.schema.changes": "false",
    "slot.name": "banking_slot",
    "tombstones.on.delete": "false",
    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "key.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter"
  }
}

DEV Community