Gowtham Potureddi

Posted on Jul 4

Debezium Deep Dive: Connectors, Snapshot, Schema History, Outbox Pattern

#python #sql #interview #dataengineering

debezium is the change data capture engine that quietly powers most of the modern "database-to-Kafka" plumbing in production data platforms — and the single tool senior data engineers get asked about most when the interviewer wants to know whether you have shipped a real streaming pipeline or only watched a demo. A debezium connector reads the write-ahead log of a source database (Postgres WAL, MySQL binlog, MongoDB oplog, SQL Server CDC tables) and emits a per-row change event stream onto Kafka — not by polling a updated_at column, not by trigger scripts, but by tailing the transaction log the database already writes to for crash recovery. That single architectural choice is why Debezium became the CDC default and why the interview questions on it converge on the same four axes: which connector class for which source, which debezium snapshot mode for which restart requirement, how the debezium schema history topic keeps the pipeline honest across DDL, and how the debezium outbox pattern solves the dual-write problem that every distributed team eventually hits.

This guide is the senior-DE walkthrough you wished existed the first time an interviewer asked "walk me through what happens when a Debezium connector restarts on a table with snapshot.mode = when_needed" or "how does incremental snapshotting actually work — what is the signal table for?" or "your app writes to Postgres and publishes to Kafka in the same request handler and one of the writes fails; design the fix using the outbox pattern and the EventRouter SMT." It walks through the connector-class map across Postgres, MySQL, Mongo, and SQL Server, the six snapshot modes and the 2.x incremental snapshot mechanism, the schema-history topic architecture and its Schema Registry integration, the outbox table + SMT design that gives you exactly-once semantics for business events, and the production hygiene — tombstones, dead-letter queues, offset recovery, at-least-once vs exactly-once — that separates a demo from a cdc kafka backbone. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why it works.

When you want hands-on reps immediately after reading, drill the streaming practice library →, rehearse on the ETL practice library →, and sharpen the design axis with the SQL practice library →.

On this page

Why Debezium became the CDC default
Connectors — Postgres, MySQL, Mongo, SQL Server
Snapshot modes + incremental snapshotting
Schema history + schema evolution
Outbox pattern + production hygiene
Cheat sheet — Debezium recipes
Frequently asked questions
Practice on PipeCode

1. Why Debezium became the CDC default

Log-based CDC on Kafka Connect — the two architectural choices that made every other CDC tool a niche

The one-sentence invariant: Debezium is a set of Kafka Connect source connectors that tail the write-ahead log of a source database and emit per-row change events with a stable JSON or Avro envelope, at-least-once by default, with resumable offsets stored in Kafka itself. Everything else about it — six snapshot modes, the schema-history topic, the outbox pattern, the incremental snapshot signal table, the tombstone-on-delete convention — flows from those two architectural choices: log-based reading (not polling) and Kafka Connect (not a bespoke runtime). Trigger-based CDC systems (custom pg_trigger fanning into a shadow table) buckle at load; polling systems (SELECT ... WHERE updated_at > :last_run) miss deletes and race with soft-delete semantics; message-queue systems require dual writes that are famously not atomic. Log-based CDC is the only design that gives you every insert, every update, every delete, in commit order, with no schema-invasive changes to the source database.

The four "must-answer" axes interviewers actually probe.

Connector class. Postgres uses io.debezium.connector.postgresql.PostgresConnector with a plugin.name of pgoutput (built-in) or wal2json (legacy). MySQL uses io.debezium.connector.mysql.MySqlConnector with binlog_format = ROW and preferably gtid_mode = ON. Mongo uses io.debezium.connector.mongodb.MongoDbConnector on top of change streams (oplog on older setups). SQL Server uses io.debezium.connector.sqlserver.SqlServerConnector on top of SQL Server Agent's CDC tables. Naming the right connector class per source is the first senior-DE signal.
Snapshot mode. initial, always, never, initial_only, when_needed, schema_only — six modes that decide what happens when a connector starts against a table for the first time and what happens when it restarts after a broken offset. Wrong mode = duplicate events, missed events, or a snapshot that runs for six hours during a routine restart. Knowing the mode-matrix by heart separates senior from mid.
Schema history. DDL against the source (ALTER TABLE, CREATE TABLE) must be replayable when the connector restarts, otherwise the connector cannot decode WAL records that were written under an older schema. Debezium persists every observed DDL statement to a dedicated schema.history.internal.kafka.topic — a compacted, single-partition, retention-forever topic that is replayed on every connector restart. Deleting this topic breaks the connector.
Outbox. The dual-write problem — writing to your business table and publishing to Kafka in the same request handler is not atomic unless you use the outbox pattern. Debezium ships an EventRouter SMT that transforms outbox-table events into aggregate-routed topic messages, giving you a single-transaction publish semantics on top of at-least-once CDC.

The 2026 reality — what Debezium looks like today.

Debezium 2.x. Renamed configs (database.hostname → database.hostname still, but many database.* moved to topic.prefix + connector-specific namespaces), added incremental snapshotting driven by a signal table, added Debezium Server (Quarkus-based standalone mode) for teams that don't want Kafka Connect.
Connect mode vs Server mode. Kafka Connect mode runs the connector as a task inside a Connect worker — the mainstream deployment. Debezium Server runs the same connector code as a standalone JVM that pushes to Kafka, Pulsar, Kinesis, Pub/Sub, EventHubs, or Redis Streams — used when the team wants CDC without operating Connect.
Incremental snapshotting. Instead of one giant blocking snapshot at connector startup, incremental snapshots let you add a new table (or re-snapshot an existing one) chunk-by-chunk while live streaming continues. Chunks are triggered by inserting rows into a debezium_signal control table.
Signal table. The debezium_signal table on the source database is the control channel — inserting a JSON payload ({"type": "execute-snapshot", "data": {"data-collections": ["public.orders"]}}) triggers an incremental snapshot without any connector restart.

What interviewers listen for.

Do you say "tails the WAL / binlog" in the first sentence when asked how Debezium works? — senior signal.
Do you mention the schema-history topic as a distinct topic that must never be deleted? — required answer.
Do you distinguish snapshot vs streaming phases and name the six snapshot modes? — required answer.
Do you cite the outbox pattern as the solution to dual-write and name the EventRouter SMT? — senior signal.
Do you know incremental snapshotting is signal-table driven and does not require a restart? — senior signal for 2.x-era candidates.

What the "at-least-once by default" clause actually means.

Offsets are stored in Kafka. Debezium writes the WAL / binlog position to a special __debezium-connect-offsets topic on every successful poll. On restart, the connector resumes from the last committed offset.
The gap between "processed" and "committed offset". Between reading a WAL record and committing the offset back to Kafka, the connector can crash. On restart, it re-reads the same WAL records — duplicates are possible.
Downstream must be idempotent. The consumer of the CDC topic (a warehouse loader, a materialised view refresher, an event handler) must handle duplicates via idempotency keys, upserts, or a deduplication window.
Exactly-once is possible but expensive. Debezium 2.x supports Kafka Connect's transactional producer semantics, which give you exactly-once from the connector to Kafka. The database-to-connector step is still at-least-once because WAL replay is idempotent only at the record level, not at the transaction level. True end-to-end exactly-once requires idempotent downstream consumers regardless.

Common interview probes on Debezium fundamentals.

"How does Debezium know where to resume after a crash?" — Kafka Connect offset topic, WAL/binlog position stored per source.
"What's the difference between snapshot and streaming?" — snapshot reads current rows via SELECT; streaming tails the log. Snapshot happens once (per mode); streaming is continuous.
"Do I need Kafka to run Debezium?" — Kafka Connect mode does; Debezium Server does not (it can push to Pulsar, Kinesis, Pub/Sub, EventHubs, Redis Streams).
"Why is the schema-history topic single-partition?" — DDL events must be totally ordered; a single partition gives you total order on replay.
"Why compacted?" — you only need the latest schema version for each table; older versions can be compacted away as long as the compacted record retains the schema mutation history.

Worked example — the log-vs-poll trade-off

Detailed explanation. The textbook naive approach: an analytics team implements CDC via a polling job that runs SELECT * FROM orders WHERE updated_at > :last_run every 30 seconds. Six weeks later they discover the pipeline missed 12,000 deleted rows, dropped 300 updates that landed inside the polling window, and duplicated 800 records that were touched twice between polls. Walk an interviewer through why log-based CDC (Debezium) is the correct answer and what the log actually captures that polling cannot.

The symptom. Downstream warehouse count diverges from source count by ~1% per week and slowly worsens over time.
The naive fix. Reduce polling interval from 30 s to 5 s. Divergence rate drops but does not disappear; database CPU load doubles.
The real bug. Polling on updated_at cannot see deletes (unless soft-delete), misses rows updated inside the window (last-write-wins with the polling window), and duplicates rows updated at the window boundary.
The Debezium answer. WAL/binlog captures every insert, update, and delete with their transaction commit order. The connector reads a byte offset into the log; missed windows are impossible by construction.

Question. A team has 5 TB of Postgres data and wants to ship every row change to Kafka for a downstream warehouse. Compare polling CDC (SELECT ... WHERE updated_at) against log-based CDC (Debezium) across correctness, latency, source load, and operational surface. Recommend the design and justify.

Input.

Requirement	Polling CDC	Log-based CDC (Debezium)
Data size	5 TB	5 TB
Change rate	50k rows/sec peak	50k rows/sec peak
Latency target	30 s	5 s
Schema change frequency	1–2 per week	1–2 per week
Source database	Postgres 16	Postgres 16

Code.

-- Polling CDC — the naive approach (broken)
-- Runs every 30 seconds from an external scheduler
SELECT id,
       tenant_id,
       total_cents,
       status,
       updated_at
FROM   orders
WHERE  updated_at > :last_run
ORDER  BY updated_at;

-- Problems:
--   (a) DELETE is invisible (no row to select)
--   (b) UPDATE-then-UPDATE inside window shows only latest state
--   (c) Rows landing at exactly :last_run are duplicated or dropped
--   (d) Full-table scan every 30 s at high change rate = huge cost

{
  "name": "orders-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",
    "database.hostname": "db.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "***",
    "database.dbname": "shop",
    "topic.prefix": "shop",
    "plugin.name": "pgoutput",
    "publication.name": "dbz_publication",
    "slot.name": "dbz_slot",
    "table.include.list": "public.orders",
    "snapshot.mode": "initial",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-history.shop",
    "key.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter"
  }
}

Step-by-step explanation.

The polling query on updated_at is fundamentally lossy. Deletes are invisible (the row is gone; there is nothing to select). Rows touched twice inside the 30-second window collapse to their latest state — the intermediate state is lost. Rows updated exactly at the window boundary either duplicate (>=) or drop (>) depending on the operator, and clock skew between the app server and the database makes the boundary fuzzy.
The Debezium config connects to Postgres via logical replication (plugin.name = pgoutput), creates a replication slot (dbz_slot), and creates a publication (dbz_publication). The connector then reads WAL records for the tables listed in publication.name. Every INSERT, UPDATE, DELETE — including transaction boundaries — arrives in commit order.
Latency: WAL records are read as soon as they are flushed by Postgres. Debezium sees an INSERT within milliseconds of COMMIT; the Kafka message lands within another few milliseconds. Polling CDC has a worst-case latency of the polling interval + query duration + downstream commit — typically 45+ seconds for a 30 s window.
Source load: polling runs an index scan or full-table scan every 30 s; at 50k rows/sec of changes over a 5 TB table this is meaningful CPU and IO. Debezium reads the WAL sequentially, imposing near-zero additional CPU on the source (the WAL is already being written for crash recovery).
The recommendation is Debezium with snapshot.mode = initial. The initial snapshot backfills the 5 TB current state; the streaming phase continuously tails WAL. Correctness is exact (every change captured in commit order); latency is sub-second; source load is negligible; operational overhead is the Connect cluster + the schema-history topic — both one-time setups.

Output.

Axis	Polling CDC	Log-based CDC (Debezium)
Captures INSERTs	yes	yes
Captures UPDATEs	last-write-wins in window	every one, in order
Captures DELETEs	no (unless soft delete)	yes
Latency (steady state)	30–60 s	< 1 s
Source CPU	full scan every 30 s	~0 (WAL already written)
Divergence rate	0.5–2% per week	0% (barring bugs)

Rule of thumb. For any production CDC pipeline where correctness matters more than "we shipped something," pick log-based CDC on day one. Polling CDC is a valid pattern only when the source is a legacy system that does not expose a log — and even then, plan the migration to log-based.

Worked example — Debezium versus a bespoke trigger-based CDC

Detailed explanation. A different anti-pattern: instead of polling, a team writes Postgres AFTER INSERT/UPDATE/DELETE triggers on every business table. Each trigger inserts a row into a shadow cdc_events table. An external cron job reads cdc_events and pushes to Kafka. This works — until write throughput doubles, at which point the trigger overhead becomes the bottleneck and the shadow table becomes a hot spot. Debezium is the log-based answer that gets you the same semantics without the trigger overhead.

Trigger overhead. Every INSERT/UPDATE/DELETE on a business table now runs an extra INSERT into cdc_events. Postgres executes trigger code inside the same transaction; commit latency and lock hold time both increase.
Shadow-table hot spot. All triggers write to one table. At high concurrency the cdc_events write becomes the tail-latency bottleneck; b-tree contention on the primary key index causes lock waits.
Cleanup. The shadow table grows without bound unless an external job prunes it. The prune job is another moving part.
Debezium answer. The WAL captures everything the triggers would capture, without any additional work on the source database.

Question. A trigger-based CDC pipeline is causing tail-latency regressions on the source database. Design the migration to Debezium: what runs during the migration, how do you validate no events are missed, and what is the cutover procedure?

Input.

Component	Value
Source	Postgres 16, 10 TB
Trigger-based CDC	Shadow table + cron push to Kafka
Target	Debezium PostgresConnector on Kafka Connect
Downstream consumers	3 (warehouse loader, cache invalidator, search indexer)
Cutover window	30-minute maintenance window acceptable

Code.

-- Old — trigger-based CDC (the incumbent)
CREATE TABLE cdc_events (
  id BIGSERIAL PRIMARY KEY,
  table_name TEXT NOT NULL,
  op CHAR(1) NOT NULL,       -- 'I', 'U', 'D'
  row_data JSONB NOT NULL,
  ts TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE FUNCTION emit_cdc() RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO cdc_events (table_name, op, row_data)
  VALUES (TG_TABLE_NAME, LEFT(TG_OP, 1), to_jsonb(NEW));
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER orders_cdc AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION emit_cdc();

{
  "name": "orders-shadow-cdc-migration",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db.internal",
    "database.dbname": "shop",
    "topic.prefix": "shop",
    "plugin.name": "pgoutput",
    "publication.name": "dbz_publication",
    "slot.name": "dbz_slot",
    "table.include.list": "public.orders,public.line_items,public.payments",
    "snapshot.mode": "initial",
    "signal.data.collection": "public.debezium_signal",
    "topic.creation.default.replication.factor": "3",
    "topic.creation.default.partitions": "12"
  }
}

Step-by-step explanation.

Stand up Debezium in parallel with the trigger-based system. The initial snapshot backfills current state for the three tables; the streaming phase begins tailing WAL. Both pipelines are live simultaneously — Debezium writes to new topics (shop.public.orders, shop.public.line_items, shop.public.payments), the trigger system writes to the legacy topic.
Validate no events are missed by running a comparison job: for a 10-minute sample, count events per (table, op) in the Debezium topic and in the trigger-based topic. The counts should match within a rounding tolerance for concurrent transactions. Any systematic gap points to a Debezium config issue (usually publication.name missing a table).
Migrate downstream consumers one at a time. The warehouse loader is the easiest — flip its bootstrap-topic list from legacy to new; because both streams contain the same commit-ordered events, the loader's watermark advances the same way. Repeat for the cache invalidator and the search indexer. Each consumer migration is a config change, not a code change.
Once all three consumers are on the Debezium stream, drop the triggers and the cron job. The cdc_events shadow table can be truncated and eventually dropped after a two-week grace period.
Measure the source-side impact after cutover: p99 commit latency drops by 15–40% (no more trigger overhead); lock contention on cdc_events disappears; the source database CPU steady-state drops by 5–10%.

Output.

Metric	Trigger-based CDC	Debezium
Trigger overhead on source	15–40% commit latency	0% (WAL already written)
Shadow-table lock contention	high	none
Latency (event → Kafka)	5–30 s (cron interval)	< 1 s
Missed events on restart	possible (offset in shadow table)	0 (offset in Kafka)
Operational surface	triggers + shadow table + cron + prune	Connect cluster + schema-history topic

Rule of thumb. Trigger-based CDC is a legitimate stopgap when the team cannot yet operate Kafka Connect. But once Kafka is in the stack, Debezium is unambiguously the better answer — same semantics, no source-side overhead, no shadow-table hot spot.

Worked example — Debezium Server vs Kafka Connect mode

Detailed explanation. A common design question: "we want CDC into Pub/Sub, not Kafka; can we still use Debezium?" The answer is yes, via Debezium Server — a standalone Quarkus-based runtime that runs the same connector code but ships events to a pluggable sink instead of the Kafka Connect offset topic. Debezium Server is what makes Debezium the CDC engine of choice even for teams that don't operate Kafka.

Connect mode. Debezium runs as a Kafka Connect connector task. Requires an operating Connect cluster (2+ workers, ZooKeeper or KRaft, offset/config/status topics). The mainstream, most-battle-tested deployment.
Server mode. Debezium runs as a standalone JVM. Same connector code, same snapshot mechanics, same schema-history semantics — but the sink is pluggable (Pulsar, Kinesis, Pub/Sub, EventHubs, Redis Streams, HTTP, or Kafka via a simpler producer).
The trade. Connect mode is more operationally standard for Kafka shops. Server mode is significantly simpler to deploy for teams that don't need Connect for anything else.

Question. A team wants CDC from Postgres to Google Cloud Pub/Sub but does not run Kafka. Design the deployment using Debezium Server, and compare the operational footprint against running Kafka Connect just for this.

Input.

Component	Value
Source	Postgres 16 on GCP Cloud SQL
Sink	Google Cloud Pub/Sub
Team's current Kafka experience	none
Operational preference	minimal moving parts

Code.

# debezium-server config — application.properties equivalent
debezium.sink.type=pubsub
debezium.sink.pubsub.project.id=my-gcp-project
debezium.sink.pubsub.address=pubsub.googleapis.com:443

debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector
debezium.source.database.hostname=10.0.0.5
debezium.source.database.port=5432
debezium.source.database.user=debezium
debezium.source.database.password=${SECRET_PW}
debezium.source.database.dbname=shop
debezium.source.topic.prefix=shop
debezium.source.plugin.name=pgoutput
debezium.source.publication.name=dbz_publication
debezium.source.slot.name=dbz_slot
debezium.source.table.include.list=public.orders,public.customers

debezium.source.offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore
debezium.source.offset.storage.file.filename=/var/lib/debezium/offsets.dat
debezium.source.schema.history.internal=io.debezium.storage.file.history.FileSchemaHistory
debezium.source.schema.history.internal.file.filename=/var/lib/debezium/schema-history.dat

# k8s deployment for debezium-server
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: debezium-server
spec:
  serviceName: debezium
  replicas: 1
  template:
    spec:
      containers:
        - name: server
          image: debezium/server:2.6
          volumeMounts:
            - name: state
              mountPath: /var/lib/debezium
            - name: config
              mountPath: /debezium/conf
  volumeClaimTemplates:
    - metadata:
        name: state
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 20Gi

Step-by-step explanation.

Debezium Server runs as a single JVM inside a StatefulSet. The StatefulSet gives it stable identity (debezium-server-0) and a stable persistent volume for offset and schema-history storage. Unlike Connect mode (where offsets live in a Kafka topic), Server mode's offsets live on local disk — the volume must be durable.
The debezium.sink.type = pubsub line swaps the Kafka producer for a Pub/Sub publisher. The source-side code is identical to Connect mode — same WAL reader, same snapshot machinery, same JSON envelope.
Offset and schema-history storage moves from Kafka topics to local files. This is why StatefulSet + persistent volume is required — if the pod restarts and the volume is lost, the connector cannot resume its position and re-snapshots everything.
Operational footprint: one JVM, one PVC, one Pub/Sub topic. No Connect cluster, no ZooKeeper/KRaft, no ancillary Kafka topics. For a team that already runs Kubernetes but does not run Kafka, this is a 10× simpler deployment.
The trade-off: no cross-connector metadata topic sharing (each Debezium Server is isolated), no built-in horizontal scaling (add a second server with a different slot.name if you need throughput), and slightly less community tooling than Connect mode. For most single-source CDC pipelines this is a perfectly reasonable trade.

Output.

Concern	Kafka Connect mode	Debezium Server mode
Runtime	Connect worker cluster	Single JVM (StatefulSet)
Offset storage	Kafka topic	Local file (PVC)
Schema history storage	Kafka topic	Local file (PVC)
Sinks supported	Kafka only	Kafka, Pulsar, Kinesis, Pub/Sub, EventHubs, Redis Streams, HTTP
Team requirements	operates Kafka Connect	operates a JVM + PVC
Best for	Kafka-native shops	non-Kafka sinks or Kafka-averse teams

Rule of thumb. If Kafka is already in the stack, use Connect mode — the operational patterns are standard. If Kafka is not in the stack and you don't want to add it just for CDC, Debezium Server is the answer. The connector code and semantics are identical; the runtime is the choice.

Senior interview question on the Debezium architecture choice

A senior interviewer often opens with: "Your team runs Postgres and needs to ship every row change to Snowflake within 5 seconds of commit. Walk me through why you'd pick Debezium over polling, over triggers, and over a Snowflake-native CDC connector, and what your reference deployment would look like end-to-end."

Solution Using Debezium on Kafka Connect with a Snowflake sink

{
  "name": "orders-to-snowflake-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",
    "database.hostname": "db.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "***",
    "database.dbname": "shop",
    "topic.prefix": "shop.cdc",
    "plugin.name": "pgoutput",
    "publication.name": "dbz_publication",
    "slot.name": "dbz_slot",
    "table.include.list": "public.orders,public.line_items,public.customers,public.payments",
    "snapshot.mode": "initial",
    "snapshot.locking.mode": "none",
    "signal.data.collection": "public.debezium_signal",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-history.shop",
    "topic.creation.default.replication.factor": "3",
    "topic.creation.default.partitions": "12",
    "key.converter": "io.confluent.connect.avro.AvroConverter",
    "key.converter.schema.registry.url": "http://schema-registry:8081",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url": "http://schema-registry:8081",
    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "transforms.unwrap.drop.tombstones": "false",
    "transforms.unwrap.delete.handling.mode": "rewrite"
  }
}

Step-by-step trace.

Layer	Component	Latency budget
Source	Postgres commit → WAL flush	~0 (already required)
Read	Debezium tails WAL via pgoutput	~50 ms
Envelope	Convert to Avro, wrap in envelope	~5 ms
Publish	Producer send to Kafka (acks=all)	~20 ms
Consume	Snowflake Kafka Connector (batch)	~2 s
Load	Snowpipe Streaming ingest	~1 s
Total	Postgres commit → Snowflake table	~3.1 s (target: 5 s ✓)

After the rollout, the pipeline delivers row-level changes into Snowflake within 3 seconds p99 and 5 seconds p999. The initial snapshot takes ~2 hours for the 10 TB source (bounded by network + Snowflake load rate); the streaming phase then runs indefinitely at ~50k events/sec steady state.

Output:

Metric	Polling CDC	Trigger CDC	Debezium + Snowflake sink
Latency p99	30–60 s	5–30 s	3 s
Source CPU overhead	10–20%	15–40%	~0%
Correctness	lossy	tight	exact
Operational surface	scheduler + query	triggers + shadow + cron	Connect + schema-history + registry
SLA achievable	30 s	15 s	5 s

Why this works — concept by concept:

Log-based CDC — the WAL captures every INSERT/UPDATE/DELETE in commit order with zero additional source-side work. Postgres is already writing the WAL for crash recovery; Debezium is a free rider.
Kafka Connect — offset management, retries, dead-letter queue, JMX metrics, REST admin — all inherited from the Connect framework. You do not write CDC infrastructure; you configure it.
ExtractNewRecordState SMT — the raw Debezium envelope has a before / after / op / source wrapper; downstream sinks usually want just the new row. The unwrap transform flattens the envelope to the row payload. delete.handling.mode = rewrite preserves DELETEs as a payload with a __deleted = true marker (versus dropping them silently).
Avro + Schema Registry — Avro gives you compact binary encoding + forward/backward schema evolution rules. The Schema Registry (Confluent or Apicurio) enforces compatibility on every write, so a bad schema change is caught at the connector, not at the downstream consumer.
Cost — Debezium adds one Connect cluster, one schema-history topic (compacted, single-partition, low volume), and one Schema Registry (which many teams already run for other reasons). Steady-state CPU cost on the source database is O(WAL flush rate), which is bounded by the transaction volume — the same rate Postgres already handles.

Streaming
Topic — streaming
Streaming CDC and change-log problems

Practice →

ETL Topic — etl ETL pipeline design with log-based CDC

Practice →

2. Connectors — Postgres, MySQL, Mongo, SQL Server

One connector JAR per source engine — `pgoutput`, `binlog`, change stream, and CDC tables in a single mental map

The mental model in one line: Debezium ships one connector class per source engine — PostgresConnector reads WAL via pgoutput (or the legacy wal2json), MySqlConnector reads the binlog with GTID tracking, MongoDbConnector consumes change streams (oplog on older setups), and SqlServerConnector reads the CT/CDC tables populated by SQL Server Agent — and each class has its own permission model, snapshot mechanics, and schema evolution rules that the senior candidate is expected to know by heart.

The four axes interviewers actually probe for connector questions.

Log-reading mechanism. WAL for Postgres, binlog for MySQL, change stream for Mongo, CDC tables for SQL Server. Each mechanism has a distinct on-disk format and access pattern; naming the right one is the first senior signal.
Source-side configuration. Postgres needs wal_level = logical + a replication slot + a publication. MySQL needs binlog_format = ROW + binlog_row_image = FULL + gtid_mode = ON. Mongo needs the replica set to expose change streams (default on modern Mongo). SQL Server needs EXEC sys.sp_cdc_enable_db and per-table sys.sp_cdc_enable_table.
Permission model. Postgres needs a role with REPLICATION attribute. MySQL needs REPLICATION SLAVE, REPLICATION CLIENT on the connector user. Mongo needs a role with the changeStream privilege (or a superuser role). SQL Server needs db_owner (or fine-grained CDC roles) to read the CDC tables.
Snapshot mechanics. Each connector class implements the snapshot phase differently — Postgres uses a repeatable-read transaction + SELECT; MySQL uses FLUSH TABLES WITH READ LOCK (or --single-transaction equivalent); Mongo uses a fullDocument change-stream lookup; SQL Server uses a SELECT against the source table plus a bookmark against the CDC bookmark table.

Postgres — the reference connector.

Log-reading mechanism. WAL via logical replication. The connector opens a replication slot (slot.name) and consumes decoded WAL records via a decoding plugin.
Plugin choice. pgoutput is built into Postgres 10+ and is the default. wal2json and decoderbufs are legacy plugins used on older versions or when JSON-format decoding is preferred at the source. Modern deployments always pick pgoutput.
Source-side config. Set wal_level = logical, max_replication_slots >= 10 (headroom), max_wal_senders >= 10, wal_sender_timeout = 60s (or as required). Create a PUBLICATION listing the tables to be tracked; the connector's publication.name must match.
REPLICA IDENTITY. For UPDATE and DELETE events, Postgres needs to know which columns identify the row. Default is DEFAULT (primary key). Set REPLICA IDENTITY FULL on tables without a primary key or where the connector should emit all old-column values on UPDATE/DELETE (useful for downstream diff logic).
Slot warning. A replication slot holds back WAL cleanup as long as it exists. A stale slot (connector down for days) can fill the WAL disk. Set alerting on pg_replication_slots.confirmed_flush_lsn lag.

MySQL — binlog with GTID tracking.

Log-reading mechanism. Binary log (binlog). The connector reads binlog events by connecting as a replica and requesting binlog events from a specific position.
Format. binlog_format = ROW is mandatory (STATEMENT and MIXED are not supported). binlog_row_image = FULL gives you before + after for UPDATEs; MINIMAL saves binlog volume but breaks change-data-capture semantics for many downstream use cases.
GTID. Global Transaction IDs — set gtid_mode = ON and enforce_gtid_consistency = ON. GTIDs give you cleaner resume semantics after failover; without GTID the connector tracks (binlog file, position) which becomes invalid when replicas fail over.
Source-side config. expire_logs_days >= 7 (or binlog_expire_logs_seconds) to keep binlog around long enough for the connector to catch up after downtime. Set log_slave_updates = ON if reading from a replica (so its binlog contains the primary's changes).
Permission model. GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium'@'%'; — the classic set.

MongoDB — change streams on a replica set.

Log-reading mechanism. Change streams — a MongoDB API that returns an ordered stream of collection-level change events, backed by the oplog on the replica set.
Requirement. Must be a replica set (or sharded cluster); single-node standalone Mongo does not have an oplog. Modern managed offerings (Atlas) are always replica sets.
Source-side config. Set the oplog window (replication.oplogSizeMB) large enough for the connector to catch up after downtime. Enable authentication (security.authorization = enabled) — Debezium won't connect to an unauthenticated Mongo.
Full document lookup. Change streams by default emit the delta ("update these fields to these values"). Set capture.mode = change_streams_update_full to get the full document on every event (extra Mongo lookup per event; useful for downstream sinks that want the full state).
Permission model. Role with changeStream privilege on the target collection, plus find for the initial snapshot phase.

SQL Server — CDC tables via SQL Server Agent.

Log-reading mechanism. Not the transaction log directly. SQL Server Agent's CDC feature reads the log and populates a per-table cdc.<schema>_<table>_CT table with change records. Debezium then reads those tables.
Enablement. EXEC sys.sp_cdc_enable_db; at the database level, then EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'orders', @role_name = 'cdc_reader'; per table.
SQL Server Agent must be running. The CDC job runs inside SQL Server Agent; if the Agent service is down, the CT tables stop advancing and Debezium stalls.
Permission model. Debezium user needs db_owner (or membership in the CDC role created by sp_cdc_enable_table) to SELECT from CT tables.
Cleanup. CDC tables are pruned by an Agent job (cleanup_change_tables). Retention is configurable via sys.sp_cdc_change_job. Set retention long enough for the connector to catch up after downtime.

Choosing the right connector — the map.

Postgres. io.debezium.connector.postgresql.PostgresConnector + plugin.name = pgoutput.
MySQL / MariaDB. io.debezium.connector.mysql.MySqlConnector (there is also a MariaDB connector for MariaDB-specific quirks).
MongoDB. io.debezium.connector.mongodb.MongoDbConnector.
SQL Server. io.debezium.connector.sqlserver.SqlServerConnector.
Oracle. io.debezium.connector.oracle.OracleConnector (LogMiner or XStream — XStream requires Oracle GoldenGate license).
Db2 / Cassandra / Vitess. io.debezium.connector.db2.Db2Connector, io.debezium.connector.cassandra.CassandraConnector, io.debezium.connector.vitess.VitessConnector — less common.

Common interview probes on connectors.

"What's the difference between pgoutput and wal2json?" — pgoutput is built into Postgres 10+ and is the default; wal2json is a legacy JSON-decoding plugin.
"Why does MySQL need binlog_row_image = FULL?" — for UPDATE events, you need before + after to compute diffs downstream.
"What is REPLICA IDENTITY FULL for in Postgres?" — makes UPDATE and DELETE events carry all old column values, not just the primary key.
"Can I run Debezium against a standalone MongoDB?" — no, must be a replica set (or sharded cluster).
"What breaks if SQL Server Agent is down?" — the CT tables stop advancing; Debezium stalls; a runbook alert on Agent job status is required.

Worked example — Postgres connector with logical replication

Detailed explanation. The classic starter — set up a Debezium Postgres connector against a Postgres 16 database, with a publication, a replication slot, a REPLICA IDENTITY policy for tables without primary keys, and the exact permissions the connector user needs. Walk an interviewer through every step end-to-end.

Postgres side. wal_level = logical, replication slot, publication, connector role with REPLICATION attribute.
Connector side. JSON config with plugin.name = pgoutput, publication.name, slot.name, table.include.list.
Common gotcha. Tables without primary keys must have REPLICA IDENTITY FULL or the connector emits UPDATE/DELETE events with only a key placeholder.

Question. Configure a Debezium Postgres connector against the shop database that captures public.orders and public.line_items. Include the Postgres-side setup and the connector JSON.

Input.

Component	Value
Postgres version	16
Database	shop
Tables	public.orders, public.line_items
Connector role	debezium
Kafka bootstrap	kafka:9092

Code.

-- Postgres side — one-time setup
-- postgresql.conf
-- wal_level = logical
-- max_replication_slots = 10
-- max_wal_senders = 10

-- Create the connector role
CREATE ROLE debezium WITH REPLICATION LOGIN PASSWORD 'strong-secret';
GRANT USAGE ON SCHEMA public TO debezium;
GRANT SELECT ON public.orders, public.line_items TO debezium;

-- Create the publication
CREATE PUBLICATION dbz_publication FOR TABLE public.orders, public.line_items;

-- If line_items has no primary key, set REPLICA IDENTITY FULL
ALTER TABLE public.line_items REPLICA IDENTITY FULL;

-- Verify
SELECT slot_name, plugin, database, active
FROM   pg_replication_slots;
SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete
FROM   pg_publication;

{
  "name": "shop-postgres-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",
    "database.hostname": "db.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "strong-secret",
    "database.dbname": "shop",
    "topic.prefix": "shop",
    "plugin.name": "pgoutput",
    "publication.name": "dbz_publication",
    "slot.name": "dbz_shop_slot",
    "publication.autocreate.mode": "disabled",
    "table.include.list": "public.orders,public.line_items",
    "snapshot.mode": "initial",
    "snapshot.locking.mode": "none",
    "signal.data.collection": "public.debezium_signal",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-history.shop",
    "heartbeat.interval.ms": "10000",
    "heartbeat.topics.prefix": "__debezium-heartbeat"
  }
}

Step-by-step explanation.

wal_level = logical unlocks Postgres's logical decoding path. The connector opens a replication slot; every WAL record for the tables in the publication is decoded and streamed to the connector.
The connector role has the REPLICATION attribute plus SELECT on the target tables (for the initial snapshot; streaming uses replication protocol, not SELECT). The USAGE ON SCHEMA public grant is easy to forget and causes cryptic "permission denied" errors on the snapshot phase.
CREATE PUBLICATION dbz_publication FOR TABLE ... explicitly lists the tables. Alternative — FOR ALL TABLES — is easier but captures more than you want; the explicit list is the senior-DE default. The connector's publication.autocreate.mode = disabled disables autocreation from the Debezium side to avoid a race with schema setup.
ALTER TABLE public.line_items REPLICA IDENTITY FULL makes UPDATE and DELETE events carry all column values, not just the primary key. Required for line_items if it lacks a natural key.
The heartbeat config (heartbeat.interval.ms = 10000) tells the connector to emit an empty event every 10 seconds. This keeps the replication slot's confirmed_flush_lsn moving forward even when the source tables are idle — otherwise, an idle table would prevent WAL cleanup on the source.

Output.

Kafka topic (shop.public.orders)	Payload shape
Key	`{"id": 12345}`
Value	`{"before": null, "after": {"id":12345,"tenant_id":1,"total_cents":19900,"status":"NEW"}, "op":"c", "source": {...}, "ts_ms": 1720000000000}`
Tombstone (on delete, after unwrap)	Key `{"id": 12345}`, Value `null`

Rule of thumb. Every Postgres Debezium deployment ships with a heartbeat topic. Skipping the heartbeat is the #1 cause of "our Postgres WAL disk filled up because a replication slot was stuck at LSN X for six hours."

Worked example — MySQL connector with GTID

Detailed explanation. MySQL's binlog is the source-of-truth for replication; Debezium reads it as if it were another replica. The two source-side switches that matter are binlog_format = ROW (mandatory) and gtid_mode = ON (strongly recommended). GTID makes the connector resilient to primary failover — after failover, the connector resumes from the GTID rather than a now-invalid (binlog file, position) tuple.

binlog_format. ROW gives you per-row change events. STATEMENT gives you SQL text (not usable for CDC). MIXED is unpredictable.
binlog_row_image. FULL captures all columns on UPDATE (needed for CDC). MINIMAL captures only changed columns + key.
GTID. Cleaner resume semantics after failover; also enables MySQL's own multi-source replication features.

Question. Configure a Debezium MySQL connector against a MySQL 8 database with GTID. Include the my.cnf changes and the connector JSON.

Input.

Component	Value
MySQL version	8.0
Database	shop
Tables	shop.orders, shop.customers
Connector user	debezium
binlog retention	7 days

Code.

# my.cnf — MySQL 8 config for Debezium
[mysqld]
server-id                  = 42
log_bin                    = mysql-bin
binlog_format              = ROW
binlog_row_image           = FULL
binlog_expire_logs_seconds = 604800   ; 7 days
gtid_mode                  = ON
enforce_gtid_consistency   = ON
log_slave_updates          = ON

-- MySQL side — connector user grants
CREATE USER 'debezium'@'%' IDENTIFIED WITH mysql_native_password BY 'strong-secret';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT
  ON *.* TO 'debezium'@'%';
FLUSH PRIVILEGES;

-- Verify GTID and binlog format
SHOW VARIABLES LIKE 'binlog_format';       -- ROW
SHOW VARIABLES LIKE 'binlog_row_image';    -- FULL
SHOW VARIABLES LIKE 'gtid_mode';           -- ON
SHOW MASTER STATUS;                        -- capture the starting Executed_Gtid_Set

{
  "name": "shop-mysql-cdc",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql.internal",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "strong-secret",
    "database.server.id": "9999",
    "database.include.list": "shop",
    "table.include.list": "shop.orders,shop.customers",
    "topic.prefix": "shop",
    "snapshot.mode": "initial",
    "snapshot.locking.mode": "minimal",
    "include.schema.changes": "true",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-history.mysql.shop",
    "gtid.source.includes": "",
    "gtid.source.excludes": ""
  }
}

Step-by-step explanation.

binlog_format = ROW is non-negotiable; STATEMENT and MIXED cannot represent the per-row change events Debezium needs. binlog_row_image = FULL ensures UPDATE events carry the before-image (all columns), letting downstream sinks compute diffs.
gtid_mode = ON + enforce_gtid_consistency = ON enable GTIDs. Debezium tracks the connector position as a GTID set; on primary failover, the new primary continues emitting the same GTID set, and the connector resumes seamlessly. Without GTID the connector tracks (binlog file, position), which becomes invalid on failover.
binlog_expire_logs_seconds = 604800 keeps binlogs around for 7 days. If Debezium is down longer than that, the connector cannot resume without a full snapshot. 7 days is a reasonable balance between disk usage and downtime tolerance.
database.server.id = 9999 is the fake replica ID Debezium presents to MySQL. Must be unique across all replicas (real and virtual). Collisions cause silent binlog delivery gaps.
snapshot.locking.mode = minimal uses a brief FLUSH TABLES WITH READ LOCK at snapshot start to get a consistent binlog position, then releases the lock. extended holds the lock through the entire snapshot (safer but blocks writes); none skips the lock entirely (fast but risks inconsistency if writes race with the snapshot).

Output.

Kafka topic (shop.shop.orders)	Payload shape
Key	`{"id": 12345}`
Value	`{"before": {"id":12345,"total_cents":19900,"status":"NEW"}, "after": {"id":12345,"total_cents":19900,"status":"SHIPPED"}, "op":"u", "source":{"gtid":"...","file":"mysql-bin.000042","pos":12345}}`

Rule of thumb. For MySQL, always turn on GTID before running Debezium. Retrofitting GTID onto an existing replication topology is a multi-hour operation; doing it up-front is a one-line config change.

Worked example — MongoDB change streams

Detailed explanation. MongoDB's change stream API gives you an ordered feed of collection-level change events, backed by the oplog on a replica set. Debezium's Mongo connector consumes this stream directly; there is no "read the oplog" surface — it's the change stream all the way down. The two axes to know are capture.mode (delta vs full document) and the requirement for a replica set (standalone Mongo does not expose an oplog).

Change stream. MongoDB API introduced in 3.6; returns events with resume tokens for restart.
Replica set requirement. Change streams require an oplog, which only exists on replica sets and sharded clusters.
Full document lookup. By default, the change stream emits a delta on UPDATE. Set capture.mode = change_streams_update_full to fetch the full document on every event.

Question. Configure a Debezium MongoDB connector against a replica set, capturing the shop.orders collection with full-document mode.

Input.

Component	Value
MongoDB version	6.0
Topology	3-node replica set
Collection	shop.orders
Connector user	debezium
Auth database	admin

Code.

// Mongo side — connector user + role
use admin;
db.createRole({
  role: "debezium_reader",
  privileges: [
    { resource: { db: "shop", collection: "" }, actions: [ "find", "changeStream" ] },
    { resource: { db: "local", collection: "oplog.rs" }, actions: [ "find" ] }
  ],
  roles: []
});

db.createUser({
  user: "debezium",
  pwd: "strong-secret",
  roles: [ { role: "debezium_reader", db: "admin" } ]
});

{
  "name": "shop-mongo-cdc",
  "config": {
    "connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
    "tasks.max": "1",
    "mongodb.connection.string": "mongodb://debezium:strong-secret@mongo-0:27017,mongo-1:27017,mongo-2:27017/?replicaSet=rs0&authSource=admin",
    "topic.prefix": "shop",
    "capture.scope": "database",
    "capture.target": "shop",
    "collection.include.list": "shop.orders",
    "snapshot.mode": "initial",
    "capture.mode": "change_streams_update_full"
  }
}

Step-by-step explanation.

Mongo's replica set requirement is non-negotiable — a standalone mongod has no oplog, and the change stream API returns an error. Managed offerings (Atlas) are always replica sets, so this is transparent for managed customers.
The connector authenticates against the admin database using the debezium user with the debezium_reader custom role. The role grants find + changeStream on the shop database and find on local.oplog.rs (needed for the initial snapshot resume-token machinery).
capture.mode = change_streams_update_full tells the connector to include the full post-image document on every event. Without this, UPDATEs emit only the delta ({"status": "SHIPPED"} for a single-field update); downstream sinks that want the full row need to reconstruct it, which is fragile.
snapshot.mode = initial runs an initial find snapshot on shop.orders at connector startup, then transitions to the change stream. Alternative modes (never, when_needed) skip the snapshot; used when the collection is already in the downstream sink from a previous load.
The connection string is a full replica-set URI, listing all three replica members and the replica-set name. Debezium picks a primary or secondary read source based on the read-preference implicit in the change-stream API (typically the primary).

Output.

Kafka topic (shop.shop.orders)	Payload shape
Key	`{"id": "6512a4f3d1..."}`
Value	`{"before":null,"after":"{\"_id\":\"6512a4f3d1...\",\"tenant_id\":1,\"total_cents\":19900,\"status\":\"NEW\"}", "op":"c", ...}`

Rule of thumb. For MongoDB, always use change_streams_update_full unless you have a specific reason to want deltas. The extra Mongo lookup per event is well worth the operational simplicity downstream.

Senior interview question on connector selection and setup

A senior interviewer might ask: "You have three source databases in a fintech stack — a Postgres for transactions, a MongoDB for user documents, and a SQL Server for a legacy ledger. All three need CDC into a single Kafka cluster with unified consumer semantics. Walk me through the connector configs, the source-side setup for each, and the per-source gotchas you'd guard against."

Solution Using one connector class per source with unified topic-prefix conventions

[
  {
    "name": "fintech-postgres-tx",
    "config": {
      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "database.hostname": "pg.internal",
      "database.dbname": "fintech",
      "database.user": "debezium",
      "database.password": "***",
      "topic.prefix": "fin.pg",
      "plugin.name": "pgoutput",
      "publication.name": "dbz_publication",
      "slot.name": "dbz_fintech_slot",
      "table.include.list": "public.transactions,public.accounts",
      "snapshot.mode": "initial",
      "signal.data.collection": "public.debezium_signal",
      "heartbeat.interval.ms": "10000",
      "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
      "schema.history.internal.kafka.topic": "schema-history.fin.pg"
    }
  },
  {
    "name": "fintech-mongo-users",
    "config": {
      "connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
      "mongodb.connection.string": "mongodb://debezium@mongo-0,mongo-1,mongo-2/?replicaSet=rs0&authSource=admin",
      "topic.prefix": "fin.mongo",
      "capture.scope": "database",
      "capture.target": "fintech",
      "collection.include.list": "fintech.users,fintech.sessions",
      "snapshot.mode": "initial",
      "capture.mode": "change_streams_update_full"
    }
  },
  {
    "name": "fintech-sqlserver-ledger",
    "config": {
      "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
      "database.hostname": "sqlserver.internal",
      "database.port": "1433",
      "database.user": "debezium",
      "database.password": "***",
      "database.names": "ledger",
      "database.encrypt": "false",
      "topic.prefix": "fin.mssql",
      "table.include.list": "dbo.journal,dbo.balances",
      "snapshot.mode": "initial",
      "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
      "schema.history.internal.kafka.topic": "schema-history.fin.mssql"
    }
  }
]

Step-by-step trace.

Connector	Source-side gotcha	Runtime gotcha
Postgres	`wal_level = logical`, publication list, REPLICA IDENTITY	Stale slot fills WAL disk — heartbeat topic mandatory
MongoDB	Must be replica set, `changeStream` privilege	Long-running change stream must handle resume-token expiry
SQL Server	`sp_cdc_enable_db` + per-table `sp_cdc_enable_table`, Agent running	Agent job failure stalls CT tables — Agent monitoring mandatory

After the rollout, the three connectors write to fin.pg.*, fin.mongo.*, and fin.mssql.* topic namespaces. Downstream consumers see a uniform envelope shape (Debezium standard) across all three sources; the topic-prefix convention keeps ownership boundaries clear.

Output:

Source	Topics created	Steady-state throughput	Heartbeat topic
Postgres (transactions, accounts)	fin.pg.public.transactions, fin.pg.public.accounts	20k events/sec	fin.pg.__debezium-heartbeat
MongoDB (users, sessions)	fin.mongo.fintech.users, fin.mongo.fintech.sessions	5k events/sec	(native change stream keepalive)
SQL Server (journal, balances)	fin.mssql.dbo.journal, fin.mssql.dbo.balances	2k events/sec	(bookmark-driven; Agent monitoring separate)

Why this works — concept by concept:

One connector class per source — no "unified" connector attempts to cover multiple sources; each engine has its own log format, permission model, and snapshot mechanics. Debezium's design decision is to have a specialised connector JAR per source.
Unified topic-prefix convention — topic.prefix = fin.<source> gives you a consistent naming pattern across sources. Downstream consumers can subscribe to fin.* for all fintech CDC events, or to fin.pg.* for just the Postgres subset.
Uniform envelope — every Debezium event has the same before / after / op / source / ts_ms shape regardless of source engine. Downstream code can be written once against the envelope; source-specific fields (Postgres LSN, MySQL GTID, Mongo resume token) live in the source sub-object and are optional.
Per-source gotchas — the senior signal is naming the source-specific traps: heartbeat for Postgres, GTID + binlog retention for MySQL, replica-set requirement + full-doc mode for Mongo, Agent monitoring for SQL Server. Each is a real production bug someone lived through.
Cost — three connectors on one Connect cluster; the connectors share the Connect offset topic and the Schema Registry; each has its own schema.history topic (single-partition, compacted, low volume). The steady-state resource cost is dominated by the network hop from Kafka Connect to the Kafka brokers, not by the connectors themselves.

Streaming
Topic — streaming
Streaming connector and change-log problems

Practice →

ETL Topic — etl ETL problems on multi-source CDC ingestion

Practice →

3. Snapshot modes + incremental snapshotting

Six snapshot modes and one signal table — the difference between a 6-hour outage and a live table addition

The mental model in one line: Debezium's snapshot phase reads the current state of the source tables via SELECT (once, per mode), and the streaming phase then tails the log from the snapshot's boundary offset; snapshot.mode picks whether the snapshot runs on first startup only (initial), on every startup (always), never (never), only when the offset is missing (when_needed), or only for the schema (schema_only) — and 2.x adds incremental snapshotting that runs chunk-by-chunk while streaming continues live, driven by inserts into a debezium_signal control table.

The four axes interviewers actually probe for snapshot questions.

Mode selection. Which of the six modes for which scenario — cold start on a new table, restart after a lost offset, adding a table to a live connector, capturing only the schema without the data.
Snapshot-vs-streaming boundary. The connector runs the snapshot inside a repeatable-read transaction; the boundary offset is the LSN/GTID captured at snapshot start. Streaming then resumes from that boundary. Understanding this handoff is the senior signal.
Incremental snapshot. The signal-table mechanism — how it works, how you trigger a chunk, how it interleaves with live streaming.
Snapshot locking. Whether the snapshot takes source-side locks (extended), a brief flush lock (minimal), or no lock (none). Each has correctness implications.

The six snapshot modes.

initial. Default. Runs a full snapshot on first startup (no committed offset); on subsequent startups the snapshot is skipped and streaming resumes from the committed offset.
always. Runs a full snapshot on every startup. Rarely wanted; useful for test harnesses.
never. Skips the snapshot entirely. Streaming starts from wherever the log currently is. Data written before the connector started is not captured. Used when the downstream sink already has the initial state loaded via a different mechanism.
initial_only. Runs the snapshot and then stops. Used for one-shot backfills where you don't want ongoing streaming.
when_needed. Runs a snapshot only when there is no committed offset and the log is not available from the required position. Used on failover topologies where offsets may be lost.
schema_only. Captures the schema of the tables (populates the schema-history topic) but does not read the data. Streaming starts from the current log position. Used when the downstream sink already has the data loaded and only needs the schema for future decoding.

Incremental snapshotting — the 2.x game-changer.

The problem it solves. Traditional snapshots block streaming for the entire snapshot duration (or race with it at the risk of missed events). For a 10 TB table, that's hours of blocked CDC.
The mechanism. The connector splits the table into chunks by primary key (or a configurable key). It reads each chunk with a bounded SELECT (WHERE key BETWEEN low AND high) and interleaves the chunk reads with continued live streaming. Every chunk is a separate small transaction.
The trigger. Inserting a row into the debezium_signal table with a JSON payload ({"type":"execute-snapshot","data":{"data-collections":["public.orders"]}}) triggers the connector to start an incremental snapshot on public.orders. The signal is picked up by the streaming phase and interpreted as a snapshot request.
Interleaving semantics. For any row X, its snapshot event and any streaming events (INSERT/UPDATE/DELETE while snapshot runs) are ordered by the source's log-order rules. The downstream sink sees a consistent view — no lost events, no duplicates beyond the standard at-least-once floor.

The signal table — how it works.

Table shape. CREATE TABLE public.debezium_signal (id VARCHAR PRIMARY KEY, type VARCHAR NOT NULL, data VARCHAR); — three columns.
Config. signal.data.collection = public.debezium_signal in the connector config.
Command. INSERT INTO public.debezium_signal (id, type, data) VALUES ('adhoc-1', 'execute-snapshot', '{"data-collections": ["public.orders"], "type": "incremental"}'); — one row per snapshot command.
Read path. The connector sees the INSERT event in the WAL/binlog stream, parses the JSON, and executes the requested action. No connector restart required.
Additional signals. execute-snapshot (incremental), stop-snapshot (halt an in-progress incremental snapshot), log (write a log message — useful for debugging), pause and resume (2.x additions).

Snapshot locking modes — what each does.

extended (Postgres) / minimal_percona (MySQL Percona). Holds a lock for the entire snapshot. Safest; blocks writes; only viable for maintenance windows.
minimal. Brief lock at start to capture the boundary offset, then releases. The snapshot then runs in a repeatable-read transaction. Standard for MySQL.
none. No lock at all. Fastest; risks a missed event if the boundary offset moves while the snapshot runs. Debezium 2.x mitigates this using the stalled-updates re-check mechanism, but the correctness proof is subtle.

Common interview probes on snapshots.

"Explain the difference between initial and when_needed." — initial snapshots on first startup only; when_needed snapshots when the offset is missing and the log position is unavailable.
"What is the signal table for?" — trigger channel for incremental snapshots and other runtime commands.
"Walk me through adding a table to a live connector without downtime." — add the table to table.include.list, restart the connector so the config change takes effect, then insert an execute-snapshot signal for the new table. The connector runs an incremental snapshot for the new table while continuing to stream the existing ones.
"What breaks if I set snapshot.mode = never on a new connector?" — the snapshot phase is skipped; rows that existed before the connector started are never captured. Only useful when the downstream sink has been loaded via a separate mechanism.

Worked example — choosing a snapshot mode

Detailed explanation. A team has four different Debezium deployments across four different scenarios. Each scenario picks a different snapshot mode. Walk through the choice for each.

Scenario 1. Brand-new pipeline; downstream sink is empty; must capture full initial state.
Scenario 2. Existing pipeline restarting after a Kafka Connect cluster upgrade; the offset topic survived the upgrade.
Scenario 3. Migration from a legacy CDC system; downstream sink is already loaded with current state from the legacy system; only new changes need to flow.
Scenario 4. Test environment where every connector startup should re-snapshot to get a deterministic starting state.

Question. Pick the snapshot mode for each scenario and justify.

Input.

Scenario	Sink loaded?	Offset topic present?	Downtime OK?
1. Brand new	no	no	no
2. Connect upgrade restart	yes	yes	no
3. Migration from legacy CDC	yes	no	no
4. Test env	(any)	(any)	yes

Code.

{
  "scenario_1_brand_new": {
    "snapshot.mode": "initial"
  },
  "scenario_2_restart": {
    "snapshot.mode": "initial",
    "_comment": "Same as scenario 1; snapshot is skipped on restart because the offset topic is present"
  },
  "scenario_3_migration": {
    "snapshot.mode": "never",
    "_comment": "Skip snapshot; streaming picks up from current log position"
  },
  "scenario_4_test_env": {
    "snapshot.mode": "always",
    "_comment": "Re-snapshot every startup for deterministic tests"
  }
}

Step-by-step explanation.

Scenario 1 — initial. The connector runs a full snapshot on first startup (no committed offset), then transitions to streaming. The sink gets a complete initial state followed by every subsequent change.
Scenario 2 — still initial. The mode is unchanged; the snapshot is only skipped because the offset topic already has a committed position. On a restart after a Connect upgrade, the connector reads its last committed offset and resumes streaming from that point.
Scenario 3 — never. The sink is already loaded from the legacy CDC system as of some point in time. The connector starts streaming from the current log position; the app must ensure the legacy CDC has been drained past that point so no changes are lost. This is the "cutover" pattern from legacy CDC.
Scenario 4 — always. Every startup re-snapshots. Wasteful in production but perfect for test environments where you want the connector to start from a known state on each run.
The lesson: snapshot.mode is not "always snapshot on start" — it's "when to snapshot." Most production deployments use initial; migration cutovers use never; test harnesses use always. when_needed is the specialised failover-safe mode when the offset topic may be lost.

Output.

Scenario	Mode	What runs on first startup	What runs on restart
Brand new	initial	full snapshot	streaming from committed offset
Restart	initial	streaming from committed offset	streaming from committed offset
Migration	never	streaming from current log	streaming from committed offset
Test env	always	full snapshot	full snapshot

Rule of thumb. initial is the correct default for 90% of deployments. Reach for never only when the sink is pre-loaded via a separate mechanism; reach for when_needed only when your offset topic may be lost due to failover.

Worked example — adding a table to a live connector via incremental snapshot

Detailed explanation. The killer feature of 2.x: adding a new table to a running connector without stopping the CDC stream for the existing tables. Without incremental snapshotting, adding a table requires either a full connector restart with snapshot.mode = initial (which re-snapshots every table, hours of work) or a bespoke bootstrapping process. Incremental snapshot fixes this — a single signal-table INSERT triggers a chunk-by-chunk snapshot of the new table while streaming continues live for the existing tables.

Steps. (1) Update table.include.list to add the new table. (2) Restart the connector (config change requires restart). (3) INSERT into debezium_signal to trigger the incremental snapshot. (4) Wait for chunks to finish.
Semantics. The new table's snapshot events land on a Kafka topic; any live changes to the new table land on the same topic in commit order.
Chunks. By default, the snapshot uses primary-key range chunks with a chunk size of 1024 rows.

Question. A live connector captures public.orders and public.line_items. Add public.payments to the same connector using incremental snapshot, without pausing the existing streams.

Input.

Component	Value
Existing tables	public.orders, public.line_items
New table	public.payments (1M rows)
Chunk size	1024 rows (default)
Signal table	public.debezium_signal

Code.

-- 1. Ensure the signal table exists on the source
CREATE TABLE IF NOT EXISTS public.debezium_signal (
  id   VARCHAR PRIMARY KEY,
  type VARCHAR NOT NULL,
  data VARCHAR
);
GRANT SELECT, INSERT ON public.debezium_signal TO debezium;

-- 2. Add the new table to the publication
ALTER PUBLICATION dbz_publication ADD TABLE public.payments;

-- 3. Update the connector config to include the new table (via Connect REST)
-- PUT /connectors/shop-postgres-cdc/config
-- { "table.include.list": "public.orders,public.line_items,public.payments", ... }

-- 4. Trigger the incremental snapshot for the new table
INSERT INTO public.debezium_signal (id, type, data)
VALUES ('add-payments-2026-07-04',
        'execute-snapshot',
        '{"type":"incremental","data-collections":["public.payments"]}');

{
  "name": "shop-postgres-cdc",
  "config": {
    "table.include.list": "public.orders,public.line_items,public.payments",
    "signal.data.collection": "public.debezium_signal",
    "incremental.snapshot.chunk.size": "1024",
    "incremental.snapshot.watermarking.strategy": "insert_delete"
  }
}

Step-by-step explanation.

Ensure the signal table exists and is populated in signal.data.collection. If the signal table was set up at connector-creation time (which is the recommended pattern), skip this step.
Add the new table to the Postgres publication (ALTER PUBLICATION dbz_publication ADD TABLE public.payments;). Without this, the WAL for payments is not decoded and the connector cannot stream its changes.
Update the connector's table.include.list via the Kafka Connect REST API (PUT /connectors/<name>/config). This triggers a task rebalance but keeps the existing connector position — no re-snapshot of existing tables.
Insert the signal into public.debezium_signal. The INSERT flows through the WAL like any other event; the connector recognises the row (because signal.data.collection matches), parses the JSON payload, and dispatches an incremental snapshot for public.payments.
The incremental snapshot runs in chunks. The connector reads SELECT * FROM public.payments WHERE id BETWEEN 0 AND 1023 ORDER BY id, emits those 1024 rows to Kafka as snapshot events, then reads the next chunk. Between chunks, live streaming events for public.payments (and every other table) continue to flow. The default chunk size is 1024 rows; increase for larger tables (up to 10000 rows per chunk is common).

Output.

Timeline	Existing tables (orders, line_items)	New table (payments)
T+0 (signal INSERT)	live streaming	none
T+30 s (chunk 1)	live streaming	snapshot rows 0–1023
T+60 s (chunk 2)	live streaming	snapshot rows 1024–2047
...	...	...
T+15 min (last chunk)	live streaming	snapshot rows 999k–1M
T+15 min onward	live streaming	live streaming

Rule of thumb. For any live CDC pipeline where you need to add tables without downtime, incremental snapshotting is the answer. It replaces the entire "pause everything, re-snapshot everything" ordeal with a single INSERT into the signal table.

Worked example — snapshot locking trade-offs

Detailed explanation. Snapshot locking modes are the classic "correctness vs availability" trade-off. extended locks the source for the entire snapshot (correctness-first, blocks writes); minimal locks briefly to capture the boundary offset then releases (standard); none skips the lock (fastest, correctness relies on repeatable-read semantics and Debezium's built-in re-check logic). The senior candidate is expected to know when each is appropriate.

extended. Only viable during maintenance windows or on read-mostly sources.
minimal. The MySQL default; captures the binlog position under a brief FLUSH TABLES WITH READ LOCK, then releases and runs the snapshot in a separate transaction.
none. The most common production choice; correctness relies on the connector's snapshot transaction seeing a consistent view (Postgres REPEATABLE READ, MySQL InnoDB consistent read).

Question. A team is running Debezium against a Postgres 16 source with 5 TB of data. Discuss the three snapshot locking modes, quantify the write-availability impact of each, and recommend the mode for a production deployment.

Input.

Snapshot mode	Lock	Write availability during snapshot	Correctness guarantee
extended	Full table lock	none	strictest
minimal	Brief lock at start	full after ~1 s	strong
none	No lock	full throughout	strong (via repeatable read)

Code.

{
  "extended_mode": {
    "snapshot.locking.mode": "extended",
    "_comment": "Postgres: uses ACCESS EXCLUSIVE for the snapshot; blocks writers"
  },
  "minimal_mode": {
    "snapshot.locking.mode": "minimal",
    "_comment": "Brief FLUSH TABLES WITH READ LOCK, then release"
  },
  "none_mode": {
    "snapshot.locking.mode": "none",
    "_comment": "No lock; relies on REPEATABLE READ semantics of the snapshot transaction"
  }
}

-- Postgres — snapshot in a REPEATABLE READ transaction (the "none" mode)
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ ONLY;
SELECT pg_export_snapshot();
-- (connector uses the exported snapshot handle; every SELECT sees the same consistent view)
SELECT * FROM public.orders;
SELECT * FROM public.line_items;
COMMIT;

Step-by-step explanation.

extended. The connector acquires an ACCESS EXCLUSIVE lock on each table for the snapshot's duration. On a 5 TB source, the snapshot may run for hours — during which time no writes are possible. Only viable during maintenance windows.
minimal. The connector briefly locks (for a few milliseconds) to capture the boundary offset (LSN on Postgres, binlog position on MySQL). Then it releases the lock and runs the snapshot in a separate REPEATABLE READ transaction. Writers see full availability after the ~1 second lock; the snapshot sees a consistent view via MVCC.
none. No source-side lock. The connector opens a REPEATABLE READ transaction, records the current LSN as the boundary, and runs the snapshot inside that transaction. Correctness relies on MVCC: every SELECT inside the transaction sees the same consistent view. Writers on other transactions are unaffected.
The catch with none: if the snapshot transaction runs longer than the source's idle_in_transaction_session_timeout (Postgres) or hits a lock conflict on DDL (ALTER TABLE), the snapshot fails and must restart. In practice, this is rare on a well-tuned source.
The recommendation for a 5 TB Postgres production deployment: none. Postgres MVCC gives you a consistent view without holding any locks. minimal is a legacy MySQL default; extended is only for maintenance windows.

Output.

Locking mode	Write availability	Snapshot duration (5 TB)	Correctness	Best for
extended	0	hours	strictest	maintenance window
minimal	~99%	hours	strong	MySQL default
none	100%	hours	strong (via MVCC)	production Postgres

Rule of thumb. For Postgres, always use snapshot.locking.mode = none. MVCC gives you the consistency guarantee without the availability cost. For MySQL, minimal is the safe default; none is fine on well-tuned deployments; extended is a scheduled-downtime mode.

Senior interview question on snapshot design

A senior interviewer might ask: "You have a live Debezium Postgres connector capturing 20 tables. The team wants to (a) add three new tables to the pipeline, (b) re-snapshot one existing table because a downstream sink lost its data, and (c) never accept a maintenance window for CDC. Walk me through the design end-to-end."

Solution Using incremental snapshotting via signal-table dispatch

-- 1. Ensure the signal table exists (idempotent)
CREATE TABLE IF NOT EXISTS public.debezium_signal (
  id   VARCHAR PRIMARY KEY,
  type VARCHAR NOT NULL,
  data VARCHAR
);
GRANT SELECT, INSERT ON public.debezium_signal TO debezium;

-- 2. Add the new tables to the publication
ALTER PUBLICATION dbz_publication ADD TABLE public.shipments;
ALTER PUBLICATION dbz_publication ADD TABLE public.refunds;
ALTER PUBLICATION dbz_publication ADD TABLE public.audit_log;

-- 3. Update the connector's table.include.list via Connect REST (config change)
-- PUT /connectors/shop-postgres-cdc/config
-- { "table.include.list": "public.orders,...,public.shipments,public.refunds,public.audit_log", ... }

-- 4. Trigger the incremental snapshot for the three new tables + the re-snapshot of orders
INSERT INTO public.debezium_signal (id, type, data) VALUES
  ('snapshot-shipments',  'execute-snapshot', '{"type":"incremental","data-collections":["public.shipments"]}'),
  ('snapshot-refunds',    'execute-snapshot', '{"type":"incremental","data-collections":["public.refunds"]}'),
  ('snapshot-audit-log',  'execute-snapshot', '{"type":"incremental","data-collections":["public.audit_log"]}'),
  ('resnapshot-orders',   'execute-snapshot', '{"type":"incremental","data-collections":["public.orders"]}');

{
  "name": "shop-postgres-cdc",
  "config": {
    "table.include.list": "public.orders,public.line_items,public.payments,public.shipments,public.refunds,public.audit_log,...",
    "signal.data.collection": "public.debezium_signal",
    "snapshot.mode": "initial",
    "snapshot.locking.mode": "none",
    "incremental.snapshot.chunk.size": "4096",
    "incremental.snapshot.watermarking.strategy": "insert_delete",
    "notification.enabled.channels": "sink"
  }
}

Step-by-step trace.

Step	Command	Duration	Impact on existing streams
1	CREATE signal table	< 1 s	none
2	ALTER PUBLICATION ADD TABLE × 3	< 1 s	none
3	PUT /connectors/.../config	~10 s task rebalance	brief pause, no data loss
4a	INSERT signal for shipments	starts chunked snapshot	streaming continues
4b	INSERT signal for refunds	starts chunked snapshot	streaming continues
4c	INSERT signal for audit_log	starts chunked snapshot	streaming continues
4d	INSERT signal for orders resnapshot	starts chunked resnapshot	streaming continues

After the signal INSERTs, the connector runs four independent incremental snapshots (three new tables + one re-snapshot) in parallel with the existing live streams. No maintenance window; no downtime for the other 19 tables.

Output:

Deliverable	Duration	Downtime
Publication membership updated	< 1 s	none
Config change applied	~10 s	brief rebalance
Three new tables backfilled	~30 min each (10M rows)	0 s
One existing table re-snapshotted	~15 min (5M rows)	0 s
Total elapsed	~30 min (parallel)	none

Why this works — concept by concept:

Signal-table dispatch — the signal INSERT flows through the WAL like any other event; the connector recognises it and dispatches an incremental snapshot without restarting. This turns "add a table" into a INSERT INTO ... VALUES statement.
Chunked snapshot — the snapshot reads the source table in bounded chunks (WHERE key BETWEEN low AND high). Each chunk is a small transaction; the source is never under a table-scoped lock for more than a few seconds.
Interleaved with streaming — while the snapshot chunks run, the connector continues to consume WAL and emit live change events. For any given row, the connector uses a watermarking strategy (insert_delete is the modern default) to disambiguate the snapshot copy from any concurrent live update.
Parallel snapshots — multiple signal INSERTs queue up multiple concurrent incremental snapshots. Modern Debezium schedules them fairly; the effective throughput scales with the connector's SELECT capacity against the source.
Cost — the source database sees a modest sustained SELECT load during the snapshot (throttled by chunk size). No blocking, no maintenance window, no re-snapshotting of unrelated tables. The operational cost is O(N × table_size) for N added tables; the value is priceless because it fits into normal working hours.

Streaming
Topic — streaming
Streaming snapshot and backfill problems

Practice →

ETL Topic — etl ETL problems on chunked backfills and live sources

Practice →

4. Schema history + schema evolution

The compacted DDL replay topic that keeps the connector honest across every `ALTER TABLE`

The mental model in one line: Debezium persists every observed DDL statement (CREATE TABLE, ALTER TABLE, DROP TABLE) to a dedicated schema.history.internal.kafka.topic — a compacted, single-partition, retention-forever topic that is replayed on every connector restart so the connector can decode WAL/binlog records that were written under an older schema, and the Schema Registry integration (Confluent or Apicurio) then propagates those schema versions to downstream consumers with compatibility rules that catch bad DDL before it breaks the pipeline.

The four axes interviewers actually probe for schema questions.

Schema-history topic. Where does the connector store the observed DDL? Answer: a dedicated Kafka topic (name via schema.history.internal.kafka.topic). Compacted, single-partition, retention forever. Never delete this topic.
DDL replay. On restart, the connector replays every DDL statement from the schema-history topic in order, rebuilding its internal representation of every table's schema. Only then can it decode WAL/binlog records for those tables.
Schema Registry integration. The per-event schema (the Avro or JSON schema for each Kafka topic) lives in a Schema Registry — Confluent or Apicurio. Debezium registers the schema at connector startup and on every schema change; consumers fetch the schema by subject name.
Subject naming strategies. TopicNameStrategy (default; subject = topic-key or topic-value), RecordNameStrategy (subject = record's Avro name), TopicRecordNameStrategy (subject = topic + record name). Each has different compatibility semantics.

The schema-history topic — properties.

Compacted. cleanup.policy=compact. Only the latest state (or, more precisely, the compacted key history) is retained. Older records are compacted away by the log-cleaner.
Single-partition. DDL events must be totally ordered. Multiple partitions would allow the compactor to re-order or lose events; single-partition guarantees the DDL replay order.
Retention forever. retention.ms=-1. Never delete records; the topic is the schema truth for the connector's lifetime.
Not user-facing. The topic is internal — __debezium-schema-history or schema-history.<prefix> naming. Downstream consumers do not read it; they read the per-table CDC topics.

Schema Registry integration.

Converters. key.converter = io.confluent.connect.avro.AvroConverter + key.converter.schema.registry.url = http://schema-registry:8081. Same for the value converter.
Registration. At connector startup (post-snapshot), Debezium registers the Avro schema for each Kafka topic with the Schema Registry. On every schema change (an ALTER TABLE DDL), a new schema version is registered.
Compatibility rules. The Registry enforces the configured compatibility level — BACKWARD (readers with old schemas can read new writes), FORWARD (readers with new schemas can read old writes), FULL (both), NONE (any change accepted). BACKWARD is the standard for CDC.
Enforcement. A DDL that violates the compatibility level is rejected at registration time — the connector fails to write the event, giving the operator a chance to fix the schema before data flows through it.

Subject naming strategies — the three options.

TopicNameStrategy (default). Subject = <topic>-key and <topic>-value. One subject per Kafka topic. Simple; the most common choice.
RecordNameStrategy. Subject = the record's Avro fully-qualified name. One subject per record type; multiple topics can share a subject (useful when several tables have the same envelope).
TopicRecordNameStrategy. Subject = <topic>-<record-name>. Hybrid — per-topic isolation but with a record-name discriminator. Useful when a single topic carries events of multiple record types.

Schema evolution rules — the practical map.

Add a nullable column. Backward-compatible; consumers with the old schema read new writes with the new column as null. Safe.
Drop a column. Not backward-compatible in general; consumers with the old schema expecting the column will fail. Requires forward compatibility mode or a deprecation window.
Rename a column. Not compatible; treat as drop + add. In practice, requires either a new topic or a downstream translation step.
Change a column's type. Depends on the types. INT → BIGINT is often compatible; INT → VARCHAR is not.
Add a table. Adds a new topic; no impact on existing topics or subjects. Safe.
Drop a table. The topic remains; the schema-history topic records the DROP. Downstream consumers that subscribed to the dropped topic see no new events; no schema-registry impact on other topics.

Common interview probes on schema history.

"What is the schema-history topic for?" — persistent DDL log the connector replays on restart to decode old WAL records.
"Why is it single-partition?" — DDL order must be preserved; multi-partition breaks total order.
"Why compacted?" — the cleanup policy compacts by key so the topic doesn't grow without bound; the compaction preserves the latest state per schema key.
"What happens if the schema-history topic is deleted?" — the connector cannot decode WAL records for existing tables; it must be recreated from a full re-snapshot with snapshot.mode = initial.
"What compatibility level do you set on the Schema Registry?" — BACKWARD for CDC (old consumers must read new writes).

Worked example — schema-history topic setup and monitoring

Detailed explanation. The schema-history topic is one of the most operationally sensitive resources in a Debezium deployment. Getting the topic settings right (single-partition, compacted, retention forever, low but non-zero segment size) is critical; monitoring for its health is mandatory. Walk through the topic-creation command, the ideal properties, and the two Kafka metrics an on-call should watch.

Topic properties. partitions=1, cleanup.policy=compact, retention.ms=-1, min.compaction.lag.ms=60000 (compact aggressively).
Segment size. Small segments speed up compaction. segment.bytes=1048576 (1 MB) is a good default.
Monitoring. Watch (a) topic offset (must advance on every DDL) and (b) log-compaction time (should be < 5 minutes).

Question. A team is deploying Debezium for the first time. Create the schema-history topic with the right properties and document the monitoring runbook.

Input.

Setting	Value
Topic name	schema-history.shop
Partitions	1
Replication factor	3
cleanup.policy	compact
retention.ms	-1 (forever)
segment.bytes	1 MB

Code.

# Create the schema-history topic with the right properties
kafka-topics --bootstrap-server kafka:9092 \
  --create \
  --topic schema-history.shop \
  --partitions 1 \
  --replication-factor 3 \
  --config cleanup.policy=compact \
  --config retention.ms=-1 \
  --config min.compaction.lag.ms=60000 \
  --config segment.bytes=1048576 \
  --config max.compaction.lag.ms=86400000

# Verify
kafka-topics --bootstrap-server kafka:9092 --describe --topic schema-history.shop

# Sample output
# Topic: schema-history.shop
#  PartitionCount: 1
#  ReplicationFactor: 3
#  Configs: cleanup.policy=compact, retention.ms=-1, segment.bytes=1048576, ...

{
  "connector_config_schema_history_section": {
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-history.shop",
    "schema.history.internal.kafka.recovery.poll.interval.ms": "100",
    "schema.history.internal.kafka.recovery.attempts": "100"
  }
}

Step-by-step explanation.

partitions=1 is mandatory. DDL events must be totally ordered — the connector replays them in offset order on restart. Multiple partitions would let compaction and delivery re-order events, breaking replay.
cleanup.policy=compact + retention.ms=-1 keeps the topic growing bounded by the number of unique schema-history keys (roughly one per DDL statement). Compaction removes older versions of the same key; retention-forever means no time-based deletion.
min.compaction.lag.ms=60000 prevents the compactor from touching records less than a minute old. This gives the connector time to consume the schema history in-order without racing with the compactor.
replication.factor=3 is standard Kafka production practice — durability against a single broker loss. For the schema-history topic, losing durability is catastrophic (connector cannot decode WAL), so replication is non-negotiable.
Monitoring: Kafka's kafka.log:type=LogCleaner,name=cleaner-recopy-percent metric shows compaction efficiency. Alert on the topic's LogEndOffset failing to advance across a DDL statement (delivered from the connector's DDL-emitter perspective). Prometheus + JMX exporter is the standard monitoring path.

Output.

Property	Value	Why
partitions	1	Total DDL order required
replication.factor	3	Losing durability = losing decoding capability
cleanup.policy	compact	Keep only latest per key
retention.ms	-1	Never delete DDL history
segment.bytes	1 MB	Fast compaction on small segments
min.compaction.lag.ms	60000	Don't race with connector's replay

Rule of thumb. Never delete the schema-history topic. Never change its partition count. Never set retention.ms to a finite value. If you get any of these three wrong, the next connector restart will fail to decode WAL records and require a full re-snapshot.

Worked example — schema evolution via Schema Registry

Detailed explanation. A source table gets a new column via ALTER TABLE. Debezium observes the DDL, updates its internal schema, and registers the new Avro schema version with the Registry. The Registry enforces the configured compatibility level; if the new schema is backward-compatible with the previous version, downstream consumers keep working with no code change. Walk through the exact sequence from ALTER TABLE to a downstream consumer seeing the new column.

The DDL. ALTER TABLE public.orders ADD COLUMN notes TEXT.
The Debezium behaviour. Sees the DDL in the WAL; updates its internal schema for public.orders; registers the new Avro schema version.
The Registry check. Is the new schema backward-compatible with the old? Yes (nullable new field). Registration succeeds.
The downstream impact. New events include the notes field; old consumers still work (they simply ignore the new field or default it to null).

Question. A team adds a nullable column to a source table. Trace what happens end-to-end — DDL, Debezium, Registry, consumer — and identify the one config knob that determines whether the change is accepted.

Input.

Component	Setting
DDL	`ALTER TABLE public.orders ADD COLUMN notes TEXT`
Registry compatibility	BACKWARD
Downstream consumer	Snowflake sink connector (Avro reader)

Code.

-- Source-side DDL
ALTER TABLE public.orders ADD COLUMN notes TEXT;

-- Debezium sees the DDL via the WAL and updates its internal schema
-- (visible in the schema-history topic as a new record)

{
  "old_avro_schema": {
    "type": "record",
    "name": "Value",
    "namespace": "shop.public.orders",
    "fields": [
      {"name":"id","type":"long"},
      {"name":"tenant_id","type":"int"},
      {"name":"total_cents","type":"int"},
      {"name":"status","type":"string"}
    ]
  },
  "new_avro_schema": {
    "type": "record",
    "name": "Value",
    "namespace": "shop.public.orders",
    "fields": [
      {"name":"id","type":"long"},
      {"name":"tenant_id","type":"int"},
      {"name":"total_cents","type":"int"},
      {"name":"status","type":"string"},
      {"name":"notes","type":["null","string"],"default":null}
    ]
  }
}

# Verify the new schema was accepted by the Registry
curl -s http://schema-registry:8081/subjects/shop.public.orders-value/versions/latest | jq

# Check the compatibility level
curl -s http://schema-registry:8081/config/shop.public.orders-value | jq
# {"compatibilityLevel":"BACKWARD"}

Step-by-step explanation.

The source-side ALTER TABLE runs. Postgres emits a WAL record capturing the DDL. Debezium consumes the WAL record, recognises it as a schema change, and updates its internal table schema.
Debezium appends the DDL to the schema-history topic. On any future restart, the connector will replay this DDL from the schema-history topic before consuming WAL records that were written under the new schema.
Debezium constructs the new Avro schema. The added column becomes an Avro field with type = ["null", "string"] and default = null — the canonical way to express a nullable field. This is the shape required for BACKWARD compatibility.
Debezium registers the new schema with the Schema Registry under the subject shop.public.orders-value. The Registry checks BACKWARD compatibility: can a consumer reading with the old schema still read data written under the new schema? Yes — the old consumer will simply ignore the new field. Registration succeeds; the new schema gets a version number.
Downstream consumers using the new schema see the notes column populated; consumers still using the old schema keep working, ignoring the new field. The migration is truly zero-downtime for the consumer fleet.

Output.

Step	Actor	Action	Time
1	Source	ALTER TABLE emits DDL WAL record	0 s
2	Debezium	Consumes DDL, updates internal schema	~50 ms
3	Debezium	Writes DDL to schema-history topic	~10 ms
4	Debezium	Registers new Avro schema with Registry	~50 ms
5	Registry	Verifies BACKWARD compatibility, accepts	~10 ms
6	Downstream	Reads new events with new schema; old consumers unaffected	ongoing

Rule of thumb. For safe schema evolution, always add nullable columns (never non-nullable), always drop columns via a deprecation window (never a hard drop), and always set the Schema Registry compatibility level to BACKWARD. These three rules together prevent 95% of schema-related pipeline breakage.

Worked example — connector restart with schema-history replay

Detailed explanation. The moment of truth for the schema-history topic: connector restart. The connector reads every DDL statement from the schema-history topic (in offset order), reconstructs its internal schema for every tracked table, and only then begins consuming WAL records. If the schema-history topic is missing, corrupted, or (accidentally) partitioned across multiple partitions, this replay fails and the connector cannot decode WAL records.

The replay. On startup, the connector consumes the schema-history topic from offset 0.
The reconstruction. Each DDL statement mutates an internal per-table schema. After replay, the connector has the "current" schema for every table.
The WAL decoding. The connector then reads WAL records from its last committed offset. Each record references a table by name; the internal schema tells it how to decode column values.

Question. A connector restart is failing with "Schema for table orders is not known." Diagnose the possible causes and walk through the recovery path.

Input.

Symptom	"Schema for table orders is not known" on WAL decoding
Connector	shop-postgres-cdc
Topic	schema-history.shop
Uptime before crash	6 months

Code.

# 1. Verify the schema-history topic still exists
kafka-topics --bootstrap-server kafka:9092 --describe --topic schema-history.shop
# If missing → catastrophic; must full re-snapshot

# 2. Verify partition count is 1
# If > 1 → the topic was recreated with wrong config; must full re-snapshot

# 3. Read the schema-history topic to check DDL is there
kafka-console-consumer --bootstrap-server kafka:9092 \
  --topic schema-history.shop \
  --from-beginning \
  --max-messages 100

# 4. If DDL is missing, check whether the source publication was rebuilt without capturing the ALTER history
psql -U postgres -c "SELECT pubname, tables FROM pg_publication_tables WHERE tablename = 'orders';"

{
  "recovery_paths": {
    "case_1_topic_missing": {
      "cause": "Topic deleted (retention misconfig, accidental drop)",
      "recovery": "Reset connector offsets; snapshot.mode=initial; re-snapshot everything"
    },
    "case_2_topic_partitioned": {
      "cause": "Topic recreated with partitions > 1 or wrong cleanup.policy",
      "recovery": "Delete + recreate with partitions=1 cleanup.policy=compact; then full re-snapshot"
    },
    "case_3_ddl_missing": {
      "cause": "Source publication rebuilt; connector never saw the ALTER",
      "recovery": "Manually inject the missing DDL into the schema-history topic (advanced) OR full re-snapshot"
    },
    "case_4_corrupted_topic": {
      "cause": "Kafka disk corruption or replication factor 1 with a broker loss",
      "recovery": "Restore from Kafka backup OR full re-snapshot"
    }
  }
}

Step-by-step explanation.

Verify the topic exists (kafka-topics --describe). If missing, the topic was deleted (accidentally by ops or by a misconfigured retention). No path forward except full re-snapshot with snapshot.mode = initial.
Verify partition count = 1 and cleanup.policy = compact. If either is wrong, the topic was recreated with wrong config. Delete + recreate with correct settings + full re-snapshot.
Read the topic. If it exists and is properly configured but does not contain the DDL for orders, the source publication was likely recreated without the connector observing the ALTER (a common bug when someone runs DROP PUBLICATION and re-creates it during maintenance). Recovery: either manually inject the DDL into schema-history (an advanced move requiring Debezium internals knowledge) or full re-snapshot.
The "manual DDL inject" recovery: use the Debezium admin API to push a synthetic DDL statement into the schema-history topic. This is documented for the SQL Server and MySQL connectors; for Postgres, the safer path is to re-snapshot.
Prevention: never delete or drop-and-recreate the source publication without pausing the connector. If you must recreate the publication, do it during a maintenance window where a full re-snapshot is acceptable.

Output.

Cause	Diagnosis command	Recovery	Downtime
Topic missing	`kafka-topics --describe`	full re-snapshot	hours
Wrong partition count	`kafka-topics --describe`	recreate + full re-snapshot	hours
DDL missing	`kafka-console-consumer`	inject DDL or re-snapshot	minutes / hours
Corrupted topic	Kafka broker logs	restore from backup or re-snapshot	hours

Rule of thumb. Every Debezium runbook must include "verify schema-history topic integrity" as the first diagnostic step for any decoding-related error. And every operations playbook must forbid manual topic deletion in the region where schema-history topics live.

Senior interview question on schema evolution

A senior interviewer might ask: "A source team is planning to drop three columns and rename two others across five tables over the next quarter. Walk me through how you'd coordinate this with the CDC pipeline — what Schema Registry compatibility level, what DDL sequencing, what downstream contract changes, and what would you never do."

Solution Using a phased column-migration playbook with BACKWARD compatibility

Column-migration playbook — 5 tables, 3 drops, 2 renames
=========================================================

Preconditions (must be true before any DDL runs)
  - Schema Registry compatibility level = BACKWARD on every affected subject
  - Every downstream consumer has been audited for use of the columns
  - Owners of each downstream consumer have been notified with sunset dates

Phase 1 — mark columns deprecated (week 1)
  - Add a comment on each column: COMMENT ON COLUMN orders.legacy_col IS 'DEPRECATED - remove 2026-09-01';
  - Emit a dashboard notice to consumer teams
  - No schema change; no Debezium impact

Phase 2 — stop writing (weeks 2–4)
  - Push app-side code changes: stop writing to the deprecated columns
  - Verify writes = 0 via monitoring
  - Column still exists; still emitted in every Debezium event; consumers may still read

Phase 3 — mark as null in Debezium (week 5)
  - Add an SMT filter that nulls out the deprecated columns before emit
  - Verify downstream consumers still working

Phase 4 — drop the columns (week 6, maintenance window)
  - ALTER TABLE orders DROP COLUMN legacy_col;
  - Debezium observes the DDL; new schema version registered
  - Registry check: BACKWARD compatibility means old consumers reading with the old schema will fail on new writes → move to FORWARD if any consumer still uses the old schema, else BACKWARD is fine
  - In practice, use FORWARD_TRANSITIVE for a two-week grace window, then move back to BACKWARD

Phase 5 — rename columns (weeks 7–8)
  - Renames are impossible under BACKWARD or FULL compatibility (fields are matched by name)
  - Solution: add the new column, dual-write app-side, migrate consumers to read new column, drop the old column
  - Never do a bare RENAME COLUMN through the pipeline

Never
  - Never do a hard column drop without the deprecation window
  - Never do a bare RENAME COLUMN with the CDC pipeline live
  - Never lower compatibility to NONE just to unstick a migration

Step-by-step trace.

Week	Phase	DDL	Registry impact	Consumer impact
1	Deprecation notice	COMMENT ON COLUMN	none	notice only
2–4	Stop writes	none	none	none
5	SMT null-out	none (SMT config)	new schema version	consumers see nulls
6	DROP COLUMN	ALTER TABLE DROP	new schema; grace on FORWARD	new consumers only
7–8	Rename via add + drop	ALTER TABLE ADD then DROP	two new schema versions	dual-window

After the six-week playbook, the columns are dropped and the renames complete without a single downstream consumer breakage. No emergency reverts; no data loss; no compatibility violations.

Output:

Deliverable	Timeline	Risk
3 columns dropped safely	6 weeks	low
2 columns renamed safely	8 weeks	low
Schema-history topic preserved	forever	none
Consumer breakage	0	achieved

Why this works — concept by concept:

Phased migration — the six-week playbook stages the risk. Each phase can be reverted without a full pipeline rebuild. Big-bang column drops are how CDC pipelines break.
BACKWARD compatibility as the default — new consumers can read old writes; old consumers can read new writes as long as new fields are nullable and default-valued. Adding nullable columns is free; dropping columns requires a temporary FORWARD_TRANSITIVE window.
SMT null-out as a safety valve — the null-out phase (week 5) is a soft migration step: the column still exists on the source but is nulled before it reaches Kafka. Consumers get advance warning of the eventual drop and can update in a controlled way.
Never bare RENAME — Avro fields are matched by name; renaming breaks compatibility. The workaround is add-then-drop with dual-write — two separate DDL steps, one register-new-column at the Registry, then a drop after consumers migrate.
Cost — six weeks of calendar time; roughly 3 engineer-weeks of work across app, CDC ops, and consumer teams. The avoided cost is one Registry compatibility break, which typically requires an emergency runbook + a hasty consumer redeploy + a hangover of on-call incidents. Do the playbook.

Streaming
Topic — streaming
Streaming schema evolution and registry problems

Practice →

SQL Topic — sql SQL problems on schema design and DDL evolution

Practice →

5. Outbox pattern + production hygiene

Turn dual-write into a single transaction — outbox table, `EventRouter` SMT, and the four production hygiene axes senior interviewers probe

The mental model in one line: the outbox pattern solves the dual-write problem (writing to your business table and publishing to Kafka in the same request handler is not atomic) by writing every business event as an extra row in an outbox table inside the same database transaction as the business write; Debezium then tails the outbox table, and its EventRouter SMT transforms each outbox row into a topic-routed, aggregate-keyed Kafka event — single-transaction atomicity, at-least-once delivery, and clean per-aggregate topic fan-out.

The four axes interviewers actually probe for outbox questions.

The dual-write problem. Why writing to DB + publishing to Kafka is not atomic under any partial-failure scenario — the crash, the network partition, the timeout. Every distributed team hits this eventually.
The outbox table. The design of the table itself — one row per event, columns for aggregate_type, aggregate_id, event_type, payload, plus operational columns for id, created_at, and often a soft-delete/pruned marker.
The EventRouter SMT. Debezium's built-in Simple Message Transform that reads an outbox row and produces a Kafka event with the topic derived from aggregate_type, the key derived from aggregate_id, and the value derived from payload.
Cleanup. Outbox rows are not automatically deleted — the CDC path only reads them. A separate pruner (trigger, cron job, or logical replication policy) must delete emitted rows to prevent unbounded growth.

The dual-write problem in one paragraph.

The naive code. Inside a request handler: db.execute(INSERT INTO orders ...); kafka.publish("orders_events", ...).
The failure. The db.execute succeeds, then the process crashes / the Kafka producer times out / the network partitions. The database has the new row; Kafka has no event. Downstream consumers never learn of the change.
The reverse failure. Kafka publishes but the DB commit fails. Downstream consumers see an event for a row that never existed. Fantasy data flows through the pipeline.
The root cause. Two separate transactional systems (DB and Kafka) with no distributed transaction between them. XA/2PC exists but nobody actually runs it in production.

The outbox pattern in one paragraph.

The insight. Write the intent to publish as an extra row in an outbox table inside the same database transaction as the business write. Now there is only one transactional system; the DB either commits both or commits neither.
The read path. Debezium tails the outbox table (via WAL/binlog/CDC-table) and publishes each new outbox row as a Kafka event. The DB commit is the single point of truth; if the DB committed, the outbox row exists and Debezium will eventually publish it. If the DB didn't commit, the outbox row doesn't exist and no false event is published.
The guarantee. At-least-once publish. The DB commit implies at-least-one Kafka message will land; the message may be duplicated if the connector crashes between reading and offset-committing, but no message will be lost.

The outbox table — canonical design.

Columns. id (PK, UUID or bigint), aggregate_type (VARCHAR — "orders", "payments"), aggregate_id (VARCHAR — the business ID), event_type (VARCHAR — "OrderPlaced", "OrderShipped"), payload (JSONB — the event body), created_at (TIMESTAMPTZ), and optionally emitted_at (nullable) for pruning.
Indexes. Primary key on id; optionally an index on aggregate_id for troubleshooting.
Constraints. NOT NULL on aggregate_type, aggregate_id, event_type, payload.
Grants. Debezium connector user gets SELECT on the outbox table.

The EventRouter SMT — how it works.

Input. A Debezium row event on the outbox table, with after.aggregate_type, after.aggregate_id, after.event_type, after.payload, etc.
Output. A Kafka record with:
- Topic. <route.topic.replacement> where route.topic.replacement is a template like ${routedByValue} and route.by.field = aggregate_type — so the topic name is derived from the aggregate_type column.
- Key. <route.key.field> — e.g. aggregate_id. The key controls partitioning; events for the same aggregate go to the same partition, preserving order.
- Value. <route.payload.field> — e.g. payload. The payload column becomes the Kafka value.
Config. All the above via SMT-level config keys — a single JSON block in the connector config.

Cleanup — three strategies.

Trigger-based pruning. An AFTER INSERT trigger on the outbox schedules a delete. Downside: adds trigger overhead; risks a race with the Debezium connector.
Cron-based pruning. A periodic DELETE FROM outbox WHERE created_at < now() - INTERVAL '7 days'. Simple; slight latency between emit and delete.
Logical-replication-aware pruning. Advanced — a periodic query reads the pg_replication_slots.confirmed_flush_lsn and deletes rows whose WAL LSN is below the confirmed flush. Guarantees no early deletion.
The default choice. Cron-based with a 7-day retention. Simple, well-understood, safe if the connector is caught up within 7 days.

Common interview probes on outbox + production hygiene.

"What problem does the outbox pattern solve?" — dual-write atomicity.
"What is the EventRouter SMT for?" — routes outbox rows to Kafka topics with configurable topic/key/value extraction.
"Does Debezium give exactly-once semantics?" — at-least-once by default; exactly-once possible with Kafka Connect transactional producer, but downstream must still be idempotent.
"What is a tombstone event in Debezium?" — the null-value event that follows a delete when a topic is compacted; tells the log-cleaner to remove the key entirely.
"How do you handle a failing event that keeps crashing the connector?" — dead-letter queue via errors.tolerance = all and errors.deadletterqueue.topic.name = dlq.<connector>.

Worked example — outbox table + EventRouter SMT config

Detailed explanation. The canonical outbox implementation — Postgres table, atomic write from the app, Debezium connector with EventRouter SMT, Kafka topic per aggregate type. Walk through every layer end-to-end.

Postgres side. Outbox table, connector role with SELECT, table in the publication.
App side. Every business write includes an outbox INSERT in the same transaction.
Connector side. EventRouter SMT config that routes on aggregate_type.
Downstream. Consumers subscribe to orders_events, payments_events, etc.

Question. Design the outbox table, the app-side write, the connector config with EventRouter SMT, and the resulting Kafka topics.

Input.

Component	Value
Source database	Postgres 16
Aggregates	orders, payments, refunds
Downstream topics	orders_events, payments_events, refunds_events
Event volume	5k events/sec peak

Code.

-- Outbox table
CREATE TABLE public.outbox (
  id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  aggregate_type VARCHAR(64) NOT NULL,
  aggregate_id   VARCHAR(128) NOT NULL,
  event_type     VARCHAR(64) NOT NULL,
  payload        JSONB NOT NULL,
  created_at     TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX outbox_created_at_idx ON public.outbox (created_at);
GRANT SELECT ON public.outbox TO debezium;

-- Add to publication
ALTER PUBLICATION dbz_publication ADD TABLE public.outbox;

-- App-side write (atomic; single transaction)
BEGIN;
  INSERT INTO public.orders (id, tenant_id, total_cents, status)
  VALUES (12345, 1, 19900, 'NEW');

  INSERT INTO public.outbox (aggregate_type, aggregate_id, event_type, payload)
  VALUES ('orders',
          '12345',
          'OrderPlaced',
          '{"orderId":12345,"tenantId":1,"totalCents":19900}');
COMMIT;

{
  "name": "shop-outbox-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db.internal",
    "database.dbname": "shop",
    "topic.prefix": "shop",
    "plugin.name": "pgoutput",
    "publication.name": "dbz_publication",
    "slot.name": "dbz_shop_slot",
    "table.include.list": "public.outbox",

    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.route.by.field": "aggregate_type",
    "transforms.outbox.route.topic.replacement": "${routedByValue}_events",
    "transforms.outbox.table.field.event.key": "aggregate_id",
    "transforms.outbox.table.field.event.payload": "payload",
    "transforms.outbox.table.field.event.timestamp": "created_at",
    "transforms.outbox.table.fields.additional.placement": "event_type:header",

    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": "false",

    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-history.shop"
  }
}

Step-by-step explanation.

The outbox table lives alongside the business tables in Postgres. Adding it to the publication (ALTER PUBLICATION ... ADD TABLE public.outbox) makes the connector see INSERTs.
Every business write is atomic: INSERT INTO public.orders + INSERT INTO public.outbox inside one BEGIN/COMMIT. Both writes commit or neither does; there is no partial-failure scenario where the order exists but the outbox event does not (or vice versa).
The EventRouter SMT config picks apart the outbox row. route.by.field = aggregate_type says "use the value of aggregate_type to derive the topic name." route.topic.replacement = ${routedByValue}_events gives you topics like orders_events, payments_events, refunds_events — one per aggregate.
table.field.event.key = aggregate_id sets the Kafka message key to aggregate_id. All events for order 12345 go to the same partition, preserving per-order event order. This is the standard partitioning pattern for aggregate-typed streams.
table.fields.additional.placement = event_type:header puts the event_type column into a Kafka header (event_type: OrderPlaced). Consumers can filter on the header without deserialising the payload — useful for event-type routing at the consumer.

Output.

Aggregate write	Kafka topic	Key	Value	Header
INSERT INTO orders + outbox row	orders_events	"12345"	`{"orderId":12345,...}`	event_type: OrderPlaced
INSERT INTO payments + outbox row	payments_events	"56789"	`{"paymentId":56789,...}`	event_type: PaymentCaptured
INSERT INTO refunds + outbox row	refunds_events	"56789"	`{"refundId":98765,...}`	event_type: RefundIssued

Rule of thumb. The outbox pattern is the correct answer to the dual-write problem in every non-XA architecture. Use it whenever your app needs to write to a DB and publish to Kafka in the same logical operation. Anything else is a partial-failure bug waiting to happen.

Worked example — outbox pruning strategy

Detailed explanation. The outbox table is write-only from the connector's perspective — the connector reads via WAL, never touches the table content. A separate pruner is required to delete emitted rows or the table grows without bound. Compare the three pruning strategies and pick the safest default.

Trigger-based. Fast; risky if the Debezium connector is delayed.
Cron-based. Simple; requires a retention window bigger than the max connector lag.
LSN-aware. Safest; requires reading pg_replication_slots metadata.

Question. Implement outbox pruning with a 7-day retention window. Show the SQL, the schedule, and the monitoring.

Input.

Component	Value
Outbox table	public.outbox
Retention target	7 days
Peak volume	5k events/sec = ~3B rows/week
Prune schedule	Hourly

Code.

-- Pruning query — 7-day retention
-- Runs hourly from a cron job or a scheduled Postgres function
DELETE FROM public.outbox
WHERE  created_at < now() - INTERVAL '7 days'
RETURNING id;

-- Sample cron entry
-- 0 * * * * psql -U app -d shop -c "DELETE FROM public.outbox WHERE created_at < now() - INTERVAL '7 days';"

-- Monitor outbox growth
SELECT COUNT(*)                              AS row_count,
       pg_size_pretty(pg_total_relation_size('public.outbox')) AS size,
       min(created_at)                        AS oldest,
       max(created_at)                        AS newest
FROM   public.outbox;

-- Verify Debezium is caught up (should be seconds behind, not hours)
SELECT slot_name,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag_bytes
FROM   pg_replication_slots
WHERE  slot_name = 'dbz_shop_slot';

# Safer LSN-aware pruner — deletes only rows the connector has already flushed
import psycopg2

def prune_outbox(conn):
    with conn.cursor() as cur:
        # Get the LSN the connector has confirmed flushing up to
        cur.execute("""
            SELECT confirmed_flush_lsn
            FROM   pg_replication_slots
            WHERE  slot_name = 'dbz_shop_slot'
        """)
        flush_lsn = cur.fetchone()[0]
        # Delete only rows whose WAL LSN is below the confirmed flush
        # (approximated by created_at + 5 minute safety margin)
        cur.execute("""
            DELETE FROM public.outbox
            WHERE  created_at < (
                SELECT now() - INTERVAL '10 minutes'
            )
            AND    id IN (
              SELECT id FROM public.outbox
              WHERE created_at < now() - INTERVAL '10 minutes'
              LIMIT 10000
            )
        """)
        deleted = cur.rowcount
        conn.commit()
    return deleted

if __name__ == "__main__":
    conn = psycopg2.connect("postgres://pruner@db:5432/shop")
    while True:
        deleted = prune_outbox(conn)
        print(f"Pruned {deleted} rows")
        import time; time.sleep(60)

Step-by-step explanation.

The simple cron-based pruner runs DELETE FROM public.outbox WHERE created_at < now() - INTERVAL '7 days' every hour. Safe as long as the Debezium connector lag stays below 7 days — which it should in any healthy deployment.
The safer LSN-aware pruner cross-references pg_replication_slots.confirmed_flush_lsn before deleting. It only deletes rows the connector has confirmed publishing, guaranteeing no early deletion even if the connector is stuck for hours.
Batching: LIMIT 10000 per delete keeps each delete transaction small. Large deletes on a hot table create bloat and lock contention; small batches spread the work.
Monitoring: alert on row_count > 100M (indicates pruner falling behind or connector stalled) and on lag_bytes > 1 GB (connector significantly behind). Either indicates the outbox is growing faster than the CDC path can drain.
VACUUM: run VACUUM (VERBOSE, ANALYZE) public.outbox weekly to reclaim space from the deletes. Under high delete volume, the outbox table may need AUTOVACUUM tuning (autovacuum_vacuum_scale_factor = 0.05 to trigger vacuum more aggressively).

Output.

Metric	Healthy	Warn	Page
outbox row count	< 100M	100M–1B	> 1B
outbox size on disk	< 20 GB	20–100 GB	> 100 GB
Debezium lag (bytes)	< 100 MB	100 MB – 1 GB	> 1 GB
Oldest row age	< 8 days	8–14 days	> 14 days

Rule of thumb. Cron-based pruning with a 7-day retention is the right default; step up to LSN-aware pruning only when the connector routinely lags for hours (which itself is a signal that the connector needs sizing help, not that the pruner needs fixing).

Worked example — dead-letter queue for poison events

Detailed explanation. Some events will fail to publish — a corrupted payload, a schema violation, a downstream sink rejection. Kafka Connect provides a dead-letter queue (DLQ) mechanism: failed events are routed to a DLQ topic instead of crashing the connector. Debezium inherits this by setting a few errors.* configs. Walk through the DLQ config and the monitoring runbook.

The mechanism. errors.tolerance = all + errors.deadletterqueue.topic.name = dlq.<connector>. Failed events land in the DLQ topic.
The recovery. An operator inspects the DLQ topic, identifies the root cause (bad payload, schema mismatch, etc.), and either patches the source data or drops the poison event.
The alerting. DLQ topic offset must not grow. Any advance indicates a real issue.

Question. Configure a DLQ for the outbox connector and design the on-call runbook for a DLQ alert.

Input.

Component	Value
Connector	shop-outbox-cdc
DLQ topic	dlq.shop-outbox-cdc
Alert	DLQ offset increased in last 5 minutes

Code.

{
  "name": "shop-outbox-cdc",
  "config": {
    "errors.tolerance": "all",
    "errors.log.enable": "true",
    "errors.log.include.messages": "true",
    "errors.deadletterqueue.topic.name": "dlq.shop-outbox-cdc",
    "errors.deadletterqueue.topic.replication.factor": "3",
    "errors.deadletterqueue.context.headers.enable": "true",

    "errors.retry.timeout": "60000",
    "errors.retry.delay.max.ms": "5000"
  }
}

# On-call runbook — DLQ growth alert

# 1. Inspect the latest few DLQ messages
kafka-console-consumer --bootstrap-server kafka:9092 \
  --topic dlq.shop-outbox-cdc \
  --from-beginning \
  --max-messages 10 \
  --property print.headers=true

# The DLQ header set (Connect adds these automatically)
#   __connect.errors.topic
#   __connect.errors.partition
#   __connect.errors.offset
#   __connect.errors.class.name
#   __connect.errors.exception.class.name
#   __connect.errors.exception.message
#   __connect.errors.exception.stacktrace

# 2. Identify the root cause from the exception message
#    Common: JsonParseException on malformed payload → source app bug
#            SchemaException → schema evolution violation → coordinate with schema owner

# 3. Decide the fix
#    (a) Fix the source data (rewrite the offending row in outbox)
#    (b) Update the connector config to relax the transform
#    (c) Accept the loss (drop the DLQ event) — only if the event is truly unrecoverable

# 4. Reprocess (if the fix was a data patch)
kafka-console-producer --bootstrap-server kafka:9092 \
  --topic orders_events \
  < recovered-events.json

Step-by-step explanation.

errors.tolerance = all tells Kafka Connect to route any failing record to the DLQ instead of crashing the task. Without this, a single poison event stops the entire connector — one bad row halts the whole CDC pipeline.
errors.deadletterqueue.topic.name = dlq.shop-outbox-cdc sets the target DLQ topic. Convention is dlq.<connector-name> — one DLQ per connector, easy to correlate.
errors.deadletterqueue.context.headers.enable = true populates the failed record with headers describing why it failed (__connect.errors.exception.class.name, __connect.errors.exception.message, etc.). Without these, debugging a DLQ event requires guessing.
errors.retry.timeout = 60000 + errors.retry.delay.max.ms = 5000 — Connect will retry a failing record for up to 60 seconds with an exponential backoff capped at 5 seconds before routing to DLQ. Transient failures (Kafka broker restart) heal; permanent failures (bad schema) still land in DLQ.
The on-call runbook: (a) inspect the DLQ headers to find the root cause; (b) fix the source data if possible; (c) reprocess the fixed data by producing to the original topic; (d) if unrecoverable, drop the DLQ event and file a bug against the source team.

Output.

Failure class	DLQ header	Recovery
Bad JSON payload	JsonParseException	Fix source app; drop DLQ event
Schema violation	SchemaException	Coordinate with schema owner; reprocess after
Downstream sink reject	ConnectException	Investigate sink; may auto-heal
Registry unavailable	RestClientException	Fix Registry; connector will auto-reprocess retries

Rule of thumb. Every production Debezium connector ships with a DLQ. Without one, a single poison event stops the whole pipeline. With one, the connector keeps running; a small operational cost (DLQ inspection) buys you the availability property that separates "shipped a CDC pipeline" from "runs a CDC pipeline in production."

Senior interview question on outbox + production hygiene

A senior interviewer might ask: "Design the outbox pattern end-to-end for a Postgres-based e-commerce system with three aggregates — orders, payments, refunds — with at-least-once delivery to Kafka, DLQ for poison events, LSN-aware pruning, and a plan for exactly-once semantics on the connector-to-Kafka hop. Walk me through every layer."

Solution Using outbox + EventRouter SMT + DLQ + LSN-aware pruning + transactional producer

{
  "name": "ecommerce-outbox-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db.internal",
    "database.dbname": "shop",
    "topic.prefix": "shop",
    "plugin.name": "pgoutput",
    "publication.name": "dbz_publication",
    "slot.name": "dbz_shop_slot",
    "table.include.list": "public.outbox",

    "signal.data.collection": "public.debezium_signal",
    "heartbeat.interval.ms": "10000",

    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.route.by.field": "aggregate_type",
    "transforms.outbox.route.topic.replacement": "${routedByValue}_events",
    "transforms.outbox.table.field.event.key": "aggregate_id",
    "transforms.outbox.table.field.event.payload": "payload",
    "transforms.outbox.table.field.event.timestamp": "created_at",
    "transforms.outbox.table.fields.additional.placement": "event_type:header",

    "errors.tolerance": "all",
    "errors.log.enable": "true",
    "errors.deadletterqueue.topic.name": "dlq.ecommerce-outbox-cdc",
    "errors.deadletterqueue.topic.replication.factor": "3",
    "errors.deadletterqueue.context.headers.enable": "true",

    "producer.override.enable.idempotence": "true",
    "producer.override.acks": "all",
    "producer.override.max.in.flight.requests.per.connection": "5",
    "producer.override.transactional.id": "ecommerce-outbox-cdc-tx",

    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": "false",

    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-history.ecommerce"
  }
}

-- App-side atomic write (single transaction)
BEGIN;
  INSERT INTO public.orders (id, tenant_id, total_cents, status)
  VALUES (12345, 1, 19900, 'NEW');

  INSERT INTO public.outbox (aggregate_type, aggregate_id, event_type, payload)
  VALUES ('orders', '12345', 'OrderPlaced',
          '{"orderId":12345,"tenantId":1,"totalCents":19900}');
COMMIT;

-- LSN-aware pruner (hourly)
DELETE FROM public.outbox
WHERE  id IN (
  SELECT id
  FROM   public.outbox
  WHERE  created_at < now() - INTERVAL '10 minutes'
  AND    EXISTS (
    SELECT 1 FROM pg_replication_slots
    WHERE  slot_name = 'dbz_shop_slot'
    AND    confirmed_flush_lsn IS NOT NULL
  )
  LIMIT 10000
);

Step-by-step trace.

Hop	Guarantee	Configured by
App → DB	atomic (single transaction)	BEGIN/COMMIT
DB → Debezium	at-least-once (WAL replay)	pgoutput, slot, publication
Debezium → Kafka	exactly-once (transactional producer)	producer.override.transactional.id
Kafka → downstream	consumer's responsibility (idempotency)	consumer-side upserts

The chain gives you effectively-once end-to-end: the DB commit is the truth, the transactional producer to Kafka avoids duplicates on connector retry, and idempotent downstream consumers absorb any residual duplicates from consumer restarts.

Output:

Layer	Semantics	Failure recovery
App code	atomic write	rollback on error
DB (Postgres)	WAL + replication slot	slot survives DB restart
Debezium	at-least-once from WAL; exactly-once to Kafka	offset in Kafka topic + transactional producer
DLQ	poison events isolated	operator inspection + reprocess
Pruner	LSN-aware; 10-minute safety margin	never deletes un-emitted rows

Why this works — concept by concept:

Outbox = single transaction — the atomic write of business row + outbox row inside one DB transaction is the entire point. There is no dual-write scenario; the DB commit is the truth.
EventRouter SMT — separates the "outbox row shape" concern from the "Kafka topic layout" concern. The app writes canonical outbox rows; the SMT turns them into topic-routed, aggregate-keyed Kafka events. The routing table lives in the connector config, not in the app.
Transactional producer for exactly-once — enable.idempotence = true + acks = all + transactional.id = <unique> unlocks Kafka Connect's exactly-once producer semantics. The connector-to-Kafka hop becomes truly exactly-once; any connector retry that happens to write the same record is deduplicated by Kafka's idempotency machinery.
DLQ for poison events — the connector never stops for a single bad row. Failed events land in the DLQ; operators inspect and reprocess. The pipeline stays available even under source-side data corruption.
LSN-aware pruning — the pruner deletes only rows that Debezium has already emitted. There is no scenario where a row is deleted before it lands in Kafka; the safety margin is enforced by the DB-side check, not by hoping the connector is caught up.
Cost — five configs in the connector JSON, one outbox table, one pruner cron, one DLQ topic per connector. The operational overhead is small; the delivery-semantics improvement is enormous. In interviews, this is the canonical "senior CDC architecture" answer.

Streaming
Topic — streaming
Streaming outbox and exactly-once problems

Practice →

ETL
Topic — etl
ETL problems on event-driven pipelines with CDC

Practice →

Cheat sheet — Debezium recipes

When to use Debezium. Any time you need change data capture from Postgres, MySQL, MongoDB, SQL Server, Oracle, or Db2 into Kafka (or Pulsar / Kinesis / Pub/Sub via Debezium Server). Log-based CDC is unambiguously the correct answer over polling or triggers for anything more than a hobby project.
Connector class per source. Postgres → PostgresConnector + plugin.name = pgoutput. MySQL → MySqlConnector + binlog_format = ROW + gtid_mode = ON. Mongo → MongoDbConnector + replica set + capture.mode = change_streams_update_full. SQL Server → SqlServerConnector + sp_cdc_enable_db + running SQL Server Agent.
Postgres connector JSON (10-line template). connector.class = io.debezium.connector.postgresql.PostgresConnector, plugin.name = pgoutput, publication.name = dbz_publication, slot.name = dbz_slot, table.include.list = public.orders,..., snapshot.mode = initial, snapshot.locking.mode = none, signal.data.collection = public.debezium_signal, heartbeat.interval.ms = 10000, schema.history.internal.kafka.topic = schema-history.<prefix>.
Snapshot mode default. initial. Snapshots on first startup (no committed offset); skipped on subsequent restarts. Reach for never only when the sink is pre-loaded via another mechanism; reach for when_needed only when the offset topic may be lost due to failover; reach for always only in test harnesses.
Incremental snapshot signal. INSERT INTO public.debezium_signal (id, type, data) VALUES ('adhoc-1', 'execute-snapshot', '{"type":"incremental","data-collections":["public.orders"]}'); — no connector restart required. The signal is picked up by the streaming phase and dispatches a chunked snapshot in parallel with live streaming.
Schema history topic settings. partitions = 1, cleanup.policy = compact, retention.ms = -1 (forever), segment.bytes = 1 MB, replication.factor = 3, min.compaction.lag.ms = 60000. Never delete this topic; never change its partition count; never set a finite retention.
Schema Registry compatibility. BACKWARD is the standard for CDC. Add nullable columns freely; drop columns via a deprecation window; never bare RENAME (use add + drop with dual-write instead). Lowering to NONE to unstick a migration is a beginner trap; the correct move is a compatibility grace window (FORWARD_TRANSITIVE for two weeks).
Outbox SMT config. transforms = outbox, transforms.outbox.type = io.debezium.transforms.outbox.EventRouter, transforms.outbox.route.by.field = aggregate_type, transforms.outbox.route.topic.replacement = ${routedByValue}_events, transforms.outbox.table.field.event.key = aggregate_id, transforms.outbox.table.field.event.payload = payload, transforms.outbox.table.fields.additional.placement = event_type:header. One SMT, one outbox table, N Kafka topics fanned out per aggregate.
Outbox pruning. Cron-based with a 7-day retention is the safe default: DELETE FROM public.outbox WHERE created_at < now() - INTERVAL '7 days' hourly, batched with LIMIT 10000 per delete. Step up to LSN-aware pruning if the connector routinely lags for hours.
Dead-letter queue. errors.tolerance = all, errors.deadletterqueue.topic.name = dlq.<connector>, errors.deadletterqueue.topic.replication.factor = 3, errors.deadletterqueue.context.headers.enable = true. Poison events land in DLQ instead of stalling the connector; operators inspect + reprocess.
Exactly-once producer. producer.override.enable.idempotence = true, producer.override.acks = all, producer.override.transactional.id = <unique-per-connector>. Unlocks Kafka Connect's transactional producer semantics for the connector-to-Kafka hop. Downstream consumers must still be idempotent for true end-to-end exactly-once.
Heartbeat topic. heartbeat.interval.ms = 10000, heartbeat.topics.prefix = __debezium-heartbeat. Keeps the replication slot's confirmed_flush_lsn moving forward on idle tables — prevents WAL disk fill on the source.
Restart-from-offset command. Connectors are offset-driven; a restart resumes from the committed offset in the Connect offset topic. To force a restart from a specific point: (a) stop the connector, (b) DELETE from the offset topic (advanced), (c) start with snapshot.mode = initial (safe). Never manually rewrite offsets in production; always restart with a controlled snapshot mode.
Debezium Server for non-Kafka sinks. Same connector code, standalone JVM, pluggable sink (Pub/Sub, Kinesis, EventHubs, Redis Streams, HTTP). Requires a persistent local volume for the offset and schema-history files. Use when Kafka is not in the stack and you don't want to add it just for CDC.

Frequently asked questions

What is Debezium and how does it work?

debezium is a set of Kafka Connect source connectors that implement log-based change data capture against Postgres, MySQL, MongoDB, SQL Server, Oracle, Db2, Cassandra, and Vitess. Instead of polling SELECT ... WHERE updated_at > :last_run (which misses deletes and races with window boundaries) or attaching triggers (which slow down the source), Debezium tails the database's own write-ahead log (Postgres WAL, MySQL binlog, MongoDB oplog / change stream, SQL Server Agent CDC tables) and emits every INSERT, UPDATE, and DELETE as a per-row Kafka event in commit order. Two architectural choices define the shape: (a) log-based reading gives you exact CDC semantics with near-zero source overhead, and (b) running on Kafka Connect gives you free offset management, retries, dead-letter queue, JMX metrics, REST admin, and a mature ecosystem. Debezium 2.x adds incremental snapshotting driven by a debezium_signal control table, Debezium Server for teams that don't run Kafka, and a modernised config surface.

Debezium vs Kafka native Connect connectors — when do I pick each?

Kafka Connect ships with dozens of source connectors (JDBC, Elasticsearch, S3, HDFS, etc.) — but the JDBC source connector is the one most often confused with Debezium. The JDBC source connector polls the source database with a configurable query and interval; it captures INSERTs and UPDATEs (via updated_at or an incrementing key) but has no way to see DELETEs, and it races with concurrent writes at the polling boundary. Debezium reads the log directly, capturing every INSERT / UPDATE / DELETE in commit order with near-zero source overhead. For any deployment where correctness matters or where the source has DELETEs, Debezium is unambiguously the correct choice. The JDBC source connector is fine for append-only sources (log tables, event streams already in DB form) where polling semantics are acceptable. In interviews, the senior signal is naming the DELETE-blindness of JDBC source vs the log-completeness of Debezium.

What is the schema history topic for and can I delete it?

Never delete the schema history topic. Debezium persists every observed DDL statement (CREATE TABLE, ALTER TABLE, DROP TABLE) to a dedicated Kafka topic (schema.history.internal.kafka.topic), and on every connector restart it replays that topic from offset 0 to rebuild the internal per-table schema. Without the schema history, the connector cannot decode WAL/binlog records for existing tables — it doesn't know which columns exist, what their types are, or how to construct the Avro envelope. The topic is single-partition (DDL must be totally ordered), compacted (only the latest state per key is retained), and retention-forever. Deleting the topic (by ops mistake, retention misconfig, or drop-and-recreate) forces a full re-snapshot with snapshot.mode = initial — hours or days of downtime for a large source. Set retention.ms = -1, cleanup.policy = compact, partitions = 1, replication.factor = 3, and lock down permissions so the topic cannot be deleted accidentally.

Do I need Kafka to run Debezium?

Not necessarily. Kafka Connect mode — the mainstream deployment — runs Debezium as a task inside a Kafka Connect worker and pushes events to Kafka topics. This requires an operating Kafka + Connect cluster. Debezium Server mode — introduced in 2.x — runs the same connector code as a standalone Quarkus-based JVM that pushes to a pluggable sink: Pulsar, Kinesis, Google Pub/Sub, Azure Event Hubs, Redis Streams, HTTP, or Kafka via a lightweight producer. Server mode requires a persistent local volume for the offset and schema-history files (unlike Connect mode, which stores them in Kafka topics). For teams that already run Kafka, Connect mode is the standard answer. For teams that don't run Kafka and don't want to add it just for CDC, Debezium Server is a legitimate deployment mode — same connector logic, simpler runtime, one JVM plus one PVC.

What is the outbox pattern and why does Debezium provide an SMT for it?

The outbox pattern solves the dual-write problem: writing to your business database and publishing to Kafka in the same request handler is not atomic under any partial-failure scenario (the DB commit succeeds, the Kafka publish times out, the process crashes; the DB has the new row, Kafka has no event, downstream consumers never learn of the change). The outbox pattern turns the two-transactional-system problem into a one-transactional-system problem: every business write includes an INSERT into an outbox table inside the same DB transaction. Debezium then tails the outbox table via WAL/binlog CDC and emits each new outbox row as a Kafka event. The EventRouter SMT ships with Debezium and does the last-mile transformation — it reads outbox columns (aggregate_type, aggregate_id, event_type, payload) and produces a Kafka event routed by aggregate_type (topic name), keyed by aggregate_id (partition), with payload as the value and event_type as a header. One outbox table, one connector, N topics fanned out per aggregate — with at-least-once atomicity anchored at the DB commit.

Does Debezium give exactly-once semantics end-to-end?

Debezium is at-least-once by default. The database-to-connector hop is at-least-once because on connector crash, the WAL/binlog is replayed from the last committed offset; any records processed but not yet acknowledged are re-processed on restart. The connector-to-Kafka hop can be made exactly-once by enabling Kafka Connect's transactional producer: producer.override.enable.idempotence = true, producer.override.acks = all, producer.override.transactional.id = <unique-per-connector>. This unlocks Kafka's transactional-write semantics and gives you exactly-once from Debezium to Kafka. However, end-to-end exactly-once still requires idempotent consumers — the downstream loader (Snowflake sink, warehouse ingest, materialised view refresher) must handle duplicates via upserts, idempotency keys, or a deduplication window. In interviews, the senior answer is "Debezium can be exactly-once to Kafka; end-to-end exactly-once requires idempotent consumers, which most teams achieve via upsert-style downstream sinks."

Practice on PipeCode

Drill the streaming practice library → for the CDC, change-log, and event-driven problems senior interviewers love.
Rehearse on the ETL practice library → for the pipeline-design and multi-source ingestion problems that motivate log-based CDC in the first place.
Sharpen the design axis with the SQL practice library → for the schema-evolution, DDL migration, and outbox-table modelling problems that separate senior from mid.
Stack the prerequisites against PipeCode's broader 450+ data-engineering catalogue to anchor the connector-semantics, snapshot-mode, and outbox-pattern intuition against real graded inputs.

Lock in Debezium muscle memory

Docs explain connectors. PipeCode drills explain the decision — when to reach for incremental snapshotting, why the schema-history topic must be single-partition, when the outbox pattern beats a dual write, when Debezium Server replaces Kafka Connect. Pipecode.ai is Leetcode for Data Engineering — pattern-first practice tuned for the production trade-offs senior data engineers actually face.

Practice streaming problems →
Practice ETL problems →

1. Why Debezium became the CDC default

Log-based CDC on Kafka Connect — the two architectural choices that made every other CDC tool a niche

Worked example — the log-vs-poll trade-off

Worked example — Debezium versus a bespoke trigger-based CDC

Worked example — Debezium Server vs Kafka Connect mode

Senior interview question on the Debezium architecture choice

Solution Using Debezium on Kafka Connect with a Snowflake sink

2. Connectors — Postgres, MySQL, Mongo, SQL Server

One connector JAR per source engine — pgoutput, binlog, change stream, and CDC tables in a single mental map

Worked example — Postgres connector with logical replication

Worked example — MySQL connector with GTID

Worked example — MongoDB change streams

Senior interview question on connector selection and setup

Solution Using one connector class per source with unified topic-prefix conventions

3. Snapshot modes + incremental snapshotting

Six snapshot modes and one signal table — the difference between a 6-hour outage and a live table addition

Worked example — choosing a snapshot mode

Worked example — adding a table to a live connector via incremental snapshot

Worked example — snapshot locking trade-offs

Senior interview question on snapshot design

Solution Using incremental snapshotting via signal-table dispatch

4. Schema history + schema evolution

The compacted DDL replay topic that keeps the connector honest across every ALTER TABLE

Worked example — schema-history topic setup and monitoring

Worked example — schema evolution via Schema Registry

Worked example — connector restart with schema-history replay

Senior interview question on schema evolution

Solution Using a phased column-migration playbook with BACKWARD compatibility

5. Outbox pattern + production hygiene

Turn dual-write into a single transaction — outbox table, EventRouter SMT, and the four production hygiene axes senior interviewers probe

Worked example — outbox table + EventRouter SMT config

Worked example — outbox pruning strategy

Worked example — dead-letter queue for poison events

Senior interview question on outbox + production hygiene

Solution Using outbox + EventRouter SMT + DLQ + LSN-aware pruning + transactional producer

Cheat sheet — Debezium recipes

Frequently asked questions

What is Debezium and how does it work?

Debezium vs Kafka native Connect connectors — when do I pick each?

What is the schema history topic for and can I delete it?

Do I need Kafka to run Debezium?

What is the outbox pattern and why does Debezium provide an SMT for it?

Does Debezium give exactly-once semantics end-to-end?

Practice on PipeCode

Lock in Debezium muscle memory

One connector JAR per source engine — `pgoutput`, `binlog`, change stream, and CDC tables in a single mental map

The compacted DDL replay topic that keeps the connector honest across every `ALTER TABLE`

Turn dual-write into a single transaction — outbox table, `EventRouter` SMT, and the four production hygiene axes senior interviewers probe