debezium is the change data capture engine that quietly powers most of the modern "database-to-Kafka" plumbing in production data platforms — and the single tool senior data engineers get asked about most when the interviewer wants to know whether you have shipped a real streaming pipeline or only watched a demo. A debezium connector reads the write-ahead log of a source database (Postgres WAL, MySQL binlog, MongoDB oplog, SQL Server CDC tables) and emits a per-row change event stream onto Kafka — not by polling a updated_at column, not by trigger scripts, but by tailing the transaction log the database already writes to for crash recovery. That single architectural choice is why Debezium became the CDC default and why the interview questions on it converge on the same four axes: which connector class for which source, which debezium snapshot mode for which restart requirement, how the debezium schema history topic keeps the pipeline honest across DDL, and how the debezium outbox pattern solves the dual-write problem that every distributed team eventually hits.
This guide is the senior-DE walkthrough you wished existed the first time an interviewer asked "walk me through what happens when a Debezium connector restarts on a table with snapshot.mode = when_needed" or "how does incremental snapshotting actually work — what is the signal table for?" or "your app writes to Postgres and publishes to Kafka in the same request handler and one of the writes fails; design the fix using the outbox pattern and the EventRouter SMT." It walks through the connector-class map across Postgres, MySQL, Mongo, and SQL Server, the six snapshot modes and the 2.x incremental snapshot mechanism, the schema-history topic architecture and its Schema Registry integration, the outbox table + SMT design that gives you exactly-once semantics for business events, and the production hygiene — tombstones, dead-letter queues, offset recovery, at-least-once vs exactly-once — that separates a demo from a cdc kafka backbone. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why it works.
When you want hands-on reps immediately after reading, drill the streaming practice library →, rehearse on the ETL practice library →, and sharpen the design axis with the SQL practice library →.
On this page
- Why Debezium became the CDC default
- Connectors — Postgres, MySQL, Mongo, SQL Server
- Snapshot modes + incremental snapshotting
- Schema history + schema evolution
- Outbox pattern + production hygiene
- Cheat sheet — Debezium recipes
- Frequently asked questions
- Practice on PipeCode
1. Why Debezium became the CDC default
Log-based CDC on Kafka Connect — the two architectural choices that made every other CDC tool a niche
The one-sentence invariant: Debezium is a set of Kafka Connect source connectors that tail the write-ahead log of a source database and emit per-row change events with a stable JSON or Avro envelope, at-least-once by default, with resumable offsets stored in Kafka itself. Everything else about it — six snapshot modes, the schema-history topic, the outbox pattern, the incremental snapshot signal table, the tombstone-on-delete convention — flows from those two architectural choices: log-based reading (not polling) and Kafka Connect (not a bespoke runtime). Trigger-based CDC systems (custom pg_trigger fanning into a shadow table) buckle at load; polling systems (SELECT ... WHERE updated_at > :last_run) miss deletes and race with soft-delete semantics; message-queue systems require dual writes that are famously not atomic. Log-based CDC is the only design that gives you every insert, every update, every delete, in commit order, with no schema-invasive changes to the source database.
The four "must-answer" axes interviewers actually probe.
-
Connector class. Postgres uses
io.debezium.connector.postgresql.PostgresConnectorwith aplugin.nameofpgoutput(built-in) orwal2json(legacy). MySQL usesio.debezium.connector.mysql.MySqlConnectorwithbinlog_format = ROWand preferablygtid_mode = ON. Mongo usesio.debezium.connector.mongodb.MongoDbConnectoron top of change streams (oplog on older setups). SQL Server usesio.debezium.connector.sqlserver.SqlServerConnectoron top of SQL Server Agent's CDC tables. Naming the right connector class per source is the first senior-DE signal. -
Snapshot mode.
initial,always,never,initial_only,when_needed,schema_only— six modes that decide what happens when a connector starts against a table for the first time and what happens when it restarts after a broken offset. Wrong mode = duplicate events, missed events, or a snapshot that runs for six hours during a routine restart. Knowing the mode-matrix by heart separates senior from mid. -
Schema history. DDL against the source (
ALTER TABLE,CREATE TABLE) must be replayable when the connector restarts, otherwise the connector cannot decode WAL records that were written under an older schema. Debezium persists every observed DDL statement to a dedicatedschema.history.internal.kafka.topic— a compacted, single-partition, retention-forever topic that is replayed on every connector restart. Deleting this topic breaks the connector. -
Outbox. The dual-write problem — writing to your business table and publishing to Kafka in the same request handler is not atomic unless you use the outbox pattern. Debezium ships an
EventRouterSMT that transforms outbox-table events into aggregate-routed topic messages, giving you a single-transaction publish semantics on top of at-least-once CDC.
The 2026 reality — what Debezium looks like today.
-
Debezium 2.x. Renamed configs (
database.hostname→database.hostnamestill, but manydatabase.*moved totopic.prefix+ connector-specific namespaces), added incremental snapshotting driven by a signal table, added Debezium Server (Quarkus-based standalone mode) for teams that don't want Kafka Connect. - Connect mode vs Server mode. Kafka Connect mode runs the connector as a task inside a Connect worker — the mainstream deployment. Debezium Server runs the same connector code as a standalone JVM that pushes to Kafka, Pulsar, Kinesis, Pub/Sub, EventHubs, or Redis Streams — used when the team wants CDC without operating Connect.
-
Incremental snapshotting. Instead of one giant blocking snapshot at connector startup, incremental snapshots let you add a new table (or re-snapshot an existing one) chunk-by-chunk while live streaming continues. Chunks are triggered by inserting rows into a
debezium_signalcontrol table. -
Signal table. The
debezium_signaltable on the source database is the control channel — inserting a JSON payload ({"type": "execute-snapshot", "data": {"data-collections": ["public.orders"]}}) triggers an incremental snapshot without any connector restart.
What interviewers listen for.
- Do you say "tails the WAL / binlog" in the first sentence when asked how Debezium works? — senior signal.
- Do you mention the schema-history topic as a distinct topic that must never be deleted? — required answer.
- Do you distinguish snapshot vs streaming phases and name the six snapshot modes? — required answer.
- Do you cite the outbox pattern as the solution to dual-write and name the
EventRouterSMT? — senior signal. - Do you know incremental snapshotting is signal-table driven and does not require a restart? — senior signal for 2.x-era candidates.
What the "at-least-once by default" clause actually means.
-
Offsets are stored in Kafka. Debezium writes the WAL / binlog position to a special
__debezium-connect-offsetstopic on every successful poll. On restart, the connector resumes from the last committed offset. - The gap between "processed" and "committed offset". Between reading a WAL record and committing the offset back to Kafka, the connector can crash. On restart, it re-reads the same WAL records — duplicates are possible.
- Downstream must be idempotent. The consumer of the CDC topic (a warehouse loader, a materialised view refresher, an event handler) must handle duplicates via idempotency keys, upserts, or a deduplication window.
- Exactly-once is possible but expensive. Debezium 2.x supports Kafka Connect's transactional producer semantics, which give you exactly-once from the connector to Kafka. The database-to-connector step is still at-least-once because WAL replay is idempotent only at the record level, not at the transaction level. True end-to-end exactly-once requires idempotent downstream consumers regardless.
Common interview probes on Debezium fundamentals.
- "How does Debezium know where to resume after a crash?" — Kafka Connect offset topic, WAL/binlog position stored per source.
- "What's the difference between snapshot and streaming?" — snapshot reads current rows via SELECT; streaming tails the log. Snapshot happens once (per mode); streaming is continuous.
- "Do I need Kafka to run Debezium?" — Kafka Connect mode does; Debezium Server does not (it can push to Pulsar, Kinesis, Pub/Sub, EventHubs, Redis Streams).
- "Why is the schema-history topic single-partition?" — DDL events must be totally ordered; a single partition gives you total order on replay.
- "Why compacted?" — you only need the latest schema version for each table; older versions can be compacted away as long as the compacted record retains the schema mutation history.
Worked example — the log-vs-poll trade-off
Detailed explanation. The textbook naive approach: an analytics team implements CDC via a polling job that runs SELECT * FROM orders WHERE updated_at > :last_run every 30 seconds. Six weeks later they discover the pipeline missed 12,000 deleted rows, dropped 300 updates that landed inside the polling window, and duplicated 800 records that were touched twice between polls. Walk an interviewer through why log-based CDC (Debezium) is the correct answer and what the log actually captures that polling cannot.
- The symptom. Downstream warehouse count diverges from source count by ~1% per week and slowly worsens over time.
- The naive fix. Reduce polling interval from 30 s to 5 s. Divergence rate drops but does not disappear; database CPU load doubles.
-
The real bug. Polling on
updated_atcannot see deletes (unless soft-delete), misses rows updated inside the window (last-write-wins with the polling window), and duplicates rows updated at the window boundary. - The Debezium answer. WAL/binlog captures every insert, update, and delete with their transaction commit order. The connector reads a byte offset into the log; missed windows are impossible by construction.
Question. A team has 5 TB of Postgres data and wants to ship every row change to Kafka for a downstream warehouse. Compare polling CDC (SELECT ... WHERE updated_at) against log-based CDC (Debezium) across correctness, latency, source load, and operational surface. Recommend the design and justify.
Input.
| Requirement | Polling CDC | Log-based CDC (Debezium) |
|---|---|---|
| Data size | 5 TB | 5 TB |
| Change rate | 50k rows/sec peak | 50k rows/sec peak |
| Latency target | 30 s | 5 s |
| Schema change frequency | 1–2 per week | 1–2 per week |
| Source database | Postgres 16 | Postgres 16 |
Code.
-- Polling CDC — the naive approach (broken)
-- Runs every 30 seconds from an external scheduler
SELECT id,
tenant_id,
total_cents,
status,
updated_at
FROM orders
WHERE updated_at > :last_run
ORDER BY updated_at;
-- Problems:
-- (a) DELETE is invisible (no row to select)
-- (b) UPDATE-then-UPDATE inside window shows only latest state
-- (c) Rows landing at exactly :last_run are duplicated or dropped
-- (d) Full-table scan every 30 s at high change rate = huge cost
{
"name": "orders-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "db.internal",
"database.port": "5432",
"database.user": "debezium",
"database.password": "***",
"database.dbname": "shop",
"topic.prefix": "shop",
"plugin.name": "pgoutput",
"publication.name": "dbz_publication",
"slot.name": "dbz_slot",
"table.include.list": "public.orders",
"snapshot.mode": "initial",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.shop",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter"
}
}
Step-by-step explanation.
- The polling query on
updated_atis fundamentally lossy. Deletes are invisible (the row is gone; there is nothing to select). Rows touched twice inside the 30-second window collapse to their latest state — the intermediate state is lost. Rows updated exactly at the window boundary either duplicate (>=) or drop (>) depending on the operator, and clock skew between the app server and the database makes the boundary fuzzy. - The Debezium config connects to Postgres via logical replication (
plugin.name = pgoutput), creates a replication slot (dbz_slot), and creates a publication (dbz_publication). The connector then reads WAL records for the tables listed inpublication.name. Every INSERT, UPDATE, DELETE — including transaction boundaries — arrives in commit order. - Latency: WAL records are read as soon as they are flushed by Postgres. Debezium sees an INSERT within milliseconds of COMMIT; the Kafka message lands within another few milliseconds. Polling CDC has a worst-case latency of the polling interval + query duration + downstream commit — typically 45+ seconds for a 30 s window.
- Source load: polling runs an index scan or full-table scan every 30 s; at 50k rows/sec of changes over a 5 TB table this is meaningful CPU and IO. Debezium reads the WAL sequentially, imposing near-zero additional CPU on the source (the WAL is already being written for crash recovery).
- The recommendation is Debezium with
snapshot.mode = initial. The initial snapshot backfills the 5 TB current state; the streaming phase continuously tails WAL. Correctness is exact (every change captured in commit order); latency is sub-second; source load is negligible; operational overhead is the Connect cluster + the schema-history topic — both one-time setups.
Output.
| Axis | Polling CDC | Log-based CDC (Debezium) |
|---|---|---|
| Captures INSERTs | yes | yes |
| Captures UPDATEs | last-write-wins in window | every one, in order |
| Captures DELETEs | no (unless soft delete) | yes |
| Latency (steady state) | 30–60 s | < 1 s |
| Source CPU | full scan every 30 s | ~0 (WAL already written) |
| Divergence rate | 0.5–2% per week | 0% (barring bugs) |
Rule of thumb. For any production CDC pipeline where correctness matters more than "we shipped something," pick log-based CDC on day one. Polling CDC is a valid pattern only when the source is a legacy system that does not expose a log — and even then, plan the migration to log-based.
Worked example — Debezium versus a bespoke trigger-based CDC
Detailed explanation. A different anti-pattern: instead of polling, a team writes Postgres AFTER INSERT/UPDATE/DELETE triggers on every business table. Each trigger inserts a row into a shadow cdc_events table. An external cron job reads cdc_events and pushes to Kafka. This works — until write throughput doubles, at which point the trigger overhead becomes the bottleneck and the shadow table becomes a hot spot. Debezium is the log-based answer that gets you the same semantics without the trigger overhead.
-
Trigger overhead. Every INSERT/UPDATE/DELETE on a business table now runs an extra INSERT into
cdc_events. Postgres executes trigger code inside the same transaction; commit latency and lock hold time both increase. -
Shadow-table hot spot. All triggers write to one table. At high concurrency the
cdc_eventswrite becomes the tail-latency bottleneck; b-tree contention on the primary key index causes lock waits. - Cleanup. The shadow table grows without bound unless an external job prunes it. The prune job is another moving part.
- Debezium answer. The WAL captures everything the triggers would capture, without any additional work on the source database.
Question. A trigger-based CDC pipeline is causing tail-latency regressions on the source database. Design the migration to Debezium: what runs during the migration, how do you validate no events are missed, and what is the cutover procedure?
Input.
| Component | Value |
|---|---|
| Source | Postgres 16, 10 TB |
| Trigger-based CDC | Shadow table + cron push to Kafka |
| Target | Debezium PostgresConnector on Kafka Connect |
| Downstream consumers | 3 (warehouse loader, cache invalidator, search indexer) |
| Cutover window | 30-minute maintenance window acceptable |
Code.
-- Old — trigger-based CDC (the incumbent)
CREATE TABLE cdc_events (
id BIGSERIAL PRIMARY KEY,
table_name TEXT NOT NULL,
op CHAR(1) NOT NULL, -- 'I', 'U', 'D'
row_data JSONB NOT NULL,
ts TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE FUNCTION emit_cdc() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO cdc_events (table_name, op, row_data)
VALUES (TG_TABLE_NAME, LEFT(TG_OP, 1), to_jsonb(NEW));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER orders_cdc AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION emit_cdc();
{
"name": "orders-shadow-cdc-migration",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "db.internal",
"database.dbname": "shop",
"topic.prefix": "shop",
"plugin.name": "pgoutput",
"publication.name": "dbz_publication",
"slot.name": "dbz_slot",
"table.include.list": "public.orders,public.line_items,public.payments",
"snapshot.mode": "initial",
"signal.data.collection": "public.debezium_signal",
"topic.creation.default.replication.factor": "3",
"topic.creation.default.partitions": "12"
}
}
Step-by-step explanation.
- Stand up Debezium in parallel with the trigger-based system. The initial snapshot backfills current state for the three tables; the streaming phase begins tailing WAL. Both pipelines are live simultaneously — Debezium writes to new topics (
shop.public.orders,shop.public.line_items,shop.public.payments), the trigger system writes to the legacy topic. - Validate no events are missed by running a comparison job: for a 10-minute sample, count events per (table, op) in the Debezium topic and in the trigger-based topic. The counts should match within a rounding tolerance for concurrent transactions. Any systematic gap points to a Debezium config issue (usually
publication.namemissing a table). - Migrate downstream consumers one at a time. The warehouse loader is the easiest — flip its bootstrap-topic list from legacy to new; because both streams contain the same commit-ordered events, the loader's watermark advances the same way. Repeat for the cache invalidator and the search indexer. Each consumer migration is a config change, not a code change.
- Once all three consumers are on the Debezium stream, drop the triggers and the cron job. The
cdc_eventsshadow table can be truncated and eventually dropped after a two-week grace period. - Measure the source-side impact after cutover: p99 commit latency drops by 15–40% (no more trigger overhead); lock contention on
cdc_eventsdisappears; the source database CPU steady-state drops by 5–10%.
Output.
| Metric | Trigger-based CDC | Debezium |
|---|---|---|
| Trigger overhead on source | 15–40% commit latency | 0% (WAL already written) |
| Shadow-table lock contention | high | none |
| Latency (event → Kafka) | 5–30 s (cron interval) | < 1 s |
| Missed events on restart | possible (offset in shadow table) | 0 (offset in Kafka) |
| Operational surface | triggers + shadow table + cron + prune | Connect cluster + schema-history topic |
Rule of thumb. Trigger-based CDC is a legitimate stopgap when the team cannot yet operate Kafka Connect. But once Kafka is in the stack, Debezium is unambiguously the better answer — same semantics, no source-side overhead, no shadow-table hot spot.
Worked example — Debezium Server vs Kafka Connect mode
Detailed explanation. A common design question: "we want CDC into Pub/Sub, not Kafka; can we still use Debezium?" The answer is yes, via Debezium Server — a standalone Quarkus-based runtime that runs the same connector code but ships events to a pluggable sink instead of the Kafka Connect offset topic. Debezium Server is what makes Debezium the CDC engine of choice even for teams that don't operate Kafka.
- Connect mode. Debezium runs as a Kafka Connect connector task. Requires an operating Connect cluster (2+ workers, ZooKeeper or KRaft, offset/config/status topics). The mainstream, most-battle-tested deployment.
- Server mode. Debezium runs as a standalone JVM. Same connector code, same snapshot mechanics, same schema-history semantics — but the sink is pluggable (Pulsar, Kinesis, Pub/Sub, EventHubs, Redis Streams, HTTP, or Kafka via a simpler producer).
- The trade. Connect mode is more operationally standard for Kafka shops. Server mode is significantly simpler to deploy for teams that don't need Connect for anything else.
Question. A team wants CDC from Postgres to Google Cloud Pub/Sub but does not run Kafka. Design the deployment using Debezium Server, and compare the operational footprint against running Kafka Connect just for this.
Input.
| Component | Value |
|---|---|
| Source | Postgres 16 on GCP Cloud SQL |
| Sink | Google Cloud Pub/Sub |
| Team's current Kafka experience | none |
| Operational preference | minimal moving parts |
Code.
# debezium-server config — application.properties equivalent
debezium.sink.type=pubsub
debezium.sink.pubsub.project.id=my-gcp-project
debezium.sink.pubsub.address=pubsub.googleapis.com:443
debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector
debezium.source.database.hostname=10.0.0.5
debezium.source.database.port=5432
debezium.source.database.user=debezium
debezium.source.database.password=${SECRET_PW}
debezium.source.database.dbname=shop
debezium.source.topic.prefix=shop
debezium.source.plugin.name=pgoutput
debezium.source.publication.name=dbz_publication
debezium.source.slot.name=dbz_slot
debezium.source.table.include.list=public.orders,public.customers
debezium.source.offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore
debezium.source.offset.storage.file.filename=/var/lib/debezium/offsets.dat
debezium.source.schema.history.internal=io.debezium.storage.file.history.FileSchemaHistory
debezium.source.schema.history.internal.file.filename=/var/lib/debezium/schema-history.dat
# k8s deployment for debezium-server
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: debezium-server
spec:
serviceName: debezium
replicas: 1
template:
spec:
containers:
- name: server
image: debezium/server:2.6
volumeMounts:
- name: state
mountPath: /var/lib/debezium
- name: config
mountPath: /debezium/conf
volumeClaimTemplates:
- metadata:
name: state
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
Step-by-step explanation.
- Debezium Server runs as a single JVM inside a StatefulSet. The StatefulSet gives it stable identity (
debezium-server-0) and a stable persistent volume for offset and schema-history storage. Unlike Connect mode (where offsets live in a Kafka topic), Server mode's offsets live on local disk — the volume must be durable. - The
debezium.sink.type = pubsubline swaps the Kafka producer for a Pub/Sub publisher. The source-side code is identical to Connect mode — same WAL reader, same snapshot machinery, same JSON envelope. - Offset and schema-history storage moves from Kafka topics to local files. This is why StatefulSet + persistent volume is required — if the pod restarts and the volume is lost, the connector cannot resume its position and re-snapshots everything.
- Operational footprint: one JVM, one PVC, one Pub/Sub topic. No Connect cluster, no ZooKeeper/KRaft, no ancillary Kafka topics. For a team that already runs Kubernetes but does not run Kafka, this is a 10× simpler deployment.
- The trade-off: no cross-connector metadata topic sharing (each Debezium Server is isolated), no built-in horizontal scaling (add a second server with a different
slot.nameif you need throughput), and slightly less community tooling than Connect mode. For most single-source CDC pipelines this is a perfectly reasonable trade.
Output.
| Concern | Kafka Connect mode | Debezium Server mode |
|---|---|---|
| Runtime | Connect worker cluster | Single JVM (StatefulSet) |
| Offset storage | Kafka topic | Local file (PVC) |
| Schema history storage | Kafka topic | Local file (PVC) |
| Sinks supported | Kafka only | Kafka, Pulsar, Kinesis, Pub/Sub, EventHubs, Redis Streams, HTTP |
| Team requirements | operates Kafka Connect | operates a JVM + PVC |
| Best for | Kafka-native shops | non-Kafka sinks or Kafka-averse teams |
Rule of thumb. If Kafka is already in the stack, use Connect mode — the operational patterns are standard. If Kafka is not in the stack and you don't want to add it just for CDC, Debezium Server is the answer. The connector code and semantics are identical; the runtime is the choice.
Senior interview question on the Debezium architecture choice
A senior interviewer often opens with: "Your team runs Postgres and needs to ship every row change to Snowflake within 5 seconds of commit. Walk me through why you'd pick Debezium over polling, over triggers, and over a Snowflake-native CDC connector, and what your reference deployment would look like end-to-end."
Solution Using Debezium on Kafka Connect with a Snowflake sink
{
"name": "orders-to-snowflake-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "db.internal",
"database.port": "5432",
"database.user": "debezium",
"database.password": "***",
"database.dbname": "shop",
"topic.prefix": "shop.cdc",
"plugin.name": "pgoutput",
"publication.name": "dbz_publication",
"slot.name": "dbz_slot",
"table.include.list": "public.orders,public.line_items,public.customers,public.payments",
"snapshot.mode": "initial",
"snapshot.locking.mode": "none",
"signal.data.collection": "public.debezium_signal",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.shop",
"topic.creation.default.replication.factor": "3",
"topic.creation.default.partitions": "12",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}
Step-by-step trace.
| Layer | Component | Latency budget |
|---|---|---|
| Source | Postgres commit → WAL flush | ~0 (already required) |
| Read | Debezium tails WAL via pgoutput | ~50 ms |
| Envelope | Convert to Avro, wrap in envelope | ~5 ms |
| Publish | Producer send to Kafka (acks=all) | ~20 ms |
| Consume | Snowflake Kafka Connector (batch) | ~2 s |
| Load | Snowpipe Streaming ingest | ~1 s |
| Total | Postgres commit → Snowflake table | ~3.1 s (target: 5 s ✓) |
After the rollout, the pipeline delivers row-level changes into Snowflake within 3 seconds p99 and 5 seconds p999. The initial snapshot takes ~2 hours for the 10 TB source (bounded by network + Snowflake load rate); the streaming phase then runs indefinitely at ~50k events/sec steady state.
Output:
| Metric | Polling CDC | Trigger CDC | Debezium + Snowflake sink |
|---|---|---|---|
| Latency p99 | 30–60 s | 5–30 s | 3 s |
| Source CPU overhead | 10–20% | 15–40% | ~0% |
| Correctness | lossy | tight | exact |
| Operational surface | scheduler + query | triggers + shadow + cron | Connect + schema-history + registry |
| SLA achievable | 30 s | 15 s | 5 s |
Why this works — concept by concept:
- Log-based CDC — the WAL captures every INSERT/UPDATE/DELETE in commit order with zero additional source-side work. Postgres is already writing the WAL for crash recovery; Debezium is a free rider.
- Kafka Connect — offset management, retries, dead-letter queue, JMX metrics, REST admin — all inherited from the Connect framework. You do not write CDC infrastructure; you configure it.
-
ExtractNewRecordState SMT — the raw Debezium envelope has a
before/after/op/sourcewrapper; downstream sinks usually want just the new row. Theunwraptransform flattens the envelope to the row payload.delete.handling.mode = rewritepreserves DELETEs as a payload with a__deleted = truemarker (versus dropping them silently). - Avro + Schema Registry — Avro gives you compact binary encoding + forward/backward schema evolution rules. The Schema Registry (Confluent or Apicurio) enforces compatibility on every write, so a bad schema change is caught at the connector, not at the downstream consumer.
- Cost — Debezium adds one Connect cluster, one schema-history topic (compacted, single-partition, low volume), and one Schema Registry (which many teams already run for other reasons). Steady-state CPU cost on the source database is O(WAL flush rate), which is bounded by the transaction volume — the same rate Postgres already handles.
Streaming
Topic — streaming
Streaming CDC and change-log problems
2. Connectors — Postgres, MySQL, Mongo, SQL Server
One connector JAR per source engine — pgoutput, binlog, change stream, and CDC tables in a single mental map
The mental model in one line: Debezium ships one connector class per source engine — PostgresConnector reads WAL via pgoutput (or the legacy wal2json), MySqlConnector reads the binlog with GTID tracking, MongoDbConnector consumes change streams (oplog on older setups), and SqlServerConnector reads the CT/CDC tables populated by SQL Server Agent — and each class has its own permission model, snapshot mechanics, and schema evolution rules that the senior candidate is expected to know by heart.
The four axes interviewers actually probe for connector questions.
- Log-reading mechanism. WAL for Postgres, binlog for MySQL, change stream for Mongo, CDC tables for SQL Server. Each mechanism has a distinct on-disk format and access pattern; naming the right one is the first senior signal.
-
Source-side configuration. Postgres needs
wal_level = logical+ a replication slot + a publication. MySQL needsbinlog_format = ROW+binlog_row_image = FULL+gtid_mode = ON. Mongo needs the replica set to expose change streams (default on modern Mongo). SQL Server needsEXEC sys.sp_cdc_enable_dband per-tablesys.sp_cdc_enable_table. -
Permission model. Postgres needs a role with
REPLICATIONattribute. MySQL needsREPLICATION SLAVE, REPLICATION CLIENTon the connector user. Mongo needs a role with thechangeStreamprivilege (or a superuser role). SQL Server needsdb_owner(or fine-grained CDC roles) to read the CDC tables. -
Snapshot mechanics. Each connector class implements the snapshot phase differently — Postgres uses a repeatable-read transaction + SELECT; MySQL uses
FLUSH TABLES WITH READ LOCK(or--single-transactionequivalent); Mongo uses a fullDocument change-stream lookup; SQL Server uses a SELECT against the source table plus a bookmark against the CDC bookmark table.
Postgres — the reference connector.
-
Log-reading mechanism. WAL via logical replication. The connector opens a replication slot (
slot.name) and consumes decoded WAL records via a decoding plugin. -
Plugin choice.
pgoutputis built into Postgres 10+ and is the default.wal2jsonanddecoderbufsare legacy plugins used on older versions or when JSON-format decoding is preferred at the source. Modern deployments always pickpgoutput. -
Source-side config. Set
wal_level = logical,max_replication_slots >= 10(headroom),max_wal_senders >= 10,wal_sender_timeout = 60s(or as required). Create aPUBLICATIONlisting the tables to be tracked; the connector'spublication.namemust match. -
REPLICA IDENTITY. For UPDATE and DELETE events, Postgres needs to know which columns identify the row. Default is
DEFAULT(primary key). SetREPLICA IDENTITY FULLon tables without a primary key or where the connector should emit all old-column values on UPDATE/DELETE (useful for downstream diff logic). -
Slot warning. A replication slot holds back WAL cleanup as long as it exists. A stale slot (connector down for days) can fill the WAL disk. Set alerting on
pg_replication_slots.confirmed_flush_lsnlag.
MySQL — binlog with GTID tracking.
-
Log-reading mechanism. Binary log (
binlog). The connector reads binlog events by connecting as a replica and requesting binlog events from a specific position. -
Format.
binlog_format = ROWis mandatory (STATEMENT and MIXED are not supported).binlog_row_image = FULLgives you before + after for UPDATEs;MINIMALsaves binlog volume but breaks change-data-capture semantics for many downstream use cases. -
GTID. Global Transaction IDs — set
gtid_mode = ONandenforce_gtid_consistency = ON. GTIDs give you cleaner resume semantics after failover; without GTID the connector tracks (binlog file, position) which becomes invalid when replicas fail over. -
Source-side config.
expire_logs_days >= 7(orbinlog_expire_logs_seconds) to keep binlog around long enough for the connector to catch up after downtime. Setlog_slave_updates = ONif reading from a replica (so its binlog contains the primary's changes). -
Permission model.
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium'@'%';— the classic set.
MongoDB — change streams on a replica set.
- Log-reading mechanism. Change streams — a MongoDB API that returns an ordered stream of collection-level change events, backed by the oplog on the replica set.
- Requirement. Must be a replica set (or sharded cluster); single-node standalone Mongo does not have an oplog. Modern managed offerings (Atlas) are always replica sets.
-
Source-side config. Set the oplog window (
replication.oplogSizeMB) large enough for the connector to catch up after downtime. Enable authentication (security.authorization = enabled) — Debezium won't connect to an unauthenticated Mongo. -
Full document lookup. Change streams by default emit the delta ("update these fields to these values"). Set
capture.mode = change_streams_update_fullto get the full document on every event (extra Mongo lookup per event; useful for downstream sinks that want the full state). -
Permission model. Role with
changeStreamprivilege on the target collection, plusfindfor the initial snapshot phase.
SQL Server — CDC tables via SQL Server Agent.
-
Log-reading mechanism. Not the transaction log directly. SQL Server Agent's CDC feature reads the log and populates a per-table
cdc.<schema>_<table>_CTtable with change records. Debezium then reads those tables. -
Enablement.
EXEC sys.sp_cdc_enable_db;at the database level, thenEXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'orders', @role_name = 'cdc_reader';per table. - SQL Server Agent must be running. The CDC job runs inside SQL Server Agent; if the Agent service is down, the CT tables stop advancing and Debezium stalls.
-
Permission model. Debezium user needs
db_owner(or membership in the CDC role created bysp_cdc_enable_table) to SELECT from CT tables. -
Cleanup. CDC tables are pruned by an Agent job (
cleanup_change_tables). Retention is configurable viasys.sp_cdc_change_job. Set retention long enough for the connector to catch up after downtime.
Choosing the right connector — the map.
-
Postgres.
io.debezium.connector.postgresql.PostgresConnector+plugin.name = pgoutput. -
MySQL / MariaDB.
io.debezium.connector.mysql.MySqlConnector(there is also a MariaDB connector for MariaDB-specific quirks). -
MongoDB.
io.debezium.connector.mongodb.MongoDbConnector. -
SQL Server.
io.debezium.connector.sqlserver.SqlServerConnector. -
Oracle.
io.debezium.connector.oracle.OracleConnector(LogMiner or XStream — XStream requires Oracle GoldenGate license). -
Db2 / Cassandra / Vitess.
io.debezium.connector.db2.Db2Connector,io.debezium.connector.cassandra.CassandraConnector,io.debezium.connector.vitess.VitessConnector— less common.
Common interview probes on connectors.
- "What's the difference between
pgoutputandwal2json?" — pgoutput is built into Postgres 10+ and is the default; wal2json is a legacy JSON-decoding plugin. - "Why does MySQL need
binlog_row_image = FULL?" — for UPDATE events, you need before + after to compute diffs downstream. - "What is
REPLICA IDENTITY FULLfor in Postgres?" — makes UPDATE and DELETE events carry all old column values, not just the primary key. - "Can I run Debezium against a standalone MongoDB?" — no, must be a replica set (or sharded cluster).
- "What breaks if SQL Server Agent is down?" — the CT tables stop advancing; Debezium stalls; a runbook alert on Agent job status is required.
Worked example — Postgres connector with logical replication
Detailed explanation. The classic starter — set up a Debezium Postgres connector against a Postgres 16 database, with a publication, a replication slot, a REPLICA IDENTITY policy for tables without primary keys, and the exact permissions the connector user needs. Walk an interviewer through every step end-to-end.
-
Postgres side.
wal_level = logical, replication slot, publication, connector role withREPLICATIONattribute. -
Connector side. JSON config with
plugin.name = pgoutput,publication.name,slot.name,table.include.list. -
Common gotcha. Tables without primary keys must have
REPLICA IDENTITY FULLor the connector emits UPDATE/DELETE events with only a key placeholder.
Question. Configure a Debezium Postgres connector against the shop database that captures public.orders and public.line_items. Include the Postgres-side setup and the connector JSON.
Input.
| Component | Value |
|---|---|
| Postgres version | 16 |
| Database | shop |
| Tables | public.orders, public.line_items |
| Connector role | debezium |
| Kafka bootstrap | kafka:9092 |
Code.
-- Postgres side — one-time setup
-- postgresql.conf
-- wal_level = logical
-- max_replication_slots = 10
-- max_wal_senders = 10
-- Create the connector role
CREATE ROLE debezium WITH REPLICATION LOGIN PASSWORD 'strong-secret';
GRANT USAGE ON SCHEMA public TO debezium;
GRANT SELECT ON public.orders, public.line_items TO debezium;
-- Create the publication
CREATE PUBLICATION dbz_publication FOR TABLE public.orders, public.line_items;
-- If line_items has no primary key, set REPLICA IDENTITY FULL
ALTER TABLE public.line_items REPLICA IDENTITY FULL;
-- Verify
SELECT slot_name, plugin, database, active
FROM pg_replication_slots;
SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete
FROM pg_publication;
{
"name": "shop-postgres-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "db.internal",
"database.port": "5432",
"database.user": "debezium",
"database.password": "strong-secret",
"database.dbname": "shop",
"topic.prefix": "shop",
"plugin.name": "pgoutput",
"publication.name": "dbz_publication",
"slot.name": "dbz_shop_slot",
"publication.autocreate.mode": "disabled",
"table.include.list": "public.orders,public.line_items",
"snapshot.mode": "initial",
"snapshot.locking.mode": "none",
"signal.data.collection": "public.debezium_signal",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.shop",
"heartbeat.interval.ms": "10000",
"heartbeat.topics.prefix": "__debezium-heartbeat"
}
}
Step-by-step explanation.
-
wal_level = logicalunlocks Postgres's logical decoding path. The connector opens a replication slot; every WAL record for the tables in the publication is decoded and streamed to the connector. - The connector role has the
REPLICATIONattribute plusSELECTon the target tables (for the initial snapshot; streaming uses replication protocol, not SELECT). TheUSAGE ON SCHEMA publicgrant is easy to forget and causes cryptic "permission denied" errors on the snapshot phase. -
CREATE PUBLICATION dbz_publication FOR TABLE ...explicitly lists the tables. Alternative —FOR ALL TABLES— is easier but captures more than you want; the explicit list is the senior-DE default. The connector'spublication.autocreate.mode = disableddisables autocreation from the Debezium side to avoid a race with schema setup. -
ALTER TABLE public.line_items REPLICA IDENTITY FULLmakes UPDATE and DELETE events carry all column values, not just the primary key. Required forline_itemsif it lacks a natural key. - The heartbeat config (
heartbeat.interval.ms = 10000) tells the connector to emit an empty event every 10 seconds. This keeps the replication slot'sconfirmed_flush_lsnmoving forward even when the source tables are idle — otherwise, an idle table would prevent WAL cleanup on the source.
Output.
| Kafka topic (shop.public.orders) | Payload shape |
|---|---|
| Key | {"id": 12345} |
| Value | {"before": null, "after": {"id":12345,"tenant_id":1,"total_cents":19900,"status":"NEW"}, "op":"c", "source": {...}, "ts_ms": 1720000000000} |
| Tombstone (on delete, after unwrap) | Key {"id": 12345}, Value null
|
Rule of thumb. Every Postgres Debezium deployment ships with a heartbeat topic. Skipping the heartbeat is the #1 cause of "our Postgres WAL disk filled up because a replication slot was stuck at LSN X for six hours."
Worked example — MySQL connector with GTID
Detailed explanation. MySQL's binlog is the source-of-truth for replication; Debezium reads it as if it were another replica. The two source-side switches that matter are binlog_format = ROW (mandatory) and gtid_mode = ON (strongly recommended). GTID makes the connector resilient to primary failover — after failover, the connector resumes from the GTID rather than a now-invalid (binlog file, position) tuple.
-
binlog_format.
ROWgives you per-row change events.STATEMENTgives you SQL text (not usable for CDC).MIXEDis unpredictable. -
binlog_row_image.
FULLcaptures all columns on UPDATE (needed for CDC).MINIMALcaptures only changed columns + key. - GTID. Cleaner resume semantics after failover; also enables MySQL's own multi-source replication features.
Question. Configure a Debezium MySQL connector against a MySQL 8 database with GTID. Include the my.cnf changes and the connector JSON.
Input.
| Component | Value |
|---|---|
| MySQL version | 8.0 |
| Database | shop |
| Tables | shop.orders, shop.customers |
| Connector user | debezium |
| binlog retention | 7 days |
Code.
# my.cnf — MySQL 8 config for Debezium
[mysqld]
server-id = 42
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
binlog_expire_logs_seconds = 604800 ; 7 days
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
-- MySQL side — connector user grants
CREATE USER 'debezium'@'%' IDENTIFIED WITH mysql_native_password BY 'strong-secret';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT
ON *.* TO 'debezium'@'%';
FLUSH PRIVILEGES;
-- Verify GTID and binlog format
SHOW VARIABLES LIKE 'binlog_format'; -- ROW
SHOW VARIABLES LIKE 'binlog_row_image'; -- FULL
SHOW VARIABLES LIKE 'gtid_mode'; -- ON
SHOW MASTER STATUS; -- capture the starting Executed_Gtid_Set
{
"name": "shop-mysql-cdc",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql.internal",
"database.port": "3306",
"database.user": "debezium",
"database.password": "strong-secret",
"database.server.id": "9999",
"database.include.list": "shop",
"table.include.list": "shop.orders,shop.customers",
"topic.prefix": "shop",
"snapshot.mode": "initial",
"snapshot.locking.mode": "minimal",
"include.schema.changes": "true",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.mysql.shop",
"gtid.source.includes": "",
"gtid.source.excludes": ""
}
}
Step-by-step explanation.
-
binlog_format = ROWis non-negotiable; STATEMENT and MIXED cannot represent the per-row change events Debezium needs.binlog_row_image = FULLensures UPDATE events carry the before-image (all columns), letting downstream sinks compute diffs. -
gtid_mode = ON+enforce_gtid_consistency = ONenable GTIDs. Debezium tracks the connector position as a GTID set; on primary failover, the new primary continues emitting the same GTID set, and the connector resumes seamlessly. Without GTID the connector tracks (binlog file, position), which becomes invalid on failover. -
binlog_expire_logs_seconds = 604800keeps binlogs around for 7 days. If Debezium is down longer than that, the connector cannot resume without a full snapshot. 7 days is a reasonable balance between disk usage and downtime tolerance. -
database.server.id = 9999is the fake replica ID Debezium presents to MySQL. Must be unique across all replicas (real and virtual). Collisions cause silent binlog delivery gaps. -
snapshot.locking.mode = minimaluses a briefFLUSH TABLES WITH READ LOCKat snapshot start to get a consistent binlog position, then releases the lock.extendedholds the lock through the entire snapshot (safer but blocks writes);noneskips the lock entirely (fast but risks inconsistency if writes race with the snapshot).
Output.
| Kafka topic (shop.shop.orders) | Payload shape |
|---|---|
| Key | {"id": 12345} |
| Value | {"before": {"id":12345,"total_cents":19900,"status":"NEW"}, "after": {"id":12345,"total_cents":19900,"status":"SHIPPED"}, "op":"u", "source":{"gtid":"...","file":"mysql-bin.000042","pos":12345}} |
Rule of thumb. For MySQL, always turn on GTID before running Debezium. Retrofitting GTID onto an existing replication topology is a multi-hour operation; doing it up-front is a one-line config change.
Worked example — MongoDB change streams
Detailed explanation. MongoDB's change stream API gives you an ordered feed of collection-level change events, backed by the oplog on a replica set. Debezium's Mongo connector consumes this stream directly; there is no "read the oplog" surface — it's the change stream all the way down. The two axes to know are capture.mode (delta vs full document) and the requirement for a replica set (standalone Mongo does not expose an oplog).
- Change stream. MongoDB API introduced in 3.6; returns events with resume tokens for restart.
- Replica set requirement. Change streams require an oplog, which only exists on replica sets and sharded clusters.
-
Full document lookup. By default, the change stream emits a delta on UPDATE. Set
capture.mode = change_streams_update_fullto fetch the full document on every event.
Question. Configure a Debezium MongoDB connector against a replica set, capturing the shop.orders collection with full-document mode.
Input.
| Component | Value |
|---|---|
| MongoDB version | 6.0 |
| Topology | 3-node replica set |
| Collection | shop.orders |
| Connector user | debezium |
| Auth database | admin |
Code.
// Mongo side — connector user + role
use admin;
db.createRole({
role: "debezium_reader",
privileges: [
{ resource: { db: "shop", collection: "" }, actions: [ "find", "changeStream" ] },
{ resource: { db: "local", collection: "oplog.rs" }, actions: [ "find" ] }
],
roles: []
});
db.createUser({
user: "debezium",
pwd: "strong-secret",
roles: [ { role: "debezium_reader", db: "admin" } ]
});
{
"name": "shop-mongo-cdc",
"config": {
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"tasks.max": "1",
"mongodb.connection.string": "mongodb://debezium:strong-secret@mongo-0:27017,mongo-1:27017,mongo-2:27017/?replicaSet=rs0&authSource=admin",
"topic.prefix": "shop",
"capture.scope": "database",
"capture.target": "shop",
"collection.include.list": "shop.orders",
"snapshot.mode": "initial",
"capture.mode": "change_streams_update_full"
}
}
Step-by-step explanation.
- Mongo's replica set requirement is non-negotiable — a standalone
mongodhas no oplog, and the change stream API returns an error. Managed offerings (Atlas) are always replica sets, so this is transparent for managed customers. - The connector authenticates against the
admindatabase using thedebeziumuser with thedebezium_readercustom role. The role grantsfind+changeStreamon theshopdatabase andfindonlocal.oplog.rs(needed for the initial snapshot resume-token machinery). -
capture.mode = change_streams_update_fulltells the connector to include the full post-image document on every event. Without this, UPDATEs emit only the delta ({"status": "SHIPPED"}for a single-field update); downstream sinks that want the full row need to reconstruct it, which is fragile. -
snapshot.mode = initialruns an initialfindsnapshot onshop.ordersat connector startup, then transitions to the change stream. Alternative modes (never,when_needed) skip the snapshot; used when the collection is already in the downstream sink from a previous load. - The connection string is a full replica-set URI, listing all three replica members and the replica-set name. Debezium picks a primary or secondary read source based on the read-preference implicit in the change-stream API (typically the primary).
Output.
| Kafka topic (shop.shop.orders) | Payload shape |
|---|---|
| Key | {"id": "6512a4f3d1..."} |
| Value | {"before":null,"after":"{\"_id\":\"6512a4f3d1...\",\"tenant_id\":1,\"total_cents\":19900,\"status\":\"NEW\"}", "op":"c", ...} |
Rule of thumb. For MongoDB, always use change_streams_update_full unless you have a specific reason to want deltas. The extra Mongo lookup per event is well worth the operational simplicity downstream.
Senior interview question on connector selection and setup
A senior interviewer might ask: "You have three source databases in a fintech stack — a Postgres for transactions, a MongoDB for user documents, and a SQL Server for a legacy ledger. All three need CDC into a single Kafka cluster with unified consumer semantics. Walk me through the connector configs, the source-side setup for each, and the per-source gotchas you'd guard against."
Solution Using one connector class per source with unified topic-prefix conventions
[
{
"name": "fintech-postgres-tx",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "pg.internal",
"database.dbname": "fintech",
"database.user": "debezium",
"database.password": "***",
"topic.prefix": "fin.pg",
"plugin.name": "pgoutput",
"publication.name": "dbz_publication",
"slot.name": "dbz_fintech_slot",
"table.include.list": "public.transactions,public.accounts",
"snapshot.mode": "initial",
"signal.data.collection": "public.debezium_signal",
"heartbeat.interval.ms": "10000",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.fin.pg"
}
},
{
"name": "fintech-mongo-users",
"config": {
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"mongodb.connection.string": "mongodb://debezium@mongo-0,mongo-1,mongo-2/?replicaSet=rs0&authSource=admin",
"topic.prefix": "fin.mongo",
"capture.scope": "database",
"capture.target": "fintech",
"collection.include.list": "fintech.users,fintech.sessions",
"snapshot.mode": "initial",
"capture.mode": "change_streams_update_full"
}
},
{
"name": "fintech-sqlserver-ledger",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.hostname": "sqlserver.internal",
"database.port": "1433",
"database.user": "debezium",
"database.password": "***",
"database.names": "ledger",
"database.encrypt": "false",
"topic.prefix": "fin.mssql",
"table.include.list": "dbo.journal,dbo.balances",
"snapshot.mode": "initial",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.fin.mssql"
}
}
]
Step-by-step trace.
| Connector | Source-side gotcha | Runtime gotcha |
|---|---|---|
| Postgres |
wal_level = logical, publication list, REPLICA IDENTITY |
Stale slot fills WAL disk — heartbeat topic mandatory |
| MongoDB | Must be replica set, changeStream privilege |
Long-running change stream must handle resume-token expiry |
| SQL Server |
sp_cdc_enable_db + per-table sp_cdc_enable_table, Agent running |
Agent job failure stalls CT tables — Agent monitoring mandatory |
After the rollout, the three connectors write to fin.pg.*, fin.mongo.*, and fin.mssql.* topic namespaces. Downstream consumers see a uniform envelope shape (Debezium standard) across all three sources; the topic-prefix convention keeps ownership boundaries clear.
Output:
| Source | Topics created | Steady-state throughput | Heartbeat topic |
|---|---|---|---|
| Postgres (transactions, accounts) | fin.pg.public.transactions, fin.pg.public.accounts | 20k events/sec | fin.pg.__debezium-heartbeat |
| MongoDB (users, sessions) | fin.mongo.fintech.users, fin.mongo.fintech.sessions | 5k events/sec | (native change stream keepalive) |
| SQL Server (journal, balances) | fin.mssql.dbo.journal, fin.mssql.dbo.balances | 2k events/sec | (bookmark-driven; Agent monitoring separate) |
Why this works — concept by concept:
- One connector class per source — no "unified" connector attempts to cover multiple sources; each engine has its own log format, permission model, and snapshot mechanics. Debezium's design decision is to have a specialised connector JAR per source.
-
Unified topic-prefix convention —
topic.prefix = fin.<source>gives you a consistent naming pattern across sources. Downstream consumers can subscribe tofin.*for all fintech CDC events, or tofin.pg.*for just the Postgres subset. -
Uniform envelope — every Debezium event has the same
before/after/op/source/ts_msshape regardless of source engine. Downstream code can be written once against the envelope; source-specific fields (Postgres LSN, MySQL GTID, Mongo resume token) live in thesourcesub-object and are optional. - Per-source gotchas — the senior signal is naming the source-specific traps: heartbeat for Postgres, GTID + binlog retention for MySQL, replica-set requirement + full-doc mode for Mongo, Agent monitoring for SQL Server. Each is a real production bug someone lived through.
-
Cost — three connectors on one Connect cluster; the connectors share the Connect offset topic and the Schema Registry; each has its own
schema.historytopic (single-partition, compacted, low volume). The steady-state resource cost is dominated by the network hop from Kafka Connect to the Kafka brokers, not by the connectors themselves.
Streaming
Topic — streaming
Streaming connector and change-log problems
3. Snapshot modes + incremental snapshotting
Six snapshot modes and one signal table — the difference between a 6-hour outage and a live table addition
The mental model in one line: Debezium's snapshot phase reads the current state of the source tables via SELECT (once, per mode), and the streaming phase then tails the log from the snapshot's boundary offset; snapshot.mode picks whether the snapshot runs on first startup only (initial), on every startup (always), never (never), only when the offset is missing (when_needed), or only for the schema (schema_only) — and 2.x adds incremental snapshotting that runs chunk-by-chunk while streaming continues live, driven by inserts into a debezium_signal control table.
The four axes interviewers actually probe for snapshot questions.
- Mode selection. Which of the six modes for which scenario — cold start on a new table, restart after a lost offset, adding a table to a live connector, capturing only the schema without the data.
- Snapshot-vs-streaming boundary. The connector runs the snapshot inside a repeatable-read transaction; the boundary offset is the LSN/GTID captured at snapshot start. Streaming then resumes from that boundary. Understanding this handoff is the senior signal.
- Incremental snapshot. The signal-table mechanism — how it works, how you trigger a chunk, how it interleaves with live streaming.
-
Snapshot locking. Whether the snapshot takes source-side locks (
extended), a brief flush lock (minimal), or no lock (none). Each has correctness implications.
The six snapshot modes.
-
initial. Default. Runs a full snapshot on first startup (no committed offset); on subsequent startups the snapshot is skipped and streaming resumes from the committed offset. -
always. Runs a full snapshot on every startup. Rarely wanted; useful for test harnesses. -
never. Skips the snapshot entirely. Streaming starts from wherever the log currently is. Data written before the connector started is not captured. Used when the downstream sink already has the initial state loaded via a different mechanism. -
initial_only. Runs the snapshot and then stops. Used for one-shot backfills where you don't want ongoing streaming. -
when_needed. Runs a snapshot only when there is no committed offset and the log is not available from the required position. Used on failover topologies where offsets may be lost. -
schema_only. Captures the schema of the tables (populates the schema-history topic) but does not read the data. Streaming starts from the current log position. Used when the downstream sink already has the data loaded and only needs the schema for future decoding.
Incremental snapshotting — the 2.x game-changer.
- The problem it solves. Traditional snapshots block streaming for the entire snapshot duration (or race with it at the risk of missed events). For a 10 TB table, that's hours of blocked CDC.
-
The mechanism. The connector splits the table into chunks by primary key (or a configurable key). It reads each chunk with a bounded SELECT (
WHERE key BETWEEN low AND high) and interleaves the chunk reads with continued live streaming. Every chunk is a separate small transaction. -
The trigger. Inserting a row into the
debezium_signaltable with a JSON payload ({"type":"execute-snapshot","data":{"data-collections":["public.orders"]}}) triggers the connector to start an incremental snapshot onpublic.orders. The signal is picked up by the streaming phase and interpreted as a snapshot request. - Interleaving semantics. For any row X, its snapshot event and any streaming events (INSERT/UPDATE/DELETE while snapshot runs) are ordered by the source's log-order rules. The downstream sink sees a consistent view — no lost events, no duplicates beyond the standard at-least-once floor.
The signal table — how it works.
-
Table shape.
CREATE TABLE public.debezium_signal (id VARCHAR PRIMARY KEY, type VARCHAR NOT NULL, data VARCHAR);— three columns. -
Config.
signal.data.collection = public.debezium_signalin the connector config. -
Command.
INSERT INTO public.debezium_signal (id, type, data) VALUES ('adhoc-1', 'execute-snapshot', '{"data-collections": ["public.orders"], "type": "incremental"}');— one row per snapshot command. - Read path. The connector sees the INSERT event in the WAL/binlog stream, parses the JSON, and executes the requested action. No connector restart required.
-
Additional signals.
execute-snapshot(incremental),stop-snapshot(halt an in-progress incremental snapshot),log(write a log message — useful for debugging),pauseandresume(2.x additions).
Snapshot locking modes — what each does.
-
extended(Postgres) /minimal_percona(MySQL Percona). Holds a lock for the entire snapshot. Safest; blocks writes; only viable for maintenance windows. -
minimal. Brief lock at start to capture the boundary offset, then releases. The snapshot then runs in a repeatable-read transaction. Standard for MySQL. -
none. No lock at all. Fastest; risks a missed event if the boundary offset moves while the snapshot runs. Debezium 2.x mitigates this using the stalled-updates re-check mechanism, but the correctness proof is subtle.
Common interview probes on snapshots.
- "Explain the difference between
initialandwhen_needed." —initialsnapshots on first startup only;when_neededsnapshots when the offset is missing and the log position is unavailable. - "What is the signal table for?" — trigger channel for incremental snapshots and other runtime commands.
- "Walk me through adding a table to a live connector without downtime." — add the table to
table.include.list, restart the connector so the config change takes effect, then insert anexecute-snapshotsignal for the new table. The connector runs an incremental snapshot for the new table while continuing to stream the existing ones. - "What breaks if I set
snapshot.mode = neveron a new connector?" — the snapshot phase is skipped; rows that existed before the connector started are never captured. Only useful when the downstream sink has been loaded via a separate mechanism.
Worked example — choosing a snapshot mode
Detailed explanation. A team has four different Debezium deployments across four different scenarios. Each scenario picks a different snapshot mode. Walk through the choice for each.
- Scenario 1. Brand-new pipeline; downstream sink is empty; must capture full initial state.
- Scenario 2. Existing pipeline restarting after a Kafka Connect cluster upgrade; the offset topic survived the upgrade.
- Scenario 3. Migration from a legacy CDC system; downstream sink is already loaded with current state from the legacy system; only new changes need to flow.
- Scenario 4. Test environment where every connector startup should re-snapshot to get a deterministic starting state.
Question. Pick the snapshot mode for each scenario and justify.
Input.
| Scenario | Sink loaded? | Offset topic present? | Downtime OK? |
|---|---|---|---|
| 1. Brand new | no | no | no |
| 2. Connect upgrade restart | yes | yes | no |
| 3. Migration from legacy CDC | yes | no | no |
| 4. Test env | (any) | (any) | yes |
Code.
{
"scenario_1_brand_new": {
"snapshot.mode": "initial"
},
"scenario_2_restart": {
"snapshot.mode": "initial",
"_comment": "Same as scenario 1; snapshot is skipped on restart because the offset topic is present"
},
"scenario_3_migration": {
"snapshot.mode": "never",
"_comment": "Skip snapshot; streaming picks up from current log position"
},
"scenario_4_test_env": {
"snapshot.mode": "always",
"_comment": "Re-snapshot every startup for deterministic tests"
}
}
Step-by-step explanation.
-
Scenario 1 —
initial. The connector runs a full snapshot on first startup (no committed offset), then transitions to streaming. The sink gets a complete initial state followed by every subsequent change. -
Scenario 2 — still
initial. The mode is unchanged; the snapshot is only skipped because the offset topic already has a committed position. On a restart after a Connect upgrade, the connector reads its last committed offset and resumes streaming from that point. -
Scenario 3 —
never. The sink is already loaded from the legacy CDC system as of some point in time. The connector starts streaming from the current log position; the app must ensure the legacy CDC has been drained past that point so no changes are lost. This is the "cutover" pattern from legacy CDC. -
Scenario 4 —
always. Every startup re-snapshots. Wasteful in production but perfect for test environments where you want the connector to start from a known state on each run. - The lesson:
snapshot.modeis not "always snapshot on start" — it's "when to snapshot." Most production deployments useinitial; migration cutovers usenever; test harnesses usealways.when_neededis the specialised failover-safe mode when the offset topic may be lost.
Output.
| Scenario | Mode | What runs on first startup | What runs on restart |
|---|---|---|---|
| Brand new | initial | full snapshot | streaming from committed offset |
| Restart | initial | streaming from committed offset | streaming from committed offset |
| Migration | never | streaming from current log | streaming from committed offset |
| Test env | always | full snapshot | full snapshot |
Rule of thumb. initial is the correct default for 90% of deployments. Reach for never only when the sink is pre-loaded via a separate mechanism; reach for when_needed only when your offset topic may be lost due to failover.
Worked example — adding a table to a live connector via incremental snapshot
Detailed explanation. The killer feature of 2.x: adding a new table to a running connector without stopping the CDC stream for the existing tables. Without incremental snapshotting, adding a table requires either a full connector restart with snapshot.mode = initial (which re-snapshots every table, hours of work) or a bespoke bootstrapping process. Incremental snapshot fixes this — a single signal-table INSERT triggers a chunk-by-chunk snapshot of the new table while streaming continues live for the existing tables.
-
Steps. (1) Update
table.include.listto add the new table. (2) Restart the connector (config change requires restart). (3) INSERT intodebezium_signalto trigger the incremental snapshot. (4) Wait for chunks to finish. - Semantics. The new table's snapshot events land on a Kafka topic; any live changes to the new table land on the same topic in commit order.
- Chunks. By default, the snapshot uses primary-key range chunks with a chunk size of 1024 rows.
Question. A live connector captures public.orders and public.line_items. Add public.payments to the same connector using incremental snapshot, without pausing the existing streams.
Input.
| Component | Value |
|---|---|
| Existing tables | public.orders, public.line_items |
| New table | public.payments (1M rows) |
| Chunk size | 1024 rows (default) |
| Signal table | public.debezium_signal |
Code.
-- 1. Ensure the signal table exists on the source
CREATE TABLE IF NOT EXISTS public.debezium_signal (
id VARCHAR PRIMARY KEY,
type VARCHAR NOT NULL,
data VARCHAR
);
GRANT SELECT, INSERT ON public.debezium_signal TO debezium;
-- 2. Add the new table to the publication
ALTER PUBLICATION dbz_publication ADD TABLE public.payments;
-- 3. Update the connector config to include the new table (via Connect REST)
-- PUT /connectors/shop-postgres-cdc/config
-- { "table.include.list": "public.orders,public.line_items,public.payments", ... }
-- 4. Trigger the incremental snapshot for the new table
INSERT INTO public.debezium_signal (id, type, data)
VALUES ('add-payments-2026-07-04',
'execute-snapshot',
'{"type":"incremental","data-collections":["public.payments"]}');
{
"name": "shop-postgres-cdc",
"config": {
"table.include.list": "public.orders,public.line_items,public.payments",
"signal.data.collection": "public.debezium_signal",
"incremental.snapshot.chunk.size": "1024",
"incremental.snapshot.watermarking.strategy": "insert_delete"
}
}
Step-by-step explanation.
- Ensure the signal table exists and is populated in
signal.data.collection. If the signal table was set up at connector-creation time (which is the recommended pattern), skip this step. - Add the new table to the Postgres publication (
ALTER PUBLICATION dbz_publication ADD TABLE public.payments;). Without this, the WAL forpaymentsis not decoded and the connector cannot stream its changes. - Update the connector's
table.include.listvia the Kafka Connect REST API (PUT /connectors/<name>/config). This triggers a task rebalance but keeps the existing connector position — no re-snapshot of existing tables. - Insert the signal into
public.debezium_signal. The INSERT flows through the WAL like any other event; the connector recognises the row (becausesignal.data.collectionmatches), parses the JSON payload, and dispatches an incremental snapshot forpublic.payments. - The incremental snapshot runs in chunks. The connector reads
SELECT * FROM public.payments WHERE id BETWEEN 0 AND 1023 ORDER BY id, emits those 1024 rows to Kafka as snapshot events, then reads the next chunk. Between chunks, live streaming events forpublic.payments(and every other table) continue to flow. The default chunk size is 1024 rows; increase for larger tables (up to 10000 rows per chunk is common).
Output.
| Timeline | Existing tables (orders, line_items) | New table (payments) |
|---|---|---|
| T+0 (signal INSERT) | live streaming | none |
| T+30 s (chunk 1) | live streaming | snapshot rows 0–1023 |
| T+60 s (chunk 2) | live streaming | snapshot rows 1024–2047 |
| ... | ... | ... |
| T+15 min (last chunk) | live streaming | snapshot rows 999k–1M |
| T+15 min onward | live streaming | live streaming |
Rule of thumb. For any live CDC pipeline where you need to add tables without downtime, incremental snapshotting is the answer. It replaces the entire "pause everything, re-snapshot everything" ordeal with a single INSERT into the signal table.
Worked example — snapshot locking trade-offs
Detailed explanation. Snapshot locking modes are the classic "correctness vs availability" trade-off. extended locks the source for the entire snapshot (correctness-first, blocks writes); minimal locks briefly to capture the boundary offset then releases (standard); none skips the lock (fastest, correctness relies on repeatable-read semantics and Debezium's built-in re-check logic). The senior candidate is expected to know when each is appropriate.
- extended. Only viable during maintenance windows or on read-mostly sources.
-
minimal. The MySQL default; captures the binlog position under a brief
FLUSH TABLES WITH READ LOCK, then releases and runs the snapshot in a separate transaction. - none. The most common production choice; correctness relies on the connector's snapshot transaction seeing a consistent view (Postgres REPEATABLE READ, MySQL InnoDB consistent read).
Question. A team is running Debezium against a Postgres 16 source with 5 TB of data. Discuss the three snapshot locking modes, quantify the write-availability impact of each, and recommend the mode for a production deployment.
Input.
| Snapshot mode | Lock | Write availability during snapshot | Correctness guarantee |
|---|---|---|---|
| extended | Full table lock | none | strictest |
| minimal | Brief lock at start | full after ~1 s | strong |
| none | No lock | full throughout | strong (via repeatable read) |
Code.
{
"extended_mode": {
"snapshot.locking.mode": "extended",
"_comment": "Postgres: uses ACCESS EXCLUSIVE for the snapshot; blocks writers"
},
"minimal_mode": {
"snapshot.locking.mode": "minimal",
"_comment": "Brief FLUSH TABLES WITH READ LOCK, then release"
},
"none_mode": {
"snapshot.locking.mode": "none",
"_comment": "No lock; relies on REPEATABLE READ semantics of the snapshot transaction"
}
}
-- Postgres — snapshot in a REPEATABLE READ transaction (the "none" mode)
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ ONLY;
SELECT pg_export_snapshot();
-- (connector uses the exported snapshot handle; every SELECT sees the same consistent view)
SELECT * FROM public.orders;
SELECT * FROM public.line_items;
COMMIT;
Step-by-step explanation.
-
extended. The connector acquires an
ACCESS EXCLUSIVElock on each table for the snapshot's duration. On a 5 TB source, the snapshot may run for hours — during which time no writes are possible. Only viable during maintenance windows. - minimal. The connector briefly locks (for a few milliseconds) to capture the boundary offset (LSN on Postgres, binlog position on MySQL). Then it releases the lock and runs the snapshot in a separate REPEATABLE READ transaction. Writers see full availability after the ~1 second lock; the snapshot sees a consistent view via MVCC.
- none. No source-side lock. The connector opens a REPEATABLE READ transaction, records the current LSN as the boundary, and runs the snapshot inside that transaction. Correctness relies on MVCC: every SELECT inside the transaction sees the same consistent view. Writers on other transactions are unaffected.
- The catch with
none: if the snapshot transaction runs longer than the source'sidle_in_transaction_session_timeout(Postgres) or hits a lock conflict on DDL (ALTER TABLE), the snapshot fails and must restart. In practice, this is rare on a well-tuned source. - The recommendation for a 5 TB Postgres production deployment:
none. Postgres MVCC gives you a consistent view without holding any locks.minimalis a legacy MySQL default;extendedis only for maintenance windows.
Output.
| Locking mode | Write availability | Snapshot duration (5 TB) | Correctness | Best for |
|---|---|---|---|---|
| extended | 0 | hours | strictest | maintenance window |
| minimal | ~99% | hours | strong | MySQL default |
| none | 100% | hours | strong (via MVCC) | production Postgres |
Rule of thumb. For Postgres, always use snapshot.locking.mode = none. MVCC gives you the consistency guarantee without the availability cost. For MySQL, minimal is the safe default; none is fine on well-tuned deployments; extended is a scheduled-downtime mode.
Senior interview question on snapshot design
A senior interviewer might ask: "You have a live Debezium Postgres connector capturing 20 tables. The team wants to (a) add three new tables to the pipeline, (b) re-snapshot one existing table because a downstream sink lost its data, and (c) never accept a maintenance window for CDC. Walk me through the design end-to-end."
Solution Using incremental snapshotting via signal-table dispatch
-- 1. Ensure the signal table exists (idempotent)
CREATE TABLE IF NOT EXISTS public.debezium_signal (
id VARCHAR PRIMARY KEY,
type VARCHAR NOT NULL,
data VARCHAR
);
GRANT SELECT, INSERT ON public.debezium_signal TO debezium;
-- 2. Add the new tables to the publication
ALTER PUBLICATION dbz_publication ADD TABLE public.shipments;
ALTER PUBLICATION dbz_publication ADD TABLE public.refunds;
ALTER PUBLICATION dbz_publication ADD TABLE public.audit_log;
-- 3. Update the connector's table.include.list via Connect REST (config change)
-- PUT /connectors/shop-postgres-cdc/config
-- { "table.include.list": "public.orders,...,public.shipments,public.refunds,public.audit_log", ... }
-- 4. Trigger the incremental snapshot for the three new tables + the re-snapshot of orders
INSERT INTO public.debezium_signal (id, type, data) VALUES
('snapshot-shipments', 'execute-snapshot', '{"type":"incremental","data-collections":["public.shipments"]}'),
('snapshot-refunds', 'execute-snapshot', '{"type":"incremental","data-collections":["public.refunds"]}'),
('snapshot-audit-log', 'execute-snapshot', '{"type":"incremental","data-collections":["public.audit_log"]}'),
('resnapshot-orders', 'execute-snapshot', '{"type":"incremental","data-collections":["public.orders"]}');
{
"name": "shop-postgres-cdc",
"config": {
"table.include.list": "public.orders,public.line_items,public.payments,public.shipments,public.refunds,public.audit_log,...",
"signal.data.collection": "public.debezium_signal",
"snapshot.mode": "initial",
"snapshot.locking.mode": "none",
"incremental.snapshot.chunk.size": "4096",
"incremental.snapshot.watermarking.strategy": "insert_delete",
"notification.enabled.channels": "sink"
}
}
Step-by-step trace.
| Step | Command | Duration | Impact on existing streams |
|---|---|---|---|
| 1 | CREATE signal table | < 1 s | none |
| 2 | ALTER PUBLICATION ADD TABLE × 3 | < 1 s | none |
| 3 | PUT /connectors/.../config | ~10 s task rebalance | brief pause, no data loss |
| 4a | INSERT signal for shipments | starts chunked snapshot | streaming continues |
| 4b | INSERT signal for refunds | starts chunked snapshot | streaming continues |
| 4c | INSERT signal for audit_log | starts chunked snapshot | streaming continues |
| 4d | INSERT signal for orders resnapshot | starts chunked resnapshot | streaming continues |
After the signal INSERTs, the connector runs four independent incremental snapshots (three new tables + one re-snapshot) in parallel with the existing live streams. No maintenance window; no downtime for the other 19 tables.
Output:
| Deliverable | Duration | Downtime |
|---|---|---|
| Publication membership updated | < 1 s | none |
| Config change applied | ~10 s | brief rebalance |
| Three new tables backfilled | ~30 min each (10M rows) | 0 s |
| One existing table re-snapshotted | ~15 min (5M rows) | 0 s |
| Total elapsed | ~30 min (parallel) | none |
Why this works — concept by concept:
-
Signal-table dispatch — the signal INSERT flows through the WAL like any other event; the connector recognises it and dispatches an incremental snapshot without restarting. This turns "add a table" into a
INSERT INTO ... VALUESstatement. -
Chunked snapshot — the snapshot reads the source table in bounded chunks (
WHERE key BETWEEN low AND high). Each chunk is a small transaction; the source is never under a table-scoped lock for more than a few seconds. -
Interleaved with streaming — while the snapshot chunks run, the connector continues to consume WAL and emit live change events. For any given row, the connector uses a watermarking strategy (
insert_deleteis the modern default) to disambiguate the snapshot copy from any concurrent live update. - Parallel snapshots — multiple signal INSERTs queue up multiple concurrent incremental snapshots. Modern Debezium schedules them fairly; the effective throughput scales with the connector's SELECT capacity against the source.
- Cost — the source database sees a modest sustained SELECT load during the snapshot (throttled by chunk size). No blocking, no maintenance window, no re-snapshotting of unrelated tables. The operational cost is O(N × table_size) for N added tables; the value is priceless because it fits into normal working hours.
Streaming
Topic — streaming
Streaming snapshot and backfill problems
4. Schema history + schema evolution
The compacted DDL replay topic that keeps the connector honest across every ALTER TABLE
The mental model in one line: Debezium persists every observed DDL statement (CREATE TABLE, ALTER TABLE, DROP TABLE) to a dedicated schema.history.internal.kafka.topic — a compacted, single-partition, retention-forever topic that is replayed on every connector restart so the connector can decode WAL/binlog records that were written under an older schema, and the Schema Registry integration (Confluent or Apicurio) then propagates those schema versions to downstream consumers with compatibility rules that catch bad DDL before it breaks the pipeline.
The four axes interviewers actually probe for schema questions.
-
Schema-history topic. Where does the connector store the observed DDL? Answer: a dedicated Kafka topic (name via
schema.history.internal.kafka.topic). Compacted, single-partition, retention forever. Never delete this topic. - DDL replay. On restart, the connector replays every DDL statement from the schema-history topic in order, rebuilding its internal representation of every table's schema. Only then can it decode WAL/binlog records for those tables.
- Schema Registry integration. The per-event schema (the Avro or JSON schema for each Kafka topic) lives in a Schema Registry — Confluent or Apicurio. Debezium registers the schema at connector startup and on every schema change; consumers fetch the schema by subject name.
-
Subject naming strategies.
TopicNameStrategy(default; subject =topic-keyortopic-value),RecordNameStrategy(subject = record's Avro name),TopicRecordNameStrategy(subject = topic + record name). Each has different compatibility semantics.
The schema-history topic — properties.
-
Compacted.
cleanup.policy=compact. Only the latest state (or, more precisely, the compacted key history) is retained. Older records are compacted away by the log-cleaner. - Single-partition. DDL events must be totally ordered. Multiple partitions would allow the compactor to re-order or lose events; single-partition guarantees the DDL replay order.
-
Retention forever.
retention.ms=-1. Never delete records; the topic is the schema truth for the connector's lifetime. -
Not user-facing. The topic is internal —
__debezium-schema-historyorschema-history.<prefix>naming. Downstream consumers do not read it; they read the per-table CDC topics.
Schema Registry integration.
-
Converters.
key.converter = io.confluent.connect.avro.AvroConverter+key.converter.schema.registry.url = http://schema-registry:8081. Same for the value converter. -
Registration. At connector startup (post-snapshot), Debezium registers the Avro schema for each Kafka topic with the Schema Registry. On every schema change (an
ALTER TABLEDDL), a new schema version is registered. -
Compatibility rules. The Registry enforces the configured compatibility level —
BACKWARD(readers with old schemas can read new writes),FORWARD(readers with new schemas can read old writes),FULL(both),NONE(any change accepted).BACKWARDis the standard for CDC. - Enforcement. A DDL that violates the compatibility level is rejected at registration time — the connector fails to write the event, giving the operator a chance to fix the schema before data flows through it.
Subject naming strategies — the three options.
-
TopicNameStrategy(default). Subject =<topic>-keyand<topic>-value. One subject per Kafka topic. Simple; the most common choice. -
RecordNameStrategy. Subject = the record's Avro fully-qualified name. One subject per record type; multiple topics can share a subject (useful when several tables have the same envelope). -
TopicRecordNameStrategy. Subject =<topic>-<record-name>. Hybrid — per-topic isolation but with a record-name discriminator. Useful when a single topic carries events of multiple record types.
Schema evolution rules — the practical map.
- Add a nullable column. Backward-compatible; consumers with the old schema read new writes with the new column as null. Safe.
- Drop a column. Not backward-compatible in general; consumers with the old schema expecting the column will fail. Requires forward compatibility mode or a deprecation window.
- Rename a column. Not compatible; treat as drop + add. In practice, requires either a new topic or a downstream translation step.
-
Change a column's type. Depends on the types.
INT → BIGINTis often compatible;INT → VARCHARis not. - Add a table. Adds a new topic; no impact on existing topics or subjects. Safe.
- Drop a table. The topic remains; the schema-history topic records the DROP. Downstream consumers that subscribed to the dropped topic see no new events; no schema-registry impact on other topics.
Common interview probes on schema history.
- "What is the schema-history topic for?" — persistent DDL log the connector replays on restart to decode old WAL records.
- "Why is it single-partition?" — DDL order must be preserved; multi-partition breaks total order.
- "Why compacted?" — the cleanup policy compacts by key so the topic doesn't grow without bound; the compaction preserves the latest state per schema key.
- "What happens if the schema-history topic is deleted?" — the connector cannot decode WAL records for existing tables; it must be recreated from a full re-snapshot with
snapshot.mode = initial. - "What compatibility level do you set on the Schema Registry?" —
BACKWARDfor CDC (old consumers must read new writes).
Worked example — schema-history topic setup and monitoring
Detailed explanation. The schema-history topic is one of the most operationally sensitive resources in a Debezium deployment. Getting the topic settings right (single-partition, compacted, retention forever, low but non-zero segment size) is critical; monitoring for its health is mandatory. Walk through the topic-creation command, the ideal properties, and the two Kafka metrics an on-call should watch.
-
Topic properties.
partitions=1,cleanup.policy=compact,retention.ms=-1,min.compaction.lag.ms=60000(compact aggressively). -
Segment size. Small segments speed up compaction.
segment.bytes=1048576(1 MB) is a good default. - Monitoring. Watch (a) topic offset (must advance on every DDL) and (b) log-compaction time (should be < 5 minutes).
Question. A team is deploying Debezium for the first time. Create the schema-history topic with the right properties and document the monitoring runbook.
Input.
| Setting | Value |
|---|---|
| Topic name | schema-history.shop |
| Partitions | 1 |
| Replication factor | 3 |
| cleanup.policy | compact |
| retention.ms | -1 (forever) |
| segment.bytes | 1 MB |
Code.
# Create the schema-history topic with the right properties
kafka-topics --bootstrap-server kafka:9092 \
--create \
--topic schema-history.shop \
--partitions 1 \
--replication-factor 3 \
--config cleanup.policy=compact \
--config retention.ms=-1 \
--config min.compaction.lag.ms=60000 \
--config segment.bytes=1048576 \
--config max.compaction.lag.ms=86400000
# Verify
kafka-topics --bootstrap-server kafka:9092 --describe --topic schema-history.shop
# Sample output
# Topic: schema-history.shop
# PartitionCount: 1
# ReplicationFactor: 3
# Configs: cleanup.policy=compact, retention.ms=-1, segment.bytes=1048576, ...
{
"connector_config_schema_history_section": {
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.shop",
"schema.history.internal.kafka.recovery.poll.interval.ms": "100",
"schema.history.internal.kafka.recovery.attempts": "100"
}
}
Step-by-step explanation.
-
partitions=1is mandatory. DDL events must be totally ordered — the connector replays them in offset order on restart. Multiple partitions would let compaction and delivery re-order events, breaking replay. -
cleanup.policy=compact+retention.ms=-1keeps the topic growing bounded by the number of unique schema-history keys (roughly one per DDL statement). Compaction removes older versions of the same key; retention-forever means no time-based deletion. -
min.compaction.lag.ms=60000prevents the compactor from touching records less than a minute old. This gives the connector time to consume the schema history in-order without racing with the compactor. -
replication.factor=3is standard Kafka production practice — durability against a single broker loss. For the schema-history topic, losing durability is catastrophic (connector cannot decode WAL), so replication is non-negotiable. - Monitoring: Kafka's
kafka.log:type=LogCleaner,name=cleaner-recopy-percentmetric shows compaction efficiency. Alert on the topic'sLogEndOffsetfailing to advance across a DDL statement (delivered from the connector's DDL-emitter perspective). Prometheus + JMX exporter is the standard monitoring path.
Output.
| Property | Value | Why |
|---|---|---|
| partitions | 1 | Total DDL order required |
| replication.factor | 3 | Losing durability = losing decoding capability |
| cleanup.policy | compact | Keep only latest per key |
| retention.ms | -1 | Never delete DDL history |
| segment.bytes | 1 MB | Fast compaction on small segments |
| min.compaction.lag.ms | 60000 | Don't race with connector's replay |
Rule of thumb. Never delete the schema-history topic. Never change its partition count. Never set retention.ms to a finite value. If you get any of these three wrong, the next connector restart will fail to decode WAL records and require a full re-snapshot.
Worked example — schema evolution via Schema Registry
Detailed explanation. A source table gets a new column via ALTER TABLE. Debezium observes the DDL, updates its internal schema, and registers the new Avro schema version with the Registry. The Registry enforces the configured compatibility level; if the new schema is backward-compatible with the previous version, downstream consumers keep working with no code change. Walk through the exact sequence from ALTER TABLE to a downstream consumer seeing the new column.
-
The DDL.
ALTER TABLE public.orders ADD COLUMN notes TEXT. -
The Debezium behaviour. Sees the DDL in the WAL; updates its internal schema for
public.orders; registers the new Avro schema version. - The Registry check. Is the new schema backward-compatible with the old? Yes (nullable new field). Registration succeeds.
-
The downstream impact. New events include the
notesfield; old consumers still work (they simply ignore the new field or default it to null).
Question. A team adds a nullable column to a source table. Trace what happens end-to-end — DDL, Debezium, Registry, consumer — and identify the one config knob that determines whether the change is accepted.
Input.
| Component | Setting |
|---|---|
| DDL | ALTER TABLE public.orders ADD COLUMN notes TEXT |
| Registry compatibility | BACKWARD |
| Downstream consumer | Snowflake sink connector (Avro reader) |
Code.
-- Source-side DDL
ALTER TABLE public.orders ADD COLUMN notes TEXT;
-- Debezium sees the DDL via the WAL and updates its internal schema
-- (visible in the schema-history topic as a new record)
{
"old_avro_schema": {
"type": "record",
"name": "Value",
"namespace": "shop.public.orders",
"fields": [
{"name":"id","type":"long"},
{"name":"tenant_id","type":"int"},
{"name":"total_cents","type":"int"},
{"name":"status","type":"string"}
]
},
"new_avro_schema": {
"type": "record",
"name": "Value",
"namespace": "shop.public.orders",
"fields": [
{"name":"id","type":"long"},
{"name":"tenant_id","type":"int"},
{"name":"total_cents","type":"int"},
{"name":"status","type":"string"},
{"name":"notes","type":["null","string"],"default":null}
]
}
}
# Verify the new schema was accepted by the Registry
curl -s http://schema-registry:8081/subjects/shop.public.orders-value/versions/latest | jq
# Check the compatibility level
curl -s http://schema-registry:8081/config/shop.public.orders-value | jq
# {"compatibilityLevel":"BACKWARD"}
Step-by-step explanation.
- The source-side
ALTER TABLEruns. Postgres emits a WAL record capturing the DDL. Debezium consumes the WAL record, recognises it as a schema change, and updates its internal table schema. - Debezium appends the DDL to the schema-history topic. On any future restart, the connector will replay this DDL from the schema-history topic before consuming WAL records that were written under the new schema.
- Debezium constructs the new Avro schema. The added column becomes an Avro field with
type = ["null", "string"]anddefault = null— the canonical way to express a nullable field. This is the shape required for BACKWARD compatibility. - Debezium registers the new schema with the Schema Registry under the subject
shop.public.orders-value. The Registry checks BACKWARD compatibility: can a consumer reading with the old schema still read data written under the new schema? Yes — the old consumer will simply ignore the new field. Registration succeeds; the new schema gets a version number. - Downstream consumers using the new schema see the
notescolumn populated; consumers still using the old schema keep working, ignoring the new field. The migration is truly zero-downtime for the consumer fleet.
Output.
| Step | Actor | Action | Time |
|---|---|---|---|
| 1 | Source | ALTER TABLE emits DDL WAL record | 0 s |
| 2 | Debezium | Consumes DDL, updates internal schema | ~50 ms |
| 3 | Debezium | Writes DDL to schema-history topic | ~10 ms |
| 4 | Debezium | Registers new Avro schema with Registry | ~50 ms |
| 5 | Registry | Verifies BACKWARD compatibility, accepts | ~10 ms |
| 6 | Downstream | Reads new events with new schema; old consumers unaffected | ongoing |
Rule of thumb. For safe schema evolution, always add nullable columns (never non-nullable), always drop columns via a deprecation window (never a hard drop), and always set the Schema Registry compatibility level to BACKWARD. These three rules together prevent 95% of schema-related pipeline breakage.
Worked example — connector restart with schema-history replay
Detailed explanation. The moment of truth for the schema-history topic: connector restart. The connector reads every DDL statement from the schema-history topic (in offset order), reconstructs its internal schema for every tracked table, and only then begins consuming WAL records. If the schema-history topic is missing, corrupted, or (accidentally) partitioned across multiple partitions, this replay fails and the connector cannot decode WAL records.
- The replay. On startup, the connector consumes the schema-history topic from offset 0.
- The reconstruction. Each DDL statement mutates an internal per-table schema. After replay, the connector has the "current" schema for every table.
- The WAL decoding. The connector then reads WAL records from its last committed offset. Each record references a table by name; the internal schema tells it how to decode column values.
Question. A connector restart is failing with "Schema for table orders is not known." Diagnose the possible causes and walk through the recovery path.
Input.
| Symptom | "Schema for table orders is not known" on WAL decoding |
|---|---|
| Connector | shop-postgres-cdc |
| Topic | schema-history.shop |
| Uptime before crash | 6 months |
Code.
# 1. Verify the schema-history topic still exists
kafka-topics --bootstrap-server kafka:9092 --describe --topic schema-history.shop
# If missing → catastrophic; must full re-snapshot
# 2. Verify partition count is 1
# If > 1 → the topic was recreated with wrong config; must full re-snapshot
# 3. Read the schema-history topic to check DDL is there
kafka-console-consumer --bootstrap-server kafka:9092 \
--topic schema-history.shop \
--from-beginning \
--max-messages 100
# 4. If DDL is missing, check whether the source publication was rebuilt without capturing the ALTER history
psql -U postgres -c "SELECT pubname, tables FROM pg_publication_tables WHERE tablename = 'orders';"
{
"recovery_paths": {
"case_1_topic_missing": {
"cause": "Topic deleted (retention misconfig, accidental drop)",
"recovery": "Reset connector offsets; snapshot.mode=initial; re-snapshot everything"
},
"case_2_topic_partitioned": {
"cause": "Topic recreated with partitions > 1 or wrong cleanup.policy",
"recovery": "Delete + recreate with partitions=1 cleanup.policy=compact; then full re-snapshot"
},
"case_3_ddl_missing": {
"cause": "Source publication rebuilt; connector never saw the ALTER",
"recovery": "Manually inject the missing DDL into the schema-history topic (advanced) OR full re-snapshot"
},
"case_4_corrupted_topic": {
"cause": "Kafka disk corruption or replication factor 1 with a broker loss",
"recovery": "Restore from Kafka backup OR full re-snapshot"
}
}
}
Step-by-step explanation.
- Verify the topic exists (
kafka-topics --describe). If missing, the topic was deleted (accidentally by ops or by a misconfigured retention). No path forward except full re-snapshot withsnapshot.mode = initial. - Verify partition count = 1 and cleanup.policy = compact. If either is wrong, the topic was recreated with wrong config. Delete + recreate with correct settings + full re-snapshot.
- Read the topic. If it exists and is properly configured but does not contain the DDL for
orders, the source publication was likely recreated without the connector observing the ALTER (a common bug when someone runsDROP PUBLICATIONand re-creates it during maintenance). Recovery: either manually inject the DDL into schema-history (an advanced move requiring Debezium internals knowledge) or full re-snapshot. - The "manual DDL inject" recovery: use the Debezium admin API to push a synthetic DDL statement into the schema-history topic. This is documented for the SQL Server and MySQL connectors; for Postgres, the safer path is to re-snapshot.
- Prevention: never delete or drop-and-recreate the source publication without pausing the connector. If you must recreate the publication, do it during a maintenance window where a full re-snapshot is acceptable.
Output.
| Cause | Diagnosis command | Recovery | Downtime |
|---|---|---|---|
| Topic missing | kafka-topics --describe |
full re-snapshot | hours |
| Wrong partition count | kafka-topics --describe |
recreate + full re-snapshot | hours |
| DDL missing | kafka-console-consumer |
inject DDL or re-snapshot | minutes / hours |
| Corrupted topic | Kafka broker logs | restore from backup or re-snapshot | hours |
Rule of thumb. Every Debezium runbook must include "verify schema-history topic integrity" as the first diagnostic step for any decoding-related error. And every operations playbook must forbid manual topic deletion in the region where schema-history topics live.
Senior interview question on schema evolution
A senior interviewer might ask: "A source team is planning to drop three columns and rename two others across five tables over the next quarter. Walk me through how you'd coordinate this with the CDC pipeline — what Schema Registry compatibility level, what DDL sequencing, what downstream contract changes, and what would you never do."
Solution Using a phased column-migration playbook with BACKWARD compatibility
Column-migration playbook — 5 tables, 3 drops, 2 renames
=========================================================
Preconditions (must be true before any DDL runs)
- Schema Registry compatibility level = BACKWARD on every affected subject
- Every downstream consumer has been audited for use of the columns
- Owners of each downstream consumer have been notified with sunset dates
Phase 1 — mark columns deprecated (week 1)
- Add a comment on each column: COMMENT ON COLUMN orders.legacy_col IS 'DEPRECATED - remove 2026-09-01';
- Emit a dashboard notice to consumer teams
- No schema change; no Debezium impact
Phase 2 — stop writing (weeks 2–4)
- Push app-side code changes: stop writing to the deprecated columns
- Verify writes = 0 via monitoring
- Column still exists; still emitted in every Debezium event; consumers may still read
Phase 3 — mark as null in Debezium (week 5)
- Add an SMT filter that nulls out the deprecated columns before emit
- Verify downstream consumers still working
Phase 4 — drop the columns (week 6, maintenance window)
- ALTER TABLE orders DROP COLUMN legacy_col;
- Debezium observes the DDL; new schema version registered
- Registry check: BACKWARD compatibility means old consumers reading with the old schema will fail on new writes → move to FORWARD if any consumer still uses the old schema, else BACKWARD is fine
- In practice, use FORWARD_TRANSITIVE for a two-week grace window, then move back to BACKWARD
Phase 5 — rename columns (weeks 7–8)
- Renames are impossible under BACKWARD or FULL compatibility (fields are matched by name)
- Solution: add the new column, dual-write app-side, migrate consumers to read new column, drop the old column
- Never do a bare RENAME COLUMN through the pipeline
Never
- Never do a hard column drop without the deprecation window
- Never do a bare RENAME COLUMN with the CDC pipeline live
- Never lower compatibility to NONE just to unstick a migration
Step-by-step trace.
| Week | Phase | DDL | Registry impact | Consumer impact |
|---|---|---|---|---|
| 1 | Deprecation notice | COMMENT ON COLUMN | none | notice only |
| 2–4 | Stop writes | none | none | none |
| 5 | SMT null-out | none (SMT config) | new schema version | consumers see nulls |
| 6 | DROP COLUMN | ALTER TABLE DROP | new schema; grace on FORWARD | new consumers only |
| 7–8 | Rename via add + drop | ALTER TABLE ADD then DROP | two new schema versions | dual-window |
After the six-week playbook, the columns are dropped and the renames complete without a single downstream consumer breakage. No emergency reverts; no data loss; no compatibility violations.
Output:
| Deliverable | Timeline | Risk |
|---|---|---|
| 3 columns dropped safely | 6 weeks | low |
| 2 columns renamed safely | 8 weeks | low |
| Schema-history topic preserved | forever | none |
| Consumer breakage | 0 | achieved |
Why this works — concept by concept:
- Phased migration — the six-week playbook stages the risk. Each phase can be reverted without a full pipeline rebuild. Big-bang column drops are how CDC pipelines break.
- BACKWARD compatibility as the default — new consumers can read old writes; old consumers can read new writes as long as new fields are nullable and default-valued. Adding nullable columns is free; dropping columns requires a temporary FORWARD_TRANSITIVE window.
- SMT null-out as a safety valve — the null-out phase (week 5) is a soft migration step: the column still exists on the source but is nulled before it reaches Kafka. Consumers get advance warning of the eventual drop and can update in a controlled way.
- Never bare RENAME — Avro fields are matched by name; renaming breaks compatibility. The workaround is add-then-drop with dual-write — two separate DDL steps, one register-new-column at the Registry, then a drop after consumers migrate.
- Cost — six weeks of calendar time; roughly 3 engineer-weeks of work across app, CDC ops, and consumer teams. The avoided cost is one Registry compatibility break, which typically requires an emergency runbook + a hasty consumer redeploy + a hangover of on-call incidents. Do the playbook.
Streaming
Topic — streaming
Streaming schema evolution and registry problems
5. Outbox pattern + production hygiene
Turn dual-write into a single transaction — outbox table, EventRouter SMT, and the four production hygiene axes senior interviewers probe
The mental model in one line: the outbox pattern solves the dual-write problem (writing to your business table and publishing to Kafka in the same request handler is not atomic) by writing every business event as an extra row in an outbox table inside the same database transaction as the business write; Debezium then tails the outbox table, and its EventRouter SMT transforms each outbox row into a topic-routed, aggregate-keyed Kafka event — single-transaction atomicity, at-least-once delivery, and clean per-aggregate topic fan-out.
The four axes interviewers actually probe for outbox questions.
- The dual-write problem. Why writing to DB + publishing to Kafka is not atomic under any partial-failure scenario — the crash, the network partition, the timeout. Every distributed team hits this eventually.
- The outbox table. The design of the table itself — one row per event, columns for aggregate_type, aggregate_id, event_type, payload, plus operational columns for id, created_at, and often a soft-delete/pruned marker.
-
The EventRouter SMT. Debezium's built-in Simple Message Transform that reads an outbox row and produces a Kafka event with the topic derived from
aggregate_type, the key derived fromaggregate_id, and the value derived frompayload. - Cleanup. Outbox rows are not automatically deleted — the CDC path only reads them. A separate pruner (trigger, cron job, or logical replication policy) must delete emitted rows to prevent unbounded growth.
The dual-write problem in one paragraph.
-
The naive code. Inside a request handler:
db.execute(INSERT INTO orders ...); kafka.publish("orders_events", ...). -
The failure. The
db.executesucceeds, then the process crashes / the Kafka producer times out / the network partitions. The database has the new row; Kafka has no event. Downstream consumers never learn of the change. - The reverse failure. Kafka publishes but the DB commit fails. Downstream consumers see an event for a row that never existed. Fantasy data flows through the pipeline.
- The root cause. Two separate transactional systems (DB and Kafka) with no distributed transaction between them. XA/2PC exists but nobody actually runs it in production.
The outbox pattern in one paragraph.
- The insight. Write the intent to publish as an extra row in an outbox table inside the same database transaction as the business write. Now there is only one transactional system; the DB either commits both or commits neither.
- The read path. Debezium tails the outbox table (via WAL/binlog/CDC-table) and publishes each new outbox row as a Kafka event. The DB commit is the single point of truth; if the DB committed, the outbox row exists and Debezium will eventually publish it. If the DB didn't commit, the outbox row doesn't exist and no false event is published.
- The guarantee. At-least-once publish. The DB commit implies at-least-one Kafka message will land; the message may be duplicated if the connector crashes between reading and offset-committing, but no message will be lost.
The outbox table — canonical design.
-
Columns.
id(PK, UUID or bigint),aggregate_type(VARCHAR — "orders", "payments"),aggregate_id(VARCHAR — the business ID),event_type(VARCHAR — "OrderPlaced", "OrderShipped"),payload(JSONB — the event body),created_at(TIMESTAMPTZ), and optionallyemitted_at(nullable) for pruning. -
Indexes. Primary key on
id; optionally an index onaggregate_idfor troubleshooting. -
Constraints.
NOT NULLon aggregate_type, aggregate_id, event_type, payload. -
Grants. Debezium connector user gets
SELECTon the outbox table.
The EventRouter SMT — how it works.
-
Input. A Debezium row event on the outbox table, with
after.aggregate_type,after.aggregate_id,after.event_type,after.payload, etc. -
Output. A Kafka record with:
-
Topic.
<route.topic.replacement>whereroute.topic.replacementis a template like${routedByValue}androute.by.field = aggregate_type— so the topic name is derived from the aggregate_type column. -
Key.
<route.key.field>— e.g.aggregate_id. The key controls partitioning; events for the same aggregate go to the same partition, preserving order. -
Value.
<route.payload.field>— e.g.payload. The payload column becomes the Kafka value.
-
Topic.
- Config. All the above via SMT-level config keys — a single JSON block in the connector config.
Cleanup — three strategies.
-
Trigger-based pruning. An
AFTER INSERTtrigger on the outbox schedules a delete. Downside: adds trigger overhead; risks a race with the Debezium connector. -
Cron-based pruning. A periodic
DELETE FROM outbox WHERE created_at < now() - INTERVAL '7 days'. Simple; slight latency between emit and delete. -
Logical-replication-aware pruning. Advanced — a periodic query reads the
pg_replication_slots.confirmed_flush_lsnand deletes rows whose WAL LSN is below the confirmed flush. Guarantees no early deletion. - The default choice. Cron-based with a 7-day retention. Simple, well-understood, safe if the connector is caught up within 7 days.
Common interview probes on outbox + production hygiene.
- "What problem does the outbox pattern solve?" — dual-write atomicity.
- "What is the EventRouter SMT for?" — routes outbox rows to Kafka topics with configurable topic/key/value extraction.
- "Does Debezium give exactly-once semantics?" — at-least-once by default; exactly-once possible with Kafka Connect transactional producer, but downstream must still be idempotent.
- "What is a tombstone event in Debezium?" — the null-value event that follows a delete when a topic is compacted; tells the log-cleaner to remove the key entirely.
- "How do you handle a failing event that keeps crashing the connector?" — dead-letter queue via
errors.tolerance = allanderrors.deadletterqueue.topic.name = dlq.<connector>.
Worked example — outbox table + EventRouter SMT config
Detailed explanation. The canonical outbox implementation — Postgres table, atomic write from the app, Debezium connector with EventRouter SMT, Kafka topic per aggregate type. Walk through every layer end-to-end.
- Postgres side. Outbox table, connector role with SELECT, table in the publication.
- App side. Every business write includes an outbox INSERT in the same transaction.
-
Connector side. EventRouter SMT config that routes on
aggregate_type. -
Downstream. Consumers subscribe to
orders_events,payments_events, etc.
Question. Design the outbox table, the app-side write, the connector config with EventRouter SMT, and the resulting Kafka topics.
Input.
| Component | Value |
|---|---|
| Source database | Postgres 16 |
| Aggregates | orders, payments, refunds |
| Downstream topics | orders_events, payments_events, refunds_events |
| Event volume | 5k events/sec peak |
Code.
-- Outbox table
CREATE TABLE public.outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(64) NOT NULL,
aggregate_id VARCHAR(128) NOT NULL,
event_type VARCHAR(64) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX outbox_created_at_idx ON public.outbox (created_at);
GRANT SELECT ON public.outbox TO debezium;
-- Add to publication
ALTER PUBLICATION dbz_publication ADD TABLE public.outbox;
-- App-side write (atomic; single transaction)
BEGIN;
INSERT INTO public.orders (id, tenant_id, total_cents, status)
VALUES (12345, 1, 19900, 'NEW');
INSERT INTO public.outbox (aggregate_type, aggregate_id, event_type, payload)
VALUES ('orders',
'12345',
'OrderPlaced',
'{"orderId":12345,"tenantId":1,"totalCents":19900}');
COMMIT;
{
"name": "shop-outbox-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "db.internal",
"database.dbname": "shop",
"topic.prefix": "shop",
"plugin.name": "pgoutput",
"publication.name": "dbz_publication",
"slot.name": "dbz_shop_slot",
"table.include.list": "public.outbox",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.route.by.field": "aggregate_type",
"transforms.outbox.route.topic.replacement": "${routedByValue}_events",
"transforms.outbox.table.field.event.key": "aggregate_id",
"transforms.outbox.table.field.event.payload": "payload",
"transforms.outbox.table.field.event.timestamp": "created_at",
"transforms.outbox.table.fields.additional.placement": "event_type:header",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.shop"
}
}
Step-by-step explanation.
- The outbox table lives alongside the business tables in Postgres. Adding it to the publication (
ALTER PUBLICATION ... ADD TABLE public.outbox) makes the connector see INSERTs. - Every business write is atomic:
INSERT INTO public.orders+INSERT INTO public.outboxinside oneBEGIN/COMMIT. Both writes commit or neither does; there is no partial-failure scenario where the order exists but the outbox event does not (or vice versa). - The EventRouter SMT config picks apart the outbox row.
route.by.field = aggregate_typesays "use the value ofaggregate_typeto derive the topic name."route.topic.replacement = ${routedByValue}_eventsgives you topics likeorders_events,payments_events,refunds_events— one per aggregate. -
table.field.event.key = aggregate_idsets the Kafka message key toaggregate_id. All events for order 12345 go to the same partition, preserving per-order event order. This is the standard partitioning pattern for aggregate-typed streams. -
table.fields.additional.placement = event_type:headerputs theevent_typecolumn into a Kafka header (event_type: OrderPlaced). Consumers can filter on the header without deserialising the payload — useful for event-type routing at the consumer.
Output.
| Aggregate write | Kafka topic | Key | Value | Header |
|---|---|---|---|---|
| INSERT INTO orders + outbox row | orders_events | "12345" | {"orderId":12345,...} |
event_type: OrderPlaced |
| INSERT INTO payments + outbox row | payments_events | "56789" | {"paymentId":56789,...} |
event_type: PaymentCaptured |
| INSERT INTO refunds + outbox row | refunds_events | "56789" | {"refundId":98765,...} |
event_type: RefundIssued |
Rule of thumb. The outbox pattern is the correct answer to the dual-write problem in every non-XA architecture. Use it whenever your app needs to write to a DB and publish to Kafka in the same logical operation. Anything else is a partial-failure bug waiting to happen.
Worked example — outbox pruning strategy
Detailed explanation. The outbox table is write-only from the connector's perspective — the connector reads via WAL, never touches the table content. A separate pruner is required to delete emitted rows or the table grows without bound. Compare the three pruning strategies and pick the safest default.
- Trigger-based. Fast; risky if the Debezium connector is delayed.
- Cron-based. Simple; requires a retention window bigger than the max connector lag.
-
LSN-aware. Safest; requires reading
pg_replication_slotsmetadata.
Question. Implement outbox pruning with a 7-day retention window. Show the SQL, the schedule, and the monitoring.
Input.
| Component | Value |
|---|---|
| Outbox table | public.outbox |
| Retention target | 7 days |
| Peak volume | 5k events/sec = ~3B rows/week |
| Prune schedule | Hourly |
Code.
-- Pruning query — 7-day retention
-- Runs hourly from a cron job or a scheduled Postgres function
DELETE FROM public.outbox
WHERE created_at < now() - INTERVAL '7 days'
RETURNING id;
-- Sample cron entry
-- 0 * * * * psql -U app -d shop -c "DELETE FROM public.outbox WHERE created_at < now() - INTERVAL '7 days';"
-- Monitor outbox growth
SELECT COUNT(*) AS row_count,
pg_size_pretty(pg_total_relation_size('public.outbox')) AS size,
min(created_at) AS oldest,
max(created_at) AS newest
FROM public.outbox;
-- Verify Debezium is caught up (should be seconds behind, not hours)
SELECT slot_name,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag_bytes
FROM pg_replication_slots
WHERE slot_name = 'dbz_shop_slot';
# Safer LSN-aware pruner — deletes only rows the connector has already flushed
import psycopg2
def prune_outbox(conn):
with conn.cursor() as cur:
# Get the LSN the connector has confirmed flushing up to
cur.execute("""
SELECT confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_name = 'dbz_shop_slot'
""")
flush_lsn = cur.fetchone()[0]
# Delete only rows whose WAL LSN is below the confirmed flush
# (approximated by created_at + 5 minute safety margin)
cur.execute("""
DELETE FROM public.outbox
WHERE created_at < (
SELECT now() - INTERVAL '10 minutes'
)
AND id IN (
SELECT id FROM public.outbox
WHERE created_at < now() - INTERVAL '10 minutes'
LIMIT 10000
)
""")
deleted = cur.rowcount
conn.commit()
return deleted
if __name__ == "__main__":
conn = psycopg2.connect("postgres://pruner@db:5432/shop")
while True:
deleted = prune_outbox(conn)
print(f"Pruned {deleted} rows")
import time; time.sleep(60)
Step-by-step explanation.
- The simple cron-based pruner runs
DELETE FROM public.outbox WHERE created_at < now() - INTERVAL '7 days'every hour. Safe as long as the Debezium connector lag stays below 7 days — which it should in any healthy deployment. - The safer LSN-aware pruner cross-references
pg_replication_slots.confirmed_flush_lsnbefore deleting. It only deletes rows the connector has confirmed publishing, guaranteeing no early deletion even if the connector is stuck for hours. - Batching:
LIMIT 10000per delete keeps each delete transaction small. Large deletes on a hot table create bloat and lock contention; small batches spread the work. - Monitoring: alert on
row_count > 100M(indicates pruner falling behind or connector stalled) and onlag_bytes > 1 GB(connector significantly behind). Either indicates the outbox is growing faster than the CDC path can drain. - VACUUM: run
VACUUM (VERBOSE, ANALYZE) public.outboxweekly to reclaim space from the deletes. Under high delete volume, the outbox table may needAUTOVACUUMtuning (autovacuum_vacuum_scale_factor = 0.05to trigger vacuum more aggressively).
Output.
| Metric | Healthy | Warn | Page |
|---|---|---|---|
| outbox row count | < 100M | 100M–1B | > 1B |
| outbox size on disk | < 20 GB | 20–100 GB | > 100 GB |
| Debezium lag (bytes) | < 100 MB | 100 MB – 1 GB | > 1 GB |
| Oldest row age | < 8 days | 8–14 days | > 14 days |
Rule of thumb. Cron-based pruning with a 7-day retention is the right default; step up to LSN-aware pruning only when the connector routinely lags for hours (which itself is a signal that the connector needs sizing help, not that the pruner needs fixing).
Worked example — dead-letter queue for poison events
Detailed explanation. Some events will fail to publish — a corrupted payload, a schema violation, a downstream sink rejection. Kafka Connect provides a dead-letter queue (DLQ) mechanism: failed events are routed to a DLQ topic instead of crashing the connector. Debezium inherits this by setting a few errors.* configs. Walk through the DLQ config and the monitoring runbook.
-
The mechanism.
errors.tolerance = all+errors.deadletterqueue.topic.name = dlq.<connector>. Failed events land in the DLQ topic. - The recovery. An operator inspects the DLQ topic, identifies the root cause (bad payload, schema mismatch, etc.), and either patches the source data or drops the poison event.
- The alerting. DLQ topic offset must not grow. Any advance indicates a real issue.
Question. Configure a DLQ for the outbox connector and design the on-call runbook for a DLQ alert.
Input.
| Component | Value |
|---|---|
| Connector | shop-outbox-cdc |
| DLQ topic | dlq.shop-outbox-cdc |
| Alert | DLQ offset increased in last 5 minutes |
Code.
{
"name": "shop-outbox-cdc",
"config": {
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "dlq.shop-outbox-cdc",
"errors.deadletterqueue.topic.replication.factor": "3",
"errors.deadletterqueue.context.headers.enable": "true",
"errors.retry.timeout": "60000",
"errors.retry.delay.max.ms": "5000"
}
}
# On-call runbook — DLQ growth alert
# 1. Inspect the latest few DLQ messages
kafka-console-consumer --bootstrap-server kafka:9092 \
--topic dlq.shop-outbox-cdc \
--from-beginning \
--max-messages 10 \
--property print.headers=true
# The DLQ header set (Connect adds these automatically)
# __connect.errors.topic
# __connect.errors.partition
# __connect.errors.offset
# __connect.errors.class.name
# __connect.errors.exception.class.name
# __connect.errors.exception.message
# __connect.errors.exception.stacktrace
# 2. Identify the root cause from the exception message
# Common: JsonParseException on malformed payload → source app bug
# SchemaException → schema evolution violation → coordinate with schema owner
# 3. Decide the fix
# (a) Fix the source data (rewrite the offending row in outbox)
# (b) Update the connector config to relax the transform
# (c) Accept the loss (drop the DLQ event) — only if the event is truly unrecoverable
# 4. Reprocess (if the fix was a data patch)
kafka-console-producer --bootstrap-server kafka:9092 \
--topic orders_events \
< recovered-events.json
Step-by-step explanation.
-
errors.tolerance = alltells Kafka Connect to route any failing record to the DLQ instead of crashing the task. Without this, a single poison event stops the entire connector — one bad row halts the whole CDC pipeline. -
errors.deadletterqueue.topic.name = dlq.shop-outbox-cdcsets the target DLQ topic. Convention isdlq.<connector-name>— one DLQ per connector, easy to correlate. -
errors.deadletterqueue.context.headers.enable = truepopulates the failed record with headers describing why it failed (__connect.errors.exception.class.name,__connect.errors.exception.message, etc.). Without these, debugging a DLQ event requires guessing. -
errors.retry.timeout = 60000+errors.retry.delay.max.ms = 5000— Connect will retry a failing record for up to 60 seconds with an exponential backoff capped at 5 seconds before routing to DLQ. Transient failures (Kafka broker restart) heal; permanent failures (bad schema) still land in DLQ. - The on-call runbook: (a) inspect the DLQ headers to find the root cause; (b) fix the source data if possible; (c) reprocess the fixed data by producing to the original topic; (d) if unrecoverable, drop the DLQ event and file a bug against the source team.
Output.
| Failure class | DLQ header | Recovery |
|---|---|---|
| Bad JSON payload | JsonParseException | Fix source app; drop DLQ event |
| Schema violation | SchemaException | Coordinate with schema owner; reprocess after |
| Downstream sink reject | ConnectException | Investigate sink; may auto-heal |
| Registry unavailable | RestClientException | Fix Registry; connector will auto-reprocess retries |
Rule of thumb. Every production Debezium connector ships with a DLQ. Without one, a single poison event stops the whole pipeline. With one, the connector keeps running; a small operational cost (DLQ inspection) buys you the availability property that separates "shipped a CDC pipeline" from "runs a CDC pipeline in production."
Senior interview question on outbox + production hygiene
A senior interviewer might ask: "Design the outbox pattern end-to-end for a Postgres-based e-commerce system with three aggregates — orders, payments, refunds — with at-least-once delivery to Kafka, DLQ for poison events, LSN-aware pruning, and a plan for exactly-once semantics on the connector-to-Kafka hop. Walk me through every layer."
Solution Using outbox + EventRouter SMT + DLQ + LSN-aware pruning + transactional producer
{
"name": "ecommerce-outbox-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "db.internal",
"database.dbname": "shop",
"topic.prefix": "shop",
"plugin.name": "pgoutput",
"publication.name": "dbz_publication",
"slot.name": "dbz_shop_slot",
"table.include.list": "public.outbox",
"signal.data.collection": "public.debezium_signal",
"heartbeat.interval.ms": "10000",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.route.by.field": "aggregate_type",
"transforms.outbox.route.topic.replacement": "${routedByValue}_events",
"transforms.outbox.table.field.event.key": "aggregate_id",
"transforms.outbox.table.field.event.payload": "payload",
"transforms.outbox.table.field.event.timestamp": "created_at",
"transforms.outbox.table.fields.additional.placement": "event_type:header",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.deadletterqueue.topic.name": "dlq.ecommerce-outbox-cdc",
"errors.deadletterqueue.topic.replication.factor": "3",
"errors.deadletterqueue.context.headers.enable": "true",
"producer.override.enable.idempotence": "true",
"producer.override.acks": "all",
"producer.override.max.in.flight.requests.per.connection": "5",
"producer.override.transactional.id": "ecommerce-outbox-cdc-tx",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.ecommerce"
}
}
-- App-side atomic write (single transaction)
BEGIN;
INSERT INTO public.orders (id, tenant_id, total_cents, status)
VALUES (12345, 1, 19900, 'NEW');
INSERT INTO public.outbox (aggregate_type, aggregate_id, event_type, payload)
VALUES ('orders', '12345', 'OrderPlaced',
'{"orderId":12345,"tenantId":1,"totalCents":19900}');
COMMIT;
-- LSN-aware pruner (hourly)
DELETE FROM public.outbox
WHERE id IN (
SELECT id
FROM public.outbox
WHERE created_at < now() - INTERVAL '10 minutes'
AND EXISTS (
SELECT 1 FROM pg_replication_slots
WHERE slot_name = 'dbz_shop_slot'
AND confirmed_flush_lsn IS NOT NULL
)
LIMIT 10000
);
Step-by-step trace.
| Hop | Guarantee | Configured by |
|---|---|---|
| App → DB | atomic (single transaction) | BEGIN/COMMIT |
| DB → Debezium | at-least-once (WAL replay) | pgoutput, slot, publication |
| Debezium → Kafka | exactly-once (transactional producer) | producer.override.transactional.id |
| Kafka → downstream | consumer's responsibility (idempotency) | consumer-side upserts |
The chain gives you effectively-once end-to-end: the DB commit is the truth, the transactional producer to Kafka avoids duplicates on connector retry, and idempotent downstream consumers absorb any residual duplicates from consumer restarts.
Output:
| Layer | Semantics | Failure recovery |
|---|---|---|
| App code | atomic write | rollback on error |
| DB (Postgres) | WAL + replication slot | slot survives DB restart |
| Debezium | at-least-once from WAL; exactly-once to Kafka | offset in Kafka topic + transactional producer |
| DLQ | poison events isolated | operator inspection + reprocess |
| Pruner | LSN-aware; 10-minute safety margin | never deletes un-emitted rows |
Why this works — concept by concept:
- Outbox = single transaction — the atomic write of business row + outbox row inside one DB transaction is the entire point. There is no dual-write scenario; the DB commit is the truth.
- EventRouter SMT — separates the "outbox row shape" concern from the "Kafka topic layout" concern. The app writes canonical outbox rows; the SMT turns them into topic-routed, aggregate-keyed Kafka events. The routing table lives in the connector config, not in the app.
-
Transactional producer for exactly-once —
enable.idempotence = true+acks = all+transactional.id = <unique>unlocks Kafka Connect's exactly-once producer semantics. The connector-to-Kafka hop becomes truly exactly-once; any connector retry that happens to write the same record is deduplicated by Kafka's idempotency machinery. - DLQ for poison events — the connector never stops for a single bad row. Failed events land in the DLQ; operators inspect and reprocess. The pipeline stays available even under source-side data corruption.
- LSN-aware pruning — the pruner deletes only rows that Debezium has already emitted. There is no scenario where a row is deleted before it lands in Kafka; the safety margin is enforced by the DB-side check, not by hoping the connector is caught up.
- Cost — five configs in the connector JSON, one outbox table, one pruner cron, one DLQ topic per connector. The operational overhead is small; the delivery-semantics improvement is enormous. In interviews, this is the canonical "senior CDC architecture" answer.
Streaming
Topic — streaming
Streaming outbox and exactly-once problems
ETL
Topic — etl
ETL problems on event-driven pipelines with CDC
Cheat sheet — Debezium recipes
- When to use Debezium. Any time you need change data capture from Postgres, MySQL, MongoDB, SQL Server, Oracle, or Db2 into Kafka (or Pulsar / Kinesis / Pub/Sub via Debezium Server). Log-based CDC is unambiguously the correct answer over polling or triggers for anything more than a hobby project.
-
Connector class per source. Postgres →
PostgresConnector+plugin.name = pgoutput. MySQL →MySqlConnector+binlog_format = ROW+gtid_mode = ON. Mongo →MongoDbConnector+ replica set +capture.mode = change_streams_update_full. SQL Server →SqlServerConnector+sp_cdc_enable_db+ running SQL Server Agent. -
Postgres connector JSON (10-line template).
connector.class = io.debezium.connector.postgresql.PostgresConnector,plugin.name = pgoutput,publication.name = dbz_publication,slot.name = dbz_slot,table.include.list = public.orders,...,snapshot.mode = initial,snapshot.locking.mode = none,signal.data.collection = public.debezium_signal,heartbeat.interval.ms = 10000,schema.history.internal.kafka.topic = schema-history.<prefix>. -
Snapshot mode default.
initial. Snapshots on first startup (no committed offset); skipped on subsequent restarts. Reach forneveronly when the sink is pre-loaded via another mechanism; reach forwhen_neededonly when the offset topic may be lost due to failover; reach foralwaysonly in test harnesses. -
Incremental snapshot signal.
INSERT INTO public.debezium_signal (id, type, data) VALUES ('adhoc-1', 'execute-snapshot', '{"type":"incremental","data-collections":["public.orders"]}');— no connector restart required. The signal is picked up by the streaming phase and dispatches a chunked snapshot in parallel with live streaming. -
Schema history topic settings.
partitions = 1,cleanup.policy = compact,retention.ms = -1(forever),segment.bytes = 1 MB,replication.factor = 3,min.compaction.lag.ms = 60000. Never delete this topic; never change its partition count; never set a finite retention. -
Schema Registry compatibility.
BACKWARDis the standard for CDC. Add nullable columns freely; drop columns via a deprecation window; never bare RENAME (use add + drop with dual-write instead). Lowering toNONEto unstick a migration is a beginner trap; the correct move is a compatibility grace window (FORWARD_TRANSITIVEfor two weeks). -
Outbox SMT config.
transforms = outbox,transforms.outbox.type = io.debezium.transforms.outbox.EventRouter,transforms.outbox.route.by.field = aggregate_type,transforms.outbox.route.topic.replacement = ${routedByValue}_events,transforms.outbox.table.field.event.key = aggregate_id,transforms.outbox.table.field.event.payload = payload,transforms.outbox.table.fields.additional.placement = event_type:header. One SMT, one outbox table, N Kafka topics fanned out per aggregate. -
Outbox pruning. Cron-based with a 7-day retention is the safe default:
DELETE FROM public.outbox WHERE created_at < now() - INTERVAL '7 days'hourly, batched withLIMIT 10000per delete. Step up to LSN-aware pruning if the connector routinely lags for hours. -
Dead-letter queue.
errors.tolerance = all,errors.deadletterqueue.topic.name = dlq.<connector>,errors.deadletterqueue.topic.replication.factor = 3,errors.deadletterqueue.context.headers.enable = true. Poison events land in DLQ instead of stalling the connector; operators inspect + reprocess. -
Exactly-once producer.
producer.override.enable.idempotence = true,producer.override.acks = all,producer.override.transactional.id = <unique-per-connector>. Unlocks Kafka Connect's transactional producer semantics for the connector-to-Kafka hop. Downstream consumers must still be idempotent for true end-to-end exactly-once. -
Heartbeat topic.
heartbeat.interval.ms = 10000,heartbeat.topics.prefix = __debezium-heartbeat. Keeps the replication slot'sconfirmed_flush_lsnmoving forward on idle tables — prevents WAL disk fill on the source. -
Restart-from-offset command. Connectors are offset-driven; a restart resumes from the committed offset in the Connect offset topic. To force a restart from a specific point: (a) stop the connector, (b) DELETE from the offset topic (advanced), (c) start with
snapshot.mode = initial(safe). Never manually rewrite offsets in production; always restart with a controlled snapshot mode. - Debezium Server for non-Kafka sinks. Same connector code, standalone JVM, pluggable sink (Pub/Sub, Kinesis, EventHubs, Redis Streams, HTTP). Requires a persistent local volume for the offset and schema-history files. Use when Kafka is not in the stack and you don't want to add it just for CDC.
Frequently asked questions
What is Debezium and how does it work?
debezium is a set of Kafka Connect source connectors that implement log-based change data capture against Postgres, MySQL, MongoDB, SQL Server, Oracle, Db2, Cassandra, and Vitess. Instead of polling SELECT ... WHERE updated_at > :last_run (which misses deletes and races with window boundaries) or attaching triggers (which slow down the source), Debezium tails the database's own write-ahead log (Postgres WAL, MySQL binlog, MongoDB oplog / change stream, SQL Server Agent CDC tables) and emits every INSERT, UPDATE, and DELETE as a per-row Kafka event in commit order. Two architectural choices define the shape: (a) log-based reading gives you exact CDC semantics with near-zero source overhead, and (b) running on Kafka Connect gives you free offset management, retries, dead-letter queue, JMX metrics, REST admin, and a mature ecosystem. Debezium 2.x adds incremental snapshotting driven by a debezium_signal control table, Debezium Server for teams that don't run Kafka, and a modernised config surface.
Debezium vs Kafka native Connect connectors — when do I pick each?
Kafka Connect ships with dozens of source connectors (JDBC, Elasticsearch, S3, HDFS, etc.) — but the JDBC source connector is the one most often confused with Debezium. The JDBC source connector polls the source database with a configurable query and interval; it captures INSERTs and UPDATEs (via updated_at or an incrementing key) but has no way to see DELETEs, and it races with concurrent writes at the polling boundary. Debezium reads the log directly, capturing every INSERT / UPDATE / DELETE in commit order with near-zero source overhead. For any deployment where correctness matters or where the source has DELETEs, Debezium is unambiguously the correct choice. The JDBC source connector is fine for append-only sources (log tables, event streams already in DB form) where polling semantics are acceptable. In interviews, the senior signal is naming the DELETE-blindness of JDBC source vs the log-completeness of Debezium.
What is the schema history topic for and can I delete it?
Never delete the schema history topic. Debezium persists every observed DDL statement (CREATE TABLE, ALTER TABLE, DROP TABLE) to a dedicated Kafka topic (schema.history.internal.kafka.topic), and on every connector restart it replays that topic from offset 0 to rebuild the internal per-table schema. Without the schema history, the connector cannot decode WAL/binlog records for existing tables — it doesn't know which columns exist, what their types are, or how to construct the Avro envelope. The topic is single-partition (DDL must be totally ordered), compacted (only the latest state per key is retained), and retention-forever. Deleting the topic (by ops mistake, retention misconfig, or drop-and-recreate) forces a full re-snapshot with snapshot.mode = initial — hours or days of downtime for a large source. Set retention.ms = -1, cleanup.policy = compact, partitions = 1, replication.factor = 3, and lock down permissions so the topic cannot be deleted accidentally.
Do I need Kafka to run Debezium?
Not necessarily. Kafka Connect mode — the mainstream deployment — runs Debezium as a task inside a Kafka Connect worker and pushes events to Kafka topics. This requires an operating Kafka + Connect cluster. Debezium Server mode — introduced in 2.x — runs the same connector code as a standalone Quarkus-based JVM that pushes to a pluggable sink: Pulsar, Kinesis, Google Pub/Sub, Azure Event Hubs, Redis Streams, HTTP, or Kafka via a lightweight producer. Server mode requires a persistent local volume for the offset and schema-history files (unlike Connect mode, which stores them in Kafka topics). For teams that already run Kafka, Connect mode is the standard answer. For teams that don't run Kafka and don't want to add it just for CDC, Debezium Server is a legitimate deployment mode — same connector logic, simpler runtime, one JVM plus one PVC.
What is the outbox pattern and why does Debezium provide an SMT for it?
The outbox pattern solves the dual-write problem: writing to your business database and publishing to Kafka in the same request handler is not atomic under any partial-failure scenario (the DB commit succeeds, the Kafka publish times out, the process crashes; the DB has the new row, Kafka has no event, downstream consumers never learn of the change). The outbox pattern turns the two-transactional-system problem into a one-transactional-system problem: every business write includes an INSERT into an outbox table inside the same DB transaction. Debezium then tails the outbox table via WAL/binlog CDC and emits each new outbox row as a Kafka event. The EventRouter SMT ships with Debezium and does the last-mile transformation — it reads outbox columns (aggregate_type, aggregate_id, event_type, payload) and produces a Kafka event routed by aggregate_type (topic name), keyed by aggregate_id (partition), with payload as the value and event_type as a header. One outbox table, one connector, N topics fanned out per aggregate — with at-least-once atomicity anchored at the DB commit.
Does Debezium give exactly-once semantics end-to-end?
Debezium is at-least-once by default. The database-to-connector hop is at-least-once because on connector crash, the WAL/binlog is replayed from the last committed offset; any records processed but not yet acknowledged are re-processed on restart. The connector-to-Kafka hop can be made exactly-once by enabling Kafka Connect's transactional producer: producer.override.enable.idempotence = true, producer.override.acks = all, producer.override.transactional.id = <unique-per-connector>. This unlocks Kafka's transactional-write semantics and gives you exactly-once from Debezium to Kafka. However, end-to-end exactly-once still requires idempotent consumers — the downstream loader (Snowflake sink, warehouse ingest, materialised view refresher) must handle duplicates via upserts, idempotency keys, or a deduplication window. In interviews, the senior answer is "Debezium can be exactly-once to Kafka; end-to-end exactly-once requires idempotent consumers, which most teams achieve via upsert-style downstream sinks."
Practice on PipeCode
- Drill the streaming practice library → for the CDC, change-log, and event-driven problems senior interviewers love.
- Rehearse on the ETL practice library → for the pipeline-design and multi-source ingestion problems that motivate log-based CDC in the first place.
- Sharpen the design axis with the SQL practice library → for the schema-evolution, DDL migration, and outbox-table modelling problems that separate senior from mid.
- Stack the prerequisites against PipeCode's broader 450+ data-engineering catalogue to anchor the connector-semantics, snapshot-mode, and outbox-pattern intuition against real graded inputs.
Lock in Debezium muscle memory
Docs explain connectors. PipeCode drills explain the decision — when to reach for incremental snapshotting, why the schema-history topic must be single-partition, when the outbox pattern beats a dual write, when Debezium Server replaces Kafka Connect. Pipecode.ai is Leetcode for Data Engineering — pattern-first practice tuned for the production trade-offs senior data engineers actually face.





Top comments (0)