DEV Community

Cover image for Change Data Capture(Example using Microsoft SQL Server Log-based CDC vs Trigger-based CDC
hammoudi wissem
hammoudi wissem

Posted on

Change Data Capture(Example using Microsoft SQL Server Log-based CDC vs Trigger-based CDC

Definition of Change Data Capture

Change Data Capture (CDC) is a technique used to detect, record, and track changes—such as inserts, updates, and deletes—made to data in real-time or near real-time. It is widely used in databases and information systems to maintain an accurate history of modifications and synchronize data across systems.

Note: CDC is considered an ETL (Extract, Transform, Load) technique where only changed data is extracted from a source system and loaded into a target system such as a data warehouse or data lake.


Importance of Change Data Capture

1. Real-time Data Synchronization

CDC captures and propagates changes instantly, ensuring all connected systems remain synchronized without delay. This is especially important when multiple databases or systems depend on updated data.

2. Event-Driven Architectures

CDC transforms data changes into events that trigger downstream processes or workflows. This enables responsive systems capable of reacting immediately to user actions, business events, or operational changes.

3. Efficient Data Processing

Since CDC transmits only modified data, it reduces the overhead of full database scans or batch processing. This speeds up pipelines and ensures downstream systems always receive fresh data.

4. Scalability and Flexibility

CDC supports asynchronous, decoupled communication, making it easier for distributed systems to scale horizontally while maintaining reliability.

5. Enhanced Analytics and Insights

Real-time data flow enables updated dashboards, instant anomaly detection, and more proactive decision-making.


Change Data Capture (CDC) Principles

  • Capture: Identify inserts, updates, and deletes without impacting the source system.
  • Log-based Tracking: Leverage transaction logs or replication logs to detect changes.
  • Incremental Updates: Transfer only changed rows instead of full datasets.
  • Real-time or Near Real-time: Propagate changes quickly to ensure data freshness.
  • Idempotent Processing: Avoid duplicate processing and maintain consistency.

Use Cases of Change Data Capture

  • Data Warehousing – Keep analytical systems updated with the latest transactional data.
  • Replication – Synchronize data across distributed systems for load balancing or disaster recovery.
  • Data Integration – Integrate modern and legacy systems or synchronize between cloud and on-premises systems.
  • Real-time Analytics – Feed streaming platforms or dashboards with immediate updates.
  • Data Synchronization – Maintain consistency across microservices, caching layers, or distributed components.

Real-World Example: CDC at Netflix

Netflix employs a CDC pipeline built on Apache Kafka and Apache Flink to extract changes from internal databases and process them in real-time. This architecture powers:

  • Service Usage Monitoring – Tracking user interactions to optimize performance.
  • Recommendation Engine – Providing personalized content suggestions.
  • Fraud Detection – Identifying abnormal activities instantly.

Change Data Capture Methods

Below are the most common CDC techniques, including how they work and their pros/cons.


1. Log-Based CDC

How It Works

Changes are captured directly from the database transaction log, which records every data modification for recovery purposes.

Advantages

  • Minimal performance overhead
  • No changes required to the source schema
  • Highly accurate and reliable
  • No triggers or additional queries needed
  • Works with most transactional databases

Disadvantages

  • Some tools require a paid license
  • More complex to configure
  • Requires primary or unique keys
  • Logs must be retained if the target system is down

2. Trigger-Based CDC

How It Works

Database triggers are created on the source table to record inserts, updates, or deletes into a shadow table.

Advantages

  • Works on any database supporting triggers
  • Provides real-time capture
  • Shadow table can be customized

Disadvantages

  • Impacts source database performance
  • Adds processing overhead
  • More difficult to maintain, especially with schema changes

3. Timestamp-Based CDC

How It Works

  • Requires a LastModified or timestamp column.
  • Queries fetch only rows where the timestamp > last extraction time.

Advantages

  • Easy to implement
  • No triggers or log access required

Disadvantages

  • Deletes cannot be detected
  • Can miss changes if timestamps are not updated
  • Can slow performance due to full table scans

Conclusion on CDC Methods

Choosing the right CDC method depends on:

  • Real-time requirements
  • Database load tolerance
  • Schema complexity
  • Infrastructure capabilities
  • Operational constraints

Log-based CDC is generally the best option for performance-critical systems, while trigger-based CDC is useful when log access is restricted.


Log-Based CDC and Trigger-Based CDC Using SQL Server

1. Installation

SQL Server

Download and install SQL Server (Developer Edition recommended).

SQL Server Management Studio (SSMS)

Download SSMS and connect to the SQL Server instance.


Types of SQL Server Authentication

Windows Authentication

Uses the logged-in Windows user credentials.
Benefits:

  • No password management
  • More secure
  • Integrated with enterprise policies
  • Simplifies permission handling

SQL Server Authentication

Requires a username and password stored in SQL Server.

Choice: Windows Authentication was selected for its security and ease of use.


Implementation of Trigger-Based CDC

Create Source Table

CREATE TABLE person (
 person_id INT IDENTITY PRIMARY KEY,
 person_name VARCHAR(255)
);
Enter fullscreen mode Exit fullscreen mode

Create History Table

CREATE TABLE person_history (
 hist_id INT PRIMARY KEY IDENTITY,
 person_id INT,
 person_name VARCHAR(255),
 status VARCHAR(255),
 UserName NVARCHAR(50),
 InsertedDateTime DATETIME
);
GO
Enter fullscreen mode Exit fullscreen mode

Create Trigger

CREATE TRIGGER tr_AllOperationsTrigger
ON person
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
    -- DELETE
    IF EXISTS (SELECT * FROM deleted)
    INSERT INTO person_history (person_id, person_name, status, UserName, InsertedDateTime)
    SELECT person_id, person_name, 'deleted', SUSER_SNAME(), GETDATE()
    FROM deleted;

    -- INSERT
    IF EXISTS (SELECT * FROM inserted)
    INSERT INTO person_history (person_id, person_name, status, UserName, InsertedDateTime)
    SELECT person_id, person_name, 'inserted', SUSER_SNAME(), GETDATE()
    FROM inserted;

    -- UPDATE
    IF EXISTS (SELECT * FROM inserted) AND EXISTS (SELECT * FROM deleted)
    BEGIN
        INSERT INTO person_history
        SELECT person_id, person_name, 'updated old value', SUSER_SNAME(), GETDATE()
        FROM deleted;

        INSERT INTO person_history
        SELECT person_id, person_name, 'updated new value', SUSER_SNAME(), GETDATE()
        FROM inserted;
    END
END;
GO
Enter fullscreen mode Exit fullscreen mode

Implementation of Log-Based CDC

Enable CDC for the Database

EXEC sys.sp_cdc_enable_db;
GO
Enter fullscreen mode Exit fullscreen mode

Enable CDC for a Table

EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name = N'persontri',
    @role_name = NULL;
GO
Enter fullscreen mode Exit fullscreen mode

Disable Cleanup Job (Optional)

Prevents cleanup from removing captured data.

EXEC sys.sp_cdc_drop_job @job_type = 'cleanup';
Enter fullscreen mode Exit fullscreen mode

Verify CDC Status

SELECT is_cdc_enabled FROM sys.databases WHERE name = 'test';
SELECT * FROM cdc.change_tables;
Enter fullscreen mode Exit fullscreen mode

View CDC Data

SELECT * FROM cdc.dbo_person_CT;
Enter fullscreen mode Exit fullscreen mode

Operation Types in __$operation

Value Meaning
1 Delete
2 Insert
3 Update (old values)
4 Update (new values)
7 Metadata operation

Adding User and Operation Details

Add User Column

ALTER TABLE cdc.dbo_person_CT
ADD userid2 nvarchar(128) DEFAULT USER_NAME();
Enter fullscreen mode Exit fullscreen mode

Create Operation Table

CREATE TABLE operation (
    operation_id INT PRIMARY KEY,
    operation_name VARCHAR(255)
);

INSERT INTO operation VALUES
(1,'Delete'),(2,'Insert'),(3,'update(old)'),(4,'update(new)');
Enter fullscreen mode Exit fullscreen mode

Convert LSN to Timestamp

SELECT
    *,
    CONVERT(VARCHAR(5), CONVERT(DATETIME, sys.fn_cdc_map_lsn_to_time([__start_lsn]), 108)) AS ModificationHourMinute,
    CONVERT(DATE, sys.fn_cdc_map_lsn_to_time([__start_lsn])) AS ModificationDate
FROM
    operation o
JOIN
    cdc.dbo_person_CT a ON o.operation_id = a.__$operation;
Enter fullscreen mode Exit fullscreen mode

Comparison Between Log-based and Trigger-based CDC

Testing tools:

  • SQLQueryStress → workload generation
  • Performance Monitor → hardware-level performance metrics

Monitored Metrics

  • Disk Reads/Writes per second
  • Virtual memory page reads/writes
  • Transactions per second
  • Lock requests per second
  • Batch requests per second

Performance Summary

SQLQueryStress Results

  • Log-Based CDC

    • ~0.0037 seconds/iteration
    • ~5.672 logical reads
  • Trigger-Based CDC

    • ~0.0017 seconds/iteration
    • ~3.013 logical reads

→ Log-based CDC outperforms trigger-based CDC by 55 seconds over 100,000 iterations.


Performance Monitor Results

Log-Based CDC

  • Lower transactions/sec (623,995)
  • Lower disk writes/sec (316,574)
  • Much lower lock requests/sec (4,978)
  • Higher page writes/sec

Trigger-Based CDC

  • More batch requests/sec (833,331)
  • Very high lock requests/sec (2,548,997)
  • Higher disk writes/sec (370,824)

Interpretation:
Log-based CDC uses transaction logs efficiently without introducing overhead, while trigger-based CDC generates extra writes and locks, causing higher resource consumption.


Final Conclusion

  • Log-based CDC is the best choice for large-scale, high-performance systems.
    It minimizes overhead, provides accurate change tracking, and integrates naturally with SQL Server transaction logs.

  • Trigger-based CDC is easier to implement and works on many databases but introduces overhead and complexity.
    It is suitable when log access is restricted or custom auditing logic is required.

In performance tests, log-based CDC clearly outperforms trigger-based CDC, resulting in lower lock contention, fewer disk operations, and overall faster execution.

Top comments (0)