DEV Community

Apache SeaTunnel
Apache SeaTunnel

Posted on

Does Apache SeaTunnel MySQL CDC Support Timestamp-Based Startup?

In MySQL CDC tasks, many users encounter the same question:

If a task fails, where should it resume from?
What if you only know a point in time, but cannot obtain the corresponding binlog position?

Apache SeaTunnel 2.3.12 provides a more intuitive answer by introducing Timestamp Startup.

This article analyzes the design background, configuration, and implementation of this capability, helping readers understand how to perform CDC task recovery and data backfilling more efficiently based on time semantics.

Feature Overview

Problem: CDC Startup Configuration Is “Technically Correct but Hard to Use”

Before Apache SeaTunnel 2.3.12, the MySQL CDC connector mainly supported starting synchronization from a specific binlog position (file + position) or GTID.
While this approach is precise and reliable at the technical level, it often does not align with real-world production and operational practices.

In actual CDC operations, users are far more familiar with “time” than with low-level binlog details, for example:

  • After an abnormal task interruption, wanting to resume synchronization after “2024-04-01 10:00:00”
  • Performing backfill or replay for data within a specific time window
  • Knowing that “changes after yesterday 08:00 need to be resynchronized,” but being unable to locate the corresponding binlog file and offset

Requiring users to manually convert timestamps into binlog positions not only makes configuration complex, but is also highly error-prone and significantly increases operational costs.
This startup approach—technically friendly but user-unfriendly—has become a common pain point in CDC recovery and backtracking scenarios.

Solution: Introducing Timestamp-Based Startup

To address these issues, Apache SeaTunnel introduced timestamp-based startup for the MySQL CDC connector in version 2.3.12.

This feature allows users to specify a Unix timestamp (in milliseconds) directly as the synchronization starting point.
During startup, the MySQL CDC connector automatically performs the following steps:

  1. Locates the corresponding binlog file and offset based on the specified timestamp
  2. Starts reading change events from that binlog position
  3. Automatically skips all historical events earlier than the given timestamp

By introducing time as a more business-aligned dimension, SeaTunnel elevates CDC startup from being binlog-detail-oriented to business-time-semantic–oriented, significantly lowering the barrier for CDC usage in recovery, backtracking, and operational scenarios.

Configuration Parameters

To enable timestamp-based startup, the following two key parameters must be configured:

Parameter Name Type Required Description
startup.mode Enum No Set to "timestamp" to enable timestamp mode
startup.timestamp Long Yes Unix timestamp (milliseconds) specifying the startup time

Configuration Example

env {
  parallelism = 1
  job.mode = "STREAMING"
  checkpoint.interval = 10000
}

source {
  MySQL-CDC {
    url = "jdbc:mysql://localhost:3306/testdb"
    username = "root"
    password = "root@123"
    table-names = ["testdb.table1"]

    # Enable timestamp-based startup
    startup.mode = "timestamp"
    startup.timestamp = 1672531200000  # 2023-01-01 00:00:00 UTC
  }
}

sink {
  Console {
  }
}
Enter fullscreen mode Exit fullscreen mode

Technical Implementation

Startup Mode Enumeration

All supported startup modes, including the newly added TIMESTAMP mode, are defined in the MySqlSourceOptions class:

public static final SingleChoiceOption<StartupMode> STARTUP_MODE =
    (SingleChoiceOption)
        Options.key(SourceOptions.STARTUP_MODE_KEY)
            .singleChoice(
                StartupMode.class,
                Arrays.asList(
                    StartupMode.INITIAL,
                    StartupMode.EARLIEST,
                    StartupMode.LATEST,
                    StartupMode.SPECIFIC,
                    StartupMode.TIMESTAMP))
Enter fullscreen mode Exit fullscreen mode

Timestamp Filtering Implementation

The core logic resides in the MySqlBinlogFetchTask class.
When the startup mode is detected as TIMESTAMP, TimestampFilterMySqlStreamingChangeEventSource is used to process binlog events:

StartupMode startupMode = startupConfig.getStartupMode();
if (startupMode.equals(StartupMode.TIMESTAMP)) {
    log.info(
        "Starting MySQL binlog reader,with timestamp filter {}",
        startupConfig.getTimestamp());

    mySqlStreamingChangeEventSource =
        new TimestampFilterMySqlStreamingChangeEventSource(
            sourceFetchContext.getDbzConnectorConfig(),
            sourceFetchContext.getConnection(),
            sourceFetchContext.getDispatcher(),
            sourceFetchContext.getErrorHandler(),
            Clock.SYSTEM,
            sourceFetchContext.getTaskContext(),
            sourceFetchContext.getStreamingChangeEventSourceMetrics(),
            startupConfig.getTimestamp());
}
Enter fullscreen mode Exit fullscreen mode

Offset Calculation

The logic for locating the binlog offset based on a timestamp is implemented in MySqlSourceFetchTaskContext:

private Offset getInitOffset(SourceSplitBase mySqlSplit) {
    StartupMode startupMode = getSourceConfig().getStartupConfig().getStartupMode();
    if (startupMode.equals(StartupMode.TIMESTAMP)) {
        long timestamp = getSourceConfig().getStartupConfig().getTimestamp();
        try (JdbcConnection jdbcConnection =
                getDataSourceDialect().openJdbcConnection(getSourceConfig())) {
            return findBinlogOffsetBytimestamp(jdbcConnection, binaryLogClient, timestamp);
        } catch (Exception e) {
            throw new SeaTunnelException(e);
        }
    } else {
        return mySqlSplit.asIncrementalSplit().getStartupOffset();
    }
}
Enter fullscreen mode Exit fullscreen mode

Startup Mode Comparison and Use Cases

To better understand the role of timestamp-based startup within the overall CDC startup system, the following table compares all currently supported MySQL CDC startup modes:

Startup Mode Startup Basis Advantages Typical Use Cases
INITIAL Full + current binlog One-time full + incremental sync First-time data ingestion
EARLIEST Earliest available binlog No specific offset required Long binlog retention
LATEST Latest binlog Fast startup Only future changes
SPECIFIC Specific binlog file + position Precise and controllable Known binlog offsets
TIMESTAMP Specified timestamp (ms) Intuitive, business-friendly Recovery, backfill, time-window sync

It is clear that TIMESTAMP mode is not a lower-level replacement for SPECIFIC or GTID, but a usability- and operations-focused complementary capability designed for scenarios where users know the time but not the binlog details.

Testing and Validation

This feature has been thoroughly validated through integration tests.
The test case MysqlCDCSpecificStartingOffsetIT verifies the correctness of timestamp-based startup.

Usage Notes

  1. Version requirement: SeaTunnel 2.3.12 or later
  2. Timestamp format: Unix timestamp in milliseconds
  3. Binlog availability: Ensure the binlog file for the specified time still exists
  4. Timezone considerations: Timestamps are based on UTC; be mindful of timezone conversion

Summary

Timestamp-based startup in SeaTunnel MySQL CDC provides more precise control over data synchronization, especially for scenarios requiring recovery from a specific point in time.
By converting timestamps into binlog offsets, this feature enables efficient time-based positioning and event filtering.

Notes

  • Parameter validation is implemented in the factory class MySqlIncrementalSourceFactory via conditional rules
  • In addition to MySQL CDC, other CDC connectors, such as SQL Server CDC, also support similar timestamp-based startup mechanisms.

Top comments (0)