In MySQL CDC tasks, many users encounter the same question:
If a task fails, where should it resume from?
What if you only know a point in time, but cannot obtain the corresponding binlog position?
Apache SeaTunnel 2.3.12 provides a more intuitive answer by introducing Timestamp Startup.
This article analyzes the design background, configuration, and implementation of this capability, helping readers understand how to perform CDC task recovery and data backfilling more efficiently based on time semantics.
Feature Overview
Problem: CDC Startup Configuration Is “Technically Correct but Hard to Use”
Before Apache SeaTunnel 2.3.12, the MySQL CDC connector mainly supported starting synchronization from a specific binlog position (file + position) or GTID.
While this approach is precise and reliable at the technical level, it often does not align with real-world production and operational practices.
In actual CDC operations, users are far more familiar with “time” than with low-level binlog details, for example:
- After an abnormal task interruption, wanting to resume synchronization after “2024-04-01 10:00:00”
- Performing backfill or replay for data within a specific time window
- Knowing that “changes after yesterday 08:00 need to be resynchronized,” but being unable to locate the corresponding binlog file and offset
Requiring users to manually convert timestamps into binlog positions not only makes configuration complex, but is also highly error-prone and significantly increases operational costs.
This startup approach—technically friendly but user-unfriendly—has become a common pain point in CDC recovery and backtracking scenarios.
Solution: Introducing Timestamp-Based Startup
To address these issues, Apache SeaTunnel introduced timestamp-based startup for the MySQL CDC connector in version 2.3.12.
This feature allows users to specify a Unix timestamp (in milliseconds) directly as the synchronization starting point.
During startup, the MySQL CDC connector automatically performs the following steps:
- Locates the corresponding binlog file and offset based on the specified timestamp
- Starts reading change events from that binlog position
- Automatically skips all historical events earlier than the given timestamp
By introducing time as a more business-aligned dimension, SeaTunnel elevates CDC startup from being binlog-detail-oriented to business-time-semantic–oriented, significantly lowering the barrier for CDC usage in recovery, backtracking, and operational scenarios.
Configuration Parameters
To enable timestamp-based startup, the following two key parameters must be configured:
| Parameter Name | Type | Required | Description |
|---|---|---|---|
startup.mode |
Enum | No | Set to "timestamp" to enable timestamp mode |
startup.timestamp |
Long | Yes | Unix timestamp (milliseconds) specifying the startup time |
Configuration Example
env {
parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 10000
}
source {
MySQL-CDC {
url = "jdbc:mysql://localhost:3306/testdb"
username = "root"
password = "root@123"
table-names = ["testdb.table1"]
# Enable timestamp-based startup
startup.mode = "timestamp"
startup.timestamp = 1672531200000 # 2023-01-01 00:00:00 UTC
}
}
sink {
Console {
}
}
Technical Implementation
Startup Mode Enumeration
All supported startup modes, including the newly added TIMESTAMP mode, are defined in the MySqlSourceOptions class:
public static final SingleChoiceOption<StartupMode> STARTUP_MODE =
(SingleChoiceOption)
Options.key(SourceOptions.STARTUP_MODE_KEY)
.singleChoice(
StartupMode.class,
Arrays.asList(
StartupMode.INITIAL,
StartupMode.EARLIEST,
StartupMode.LATEST,
StartupMode.SPECIFIC,
StartupMode.TIMESTAMP))
Timestamp Filtering Implementation
The core logic resides in the MySqlBinlogFetchTask class.
When the startup mode is detected as TIMESTAMP, TimestampFilterMySqlStreamingChangeEventSource is used to process binlog events:
StartupMode startupMode = startupConfig.getStartupMode();
if (startupMode.equals(StartupMode.TIMESTAMP)) {
log.info(
"Starting MySQL binlog reader,with timestamp filter {}",
startupConfig.getTimestamp());
mySqlStreamingChangeEventSource =
new TimestampFilterMySqlStreamingChangeEventSource(
sourceFetchContext.getDbzConnectorConfig(),
sourceFetchContext.getConnection(),
sourceFetchContext.getDispatcher(),
sourceFetchContext.getErrorHandler(),
Clock.SYSTEM,
sourceFetchContext.getTaskContext(),
sourceFetchContext.getStreamingChangeEventSourceMetrics(),
startupConfig.getTimestamp());
}
Offset Calculation
The logic for locating the binlog offset based on a timestamp is implemented in MySqlSourceFetchTaskContext:
private Offset getInitOffset(SourceSplitBase mySqlSplit) {
StartupMode startupMode = getSourceConfig().getStartupConfig().getStartupMode();
if (startupMode.equals(StartupMode.TIMESTAMP)) {
long timestamp = getSourceConfig().getStartupConfig().getTimestamp();
try (JdbcConnection jdbcConnection =
getDataSourceDialect().openJdbcConnection(getSourceConfig())) {
return findBinlogOffsetBytimestamp(jdbcConnection, binaryLogClient, timestamp);
} catch (Exception e) {
throw new SeaTunnelException(e);
}
} else {
return mySqlSplit.asIncrementalSplit().getStartupOffset();
}
}
Startup Mode Comparison and Use Cases
To better understand the role of timestamp-based startup within the overall CDC startup system, the following table compares all currently supported MySQL CDC startup modes:
| Startup Mode | Startup Basis | Advantages | Typical Use Cases |
|---|---|---|---|
INITIAL |
Full + current binlog | One-time full + incremental sync | First-time data ingestion |
EARLIEST |
Earliest available binlog | No specific offset required | Long binlog retention |
LATEST |
Latest binlog | Fast startup | Only future changes |
SPECIFIC |
Specific binlog file + position | Precise and controllable | Known binlog offsets |
TIMESTAMP |
Specified timestamp (ms) | Intuitive, business-friendly | Recovery, backfill, time-window sync |
It is clear that TIMESTAMP mode is not a lower-level replacement for SPECIFIC or GTID, but a usability- and operations-focused complementary capability designed for scenarios where users know the time but not the binlog details.
Testing and Validation
This feature has been thoroughly validated through integration tests.
The test case MysqlCDCSpecificStartingOffsetIT verifies the correctness of timestamp-based startup.
Usage Notes
- Version requirement: SeaTunnel 2.3.12 or later
- Timestamp format: Unix timestamp in milliseconds
- Binlog availability: Ensure the binlog file for the specified time still exists
- Timezone considerations: Timestamps are based on UTC; be mindful of timezone conversion
Summary
Timestamp-based startup in SeaTunnel MySQL CDC provides more precise control over data synchronization, especially for scenarios requiring recovery from a specific point in time.
By converting timestamps into binlog offsets, this feature enables efficient time-based positioning and event filtering.
Notes
- Parameter validation is implemented in the factory class
MySqlIncrementalSourceFactoryvia conditional rules - In addition to MySQL CDC, other CDC connectors, such as SQL Server CDC, also support similar timestamp-based startup mechanisms.


Top comments (0)