
Apache SeaTunnel community officially released version 2.3.13! This release is a milestone for Apache SeaTunnel, bringing important features such as Checkpoint API, Flink engine upgrade, large file parallel processing, multi-table sync, AI Embedding Transform, and richer connector extensions. Whether for batch processing or real-time CDC syncing to Lakehouse, SeaTunnel can now support your data integration tasks more efficiently, stably, and intelligently.
Thanks to 50+ community contributors, this release includes 100+ PRs of new features, optimizations, and bug fixes. If you are building data warehouses, real-time sync platforms, or AI data pipelines, this release is worth your attention.
No time to read the full Release Notes? No worries, here are the Top 10 features of this release with PR references for your reference.
- Full Release Note: https://github.com/apache/seatunnel/releases/tag/2.3.13
01 New Checkpoint API Enhances Task Fault Tolerance
In data sync tasks, checkpoints are one of the core mechanisms to ensure task reliability. SeaTunnel 2.3.13 introduces Checkpoint API (#10065), making task state management more flexible and providing a solid foundation for future scheduling and operation capabilities. The Zeta engine supports min-pause configuration (#9804) to avoid system pressure caused by frequent checkpoints.
Monitoring has also been enhanced, such as adding Sink commit metrics and calculating commit rate (#10233), returning PendingJobs information in the task overview interface (#9902), and providing REST API to view the Pending queue (#10078).
These capabilities help users better understand task execution status and optimize checkpoint strategies.
02 Flink 1.20.1 Support and Enhanced CDC
On the engine side, this version improves Apache Flink support. SeaTunnel now supports Flink 1.20.1 (#9576), and CDC sync capabilities have been enhanced. CDC Source now supports Schema Evolution (#9867), automatically adapting sync tasks to source table structure changes.
Additionally, NO_CDC Source also supports checkpoints (#10094), improving task recovery. These changes make SeaTunnel more stable in scenarios with frequent database schema changes.
03 Large File Parallel Reading Significantly Improved
In real data platforms, large amounts of data often exist as files, such as HDFS, object storage, or local file systems.
This release significantly optimizes file processing performance. HDFS File Connector supports true large file parallel splitting (#10332), LocalFile Connector supports CSV, Text, JSON large file parallel reading (#10142), and Parquet files now support Logical Split (#10239).
HDFS File also supports multi-table reading (#9816). These improvements significantly increase throughput for TB-scale file processing.
04 File Connector Adds Update Sync Mode
Previously, file sync tasks only supported append or overwrite. In this version, multiple file connectors add sync_mode=update, including FTP, SFTP, and LocalFile Source (#10437), and HdfsFile Source (#10268). This allows file sync tasks to support update semantics, better fitting incremental data processing scenarios.
05 Connector Ecosystem Expansion
SeaTunnel 2.3.13 continues to expand and enhance the connector ecosystem. For analytical databases, it adds DuckDB Source and Sink support (#10285), suitable for local analysis and data exploration.
New or enhanced connectors include Apache HugeGraph Sink (#10002), AWS DSQL Sink (#9739), Lance Dataset Sink (#9894), IoTDB 2.x Source and Sink (#9872).
Existing connectors have also been improved: PostgreSQL supports TIMESTAMP_TZ (#10048), Hive Sink supports SchemaSaveMode and DataSaveMode (#9743), MongoDB Sink supports multi-table writing and adds SaveMode (#9958 / #9883).
These updates significantly improve SeaTunnel’s adaptability in database and Lakehouse scenarios and the efficiency of building data pipelines.
| Category | Connector | Type | Feature Highlights | PR |
|---|---|---|---|---|
| Analytical DB | DuckDB | Source/Sink | Read and write data from DuckDB, suitable for local analysis and exploration | #10285 |
| Graph DB | Apache HugeGraph | Sink | Write data into HugeGraph | #10002 |
| SQL Lakehouse | AWS DSQL | Sink | Write data into AWS DSQL | #9739 |
| File/Dataset | Lance Dataset | Sink | Write data into Lance Dataset | #9894 |
| Time Series DB | IoTDB 2.x | Source/Sink | Add IoTDB 2.x source and sink support | #9872 |
| Relational DB | PostgreSQL | Source | Support TIMESTAMP_TZ type | #10048 |
| Data Warehouse | Hive | Sink | Support SchemaSaveMode and DataSaveMode | #9743 |
| Document DB | MongoDB | Sink | Support multi-table write and new SaveMode | #9958 / #9883 |
06 Kafka Supports Protobuf Schema Registry
In real-time scenarios, Kafka often uses Schema Registry. This release adds Protobuf Schema Registry Wire Format support (#10183) to Kafka Connector, allowing SeaTunnel to directly parse Protobuf data managed via Schema Registry, making real-time pipeline construction easier.
07 New AI Embedding Transform
With AI and data engineering integration, more companies need vector data pipelines.
SeaTunnel adds Multimodal Embedding Transform (#9673) in the Transform component, generating vector data directly in pipelines for vector databases, RAG systems, and AI retrieval applications. RegexExtract Transform (#9829) further enhances data cleaning.
08 Markdown Parser Supports RAG Scenarios
Markdown documents are common in AI data preparation. This release adds Markdown Parser (#9760) and related documentation (#9834) for parsing and structuring Markdown, facilitating RAG pipeline construction.
09 Stability and Performance Improvements
This release includes numerous stability and performance optimizations, such as ClickHouse Connector parallel read strategy (#9801), MySQL Connector shard calculation (#9975), JSON parsing for nested structures (#10000), Zeta engine task metrics (#9833), and more.
It also fixes production issues like Zeta engine memory leak on task cancellation (#10315), ClickHouse ThreadLocal memory leak (#10264), MongoDB multi-task submit (#10116), HBase Source scan exception (#10287), Hive Sink init failure (#10331), etc.
10 Bug Fixes and Documentation Updates
Fixes include CDC Snapshot Split null pointer (#10404), ClickHouse memory leak (#10264), MongoDB multi-task submit (#10064, #10116), HBase scan exceptions (#10336, #10287), JDBC schema merge overflow (#10387, #9942, #10093), Hive Sink overwrite semantics (#10279, #9823, #9743), Elasticsearch Sink task exit issue (#10038), and other Connector, Transform, Engine, UI, CI fixes (#10422, #10013, etc.).
Documentation improvements include SeaTunnel MCP & x2SeaTunnel docs (#10108), connector config examples (#10283, #10250, #10241, #10202), multi-table sync examples (#10241), upgrade incompatibility notes (#10068), and doc structure optimizations (#10262, #10395, #10351, #10420, #10438, #10424, #10109, #10382, #10385), helping new users get started and developers better understand architecture and features.
Thanks to Contributors ❤️
Special thanks to release manager @xiaochen-zhou for strong support in planning and execution. Thanks to all volunteers; your efforts keep the SeaTunnel community growing!
Adam Wang, AzkabanWarden.Gf, Bo Schuster, cloud456, CloverDew, corgy-w, CosmosNi, Cyanty, David Zollo, dotfive-star, dy102, dyp12, Frui Guo, Jarvis, Jast, Jeremy, JeremyXin, Jia Fan, Joonseo Lee, krutoileshii, 老王, Leon Yoah, Li Dongxu, LiJie20190102, limin, LimJiaWenBrenda, liucongjy, loupipalien, mengxpgogogo-eng, misi, 巧克力黑, shfshihuafeng, silenceland, Sim Chou, Steven Zhao, wanmingshi, wtybxqm, yzeng1618, zhan7236, zhangdonghao, zhuxt2015, zy
Download & Try
- Download: https://seatunnel.apache.org/download
- Upgrade Guide: https://seatunnel.apache.org/docs/upgrade-guide
Upgrade Note: If you are on SeaTunnel 2.3.x, upgrading to 2.3.13 is generally safe as it focuses on feature enhancement and stability. Back up config files and test in staging. For tasks using checkpoints, stop tasks and confirm state consistency to avoid checkpoint conflicts. Check connector config changes (Hive, MongoDB, Kafka). If using Flink engine, consider upgrading to Flink 1.20.x for better compatibility and CDC support.
Top comments (0)