Apache SeaTunnel

Posted on Mar 20

Apache SeaTunnel 2.3.13 Major Release! Top 10 Features You Should Know

#apacheseatunnel #opensource #datascience #database

Apache SeaTunnel community officially released version 2.3.13! This release is a milestone for Apache SeaTunnel, bringing important features such as Checkpoint API, Flink engine upgrade, large file parallel processing, multi-table sync, AI Embedding Transform, and richer connector extensions. Whether for batch processing or real-time CDC syncing to Lakehouse, SeaTunnel can now support your data integration tasks more efficiently, stably, and intelligently.

Thanks to 50+ community contributors, this release includes 100+ PRs of new features, optimizations, and bug fixes. If you are building data warehouses, real-time sync platforms, or AI data pipelines, this release is worth your attention.

No time to read the full Release Notes? No worries, here are the Top 10 features of this release with PR references for your reference.

Full Release Note: https://github.com/apache/seatunnel/releases/tag/2.3.13

01 New Checkpoint API Enhances Task Fault Tolerance

In data sync tasks, checkpoints are one of the core mechanisms to ensure task reliability. SeaTunnel 2.3.13 introduces Checkpoint API (#10065), making task state management more flexible and providing a solid foundation for future scheduling and operation capabilities. The Zeta engine supports min-pause configuration (#9804) to avoid system pressure caused by frequent checkpoints.

Monitoring has also been enhanced, such as adding Sink commit metrics and calculating commit rate (#10233), returning PendingJobs information in the task overview interface (#9902), and providing REST API to view the Pending queue (#10078).

These capabilities help users better understand task execution status and optimize checkpoint strategies.

02 Flink 1.20.1 Support and Enhanced CDC

On the engine side, this version improves Apache Flink support. SeaTunnel now supports Flink 1.20.1 (#9576), and CDC sync capabilities have been enhanced. CDC Source now supports Schema Evolution (#9867), automatically adapting sync tasks to source table structure changes.

Additionally, NO_CDC Source also supports checkpoints (#10094), improving task recovery. These changes make SeaTunnel more stable in scenarios with frequent database schema changes.

03 Large File Parallel Reading Significantly Improved

In real data platforms, large amounts of data often exist as files, such as HDFS, object storage, or local file systems.

This release significantly optimizes file processing performance. HDFS File Connector supports true large file parallel splitting (#10332), LocalFile Connector supports CSV, Text, JSON large file parallel reading (#10142), and Parquet files now support Logical Split (#10239).

HDFS File also supports multi-table reading (#9816). These improvements significantly increase throughput for TB-scale file processing.

04 File Connector Adds Update Sync Mode

Previously, file sync tasks only supported append or overwrite. In this version, multiple file connectors add sync_mode=update, including FTP, SFTP, and LocalFile Source (#10437), and HdfsFile Source (#10268). This allows file sync tasks to support update semantics, better fitting incremental data processing scenarios.

05 Connector Ecosystem Expansion

SeaTunnel 2.3.13 continues to expand and enhance the connector ecosystem. For analytical databases, it adds DuckDB Source and Sink support (#10285), suitable for local analysis and data exploration.

New or enhanced connectors include Apache HugeGraph Sink (#10002), AWS DSQL Sink (#9739), Lance Dataset Sink (#9894), IoTDB 2.x Source and Sink (#9872).

Existing connectors have also been improved: PostgreSQL supports TIMESTAMP_TZ (#10048), Hive Sink supports SchemaSaveMode and DataSaveMode (#9743), MongoDB Sink supports multi-table writing and adds SaveMode (#9958 / #9883).

These updates significantly improve SeaTunnel’s adaptability in database and Lakehouse scenarios and the efficiency of building data pipelines.

Category	Connector	Type	Feature Highlights	PR
Analytical DB	DuckDB	Source/Sink	Read and write data from DuckDB, suitable for local analysis and exploration	#10285
Graph DB	Apache HugeGraph	Sink	Write data into HugeGraph	#10002
SQL Lakehouse	AWS DSQL	Sink	Write data into AWS DSQL	#9739
File/Dataset	Lance Dataset	Sink	Write data into Lance Dataset	#9894
Time Series DB	IoTDB 2.x	Source/Sink	Add IoTDB 2.x source and sink support	#9872
Relational DB	PostgreSQL	Source	Support TIMESTAMP_TZ type	#10048
Data Warehouse	Hive	Sink	Support SchemaSaveMode and DataSaveMode	#9743
Document DB	MongoDB	Sink	Support multi-table write and new SaveMode	#9958 / #9883

06 Kafka Supports Protobuf Schema Registry

In real-time scenarios, Kafka often uses Schema Registry. This release adds Protobuf Schema Registry Wire Format support (#10183) to Kafka Connector, allowing SeaTunnel to directly parse Protobuf data managed via Schema Registry, making real-time pipeline construction easier.

07 New AI Embedding Transform

With AI and data engineering integration, more companies need vector data pipelines.

SeaTunnel adds Multimodal Embedding Transform (#9673) in the Transform component, generating vector data directly in pipelines for vector databases, RAG systems, and AI retrieval applications. RegexExtract Transform (#9829) further enhances data cleaning.

08 Markdown Parser Supports RAG Scenarios

Markdown documents are common in AI data preparation. This release adds Markdown Parser (#9760) and related documentation (#9834) for parsing and structuring Markdown, facilitating RAG pipeline construction.

09 Stability and Performance Improvements

This release includes numerous stability and performance optimizations, such as ClickHouse Connector parallel read strategy (#9801), MySQL Connector shard calculation (#9975), JSON parsing for nested structures (#10000), Zeta engine task metrics (#9833), and more.

It also fixes production issues like Zeta engine memory leak on task cancellation (#10315), ClickHouse ThreadLocal memory leak (#10264), MongoDB multi-task submit (#10116), HBase Source scan exception (#10287), Hive Sink init failure (#10331), etc.

10 Bug Fixes and Documentation Updates

Fixes include CDC Snapshot Split null pointer (#10404), ClickHouse memory leak (#10264), MongoDB multi-task submit (#10064, #10116), HBase scan exceptions (#10336, #10287), JDBC schema merge overflow (#10387, #9942, #10093), Hive Sink overwrite semantics (#10279, #9823, #9743), Elasticsearch Sink task exit issue (#10038), and other Connector, Transform, Engine, UI, CI fixes (#10422, #10013, etc.).

Documentation improvements include SeaTunnel MCP & x2SeaTunnel docs (#10108), connector config examples (#10283, #10250, #10241, #10202), multi-table sync examples (#10241), upgrade incompatibility notes (#10068), and doc structure optimizations (#10262, #10395, #10351, #10420, #10438, #10424, #10109, #10382, #10385), helping new users get started and developers better understand architecture and features.

Thanks to Contributors ❤️

Special thanks to release manager @xiaochen-zhou for strong support in planning and execution. Thanks to all volunteers; your efforts keep the SeaTunnel community growing!

Adam Wang, AzkabanWarden.Gf, Bo Schuster, cloud456, CloverDew, corgy-w, CosmosNi, Cyanty, David Zollo, dotfive-star, dy102, dyp12, Frui Guo, Jarvis, Jast, Jeremy, JeremyXin, Jia Fan, Joonseo Lee, krutoileshii, 老王, Leon Yoah, Li Dongxu, LiJie20190102, limin, LimJiaWenBrenda, liucongjy, loupipalien, mengxpgogogo-eng, misi, 巧克力黑, shfshihuafeng, silenceland, Sim Chou, Steven Zhao, wanmingshi, wtybxqm, yzeng1618, zhan7236, zhangdonghao, zhuxt2015, zy

Download & Try

Download: https://seatunnel.apache.org/download
Upgrade Guide: https://seatunnel.apache.org/docs/upgrade-guide

Upgrade Note: If you are on SeaTunnel 2.3.x, upgrading to 2.3.13 is generally safe as it focuses on feature enhancement and stability. Back up config files and test in staging. For tasks using checkpoints, stop tasks and confirm state consistency to avoid checkpoint conflicts. Check connector config changes (Hive, MongoDB, Kafka). If using Flink engine, consider upgrading to Flink 1.20.x for better compatibility and CDC support.

DEV Community