DEV Community

Apache SeaTunnel
Apache SeaTunnel

Posted on

Exciting Updates Coming in Apache SeaTunnel 2.3.8

Apache SeaTunnel 2.3.8 is set to be released soon, and recently, Apache SeaTunnel PMC Member Fan Jia shared insights on the new features and updates at a community meeting. Here’s a detailed overview of what to expect:

Introduction to SeaTunnel

SeaTunnel is a high-performance open-source distributed data integration system that supports real-time streaming and offline batch processing of various data sources, making it suitable for massive data integration. Key features include:

  • Extensive Connectors: Supports over 100 data sources and storage systems.
  • Multi-Engine Support: Compatible with various data processing engines, including SeaTunnel Zeta Engine, Spark, and Flink.
  • HTTP Support: Enables data integration via HTTP interfaces.
  • Stream and Batch Integration: Supports both stream processing and batch processing.
  • Stream Rate Control: Capable of controlling the rate of data flow.
  • Automatic Table Creation: Automatically creates tables based on data structure.

New Features and Updates in Version 2.3.8

In the upcoming 2.3.8 release, the community will introduce several new features and updates:

Docker Images

The new version will provide official Docker images that include nearly all connectors. Users can run SeaTunnel more quickly and simplify deployment without downloading installation packages.

Image description

  • Build Images via Command: Users with custom needs can build images locally using command-line instructions.

Image description

  • Start Services via Command: Supports starting services for distributed deployment, submitting tasks, and querying task statuses via the command line. Users can also submit tasks through REST APIs.

Image description

  • Submit tasks via the command:

Image description

Spark Multi-Table Support

Currently, SeaTunnel only supports multi-table tasks with the Zeta Engine. The new version will introduce Spark engine support for multi-table tasks, allowing for automatic recognition and execution of multi-table jobs. Additionally, Flink’s multi-table support is in progress, and interested contributors are welcome to join on GitHub.

Image description

Config Parameter Default Values

The current version allows variable configuration in the config parameters, but each variable needs to be set manually. The new version will permit the use of default values for configuration parameters, enhancing flexibility.

Image description

Image description

Prometheus Integration for Cluster Monitoring

Previously, SeaTunnel provided interfaces for retrieving task run metrics. The new version will support integration with Prometheus for cluster monitoring. Prometheus will regularly pull the status of SeaTunnel cluster tasks and present this in a visual interface, making it easier to monitor cluster status and quickly identify issues.

Image description

Image description

Embedding Transform

The addition of the Embedding transform will enable the integration of machine learning models into the data transformation process, converting raw fields into vector values for storage in appropriate machine learning databases. Current machine-learning model providers supported by SeaTunnel include Doubao, Qianfan, and OpenAI.

Image description

Image description

Job-Level Log Filtering

The new version will enhance log filtering and viewing capabilities at the job level, enabling users to filter logs through three methods:

  1. Job ID in Logs: Users can search for logs associated with a specific Job ID, making it easier to troubleshoot when multiple tasks are running concurrently.

Image description

  1. Splitting Logs by Job ID: By modifying the log configuration file, users can ensure that logs for the same Job ID are categorized into the same file, simplifying log management.

Image description

Example modification for the log4j2.properties configuration file:

...
rootLogger.appenderRef.file.ref = routingAppender
...

appender.file.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%-30.30c{1.}] [%t] - %m%n
...
Enter fullscreen mode Exit fullscreen mode

Kafka Support for Protobuf Data

The Kafka connector has been enhanced to support the Protobuf data format, allowing for the definition of Protobuf data types for reading and writing.

Image description

File Support for Reading Compressed Files

The new version will introduce support for reading compressed file formats, eliminating the need for decompression steps.

Image description

Other Features

Additionally, the new version will remove filters on system tables, allowing users to read system tables, and enhance support for Paimon’s stream reading and dynamic bucket writing.

How to Get the Latest Version and Contribute

Download

The SeaTunnel 2.3.8 version is expected to be released in early October. Stay tuned to the SeaTunnel official download page for the latest version.

Contributing

Conclusion

The release of SeaTunnel 2.3.8 will introduce a series of new features and improvements, making data integration more efficient and flexible. Thanks to all contributors for their efforts in making SeaTunnel a more powerful data integration tool.

For more information, please visit the SeaTunnel official website.

Top comments (0)