DEV Community

Apache SeaTunnel
Apache SeaTunnel

Posted on

Apache SeaTunnel February Update: Community Busy Even During Holidays

The Apache SeaTunnel community has been very active recently. From the latest PRs, developers focused on preparing v2.3.13 release, new connector integration, Zeta engine stability improvements, and deep optimizations for JDBC/CDC connectors.

1. New Connectors & Ecosystem Expansion

The community continues to expand SeaTunnel’s data integration boundaries, connecting not only traditional databases but also SaaS and cloud-native services.

  • Airtable Source & Sink (#10469) – New connector supporting reading and writing data from Airtable, facilitating integration with data warehouses.
  • HubSpot Source (#10358) – Added HubSpot data source for CRM data integration.
  • AWS Glue Catalog (#10401) – Supports Glue Catalog with flexible credential management for S3 environments.
  • Gravitino Integration (#10402) – Introduces Gravitino as a metadata service for non-relational connectors, enhancing metadata management.

2. Existing Connector Enhancements

This is the most active PR area, especially for JDBC and CDC components. Developers refined details and filled gaps.

  • JDBC Connectors:

    • PostgreSQL COPY support (#10406) – Improves bulk data write performance.
    • SapHana CHAR type support (#10472) – Completes CHAR type handling.
    • Oracle unit tests (#10435) – Adds Testcontainers-based unit tests.
  • CDC (Change Data Capture):

    • MySQL & Postgres – Fixes unsigned type conversion and replication slot creation (#10453, #10416).
    • Oracle & SQLServer – Adds support for timestamp types (#10428).
  • Elasticsearch – Adds slicing support for better parallelism in large-scale reads (#10454).

  • S3 File Source – Enables file splitting for improved large file read performance (#10450).

3. Zeta Engine Stability

As SeaTunnel’s self-developed engine, Zeta stability is critical.

  • Checkpoint mechanism (#10448) – Fixes tasks not failing properly when checkpoints fail, ensuring data consistency.
  • Task scheduling (#10430) – Optimized queue rescheduling for WAIT strategy and fixed NPE when querying suspended tasks (#10456).
  • Memory management (#10418) – Fixed a core memory leak issue.

4. Developer Experience & Documentation

  • Architecture docs (#10429) – Improved system documentation to help new contributors understand the architecture.
  • Release management – Preparing v2.3.13 (#10466).

Contributors Shoutout

Thanks to these developers for their outstanding contributions (last 30 PRs, GitHub ID alphabetical order): AshharAhmadKhan, chl-wxp, CNF96, corgy-w, CosmosNi, davidzollo, dik111, dybyte, krutoileshii, kuleat, LeonYoah, LiJie20190102, misi1987107, MukjepScarlet, Ruiii-w, Sephiroth1024, Suresh-Krishna-Kusuma, wgzhao, xiaochen-zhou, yzeng1618, zhangshenghang, zooo-code.

Top comments (0)