Dwelling on Apache SeaTunnel’s new API Connector development

#bigdata #workfl

Writer/ Fan Jia, SeaTunnel contributor

Translator/Critina

Introduction

Days after community development, the new Connector API preliminary development of SeaTunnel has been finished, and we will continue with the adaptation work of it in the next few days. To allow the developers to quickly start the new API development, I write this piece of article to introduce the development process of the new version of the API.

01 Preparation

Environment configuration: JDK8 and Scala2.11 are recommended.
As before, we need to download the latest code locally with git, and import the code into the IDE. The address of the project is https://github.com/apache/incubator-seatunnel. Switch the branch to the api-draft, and currently use the branch to develop a new version of the API and the corresponding Connector. The structure of the project is as follows:

02 Things you need to know

Now to distinguish different Connectors, we will support the Flink/Spark Connector which is in the module of seatunnel- connectors/seatunnel — connectors — Flink (Spark). What’s more,our new Connector is developed on seatunnel-connectors/seatunnel- connectors-seatunnel module. As we can see from the above, we have now supported the Fake, the Console, and the Kafka, and the Clickhouse Connector is being developed.
Currently, the supported data type is the SeaTunnelRow, so the type of the data the Source produces and the Sink consumes should be SeaTunnelRow.

03 Start

Taking the Fake Connector, for example, we introduce how to implement a new Connector.

First of all, create the corresponding module under the path of the seatunnel-connectors-seatunnel, which is on the same level as the other new version of the connector.
Then modify the seatunnel-connectors-seatunnel/pom.xml file. We need to add the module in the modules, modify the seatunnel-connectors-seatunnel/seatunnel connector-seatunnel-fake/pom.XML file and add the dependency of seatunnel-API and the right reference to the parent. The example is as follows:

Then create the corresponding package and the related classes. Create the FakeSource which inherits the SeaTunnel Source. The SeaTunnel Source uses the design of integrated stream computing and batch computing, by which a stream source or a batch source is determined by the getBoundedness. So you can use the way of dynamic configuration (see default method) to specify a Source to stream or batch, and to obtain a customized configuration in the user configeration files by the prepare method. Next, create the FakeSourceReader, the FakeSource SplitEnumerator, and the FakeSourceSplit which inherit the corresponding abstract classes found on the corresponding classes. Once we realized the corresponding methods of these classes, the SeaTunnel Source Connector is completed.
Writing the corresponding code following the existing examples. Among them, the FakeSource Reader matters the most, which defines how we retrieve data from the outside and is also the key to Source Connector. We need to put every single data produced in the collector, as shown below:

We need to configure the configuration file of plugin-mapping. properties under the module of the seatunnel-connectors after code writing. Add ‘seatunnel.Source.FakeSource = seatunnel-connector-fake’ which represents the seatunnel can find the corresponding jar package for the project by searching the source named FakeSource. And thus we can use the Connector in the configuration file.
As for the detailed description of Source, Sink, and the SeaTunnel API code writing, please refer to the file of SeaTunnel -connectors/SeaTunnel-connectors-SeaTunnel/README.zh.md.

04 Testing

After writing the Connector, we need to test it. Find the module of seatunnel-Flink(spark)-new-connector-example in the seatunnel-examples, then test it according to different engines to ensure our Connector in different engines performs consistently as far as possible. If there is a difference, mark it in the document, modify the resource configuration file by adding our Connector configuration, at the same time introducing seatunnel-flink (spark)-new-Connector- example/pom.XML file, and then SeaTunnelApiExample is excuted to perform the test.
Stream processing mode is set by default. You can switch to the batch mode by modifying the job.mode = BATCH in the env profile.

05 Submitting the PR

When the Connector is ready, submit PR to GitHub. Once your code is reviewed and received by the community, it means you have successfully made your contribution!

About SeaTunnel

SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need SeaTunnel?

SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

Data loss and duplication
Task buildup and latency
Low throughput
Long application-to-production cycle time
Lack of application status monitoring

SeaTunnel Usage Scenarios

Massive data synchronization
Massive data integration
ETL of large volumes of data
Massive data aggregation
Multi-source data processing

Features of SeaTunnel

Rich components
High scalability
Easy to use
Mature and stable

How to get started with SeaTunnel quickly?

Want to experience SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/incubator-seatunnel/issues

Contribute code to:

https://github.com/apache/incubator-seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-10u1eujlc-g4E~ppbinD0oKpGeoo_dAw

Follow Twitter:

https://twitter.com/ASFSeaTunnel