Apache Kafka is a stream-processing platform most known for its great performance, high throughput and low latency. Its persistence layer is essentially a "massive publish/subscribe message queue following a distributed transaction logging architecture," making it valuable as an enterprise-class infrastructure for processing streaming data. Therefore, the data transmission from Kafka to Kafka is of great importance for many enterprises.
This tutorial introduces how to use BladePipe to create a Kafka-Kafka real-time data pipeline.
About BladePipe
BladePipe is a real-time end-to-end data replication tool, simplifying your data movement between diverse data sources, including databases, message queues, real-time data warehouses, etc.
By using the technique of Change Data Capture (CDC), BladePipe can track, capture and deliver data changes automatically and accurately with ultra-low latency, greatly improving the efficiency of data integration. It provides sound solutions for use cases requiring real-time data replication, fueling data-driven decision-making and business agility.
Highlights
Pushing Messages
After a DataJob is created, BladePipe automatically creates a consumer group and subscribes to the topics to be synchronized. Then it pulls the messages from the source Kafka and pushes them to the target Kafka.
Kafka Heartbeat Mechanism
When no messages were produced at the Source Kafka, BladePipe was unable to accurately calculate the message latency.
To address the problem, BladePipe monitors the Kafka heartbeat. After Kafka heartbeat is enabled, BladePipe will monitor the consumer offsets of all partitions. If the differences between the latest offset and the current offset of all partitions are all smaller than the tolerant offset interval (configured by parameter dbHeartbeatToleranceStep), a heartbeat record containing the current system time will be generated. Upon consuming this record, BladePipe will calculate the latency based on the time included in it.
Procedure
Step 1: Grant Permissions
Please refer to Permissions Required for Kafka to grant the required permissions to a user for data movement using BladePipe.
Step 2: Install BladePipe
Follow the instructions in Install Worker (Docker) or Install Worker (Binary) to download and install a BladePipe Worker.
Step 3: Add DataSources
- Log in to the BladePipe Cloud.
- Click DataSource > Add DataSource, and add 2 DataSources.
Step 4: Create a DataJob
Click DataJob > Create DataJob.
Select the source and target DataSources and click Test Connection to ensure the connection to the source and target DataSources are both successful.
Select the message format. If there is no specific message format, please select Raw Message Format.
Confirm the DataJob creation.
Now the DataJob is created and started. BladePipe will automatically run the following DataTasks:
- Schema Migration: The topics will be created automatically in the target instance if they don't exist already.
- Incremental Data Synchronization: Ongoing data changes will be continuously synchronized to the target instance.
Conclusion
In this tutorial, a data pipeline from Kafka to Kafka is created in minutes with 4 steps using BladePipe. It is just a glimpse of BladePipe’s powerful capabilities of real-time end-to-end data replication. To discover more, welcome to visit https://www.bladepipe.com/
Top comments (0)