MongoDB is a widely used document-oriented database known for its schema flexibility and strong scalability, making it suitable for a variety of use cases.
This tutorial delves into how to quickly create a stable and efficient data pipeline from MongoDB to MongoDB using BladePipe. In this tutorial, MongoDB instances are configured as replica sets.
About BladePipe
BladePipe is a real-time end-to-end data replication tool, simplifying your data movement between diverse data sources, including databases, message queues, real-time data warehouses, etc.
By using the technique of Change Data Capture (CDC), BladePipe can track, capture and deliver data changes automatically and accurately with ultra-low latency, greatly improving the efficiency of data integration. It provides sound solutions for use cases requiring real-time data replication, fueling data-driven decision-making and business agility.
Highlights
Sync Data from MongoDB
Incremental data in the source MongoDB can be obtained from the oplog.rs
collection in the local
database (replica sets are required).
An event includes the following subdocuments (there are slight differences in different MongoDB versions). BladePipe delivers the data changes by parsing event records:
Subdocument Name | Description |
---|---|
op | Operation type. BladePipe supports operations including c (control operation), i (INSERT), u (UPDATE), d (DELETE). |
ns | Namespace in the format of dbName.collectionName . If collectionName is $cmd , it indicates an operation on the corresponding database. |
ts | Timestamp of the operation, in seconds. |
o | Changed data. It shows the mirroring of data after INSERT/UPDATE operations, and the mirroing of data before DELETE operations. Note that this subdocument in MongoDB 4.x is different from that in other versions. |
o2 | Present only in UPDATE events. It can be regarded as the primary key or identifier for locating data. |
Supported Data Types in MongoDB
In a full data migration from MongoDB or a data synchronization by consuming oplog, data type conversion is crucial for data processing with custom code and data write to target data sources. For this reason, BladePipe is iteratively expanding its support for MongoDB data types.
The supported data types in full data reading from MongoDB include: null, ObjectId, Date, Number, String.
The supported data types in incremental data synchronization from MongoDB oplog include: ObjectId, Date, Number, String, Integer, Long, BigInteger, Double, BigDecimal.
The supported data types are expanding along with the requests from the increasing users.
Procedure
Step 1: Install BladePipe
Follow the instructions in Install Worker (Docker) or Install Worker (Binary) to download and install a BladePipe Worker.
Step 2: Add DataSources
- Log in to the BladePipe Cloud.
- Click DataSource > Add DataSource, and add 2 DataSources.
Step 3: Create a DataJob
Click DataJob > Create DataJob.
Select the source and target DataSources, and click Test Connection to ensure the connection to the source and target DataSources are both successful.
Select Incremental for DataJob Type, together with the Full Data option.
Confirm the DataJob creation.
Now the DataJob is created and started, BladePipe will automatically run the following DataTasks:
- Schema Migration: The schemas of the source collections will be migrated to the target instance.
- Full Data Migration: All existing data from the selected source collections will be fully migrated to the target instance.
- Incremental Synchronization: Ongoing data changes will be continuously synchronized to the target instance.
FAQ
Does BladePipe support for MongoDB shards?
Overall, our understanding of MongoDB sharding is less comprehensive than that of replica sets. If a MongoDB shard is used as a source, please configure the replica set of the shard. If it is used as a target, the shard instance is regarded as a single instance.
What should I do if a single collection contains several data types of primary key?
Currently, the full data migration is done by pagination based on "_id" and scanning. If multiple types exist in "_id" field, it can lead to problems in offset tracking or resuming data transmission.
BladePipe will consider handling this situation in the way to handle a database without a primary key in future updates.
Top comments (0)