DEV-AI

Posted on May 18

Real-Time Synchronization of Oracle Tables via Kafka: A Comprehensive Guide

In today's data-driven landscape, ensuring real-time data synchronization between databases is crucial for maintaining data consistency and enabling immediate data access across systems. This guide presents a detailed approach for synchronizing multiple master-detail tables from one Oracle database to another in real time using Apache Kafka and Kafka Connect's built-in JDBC connectors. This solution offers a lightweight alternative to using Debezium or other third-party tools, streamlining the process while minimizing operational overhead.

Introduction
Architecture Overview
Understanding the Capabilities and Limitations
Source Oracle Database Configuration
Kafka Connect JDBC Source Connector Configuration
Kafka Connect JDBC Sink Connector Configuration
Handling Master-Detail Table Relationships
Performance Optimization and Monitoring
Deployment Considerations
Conclusion

Introduction

Synchronizing data between Oracle databases in real time can be challenging, especially when dealing with high volumes of transactions and complex master-detail relationships. While tools like Debezium provide robust change data capture (CDC) capabilities, they can add complexity and overhead to your architecture.

This guide explores how to leverage Kafka Connect's built-in JDBC connectors to achieve near real-time synchronization of Oracle tables. We will cover the necessary configurations, best practices, and considerations to implement this solution effectively.

Architecture Overview

The proposed architecture consists of three main components:

Source Layer: Oracle database from which data changes are captured using Kafka Connect's JDBC source connector.
Messaging Layer: Apache Kafka cluster that transports and stores the change events.
Target Layer: Oracle database where the changes are applied using Kafka Connect's JDBC sink connector.

Architecture Diagram

This architecture leverages Kafka Connect’s managed connectors to simplify implementation, reduce the impact on the source database, and ensure scalability.

Understanding the Capabilities and Limitations

Before proceeding, it's crucial to understand the capabilities and limitations of using Kafka Connect's built-in JDBC connectors for real-time synchronization:

Data Capture: The JDBC source connector polls the database at regular intervals to capture new data based on specified criteria.
Change Types: The connector can capture inserts and, with certain configurations, updates. However, capturing deletes and full change data capture (CDC) is limited.
Latency: The polling mechanism introduces latency based on the poll.interval.ms setting.
Consistency: Without CDC, there is a risk of missing changes that occur between polls, especially in high-throughput environments.

For scenarios requiring full CDC with support for inserts, updates, and deletes, consider using tools like Debezium or Oracle GoldenGate.

Source Oracle Database Configuration

Enabling Necessary Database Features

While the JDBC source connector does not require supplemental logging or CDC features on the Oracle database, certain configurations can improve performance and reliability.

-- (Optional) Increase the size of the redo logs to handle load
ALTER DATABASE ADD LOGFILE GROUP 4 (
  '/path/to/logfile4a.log',
  '/path/to/logfile4b.log'
) SIZE 100M;

-- (Optional) Adjust undo retention to ensure consistent reads
ALTER SYSTEM SET UNDO_RETENTION = 900 SCOPE=BOTH;

Creating a Dedicated User for Kafka Connect JDBC Source Connector

Create a user with the minimal privileges required for the connector to operate:

CREATE USER kafka_user IDENTIFIED BY password;
GRANT CONNECT TO kafka_user;
GRANT SELECT ANY TABLE TO kafka_user;

These permissions allow the Kafka Connect JDBC source connector to read data from the Oracle database.

Kafka Connect JDBC Source Connector Configuration

The JDBC source connector imports data from the source Oracle database into Kafka topics. Below is an example configuration:

{
  "name": "oracle-source-connector",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "tasks.max": "1",
    "connection.url": "jdbc:oracle:thin:@source-oracle-host:1521/ORCLCDB",
    "connection.user": "kafka_user",
    "connection.password": "password",
    "table.whitelist": "INVENTORY.CUSTOMERS,INVENTORY.ORDERS,SALES.PRODUCTS",
    "mode": "timestamp+incrementing",
    "timestamp.column.name": "last_update_time",
    "incrementing.column.name": "id",
    "topic.prefix": "oracle-source-",
    "poll.interval.ms": "5000",
    "validate.non.null": "true",
    "numeric.mapping": "best_fit",
    "schema.pattern": "INVENTORY,SALES"
  }
}

Key Configuration Parameters:

table.whitelist: Specifies the tables to capture data from.
mode: Set to "timestamp+incrementing" to capture inserts and updates based on timestamp and incrementing columns.
timestamp.column.name: The column used to detect new or updated rows (e.g., last_update_time).
incrementing.column.name: The unique incrementing column (usually the primary key).
topic.prefix: Prefix for Kafka topics corresponding to source tables.
poll.interval.ms: Interval in milliseconds for polling new data.

Note: Ensure that the timestamp.column.name and incrementing.column.name columns are properly indexed for optimal performance.

Kafka Connect JDBC Sink Connector Configuration

The JDBC sink connector writes Kafka topic data to the target Oracle database. Below is an example configuration:

{
  "name": "oracle-sink-connector",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
    "tasks.max": "1",
    "topics": "oracle-source-INVENTORY.CUSTOMERS,oracle-source-INVENTORY.ORDERS,oracle-source-SALES.PRODUCTS",
    "connection.url": "jdbc:oracle:thin:@target-oracle-host:1521/ORCLCDB",
    "connection.user": "kafka_user",
    "connection.password": "password",
    "auto.create": "true",
    "auto.evolve": "true",
    "insert.mode": "upsert",
    "delete.enabled": "false",
    "pk.mode": "record_value",
    "pk.fields": "ID",
    "table.name.format": "${topic}",
    "batch.size": "3000"
  }
}

Key Configuration Parameters:

insert.mode: Set to "upsert" to handle inserts and updates.
pk.mode: Specifies how to define the primary key (record_value uses fields from the record value).
pk.fields: Fields to be used as primary keys in the target database.
auto.create and auto.evolve: Automatically create tables and adapt schemas in the target database.
delete.enabled: Set to false as deletes are not captured by the JDBC source connector in this setup.

Note: The table.name.format parameter controls how the sink connector maps topics to tables.

Handling Master-Detail Table Relationships

Correctly handling master-detail relationships requires consideration to ensure referential integrity.

Ensuring Data Consistency

Ordering: Ensure that the polling interval and query configurations account for dependencies. For example, load master records before detail records.
Transactions: While Kafka Connect does not support multi-statement transactions, the batch.size parameter can be tuned to control the number of records processed in each batch.

Custom Query Modes

If necessary, use custom SQL queries to control the data retrieved:

"query": "SELECT * FROM INVENTORY.CUSTOMERS WHERE LAST_UPDATE_TIME > ? ORDER BY LAST_UPDATE_TIME ASC"

This allows you to fine-tune the data capture process to suit complex relationships.

Performance Optimization and Monitoring

Optimizing Kafka Connect Source Connector

Increase tasks.max: Parallelize data ingestion across multiple tasks if processing multiple tables.
Proper Indexing: Ensure that timestamp.column.name and incrementing.column.name are indexed.
Adjust poll.interval.ms: Balance latency and processing load by configuring an appropriate polling interval.

Optimizing Kafka Connect Sink Connector

Batch Processing: Adjust the batch.size parameter to optimize writes to the target database.
Connection Pooling: Configure the JDBC driver to use connection pooling for better performance.

Monitoring

Kafka Connect REST API: Monitor connector status, task configurations, and runtime errors.
JMX Metrics: Enable JMX to collect detailed metrics for connectors and tasks.
Logging: Configure logging levels appropriately to capture necessary information without overwhelming the logging system.

Deployment Considerations

Infrastructure Requirements

Kafka Connect Cluster: Deploy in distributed mode for scalability and fault tolerance.
Kafka Brokers: Use a multi-broker Kafka cluster to handle throughput and provide resilience.
Database Resources: Ensure that both source and target Oracle databases have sufficient resources to handle additional load.

Best Practices

Staging Environment: Test configurations in a non-production environment to validate behavior.
Security: Use secure connections (e.g., SSL/TLS) for database and Kafka connections.
High Availability: Implement appropriate backup and failover mechanisms for the databases and Kafka cluster.

Conclusion

Leveraging Kafka Connect's built-in JDBC connectors provides a lightweight solution for synchronizing Oracle tables in near real time. While this approach has limitations—particularly in capturing deletes and providing full CDC—it can be effective for scenarios that primarily involve inserts and updates.

By carefully configuring the connectors and understanding the operational considerations, you can achieve reliable data replication between Oracle databases, maintaining data consistency and supporting your organization's real-time data needs.

Additional Resources

Disclaimer: The configurations and examples provided in this guide are for illustrative purposes. Please refer to the official Apache Kafka and Oracle documentation for the most up-to-date and detailed information.

DEV Community

Real-Time Synchronization of Oracle Tables via Kafka: A Comprehensive Guide

Table of Contents

Introduction

Architecture Overview

Understanding the Capabilities and Limitations

Source Oracle Database Configuration

Enabling Necessary Database Features

Creating a Dedicated User for Kafka Connect JDBC Source Connector

Kafka Connect JDBC Source Connector Configuration

Key Configuration Parameters:

Kafka Connect JDBC Sink Connector Configuration

Key Configuration Parameters:

Handling Master-Detail Table Relationships

Ensuring Data Consistency

Custom Query Modes

Performance Optimization and Monitoring

Optimizing Kafka Connect Source Connector

Optimizing Kafka Connect Sink Connector

Monitoring

Deployment Considerations

Infrastructure Requirements

Best Practices

Conclusion

Additional Resources

Top comments (0)