Ryan Giggs

Posted on Nov 21

MySQL HeatWave: Replication Channels and Troubleshooting

#mysqlheatwave #oracledatabaseadministration #databasereplication

Replication channels are the cornerstone of MySQL HeatWave inbound replication, establishing the connection between source databases and replica DB systems. Understanding channel states, operations, and troubleshooting techniques is essential for maintaining reliable data replication. This guide covers channel management, lifecycle states, common errors, and proven troubleshooting strategies.

Understanding Replication Channels

A replication channel represents a path through which changes produced by transactions on a source MySQL server or DB system are replicated to a target DB system.

What Channels Do

Each channel identifies:

A replication source (hostname, port, credentials)
Configuration required to connect to that source
How to copy transactions from source to replica
Replication positioning (GTID or binary log)

Important: Channels have their own lifecycle and can be enabled, disabled, reset, and resumed independently.

Channel States

MySQL HeatWave channels exist in either final states (stable) or transient states (temporary during operations).

Final States

1. ACTIVE

Channel is replicating data successfully.

Characteristics:

Both IO and SQL threads running
Transactions flowing from source to replica
May have slight replication lag (normal)
No errors reported

Console Indicator: Green icon with "Active" text

2. INACTIVE

Channel is not replicating; replication has been stopped.

Common Reasons:

Channel manually disabled by administrator
Created with "Enable automatically" unchecked
DB system is stopped

Console Indicator: Gray icon with "Inactive" text

To Resume: Enable the channel through console or CLI

3. NEEDS_ATTENTION

Channel is not replicating due to an error.

Common Causes:

Source server unreachable
Authentication failure
Binary log files purged on source
SSL/TLS configuration mismatch
Incompatible transactions
DB system is in INACTIVE state

Console Indicator: Red/orange icon with "Needs Attention" text

Action Required: Check channel details page for error messages, resolve issue, then resume channel

4. DELETED

Channel has been permanently removed.

Characteristics:

Appears briefly before removal from list
Cannot be recovered
All channel configuration deleted

Transient States

These are temporary states during channel operations:

1. CREATING

Channel is being created and configured.

Duration: Typically 1-3 minutes

Next State: ACTIVE (if auto-enabled) or INACTIVE

2. UPDATING

Channel is performing an update operation.

Occurs During:

Enabling or disabling channel
Modifying channel configuration
Resuming after error resolution
DB system operations affecting channel

Duration: Typically 30 seconds to 2 minutes

3. DELETING

Channel deletion is in progress.

Duration: Typically 1-2 minutes

Next State: DELETED

Channel Operations

Enabling a Channel

Starts replication from the source if no errors exist.

Console:

Navigate to Databases → HeatWave MySQL → Channels
Select channel → Actions → Enable
Channel transitions: INACTIVE → UPDATING → ACTIVE

CLI:

oci mysql channel update \
  --channel-id <CHANNEL_OCID> \
  --is-enabled true

Disabling a Channel

Stops replication and puts channel into INACTIVE state.

When to Disable:

Maintenance on source server
Temporary halt of replication
Troubleshooting issues
Before configuration changes

Console: Channel → Actions → Disable

CLI:

oci mysql channel update \
  --channel-id <CHANNEL_OCID> \
  --is-enabled false

Resuming a Channel

Restarts replication after error is resolved. Used when channel is in NEEDS_ATTENTION state.

Console: Channel → Actions → Resume

CLI:

oci mysql channel resume \
  --channel-id <CHANNEL_OCID>

Behavior: Channel transitions to UPDATING, then ACTIVE if error resolved

Resetting a Channel

Removes all replication position data except channel configuration. Channel must be INACTIVE first.

Warning: Resets replication position - will re-replicate transactions

Console: Channel → Actions → Reset

CLI:

oci mysql channel reset \
  --channel-id <CHANNEL_OCID> \
  --from-json '{"resetType": "POSITION"}'

Use Cases:

Recovering from corrupted replication position
Starting replication from beginning
After major source database changes

Deleting a Channel

Permanently removes channel and all associated configuration.

Console: Channel details → Delete

CLI:

oci mysql channel delete \
  --channel-id <CHANNEL_OCID> \
  --force

Troubleshooting Inbound Replication

Step 1: Identify Source of Failure

When channel enters NEEDS_ATTENTION state, determine the root cause:

Check Channel Details Page

Navigate to channel in console
Click channel name to view details
Review Error section for messages
Check Last Error timestamp
Note error code and description

Check Replication Status on Replica

Connect to replica DB system and query:

SHOW REPLICA STATUS\G

Key Fields:

Replica_IO_Running: Should be "Yes" (checks source connection)
Replica_SQL_Running: Should be "Yes" (checks transaction application)
Last_IO_Errno: IO thread error code
Last_IO_Error: IO thread error message
Last_SQL_Errno: SQL thread error code
Last_SQL_Error: SQL thread error message
Seconds_Behind_Source: Replication lag

Performance Schema Query:

SELECT * FROM performance_schema.replication_connection_status\G

Step 2: Identify Communication Failures

Verify connectivity between replica and source:

Test Network Connectivity

From replica DB system or compute instance in same VCN:

# Test if source is reachable
telnet <source_hostname> 3306

# Or use netcat
nc -zv <source_hostname> 3306

# Check DNS resolution
nslookup <source_hostname>

Test Replication User Authentication

mysql -h <source_hostname> \
      -P 3306 \
      -u <repl_user> \
      -p

If successful, authentication works; issue is elsewhere.

Common Error Codes and Solutions

Error 2003: Can't Connect to MySQL Server

Error Message: "Can't connect to MySQL server on '' (error_code)"

Meaning: Replica cannot establish network connection to source

Common Causes:

Source MySQL server not running
Firewall blocking port 3306
Network path doesn't exist
Wrong hostname or IP address
Source server not listening on expected port

Solutions:

Verify source is running:

-- On source server
SHOW GLOBAL STATUS LIKE 'Uptime';

Check firewall rules:

# On source server (Linux)
sudo iptables -L | grep 3306

# Or for firewalld
sudo firewall-cmd --list-all | grep 3306

Verify MySQL listening on port:

# On source server
sudo netstat -tlnp | grep 3306
# Or
sudo ss -tlnp | grep 3306

Check bind-address:

-- On source server
SHOW VARIABLES LIKE 'bind_address';

Should be 0.0.0.0 or specific IP (not 127.0.0.1 for remote connections)

Test from replica network:

telnet <source_ip> 3306

Error 1045: Access Denied

Error Message: "Access denied for user ''@'' (using password: YES/NO)"

Meaning: Authentication failed - wrong credentials or user doesn't have access from replica IP

Common Causes:

Incorrect username or password
User not granted access from replica IP
User exists but lacks REPLICATION SLAVE privilege
Blank users on source conflicting

Solutions:

Verify replication user exists:

-- On source server
SELECT User, Host FROM mysql.user WHERE User = 'repl_user';

Check user privileges:

SHOW GRANTS FOR 'repl_user'@'%';

Should include: GRANT REPLICATION SLAVE ON *.*

Verify replica IP is allowed:

-- On source
SELECT User, Host FROM mysql.user WHERE User = 'repl_user';

Host should be %, replica subnet (e.g., 10.0.1.%), or specific IP

Remove blank users (common issue):

-- On source
SELECT User, Host FROM mysql.user WHERE User = '';
DROP USER ''@'localhost';
DROP USER ''@'<hostname>';
FLUSH PRIVILEGES;

Test authentication:

mysql -h <source> -u repl_user -p

Error 3159: Insecure Connection

Error Message: "Connections using insecure transport are prohibited while --require_secure_transport=ON"

Meaning: Source requires SSL/TLS but channel not configured for it

Solutions:

Option 1: Configure SSL in channel (recommended)

Edit channel configuration
Set SSL mode to "REQUIRED" or higher
Provide SSL certificates if using VERIFY_CA or VERIFY_IDENTITY

Option 2: Disable require_secure_transport on source (not recommended)

-- On source server
SET GLOBAL require_secure_transport = OFF;

Verify SSL configuration:

-- On source
SHOW VARIABLES LIKE 'require_secure_transport';
SHOW VARIABLES LIKE 'have_ssl';

Error 13114: Binary Log Files Purged

Error Message: "Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'" or "The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires"

Meaning: Binary log files needed by replica have been deleted from source

Common Causes:

GTID not supported or configured
Binary log retention too short
Long replication outage
Binary logs manually purged

Solution 1: Fix GTID-Purged (For GTID-based replication)

Step 1: Retrieve gtid_executed from logical dump

When exporting source data with MySQL Shell:

mysqlsh user@source -- util dump-instance /path/to/dump \
  --ocimds=true

Capture gtid_executed value from dump metadata.

Or query source directly:

-- On source
SHOW GLOBAL VARIABLES LIKE 'gtid_executed';

Step 2: Set gtid_purged on replica

Connect to replica and execute:

-- On replica
CALL sys.SET_GTID_PURGED('<gtid_executed_value>');

Example:

CALL sys.SET_GTID_PURGED('3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5');

Step 3: Resume channel

After setting gtid_purged, resume replication:

oci mysql channel resume --channel-id <CHANNEL_OCID>

Solution 2: Rebuild Replication (If GTID fix fails)

For severe cases:

Step 1: Delete channel and replica DB system

oci mysql channel delete --channel-id <CHANNEL_OCID> --force
oci mysql db-system delete --db-system-id <DB_SYSTEM_OCID> --force

Step 2: Take new snapshot of source

mysqlsh user@source -- util dump-instance /path/to/new-dump \
  --excludeSchemas=mysql,sys,performance_schema,information_schema \
  --ocimds=true

Step 3: Create new DB system

Create DB system and import data during creation or after.

Step 4: Create and enable channel

Create new channel with proper GTID positioning from new dump.

Best Practices for Channel Management

1. Monitor Channel Health

Set up alerts:

Channel state changes to NEEDS_ATTENTION
Replication lag exceeds threshold (e.g., 60 seconds)
IO or SQL thread stops

Regular checks:

SHOW REPLICA STATUS\G

2. Maintain Adequate Binary Log Retention

On source server:

-- Set binary log expiration (seconds)
SET GLOBAL binlog_expire_logs_seconds = 172800; -- 48 hours

For AWS RDS:

CALL mysql.rds_set_configuration('binlog retention hours', 48);

3. Document Channel Configuration

Maintain records of:

Source connection details
Replication user credentials
SSL configuration
GTID settings
Network paths used

4. Test Before Production

Create test channel in non-production
Verify connectivity
Test failover scenarios
Document lessons learned

5. Plan for Maintenance Windows

Before source maintenance:

Verify replication lag is minimal
Optionally disable channel
Perform maintenance
Re-enable channel
Monitor for catch-up

6. Use GTID When Possible

Benefits:

Auto-positioning simplifies management
Easier recovery from failures
Better failover capabilities

Advanced Troubleshooting

Replication Lag Increasing

Symptoms: Seconds_Behind_Source continuously growing

Diagnose:

-- Check what's being executed
SHOW REPLICA STATUS\G

-- Check running queries on replica
SHOW PROCESSLIST;

-- Check for lock waits
SELECT * FROM sys.innodb_lock_waits;

Solutions:

Increase replica resources (upgrade shape)
Optimize slow queries on source
Break large transactions into smaller batches
Check for long-running transactions

Channel Constantly Failing

Symptoms: Channel repeatedly enters NEEDS_ATTENTION

Diagnose:

Check error patterns (same error repeating?)
Review source server logs
Check network stability
Verify source server load

Solutions:

Fix root cause (network, authentication, binary logs)
Consider increasing connection timeouts
Upgrade source server resources if overloaded

MySQL Schema Changes Causing Failures

Remember: Changes to mysql schema are not replicated and cause replication to stop

Solution:

Manage user accounts independently
Skip problematic transaction:

  SET GLOBAL sql_replica_skip_counter = 1;
  START REPLICA;

Better: Avoid mysql schema changes during active replication

Effective channel management and troubleshooting are crucial for reliable MySQL HeatWave inbound replication. By understanding channel states, recognizing common error codes, and following best practices, you can maintain healthy replication and quickly resolve issues.

Key takeaways:

Monitor channel states: ACTIVE, INACTIVE, NEEDS_ATTENTION are key indicators
Check channel details: Error messages provide crucial troubleshooting information
Common errors have common fixes: Network (2003), authentication (1045), SSL (3159), binary logs (13114)
Use GTID: Simplifies management and recovery
Maintain binary log retention: Prevent purged log issues
Test thoroughly: Validate connectivity and authentication before production