DEV Community

Cover image for MySQL HeatWave: Replication Channels and Troubleshooting
Ryan Giggs
Ryan Giggs

Posted on

MySQL HeatWave: Replication Channels and Troubleshooting

Replication channels are the cornerstone of MySQL HeatWave inbound replication, establishing the connection between source databases and replica DB systems. Understanding channel states, operations, and troubleshooting techniques is essential for maintaining reliable data replication. This guide covers channel management, lifecycle states, common errors, and proven troubleshooting strategies.

Understanding Replication Channels

A replication channel represents a path through which changes produced by transactions on a source MySQL server or DB system are replicated to a target DB system.

What Channels Do

Each channel identifies:

  • A replication source (hostname, port, credentials)
  • Configuration required to connect to that source
  • How to copy transactions from source to replica
  • Replication positioning (GTID or binary log)

Important: Channels have their own lifecycle and can be enabled, disabled, reset, and resumed independently.

Channel States

MySQL HeatWave channels exist in either final states (stable) or transient states (temporary during operations).

Final States

1. ACTIVE

Channel is replicating data successfully.

Characteristics:

  • Both IO and SQL threads running
  • Transactions flowing from source to replica
  • May have slight replication lag (normal)
  • No errors reported

Console Indicator: Green icon with "Active" text

2. INACTIVE

Channel is not replicating; replication has been stopped.

Common Reasons:

  • Channel manually disabled by administrator
  • Created with "Enable automatically" unchecked
  • DB system is stopped

Console Indicator: Gray icon with "Inactive" text

To Resume: Enable the channel through console or CLI

3. NEEDS_ATTENTION

Channel is not replicating due to an error.

Common Causes:

  • Source server unreachable
  • Authentication failure
  • Binary log files purged on source
  • SSL/TLS configuration mismatch
  • Incompatible transactions
  • DB system is in INACTIVE state

Console Indicator: Red/orange icon with "Needs Attention" text

Action Required: Check channel details page for error messages, resolve issue, then resume channel

4. DELETED

Channel has been permanently removed.

Characteristics:

  • Appears briefly before removal from list
  • Cannot be recovered
  • All channel configuration deleted

Transient States

These are temporary states during channel operations:

1. CREATING

Channel is being created and configured.

Duration: Typically 1-3 minutes

Next State: ACTIVE (if auto-enabled) or INACTIVE

2. UPDATING

Channel is performing an update operation.

Occurs During:

  • Enabling or disabling channel
  • Modifying channel configuration
  • Resuming after error resolution
  • DB system operations affecting channel

Duration: Typically 30 seconds to 2 minutes

3. DELETING

Channel deletion is in progress.

Duration: Typically 1-2 minutes

Next State: DELETED

Channel Operations

Enabling a Channel

Starts replication from the source if no errors exist.

Console:

  1. Navigate to Databases → HeatWave MySQL → Channels
  2. Select channel → Actions → Enable
  3. Channel transitions: INACTIVE → UPDATING → ACTIVE

CLI:

oci mysql channel update \
  --channel-id <CHANNEL_OCID> \
  --is-enabled true
Enter fullscreen mode Exit fullscreen mode

Disabling a Channel

Stops replication and puts channel into INACTIVE state.

When to Disable:

  • Maintenance on source server
  • Temporary halt of replication
  • Troubleshooting issues
  • Before configuration changes

Console: Channel → Actions → Disable

CLI:

oci mysql channel update \
  --channel-id <CHANNEL_OCID> \
  --is-enabled false
Enter fullscreen mode Exit fullscreen mode

Resuming a Channel

Restarts replication after error is resolved. Used when channel is in NEEDS_ATTENTION state.

Console: Channel → Actions → Resume

CLI:

oci mysql channel resume \
  --channel-id <CHANNEL_OCID>
Enter fullscreen mode Exit fullscreen mode

Behavior: Channel transitions to UPDATING, then ACTIVE if error resolved

Resetting a Channel

Removes all replication position data except channel configuration. Channel must be INACTIVE first.

Warning: Resets replication position - will re-replicate transactions

Console: Channel → Actions → Reset

CLI:

oci mysql channel reset \
  --channel-id <CHANNEL_OCID> \
  --from-json '{"resetType": "POSITION"}'
Enter fullscreen mode Exit fullscreen mode

Use Cases:

  • Recovering from corrupted replication position
  • Starting replication from beginning
  • After major source database changes

Deleting a Channel

Permanently removes channel and all associated configuration.

Console: Channel details → Delete

CLI:

oci mysql channel delete \
  --channel-id <CHANNEL_OCID> \
  --force
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Inbound Replication

Step 1: Identify Source of Failure

When channel enters NEEDS_ATTENTION state, determine the root cause:

Check Channel Details Page

  1. Navigate to channel in console
  2. Click channel name to view details
  3. Review Error section for messages
  4. Check Last Error timestamp
  5. Note error code and description

Check Replication Status on Replica

Connect to replica DB system and query:

SHOW REPLICA STATUS\G
Enter fullscreen mode Exit fullscreen mode

Key Fields:

  • Replica_IO_Running: Should be "Yes" (checks source connection)
  • Replica_SQL_Running: Should be "Yes" (checks transaction application)
  • Last_IO_Errno: IO thread error code
  • Last_IO_Error: IO thread error message
  • Last_SQL_Errno: SQL thread error code
  • Last_SQL_Error: SQL thread error message
  • Seconds_Behind_Source: Replication lag

Performance Schema Query:

SELECT * FROM performance_schema.replication_connection_status\G
Enter fullscreen mode Exit fullscreen mode

Step 2: Identify Communication Failures

Verify connectivity between replica and source:

Test Network Connectivity

From replica DB system or compute instance in same VCN:

# Test if source is reachable
telnet <source_hostname> 3306

# Or use netcat
nc -zv <source_hostname> 3306

# Check DNS resolution
nslookup <source_hostname>
Enter fullscreen mode Exit fullscreen mode

Test Replication User Authentication

mysql -h <source_hostname> \
      -P 3306 \
      -u <repl_user> \
      -p
Enter fullscreen mode Exit fullscreen mode

If successful, authentication works; issue is elsewhere.

Common Error Codes and Solutions

Error 2003: Can't Connect to MySQL Server

Error Message: "Can't connect to MySQL server on '' (error_code)"

Meaning: Replica cannot establish network connection to source

Common Causes:

  1. Source MySQL server not running
  2. Firewall blocking port 3306
  3. Network path doesn't exist
  4. Wrong hostname or IP address
  5. Source server not listening on expected port

Solutions:

Verify source is running:

-- On source server
SHOW GLOBAL STATUS LIKE 'Uptime';
Enter fullscreen mode Exit fullscreen mode

Check firewall rules:

# On source server (Linux)
sudo iptables -L | grep 3306

# Or for firewalld
sudo firewall-cmd --list-all | grep 3306
Enter fullscreen mode Exit fullscreen mode

Verify MySQL listening on port:

# On source server
sudo netstat -tlnp | grep 3306
# Or
sudo ss -tlnp | grep 3306
Enter fullscreen mode Exit fullscreen mode

Check bind-address:

-- On source server
SHOW VARIABLES LIKE 'bind_address';
Enter fullscreen mode Exit fullscreen mode

Should be 0.0.0.0 or specific IP (not 127.0.0.1 for remote connections)

Test from replica network:

telnet <source_ip> 3306
Enter fullscreen mode Exit fullscreen mode

Error 1045: Access Denied

Error Message: "Access denied for user ''@'' (using password: YES/NO)"

Meaning: Authentication failed - wrong credentials or user doesn't have access from replica IP

Common Causes:

  1. Incorrect username or password
  2. User not granted access from replica IP
  3. User exists but lacks REPLICATION SLAVE privilege
  4. Blank users on source conflicting

Solutions:

Verify replication user exists:

-- On source server
SELECT User, Host FROM mysql.user WHERE User = 'repl_user';
Enter fullscreen mode Exit fullscreen mode

Check user privileges:

SHOW GRANTS FOR 'repl_user'@'%';
Enter fullscreen mode Exit fullscreen mode

Should include: GRANT REPLICATION SLAVE ON *.*

Verify replica IP is allowed:

-- On source
SELECT User, Host FROM mysql.user WHERE User = 'repl_user';
Enter fullscreen mode Exit fullscreen mode

Host should be %, replica subnet (e.g., 10.0.1.%), or specific IP

Remove blank users (common issue):

-- On source
SELECT User, Host FROM mysql.user WHERE User = '';
DROP USER ''@'localhost';
DROP USER ''@'<hostname>';
FLUSH PRIVILEGES;
Enter fullscreen mode Exit fullscreen mode

Test authentication:

mysql -h <source> -u repl_user -p
Enter fullscreen mode Exit fullscreen mode

Error 3159: Insecure Connection

Error Message: "Connections using insecure transport are prohibited while --require_secure_transport=ON"

Meaning: Source requires SSL/TLS but channel not configured for it

Solutions:

Option 1: Configure SSL in channel (recommended)

  1. Edit channel configuration
  2. Set SSL mode to "REQUIRED" or higher
  3. Provide SSL certificates if using VERIFY_CA or VERIFY_IDENTITY

Option 2: Disable require_secure_transport on source (not recommended)

-- On source server
SET GLOBAL require_secure_transport = OFF;
Enter fullscreen mode Exit fullscreen mode

Verify SSL configuration:

-- On source
SHOW VARIABLES LIKE 'require_secure_transport';
SHOW VARIABLES LIKE 'have_ssl';
Enter fullscreen mode Exit fullscreen mode

Error 13114: Binary Log Files Purged

Error Message: "Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'" or "The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires"

Meaning: Binary log files needed by replica have been deleted from source

Common Causes:

  1. GTID not supported or configured
  2. Binary log retention too short
  3. Long replication outage
  4. Binary logs manually purged

Solution 1: Fix GTID-Purged (For GTID-based replication)

Step 1: Retrieve gtid_executed from logical dump

When exporting source data with MySQL Shell:

mysqlsh user@source -- util dump-instance /path/to/dump \
  --ocimds=true
Enter fullscreen mode Exit fullscreen mode

Capture gtid_executed value from dump metadata.

Or query source directly:

-- On source
SHOW GLOBAL VARIABLES LIKE 'gtid_executed';
Enter fullscreen mode Exit fullscreen mode

Step 2: Set gtid_purged on replica

Connect to replica and execute:

-- On replica
CALL sys.SET_GTID_PURGED('<gtid_executed_value>');
Enter fullscreen mode Exit fullscreen mode

Example:

CALL sys.SET_GTID_PURGED('3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5');
Enter fullscreen mode Exit fullscreen mode

Step 3: Resume channel

After setting gtid_purged, resume replication:

oci mysql channel resume --channel-id <CHANNEL_OCID>
Enter fullscreen mode Exit fullscreen mode

Solution 2: Rebuild Replication (If GTID fix fails)

For severe cases:

Step 1: Delete channel and replica DB system

oci mysql channel delete --channel-id <CHANNEL_OCID> --force
oci mysql db-system delete --db-system-id <DB_SYSTEM_OCID> --force
Enter fullscreen mode Exit fullscreen mode

Step 2: Take new snapshot of source

mysqlsh user@source -- util dump-instance /path/to/new-dump \
  --excludeSchemas=mysql,sys,performance_schema,information_schema \
  --ocimds=true
Enter fullscreen mode Exit fullscreen mode

Step 3: Create new DB system

Create DB system and import data during creation or after.

Step 4: Create and enable channel

Create new channel with proper GTID positioning from new dump.

Best Practices for Channel Management

1. Monitor Channel Health

Set up alerts:

  • Channel state changes to NEEDS_ATTENTION
  • Replication lag exceeds threshold (e.g., 60 seconds)
  • IO or SQL thread stops

Regular checks:

SHOW REPLICA STATUS\G
Enter fullscreen mode Exit fullscreen mode

2. Maintain Adequate Binary Log Retention

On source server:

-- Set binary log expiration (seconds)
SET GLOBAL binlog_expire_logs_seconds = 172800; -- 48 hours
Enter fullscreen mode Exit fullscreen mode

For AWS RDS:

CALL mysql.rds_set_configuration('binlog retention hours', 48);
Enter fullscreen mode Exit fullscreen mode

3. Document Channel Configuration

Maintain records of:

  • Source connection details
  • Replication user credentials
  • SSL configuration
  • GTID settings
  • Network paths used

4. Test Before Production

  • Create test channel in non-production
  • Verify connectivity
  • Test failover scenarios
  • Document lessons learned

5. Plan for Maintenance Windows

Before source maintenance:

  1. Verify replication lag is minimal
  2. Optionally disable channel
  3. Perform maintenance
  4. Re-enable channel
  5. Monitor for catch-up

6. Use GTID When Possible

Benefits:

  • Auto-positioning simplifies management
  • Easier recovery from failures
  • Better failover capabilities

Advanced Troubleshooting

Replication Lag Increasing

Symptoms: Seconds_Behind_Source continuously growing

Diagnose:

-- Check what's being executed
SHOW REPLICA STATUS\G

-- Check running queries on replica
SHOW PROCESSLIST;

-- Check for lock waits
SELECT * FROM sys.innodb_lock_waits;
Enter fullscreen mode Exit fullscreen mode

Solutions:

  1. Increase replica resources (upgrade shape)
  2. Optimize slow queries on source
  3. Break large transactions into smaller batches
  4. Check for long-running transactions

Channel Constantly Failing

Symptoms: Channel repeatedly enters NEEDS_ATTENTION

Diagnose:

  1. Check error patterns (same error repeating?)
  2. Review source server logs
  3. Check network stability
  4. Verify source server load

Solutions:

  • Fix root cause (network, authentication, binary logs)
  • Consider increasing connection timeouts
  • Upgrade source server resources if overloaded

MySQL Schema Changes Causing Failures

Remember: Changes to mysql schema are not replicated and cause replication to stop

Solution:

  • Manage user accounts independently
  • Skip problematic transaction:
  SET GLOBAL sql_replica_skip_counter = 1;
  START REPLICA;
Enter fullscreen mode Exit fullscreen mode
  • Better: Avoid mysql schema changes during active replication

Effective channel management and troubleshooting are crucial for reliable MySQL HeatWave inbound replication. By understanding channel states, recognizing common error codes, and following best practices, you can maintain healthy replication and quickly resolve issues.

Key takeaways:

  • Monitor channel states: ACTIVE, INACTIVE, NEEDS_ATTENTION are key indicators
  • Check channel details: Error messages provide crucial troubleshooting information
  • Common errors have common fixes: Network (2003), authentication (1045), SSL (3159), binary logs (13114)
  • Use GTID: Simplifies management and recovery
  • Maintain binary log retention: Prevent purged log issues
  • Test thoroughly: Validate connectivity and authentication before production

Top comments (0)