HPE Morpheus Enterprise Disaster Recovery with MySQL InnoDB ClusterSet: A Production Grade Guide
In a previous build out we deployed two separate MySQL InnoDB Clusters, one on the Master site and one on the DR site, as the backing database for our Morpheus Enterprise platform. This guide is the disaster recovery follow up: we merge those two clusters into a single InnoDB ClusterSet so Morpheus Enterprise keeps serving traffic even if an entire site is lost. It covers TLS encrypted async replication between the sites, re bootstrapping MySQL Router so the Morpheus app layer doesn't need any connection string changes, and both planned switchovers and emergency failovers, including the nasty errant GTID cleanup that most guides skip. Every step is executed without interrupting live Morpheus traffic.
Introduction: Why InnoDB ClusterSet?
MySQL InnoDB Cluster gives you solid high availability inside a single data center. Group Replication handles node failures transparently. What it doesn't give you is protection against a site level disaster: if the DC goes dark, a fiber gets cut, or a regional outage hits, your cluster vanishes with it.
InnoDB ClusterSet fills that gap:
- A Primary Cluster serves write traffic (Master Site).
- A Replica Cluster is fed via async replication (DR Site).
- MySQL Router picks up topology changes automatically, so there are no application side connection string changes after failover.
- Planned switchovers give you zero data loss; emergency failovers keep RPO in the single digit seconds range.
This article walks through the exact steps I run in production, with the error messages you will hit, the security posture you should aim for, and the operational tips most tutorials skip. It is specifically written to layer ClusterSet on top of two existing InnoDB Clusters rather than a greenfield install.
One caveat worth stating up front: ClusterSet is not a replacement for backups. Logical corruption (
DROP TABLE, a bad migration, an application bug) replicates to DR just fine. You still need MySQL Enterprise Backup or Percona XtraBackup on an independent schedule.
Starting Point and Target Architecture
Current State
| Component | Master Site | DR Site |
|---|---|---|
| InnoDB Cluster | Installed and running | Installed (will be dropped) |
| MySQL Router | Registered and active | Registered (will be re bootstrapped) |
| Database | Preserved (source of truth) | Doesn't matter (will be overwritten by clone) |
Requirements and Pre-Flight Checks
Software Versions
| Component | Version | Notes |
|---|---|---|
| MySQL Server | 8.0.27+ (8.4 LTS recommended) | Identical version across every node |
| MySQL Shell | Same major as server | Required on every node |
| MySQL Router | 8.0.x (8.2+ for R/W splitting) | At least one instance per site |
Firewall Port List
Cross-site (Master site ↔ DR site, both directions):
| Port | Protocol | Service | Purpose |
|---|---|---|---|
| 3306 | TCP | MySQL Classic | ClusterSet async replication channel (clusterset_replication) + AdminAPI |
ClusterSet uses standard async replication on port 3306 between the primary and replica clusters. Group Replication's internal port (33061) is intra cluster only and does not carry cross site traffic.
Intra cluster (within the same site, between GR members):
| Port | Protocol | Service | Purpose |
|---|---|---|---|
| 3306 | TCP | MySQL Classic | Client + recovery donor connections |
| 33061 | TCP | Group Replication | Inter-node GR local_address (configurable via group_replication_local_address) |
| 33062 | TCP | XCom (recommended free) | Used when communicationStack: 'XCOM'. On the default MYSQL stack it isn't required, but reserving it avoids surprises if you ever switch stacks |
Operator workstation → any MySQL node:
| Port | Protocol | Purpose |
|---|---|---|
| 3306 | TCP |
mysqlsh classic connection |
| 33060 | TCP |
mysqlsh X Protocol (default for newer Shells) |
A common mistake: port 33060 is not required between MySQL nodes (cross site or intra cluster). It is only needed from the operator/DBA workstation. Leaving it open cross site is unnecessary attack surface.
Application → MySQL Router:
| Port | Protocol | Purpose |
|---|---|---|
| 6446 | TCP | Classic R/W, routes to primary |
| 6447 | TCP | Classic R/O, load-balanced across secondaries |
| 6448 / 6449 | TCP | X Protocol R/W and R/O (optional) |
| 6450 | TCP | R/W splitting endpoint (MySQL Router 8.2+, optional) |
Verify connectivity before you start:
# Cross-site: only 3306 is required
nc -zv <remote_site_node_ip> 3306
# Intra-cluster (within same site): 3306 and 33061
nc -zv <local_peer_ip> 3306
nc -zv <local_peer_ip> 33061
Server-Level Consistency Checks (often forgotten)
Any of the following drifting between Master and DR will eventually break replication or corrupt query semantics. Run on one node per cluster and diff the output.
-- Character set and collation
SHOW VARIABLES LIKE 'character_set_server';
SHOW VARIABLES LIKE 'collation_server';
-- Timezone (critical for TIMESTAMP columns)
SELECT @@global.time_zone, @@global.system_time_zone;
-- SQL mode
SELECT @@global.sql_mode;
-- Lower case table names (cannot be changed after initialization)
SELECT @@global.lower_case_table_names;
-- Clone plugin availability (required for recoveryMethod: 'clone')
SHOW PLUGINS WHERE Name = 'clone';
-- Expected: Status = ACTIVE. If missing:
-- INSTALL PLUGIN clone SONAME 'mysql_clone.so';
Watch out for
lower_case_table_names. If Master and DR were initialized with different values, replication will break on any mixed case table name. The value is set atmysqld --initializetime and cannot be changed afterwards, so the only fix is a full re init of the divergent side.
Step 1: Configure server_id and report_host on the Existing Cluster
ClusterSet requires every MySQL node to have a unique server_id (across both sites) and a report_host that advertises its real address. These parameters don't take effect until MySQL restarts, so on a live cluster you perform a rolling restart. Never shut all nodes down at once.
Critical: restart secondaries first and the primary last. Otherwise you risk losing quorum.
1.1 Inspect Current Values
mysql -u clusterAdmin -p \
-e "SELECT @@server_id, @@report_host, @@report_port, @@hostname;"
Target values:
| Node | server_id | report_host |
|---|---|---|
| master_node1 (Primary) | 1 | master_node1 IP |
| master_node2 (Secondary) | 2 | master_node2 IP |
| master_node3 (Secondary) | 3 | master_node3 IP |
| dr_node1 | 4 | dr_node1 IP |
| dr_node2 | 5 | dr_node2 IP |
| dr_node3 | 6 | dr_node3 IP |
Why this matters:
server_idmust be unique across the entire ClusterSet, not just per cluster. Two nodes sharing an id will drop replication events or create a silent data divergence.
1.2 Edit my.cnf
[mysqld]
# ── Required for ClusterSet ──────────────────────────────────────
server_id = 1 # Must be UNIQUE across BOTH sites
report_host = 10.10.1.11 # Real IP of this node
report_port = 3306
# ── Should already be present on an InnoDB Cluster ───────────────
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
log_replica_updates = ON # Renamed from log_slave_updates in 8.0.26
plugin_load_add = group_replication.so
# ── Auto-rejoin after restart (optional but recommended) ─────────
loose-group_replication_start_on_boot = ON
A note on hostnames: use IP addresses for
report_host. Hostname based setups frequently hitServer address configuration errorwhen DNS is inconsistent.
1.3 Rolling Restart
The sub steps below follow the safe order: identify primary, restart each secondary in turn, restart primary last, move the primary role back to master_node1, then verify Router.
1.3.1 Identify the Primary
mysqlsh clusterAdmin@master_node1:3306
var cluster = dba.getCluster()
cluster.status()
The node with memberRole: "PRIMARY" is restarted last.
1.3.2 Restart Secondaries One at a Time
# A) (Optional) Gracefully remove from cluster
mysqlsh clusterAdmin@<secondary_node>:3306
# >> dba.getCluster().removeInstance('<secondary_node>:3306', {force: false})
# B) Restart MySQL
systemctl restart mysqld
# C) Verify new parameters
mysql -u clusterAdmin -p -e "SELECT @@server_id, @@report_host;"
# D) Re-add
mysqlsh clusterAdmin@master_node1:3306
# >> dba.getCluster().addInstance('<secondary_node>:3306', {recoveryMethod: 'incremental'})
Shortcut: with
loose group_replication_start_on_boot = ON, a plainsystemctl restart mysqldis enough. The node rejoins on its own within 30 to 60 seconds.
1.3.3 Restart the Primary Last
# Confirm every secondary is ONLINE first
systemctl restart mysqld
When mysqld exits, Group Replication performs a view change and promotes a healthy secondary (5–15 seconds).
1.3.4 Move the Primary Role Back
mysqlsh clusterAdmin@<any_node>:3306
dba.getCluster().setPrimaryInstance('master_node1:3306')
This is a coordinated primary handover, not an election. The current primary drains in flight transactions, sets the target node writable, and transfers the role. Router picks up the change via its metadata TTL (5 to 15 seconds in practice).
1.3.5 Verify Router Routing
mysql -h <router_ip> -P 6446 -u clusterAdmin -p \
-e "SELECT @@hostname, @@report_host;"
Output should show master_node1.
1.4 Full Verification
| Check | Expected |
|---|---|
MEMBER_STATE |
ONLINE for every node |
@@server_id |
Unique (1–6) |
@@report_host |
Real IP |
cluster.status() |
statusText: "Cluster is ONLINE and can tolerate up to ONE failure" |
Step 2: Verify Server Certificates Are Present (Pre-Flight for TLS)
Cross-site async replication without TLS is a serious exposure in production. Before you create the ClusterSet, confirm that every node already has working server certificates so the replication channel can negotiate TLS as soon as it's created in §4.1. The actual enforcement option (clusterSetReplicationSslMode) is set at ClusterSet creation time, not here.
2.1 Confirm Certificates Exist
MySQL auto-generates self signed certs under datadir on first init. For production, replace them with certs issued by your internal CA. Verify they exist on every node:
SHOW VARIABLES LIKE 'ssl_%';
-- ssl_ca, ssl_cert, ssl_key should all point to valid files
Test TLS is actually offered:
mysql -u clusterAdmin -p -h master_node1 \
--ssl-mode=REQUIRED \
-e "SHOW STATUS LIKE 'Ssl_cipher';"
# Ssl_cipher should be non-empty (e.g., TLS_AES_256_GCM_SHA384)
2.2 How TLS Is Enforced (Preview of §4.1)
TLS on the ClusterSet replication channel is controlled by the clusterSetReplicationSslMode option, set at ClusterSet creation time via cluster.createClusterSet() (see §4.1). Possible values:
| Value | Behavior |
|---|---|
AUTO (default) |
Enables encryption when the server supports it, disables otherwise |
REQUIRED |
Enables encryption for all ClusterSet replication channels |
DISABLED |
Disables encryption. Avoid in production |
VERIFY_CA (Shell 8.0.33+) |
Like REQUIRED, plus verifies the server certificate against the CA |
VERIFY_IDENTITY (Shell 8.0.33+) |
Like VERIFY_CA, plus verifies the certificate identity matches the server hostname |
The internal replication user (typically mysql_innodb_cs_<hex>) is created by AdminAPI and the channel is configured to match this option. Optionally you can also restrict the account server-side post-creation as defense-in-depth:
-- Defense in depth; the channel is already configured per clusterSetReplicationSslMode
SELECT User, Host FROM mysql.user WHERE User LIKE 'mysql_innodb_cs_%';
ALTER USER 'mysql_innodb_cs_xxxx'@'<scoped_host>' REQUIRE SSL;
TLS version note: MySQL 8.0 negotiates TLSv1.2 and TLSv1.3 by default. Pinning to
tls_version = TLSv1.3is good hardening only when every client supports it; keepingTLSv1.2,TLSv1.3is the safer default.
Step 3: Remove the Existing DR Cluster
The legacy InnoDB Cluster on DR cannot be added to a ClusterSet as a replica. It has to be fully dismantled first.
Data loss warning: this wipes DR. Clone re-seeds from Master afterwards. Confirm nothing on DR needs to be preserved before continuing.
3.1 Method A: dissolve() (Recommended)
mysqlsh clusterAdmin@dr_node1:3306
var oldCluster = dba.getCluster()
oldCluster.status()
oldCluster.dissolve({force: true})
dissolve() connects to every node, stops Group Replication, and drops the mysql_innodb_cluster_metadata schema. force: true lets it proceed even with OFFLINE nodes.
Verify on every DR node:
SELECT @@group_replication_group_name; -- empty
SHOW STATUS LIKE 'group_replication_primary_member'; -- empty
dba.getCluster() // should throw
3.2 Method B: dropMetadataSchema() (Fallback)
If dissolve() fails:
mysqlsh clusterAdmin@dr_node1:3306
\sql
STOP GROUP_REPLICATION;
-- RESET REPLICA ALL is a no-op for a pure InnoDB Cluster (GR uses its own
-- group_replication_recovery channel, not SQL-level replication). It's only
-- useful here if the node was *previously* a ClusterSet replica or had some
-- manual async channel configured. Skip unless `SHOW REPLICA STATUS\G` shows
-- a channel. Running it on a clean node is harmless but unnecessary.
-- RESET REPLICA ALL;
\js
dba.dropMetadataSchema({force: true})
Repeat on dr_node2 and dr_node3. If you still see errant GTIDs blocking clone later, a controlled reset is sometimes necessary:
-- Use only if you're 100% sure DR data is disposable
RESET MASTER; -- MySQL 8.0.x
-- RESET BINARY LOGS AND GTIDS; -- MySQL 8.4+ (new spelling; RESET MASTER is deprecated there)
3.3 Clean Up the DR Router
systemctl stop mysqlrouter
rm -rf /etc/mysqlrouter/*
rm -rf /var/lib/mysqlrouter/*
3.4 Confirm DR Nodes Are Standalone
mysqlsh clusterAdmin@<dr_nodeX>:3306
dba.getCluster() // ERROR expected
dba.checkInstanceConfiguration('clusterAdmin@<dr_nodeX>:3306') // "ok"
Step 4: Create the ClusterSet and Join DR
4.1 Create ClusterSet from Master (TLS hardened)
Both TLS enforcement and the replication user host scope are configured at ClusterSet creation time. They belong on cluster.createClusterSet(), not on createReplicaCluster() later. Setting them here means every replica cluster you add inherits the secure defaults.
Assumed naming convention used throughout this guide (substitute your own names if different):
- Primary cluster name (created earlier when you ran
dba.createCluster('morpheus cluster')on Master):morpheus cluster- ClusterSet name (created in this step):
MorpheusClusterSet- Replica cluster name (created in §4.2):
MorphDrClusterYou can confirm the primary cluster name with
dba.getCluster().getName()before proceeding.
mysqlsh clusterAdmin@master_node1:3306
var cluster = dba.getCluster()
// Sanity check: this should return 'morpheus cluster' in this guide
cluster.getName()
var clusterSet = cluster.createClusterSet('MorpheusClusterSet', {
clusterSetReplicationSslMode: 'REQUIRED', // Force TLS on the ClusterSet replication channel
replicationAllowedHost: '10.0.0.0/8' // MUST cover BOTH primary and replica subnets (see note below)
})
clusterSet.status()
Version requirements:
clusterSetReplicationSslModeis available in MySQL Shell 8.0.27 and later. The valuesVERIFY_CAandVERIFY_IDENTITY(stronger thanREQUIRED) were added in 8.0.33. ThereplicationAllowedHostoption requires MySQL Shell 8.0.28 or later.⚠️
replicationAllowedHostis bidirectional. Per the official docs, the host pattern you set must be reachable from nodes in both the primary and replica clusters. The internal replication user (mysql_innodb_cs_*) is granted with this host pattern and will be used by:
- The replica cluster when it connects to the primary (normal-state replication)
- The original primary's nodes if they later need to connect to a new primary after failover or switchover
Scoping the pattern to only the DR subnet (for example
10.20.0.0/24) will work as long as the primary never changes, butsetPrimaryCluster()orforcePrimaryCluster()will break replication for the old primary's nodes because they can't satisfy the host grant. Use a CIDR or wildcard that covers every MySQL node in every cluster, for example10.0.0.0/8covering both sites, or a pair of wildcards like10.10.%.%combined with10.20.%.%.Host pattern syntax: per the official docs,
replicationAllowedHostaccepts CIDR notation (for example192.0.2.0/24). The MySQLHostcolumn also accepts wildcards like10.%.%.%and netmask form10.20.0.0/255.255.0.0if your network requires those. Avoid'%'in production, since it lets the replication user connect from any IP.Already created? If you see
MYSQLSH 51601 — function is not available … already belongs to a ClusterSet, fetch the existing ClusterSet:var clusterSet = dba.getClusterSet() // To change the host scope after the fact: clusterSet.setOption('replicationAllowedHost', '10.0.0.0/8')
4.2 Add DR as Replica Cluster
TLS and host scope were already set at the ClusterSet level in §4.1, so the replica cluster inherits them automatically. Here you only need the provisioning options that belong on createReplicaCluster():
var clusterSet = dba.getClusterSet()
var replicaCluster = clusterSet.createReplicaCluster(
'clusterAdmin@dr_node1:3306',
'MorphDrCluster',
{
recoveryMethod: 'clone', // AUTO | CLONE | INCREMENTAL
recoveryProgress: 1, // 0 = silent, 1 = static, 2 = progress bars
timeout: 600 // Seconds to wait for post-provisioning sync
// communicationStack: 'MYSQL' // Default on 8.0.27+; set only to pin explicitly
}
)
What's actually happening under the hood, per the official docs:
createReplicaCluster()creates theclusterset_replicationasynchronous channel, generates an internal replication user with a random password, and configures encryption on the channel according to theclusterSetReplicationSslModeyou set on the ClusterSet in §4.1. It also setsskip_replica_start = ONandsuper_read_only = ON, and enablesmysql_start_failover_channels_if_primaryfor async failover on the channel.
Add remaining DR nodes:
replicaCluster.addInstance('clusterAdmin@dr_node2:3306', {recoveryMethod: 'clone'})
replicaCluster.addInstance('clusterAdmin@dr_node3:3306', {recoveryMethod: 'clone'})
Note on
cloneDonor: by default AdminAPI picks a secondary from the primary cluster as clone donor (and falls back to the primary if no secondary is available). For cross-WAN clones of large datasets, consider pinning an explicit donor withcloneDonor: 'host:port'to control which node sources the snapshot, and therefore which link the traffic flows through.
4.3 Common Errors
| Error | Cause | Fix |
|---|---|---|
Server address configuration error |
report_host missing/wrong |
Add to my.cnf, restart mysqld |
Target instance already part of an InnoDB Cluster |
DR not cleaned up | Re-run Step 3 |
Errant GTIDs detected |
Old transactions on DR | Use recoveryMethod: 'clone' (overwrites) |
Can't connect to primary cluster |
Firewall / report_host
|
Test with nc -zv
|
clone plugin not installed |
Missing plugin |
INSTALL PLUGIN clone SONAME 'mysql_clone.so'; on both sides |
Access denied for user 'mysql_innodb_cs_*' |
TLS mismatch or host restriction | Verify replicationAllowedHost matches donor IP |
Step 5: Re-bootstrap MySQL Router
DR's metadata was rebuilt, so Router on both sites must re-learn the new topology.
5.1 DR Router
Always bootstrap against a Master site node. Router reads metadata from the primary.
systemctl stop mysqlrouter
rm -rf /etc/mysqlrouter/*
rm -rf /var/lib/mysqlrouter/*
mysqlrouter --bootstrap clusterAdmin@master_node1:3306 \
--account routeruser \
--user=mysqlrouter \
--conf-use-gr-notifications \
--force
# Note: consider dropping --disable-rest if you want metrics/health endpoints
systemctl start mysqlrouter
5.2 Master Router
systemctl stop mysqlrouter
mysqlrouter --bootstrap clusterAdmin@master_node1:3306 \
--account routeruser \
--user=mysqlrouter \
--conf-use-gr-notifications \
--force
systemctl start mysqlrouter
--conf-use-gr-notifications makes Router react to GR state changes without waiting for a metadata TTL, which meaningfully shortens failover detection.
5.3 Keyring Error (Keyring decryption failed)
systemctl stop mysqlrouter
find / -name '*.keyring' 2>/dev/null
rm -f /var/lib/mysqlrouter/*.keyring
rm -f /var/lib/mysqlrouter/keyring*
rm -f /etc/mysqlrouter/*.key
mysqlrouter --bootstrap clusterAdmin@master_node1:3306 \
--account routeruser --user=mysqlrouter --force
5.4 mysql_native_password Warning
Failed changing the authentication plugin: mysql_native_password is deprecated is a warning, not an error. In MySQL 8.4 the plugin is disabled by default. Upgrade the account cleanly:
ALTER USER 'routeruser'@'%' IDENTIFIED WITH caching_sha2_password BY '<password>';
Step 6: Verification and Monitoring
6.1 ClusterSet Health
mysqlsh clusterAdmin@master_node1:3306
var clusterSet = dba.getClusterSet()
clusterSet.status({extended: 1})
Expected:
{
"primaryCluster": "morpheus-cluster",
"status": "HEALTHY",
"replicaClusters": {
"MorphDrCluster": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"transactionSetConsistencyStatus": "OK"
}
}
}
If transactionSetConsistencyStatus is anything other than OK (especially INCONSISTENT), stop and investigate before letting traffic continue.
6.2 Replication Channel Queries
A handful of queries cover the vast majority of troubleshooting on a running ClusterSet:
-- 1) ClusterSet replication channel status (run on DR primary)
SELECT CHANNEL_NAME, SERVICE_STATE, LAST_ERROR_NUMBER, LAST_ERROR_MESSAGE
FROM performance_schema.replication_connection_status
WHERE CHANNEL_NAME = 'clusterset_replication';
-- 2) Applier worker status (look for LAST_ERROR on any worker)
SELECT CHANNEL_NAME, WORKER_ID, SERVICE_STATE, LAST_ERROR_NUMBER, LAST_ERROR_MESSAGE
FROM performance_schema.replication_applier_status_by_worker
WHERE CHANNEL_NAME = 'clusterset_replication';
-- 3) GTID gap check: the DR executed set should be a subset of Master's
SELECT @@global.gtid_executed; -- run on both, compare
-- 4) Classic lag view (still useful for a quick eyeball)
SHOW REPLICA STATUS\G
-- Seconds_Behind_Source (8.0.22+) / Seconds_Behind_Master (older)
-- 5) Group Replication member view
SELECT MEMBER_HOST, MEMBER_ROLE, MEMBER_STATE
FROM performance_schema.replication_group_members
ORDER BY MEMBER_ROLE;
6.3 Router Registration and Ports
clusterSet.listRouters()
| Router | Hostname | R/W Port | R/O Port |
|---|---|---|---|
| Master Router | <master_server> |
6446 | 6447 |
| DR Router | <dr_server> |
6446 | 6447 |
ss -tlnp | grep mysqlrouter
# LISTEN 0 70 0.0.0.0:6446 # R/W
# LISTEN 0 70 0.0.0.0:6447 # R/O
6.4 Metrics to Pipe Into Prometheus
⚠️ Exporter version caveat: the metric names below are indicative and follow
prometheus/mysqld_exporterhistorical naming. Metric names have changed across versions (e.g.slave→replicain newer releases, and some Group Replication metrics moved between collector flags). Before copy-pasting into your alerts, verify the exact names against the exporter version you run. Query your Prometheus with{__name__=~"mysql_.*(replica|slave).*"}and{__name__=~"mysql_.*group_replication.*"}and adjust. You may also need to enable collectors like--collect.slave_status/--collect.replica_statusand--collect.perf_schema.replication_group_member_stats.
| Metric (indicative name; verify against your exporter) | Threshold | Action |
|---|---|---|
mysql_slave_status_seconds_behind_master (or mysql_replica_status_seconds_behind_source on newer exporters)
|
> 10 s sustained | Check WAN bandwidth, DR apply throughput |
mysql_perf_schema_replication_group_member_info{member_role="PRIMARY"} or equivalent |
Must match expected primary | Alert on unexpected primary change |
mysql_up on any node |
= 0 for > 60 s | Page on-call |
transactionSetConsistencyStatus (via custom exporter scraping clusterSet.status()) |
!= OK
|
Freeze writes, investigate |
The last row does not have a built-in exporter. A common pattern is a small sidecar that periodically runs mysqlsh --js -e "print(JSON.stringify(dba.getClusterSet().status({extended:1})))" and exposes the parsed fields as a Prometheus textfile.
Step 7: Failover Procedures
7.1 Planned Switchover
For maintenance windows, DC migrations, and version upgrades. Zero data loss.
var cs = dba.getClusterSet()
// Promote DR
cs.setPrimaryCluster('MorphDrCluster')
// Switch back
cs.setPrimaryCluster('morpheus-cluster')
Realistic RTO is roughly 30 seconds to 2 minutes, dominated by Router metadata refresh and the in-flight transaction drain. With --conf-use-gr-notifications enabled, the Router side is typically under 10 seconds.
7.2 Emergency Failover (forcePrimaryCluster)
Use this when Master is completely unreachable.
Warning:
forcePrimaryCluster()skips consistency checks. Transactions committed on Master but not yet replicated to DR will be lost. Expect a small RPO measured in seconds, not zero.
mysqlsh clusterAdmin@dr_node1:3306
var cs = dba.getClusterSet()
cs.forcePrimaryCluster('MorphDrCluster')
Critical: Fence the Old Master
Before or immediately after the force failover, make sure the old Master cannot accept writes when it comes back up. Otherwise the application may briefly hit the old Master through its Router and create divergent data. The preferred approach is to use the AdminAPI fencing commands (MySQL Shell 8.0.27 and later). They coordinate the fence across every node in the old Master cluster and instruct Router to stop routing to it, rather than relying on each DBA to toggle super_read_only by hand.
// Connect to ANY reachable node of the old (ex-primary) cluster
mysqlsh clusterAdmin@master_node1:3306
var cluster = dba.getCluster()
// Hard fence: blocks BOTH writes and reads via Router. Use when you want
// the old cluster completely out of traffic while you investigate.
cluster.fenceAllTraffic()
// Softer alternative: only blocks writes, keeps the cluster available for R/O.
// Useful if the old Master can safely keep serving reads from stale data.
// cluster.fenceWrites()
// After you've repaired the old Master and are ready to rejoin it as a REPLICA:
// cluster.unfenceWrites() // only needed if you used fenceWrites()
What the AdminAPI fencing actually does, per the official Shell docs:
-
fenceAllTraffic()setssuper_read_only = ONon every member, stops Group Replication on every member, and updates the metadata so Router stops routing any traffic to the cluster. -
fenceWrites()setssuper_read_only = ONon the primary and updates metadata so Router stops sending R/W traffic but still allows R/O. The cluster keeps replicating from the new primary through ClusterSet. -
unfenceWrites()reversesfenceWrites(), so the cluster comes back online as a healthy replica.
If AdminAPI is unreachable (for example the old Master cluster is completely partitioned away), fall back in this order:
- Remove the old Master's VIP, DNS entry, or load balancer registration so the application can't reach it.
- Firewall-block ports 3306 and 6446 at the old site from the application subnet.
- As a last resort,
SET GLOBAL super_read_only = ON;manually on each old Master node. This is a per-node toggle, easy to forget on one host, and it does not update ClusterSet metadata.
Skipping fencing is how you end up with a split-brain when the old Master briefly reconnects to the network.
After Master Recovers, Rejoin It
// Run from the DR primary (now the ClusterSet primary)
var cs = dba.getClusterSet()
cs.rejoinCluster('morpheus-cluster')
7.3 The Most Common Rejoin Failure: Errant GTIDs on the Old Master
If the old Master committed transactions during the outage that never made it to DR, rejoinCluster() will fail with an errant-GTID error. In practice this is by far the most common post-failover pain point.
Diagnose:
-- On old Master primary
SELECT @@global.gtid_executed;
-- On new primary (DR)
SELECT @@global.gtid_executed;
-- Compute the diff:
SELECT GTID_SUBTRACT(
'<old_master_executed>',
'<dr_executed>'
) AS errant_gtids;
Any non-empty errant_gtids means the old Master has transactions DR never saw. You have three options, ordered from safest to fastest:
-
Extract and replay manually. Use
mysqlbinlogon the old Master to dump only the errant GTIDs and then replay them against the new primary (DR). The slowest option, but no data loss.
# Step 1: On the OLD Master, find the binlog files covering the outage window
ls -lh /var/lib/mysql/binlog.*
# Step 2: Dump only the errant GTIDs to a SQL file.
# --include-gtids takes the exact GTID set printed by GTID_SUBTRACT above.
mysqlbinlog \
--read-from-remote-server \
--host=<old_master_ip> --port=3306 \
--user=clusterAdmin --password \
--include-gtids='<errant_gtids_from_GTID_SUBTRACT>' \
--skip-gtids=false \
binlog.000123 binlog.000124 binlog.000125 \
> errant.sql
# Step 3: Inspect errant.sql. Confirm the statements are safe to re-apply
# (no DROP/TRUNCATE you didn't expect, no migrations already run on DR).
less errant.sql
# Step 4: On the NEW primary (DR), replay as clusterAdmin
mysql -h dr_node1 -u clusterAdmin -p < errant.sql
# Step 5: Re-run the GTID diff from §7.3. errant_gtids should now be empty.
Why
--skip-gtids=false: it preserves the original GTIDs so the new primary records them as executed. Otherwise a laterrejoinCluster()of the old Master will still see them as errant.
-
Accept the loss and re-clone. Treat the old Master cluster as disposable,
dissolve()it, and re-add it as a new replica cluster against the current primary (DR). The fastest option, but loses the errant transactions. - Forcibly reset and rejoin. On each old Master node:
STOP GROUP_REPLICATION;
RESET MASTER; -- MySQL 8.0.x; on 8.4+ use RESET BINARY LOGS AND GTIDS
-- Then rejoinCluster() and let clone re-seed
Only safe if you have the original data preserved and audited elsewhere. This wipes the entire GTID history on the target node.
This is why backups are non-negotiable. Option 1 depends on binary log retention that covers the outage window. Set
binlog_expire_logs_secondsaccordingly. Seven days is a reasonable floor for most shops.
7.4 Router Behavior After Failover
| State | Master Router | DR Router |
|---|---|---|
| Normal | Master Cluster → R/W (6446) | Master Cluster → R/W (WAN) |
| After failover | DR Cluster → R/W (auto) | DR Cluster → R/W (local) |
No application connection string change is needed. Router reacts to ClusterSet topology changes via GR notifications and its metadata cache.
Architecture Summary
| Component | Master Site Role | DR Site Role |
|---|---|---|
| Cluster type | Primary (R/W) | Replica (R/O until failover) |
| Intra-cluster replication | Group Replication (sync) | Group Replication (sync) |
| Cross-site replication | Source (donor) | Async + TLS from Master |
| Failover method |
setPrimaryCluster() (planned) |
forcePrimaryCluster() (emergency) |
| Router behavior | Always routes to the primary | Auto-updates after failover |
| Realistic RTO | 30 s to 2 min (planned) | 2 to 5 min (forced, includes fence + rejoin) |
| RPO | Zero (planned) | Near-zero; non-zero on force (typically 1 to 5 s) |
Operational Tips and Best Practices
A switchover drill every month is the single highest-leverage habit. Failover should never be tested for the first time during a real disaster; a monthly setPrimaryCluster() round-trip proves the procedure still works and keeps on-call muscle memory sharp.
Log clusterSet.status({extended: 1}) every 60 seconds to a file. A simple cron-scraped JSON log often beats "enterprise" monitoring when you are trying to reconstruct what happened during an incident.
Put Routers behind a load balancer. A single Router IP is a single point of failure. At minimum, run two Routers per site behind HAProxy, Keepalived, or a cloud load balancer.
Back up independently of the cluster. ClusterSet protects against infrastructure failure, not logical failure. A DROP TABLE replicates. Use MySQL Enterprise Backup or Percona XtraBackup on an independent schedule, ideally to a third location that is neither Master nor DR.
Keep the rolling approach for MySQL version upgrades: upgrade DR first, observe for a week, switch over, upgrade the old Master, then switch back.
Watch cross-site RTT. Async replication tolerates latency well, but GTID apply throughput drops meaningfully above roughly 50 ms RTT for write-heavy workloads. Benchmark before assuming it will work for your traffic shape.
Rotate the internal mysql_innodb_cs_* replication account credentials on a schedule. AdminAPI creates them with strong random passwords but never rotates them on its own. Build a runbook for the rotation.
Keep binlog_expire_logs_seconds generous. Seven days is a reasonable floor for most production environments. It is the difference between manual recovery (option 1 in §7.3) and data loss (option 2).
Don't share credentials between clusterAdmin and routeruser. They have different privilege needs, and scoping them tightly limits blast radius.
Document your fencing procedure before you need it. When a real emergency failover happens, you do not want to be figuring out "how do we stop the old Master from accepting writes" on the fly.
Wrap-Up
InnoDB ClusterSet is a production-grade multi-site DR solution that can be introduced into a live environment without downtime, provided you respect three disciplines:
- Rolling restart discipline, so you never break quorum.
- Clean DR teardown followed by a TLS-hardened ClusterSet creation. Use
dissolve()(or fallbackdropMetadataSchema()) to clean DR, then create the ClusterSet withclusterSetReplicationSslMode: 'REQUIRED'(orVERIFY_IDENTITYon Shell 8.0.33+) and a scopedreplicationAllowedHost. Never use%in production. - Failover realism. Plan for errant GTIDs, old-Master fencing, and backup-dependent recovery paths before you need them, not after.
Get those three right and your application keeps talking to the same Router address while your infrastructure is ready for the worst case: a full data center loss.
Top comments (0)