DEV Community

Greg Goodman
Greg Goodman

Posted on

Elastic Migration of a Multi-Region CockroachDB Cluster

In this blog post, my colleague Drew Deally and I are going to work through the exercise of migrating a MultiRegion CockroachDB cluster from one set of cloud Regions to another using the elastic nature of a CockroachDB cluster. That is, we’re going to move the cluster to a whole new set of nodes by expanding the cluster (adding nodes during runtime) and then contracting it (removing nodes from the live cluster). And we are going to do it without suffering any period of data unavailability or even an appreciable drop in cluster performance.

In the Last Episode

This is a follow-up to a previous blog post, in which Drew and I worked through the same exercise - migrating a CockroachDB cluster from one set of cloud Regions to another - but with a cluster that did not make use of CockroachDB’s MultiRegion SQL abstractions. We encourage you to read that post, if you haven’t already.

To summarize that exercise, we used the elastic property of a CockroachDB cluster to

  • add nodes in the regions we want to the cluster to occupy
  • decommission nodes in the regions we want to vacate

In between these two steps, we used zone configurations to apply replica placement constraints to our databases, forcing our application data out of the regions we’re evacuating. When we decommissioned the nodes in the evacuated regions, we did it one node at a time, allowing the nodes to shed any remaining replicas (of system ranges) they were hosting.

It’s worth noting that we didn’t have to push the application data off the retired nodes before decommissioning them. Decommissioning a node causes it to migrate its replicas away to other nodes before shutting down. Since a node in the process of being decommissioned is ineligible to receive new replicas, it’s possible to decommission multiple nodes at once, and let the cluster move their replicas en masse to eligible nodes. We moved the application data in a separate step for two reasons:

  1. We wanted things to happen one operation at a time, so we could monitor the progress of each step.
  2. We wanted to be able to roll back the migration if necessary, an option that’s often required by customers’ IT departments to minimize risk.

Why a MultiRegion cluster is different

The procedure we outlined in the last article works for a CockroachDB cluster with databases that do NOT use the MultiRegion capabilities introduced in version 21.1, but it’s not quite the right mechanism for a MultiRegion database. Why not?

A MultiRegion database is assigned to occupy specific Database Regions, including one Primary and zero or more additional Regions. Those assignments will have to be adjusted to match the changing network topology. Changing the Database Regions will have much the same effect as applying the replica placement constraints using zone configurations as in the previous example; it will force replicas to move out of Regions they’re not allowed to occupy and into Regions where they are allowed.

Elastic Migration of a MultiRegion Database

We’re going to repeat the exercise from the previous article, but with a difference. We’ll migrate a CockroachDB cluster from one set of nodes to another by adding new nodes to the cluster and then retiring the old ones. But this time, the cluster hosts a MultiRegion database, so we’ll use the SQL MultiRegion commands to relocate the database to the new regions before retiring the old ones.

As before, the process outlined here allows the migration to take place without database downtime, without changing the cluster’s identity, and without relying on any tools outside of CockroachDB’s native functionality, or even on any of CockroachDB Enterprise features.

Starting and Ending States

In our example, we start with a CockroachDB cluster deployed across 3 regions: us-east1, us-east2 and us-central1, with us-central1 serving as the Primary Region. The desired end result is to have the cluster occupy a completely different (non-overlapping) set of regions: us-east4, us-west4, us-west3, with us-east4 serving as Primary Region.

You may have noticed another small difference between this exercise and the one we undertook in the last article: We’re not keeping any of the original regions in the final cluster. Because our Primary Region will be going away, we'll have to change the Primary Region for the database(s) in the cluster.

This chart from the DB Console illustrates the current distribution of replicas across Regions in our cluster:

Initial Replica Distribution

And this one represents the desired state this exercise will produce:

Final Replica Distribution

(Note: The higher number of replicas in the final distribution of the TPCC database is due to changing the Replication Factor from 3 to 5, a side effect of making our database Multi-Region.)

Overview of the Process

As illustrated above, we’re starting with 9 nodes in 3 regions. We’ll add another 9 nodes in 3 more regions, for a total of 18 nodes. Then we’ll manage the transfer of data out of the 3 regions we want to vacate, and into the 3 regions we’ve added. When we retire the 9 nodes in the vacated regions, we’ll leave the cluster with a new set of 9 nodes in a new set of 3 regions.

Here’s a high-level view of the steps we’re going to take.

  1. Verify all schema changes and upgrades have been finalized and backup the cluster
  2. Prepare all the new nodes with appropriate Cockroachdb binaries
  3. Update cluster certificates
  4. Add new nodes to the cluster
  5. (Optional) Modify cluster configuration to speed up range migration
  6. Add new Regions to the database(s)
  7. Change the database Region assignments, and monitor the migration of replicas to the new nodes
  8. Manually fail the application over to the new primary region
  9. Decommission nodes to be retired, one at a time

Getting Started

We created a cluster of 9 nodes, and used the cockroach workload ... tpcc command to initialize and run the TPCC workload on the cluster. We can see on the DB Console that our workload is roughly generating 3379 queries per second with a p99 Latency of 15 ms, which is expected under normal circumstances:

Cluster Performance Summary

Note that our TPCC database was created as a cluster-spanning database, not a MultiRegion database. We’ll turn it into a MultiRegion database now, by assigning it a Primary Region, and adding the other regions that can host its data.

Make the database Multi-Region

We can see the new database zone configuration (automatically adjusted as a result of the MultiRegion commands we executed) that distributes the replicas across the 3 regions and places 2 voting replicas, including the leaseholder, in the primary region.

Initial Mult-Region Zone Configuration

Over the course of about 70 minutes, the database moves the replicas to their designated locations. This is performed in the background, and doesn’t interfere with the system’s ability to service application requests.

Replicas Relocate

Uninterrupted SQL Performance

We’ve completed step 1 of the process we laid out above; our initial cluster and MultiRegion database are now set up, and we can start our migration.

We’ll gloss over steps 2-4, and assume that we’ve successfully added 9 more nodes in 3 new regions to the cluster:

Expanded Cluster

At this point, we can see that our TPCC database still occupies only the regions designated for it, while the system ranges are distributed across the whole cluster.

Replica Distribution

Replica Distribution (expanded)

This is another minor difference from our previous exercise. In that case, we used a zone configuration with replica placement constraints to keep the system from taking advantage of the new nodes and distributing our database replicas more widely. (Recall that CockroachDB’s default policy is to spread replicas geographically, both to balance load and to protect against localized failures). We didn’t do that this time because our MultiRegion database is already constrained to occupy only specific regions, and cannot currently place replicas in the regions where our new nodes are located.

Changing the Database Regions

The next steps are (#6) to add the new regions to the database, (#7) change the Primary Region, and drop the old regions. Note that, as soon as the database is permitted to place replicas in a new region, the system will start rebalancing replicas to take advantage of the newly available resources.

As in the last exercise, we could adjust these two cluster settings to speed up replica migration and reduce the time until the system reaches a new stable distribution of data:

  • kv.snapshot_rebalance.max_rate
  • kv.snapshot_recovery.max_rate

(That's the optional step #5 in our process overview.) However, as we’ve already covered that in detail, we’ll skip it for purposes of this demonstration.

First, let’s add the region us-east4 to the cluster, and make it the Primary Region.

Change the Primary Region

Now we can drop the us-central1 region from the cluster, forcing the system to move all of the replicas off those nodes, as it is no longer a legitimate region for this database to keep replicas.

Drop a region

Monitoring the DB Console, we can watch the replicas and workload shift to the new region. Finally, all the replicas and leaseholders have moved out of us-central1 and us-east1 has assumed its share of the load and hosts all of the leaseholders.

Queries per node

Leaseholder Locality

At this point, our application is still connected to the old primary region, us-central1. The application continues to run, but at somewhat lower performance. The nodes of us-central1 can still accept SQL connections and serve queries, but the leaseholders and data are all now in another region; all the queries take longer because the gateway node in us-central1 has to communicate with leaseholder(s) in us-east4. Assuming we’re going to keep running the application where it’s currently deployed, we just need to update our application connection to point to our nodes in us-east4 (presumably via a load balancer for that region).

Now’s the time to make that change (step #8), in order to minimize the window during which our application will see higher latency.

Finish adding and dropping Regions

Now let’s add the other 2 new regions to the database, and drop the two regions that we want to evacuate:

Add and drop regions

Again, we verify the database configuration is what we expect:

Updated Zone Configuration

Monitoring the database distribution, we see after a time (about 2 hours, in this case) that all the replicas have completed their migration from the old regions to the new:

Replicas Migrating to new Regions

Updated Replica Distribution

Shrinking the Cluster

Our database is completely migrated, the nodes in our original 3 regions are empty (except for replicas of system ranges), and we’re ready to retire those nodes (step #9).

Following the documentation for how to decommission nodes from a CockroachDB cluster, we first drain the nodes we’re going to retire with the cockroach node drain command, then decommission each node with cockroach node decommission. We expect this to go quickly, as nearly all replicas and leaseholders have already been moved off of these nodes.

Draining Nodes

Decommissioning Nodes

We now have the end result we were aiming for: the cluster occupies a completely new set of nodes, and during the migration did not experience any downtime, only a short period of slightly degraded performance.

Final Node Map

With the two cluster migration exercises we’ve documented, we’ve demonstrated CockroachDB’s ability to handle some demanding real-world customer requirements:

  • scale a cluster up or down by adding or removing nodes
  • add or remove a cloud Region to or from an existing cluster
  • scale or migrate the cluster while it’s live and serving data, with no downtime

And we’ve illustrated some of the features that make CockroachDB attractive to both Application developers, Database administrators, and Site Reliability Engineers:

  • high availability, even during cloud provisioning
  • versatile and responsive data placement via SQL commands
  • DB Console monitoring of runtime behavior at different levels of the software stack:
    • client connections,
    • SQL execution,
    • inter-node network traffic,
    • data replication and relocation,
    • disk operations,
    • CPU and memory usage,
    • etc.

We invite you to learn more about CockroachDB from the documentation and from our online courses at Cockroach University, to read about the launch of version 22.2, and to try out CockroachDB if you haven’t already. (You can get started with a free cloud account!)

Top comments (0)