Fabio Ghirardello for Cockroach Labs

Posted on Oct 9 • Edited on Oct 15

CockroachDB Graceful Shutdowns

#cockroachdb

This is a brief extension to the Node Shutdown section discussed in blog Repaving CockroachDB cluster node VMs the easy way. In this blog, we go through some hands-on experiments which will help you observe the client, load-balancer and CockroachDB nodes behavior.

Create a local 9 node cluster

Start a 3x3 node cluster, locally.
Run each of these commands in its own terminal window.

# NY region
cockroach start --insecure --store=node1 --listen-addr=localhost:26291 --http-addr=localhost:8091 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=NY,dc=a
cockroach start --insecure --store=node2 --listen-addr=localhost:26292 --http-addr=localhost:8092 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=NY,dc=b
cockroach start --insecure --store=node3 --listen-addr=localhost:26293 --http-addr=localhost:8093 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=NY,dc=c

# TX region
cockroach start --insecure --store=node4 --listen-addr=localhost:26281 --http-addr=localhost:8081 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=TX,dc=a
cockroach start --insecure --store=node5 --listen-addr=localhost:26282 --http-addr=localhost:8082 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=TX,dc=b
cockroach start --insecure --store=node6 --listen-addr=localhost:26283 --http-addr=localhost:8083 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=TX,dc=c

# CA region
cockroach start --insecure --store=node7 --listen-addr=localhost:26271 --http-addr=localhost:8071 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=CA,dc=a
cockroach start --insecure --store=node8 --listen-addr=localhost:26272 --http-addr=localhost:8072 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=CA,dc=b
cockroach start --insecure --store=node9 --listen-addr=localhost:26273 --http-addr=localhost:8073 --join=localhost:26291,localhost:26281,localhost:26271 --locality=region=CA,dc=c

Init the cluster, then start a SQL prompt

cockroach init --insecure --port 26291

cockroach sql --insecure --port 26291

Open the DB Console at http://localhost:8091/#/reports/localities to review the topology.

We configure cluster settings so that the server pauses for 45s before initialing the drain, and 70s for the connection pool to close off all existing connections.

SET CLUSTER SETTING server.shutdown.initial_wait = '45s';
SET CLUSTER SETTING server.shutdown.connections.timeout = '70s';

HAProxy configuration

Check the HAProxy .cfg files.

# haproxy-ny.cfg

global
  maxconn 4096

defaults
    mode                tcp
    retries             3
    timeout connect     10s
    timeout client      10m
    timeout server      10m
    option              clitcpka

listen psql
    bind :26290
    mode tcp
    balance roundrobin
    option httpchk GET /health?ready=1
    default-server inter 10s fall 3 rise 2
    server cockroach1 127.0.0.1:26291 check port 8091
    server cockroach2 127.0.0.1:26292 check port 8092
    server cockroach3 127.0.0.1:26293 check port 8093

    # Secondary (TX LB)
    server tx_lb 127.0.0.1:26280 check port 8080 backup

    # Tertiary (CA LB)
    server ca_lb 127.0.0.1:26270 check port 8070 backup

listen http
    bind :8090
    mode tcp
    balance roundrobin
    server cockroach1 127.0.0.1:8091
    server cockroach2 127.0.0.1:8092
    server cockroach3 127.0.0.1:8093

# haproxy-tx.cfg

global
  maxconn 4096

defaults
    mode                tcp
    retries             3
    timeout connect     10s
    timeout client      10m
    timeout server      10m
    option              clitcpka

listen psql
    bind :26280
    mode tcp
    balance roundrobin
    option httpchk GET /health?ready=1
    default-server inter 10s fall 3 rise 2
    server cockroach1 127.0.0.1:26281 check port 8081
    server cockroach2 127.0.0.1:26282 check port 8082
    server cockroach3 127.0.0.1:26283 check port 8083

    # Secondary (NY LB)
    server ny_lb 127.0.0.1:26290 check port 8090 backup

    # Tertiary (CA LB)
    server ca_lb 127.0.0.1:26270 check port 8070 backup

listen http
    bind :8080
    mode tcp
    balance roundrobin
    server cockroach1 127.0.0.1:8081
    server cockroach2 127.0.0.1:8082
    server cockroach3 127.0.0.1:8083

# haproxy-ca.cfg
global
  maxconn 4096

defaults
    mode                tcp
    retries             3
    timeout connect     10s
    timeout client      10m
    timeout server      10m
    option              clitcpka

listen psql
    bind :26270
    mode tcp
    balance roundrobin
    option httpchk GET /health?ready=1
    default-server inter 10s fall 3 rise 2
    server cockroach1 127.0.0.1:26271 check port 8071
    server cockroach2 127.0.0.1:26272 check port 8072
    server cockroach3 127.0.0.1:26273 check port 8073

    # Secondary (TX LB)
    server tx_lb 127.0.0.1:26280 check port 8080 backup

    # Tertiary (NY LB)
    server ny_lb 127.0.0.1:26290 check port 8090 backup

listen http
    bind :8070
    mode tcp
    balance roundrobin
    server cockroach1 127.0.0.1:8071
    server cockroach2 127.0.0.1:8072
    server cockroach3 127.0.0.1:8073

Notice that we have lines:

option httpchk GET /health?ready=1
default-server inter 10s fall 3 rise 2

server tx_lb 127.0.0.1:26280 check port 8080 backup

HAProxy will

probe every 10s and declare a server down after 3 failed attempts.
probe server endpoint /health?ready=1 for server health.
if all servers in the pool are down, we have backup servers available.

Start the Load Balancers, running each command in its own terminal window.

# NY LB
haproxy -f haproxy-ny.cfg

# TX LB
haproxy -f haproxy-tx.cfg

# CA LB
haproxy -f haproxy-ca.cfg

Running the Client App

The simple python app test.py creates a connection pool, then queries and prints the elapsed time, the node location and the connection object details.

#!/usr/bin/env python3
import time

from psycopg_pool import ConnectionPool

# note we specify the NY, TX, CA LBs in the connection string.
# by default, the first host is chosen unless the host is down

DB_DSN = "postgres://root@localhost:26290,localhost:26280,localhost:26270/defaultdb?sslmode=disable"


# Create a pool with exactly 5 connections
# the `check` params checks the validity of the connection before handing it out of the pool.
# if the conneciton is unhealthy, a new one will be recreated.

pool = ConnectionPool(
    conninfo=DB_DSN,
    min_size=5,
    max_size=5,
    open=True,
    max_lifetime=60,
    check=ConnectionPool.check_connection,
    kwargs={"autocommit": True},
)

start_time = time.time()


while True:
    for x in range(3):
        with pool.connection() as conn:
            with conn.cursor() as cur:
                rs = cur.execute("show locality", ()).fetchone()

                print(int(time.time() - start_time), rs, conn)

    print()
    time.sleep(5)

This is sufficient to test connections health, and it allows to observe what connection we are using and to what server node is connected.

Notice in the test.py file, in the pool init command, we set a max_lifetime=60 seconds.
Also, we have set that connections should be validated before handed out using check=ConnectionPool.check_connection.

Also, this app will only connect to the NY LB, however, in the connection string I have also configured the TX and CA loadbalancers as failback options.

DB_DSN = "postgres://root@localhost:26290,localhost:26280,localhost:26270/defaultdb?sslmode=disable"

Start the app.

# you first need to install the dependencies
pip install psycopg[pool] psycopg-binary

# now you can run the app
python test.py

Let the app run continuosly.

Notice that we are connected to the NY LB, identifiable by port 26290, and are correctly reaching a cockroachdb NY based node, region=NY,dc=a.
Finally, keep an eye on the 0x1041... memory address, which identifies the connection object.
You will see there are 5 distinct objects, but after 60s, these will be replaced by a new set of connections.

5 ('region=NY,dc=a',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x104161090>
6 ('region=NY,dc=b',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x104147310>
6 ('region=NY,dc=c',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x104146e10>

11 ('region=NY,dc=a',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x104161090>
11 ('region=NY,dc=a',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x101d55210>
11 ('region=NY,dc=b',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x104147310>

16 ('region=NY,dc=c',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x104146810>
16 ('region=NY,dc=c',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x104146e10>
16 ('region=NY,dc=a',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x104161090>

If below tests succeed, you will NOT see any error messages, and instead all events have been handled gracefully.

Test 1 - node shutdown

We test a node going down for maintenance.

Pick any of the terminals where you started a NY node, and Ctrl+C.
The node will start a graceful shutdown.
The node sets its healthcheck endpoint to HTTP503, signaling to the LB that the server is unhealthy.
initial_wait timeout will start.
The server is however still working. Both existing and new connections succeed.

After about 30s, the LB has declared this server as down and will be removed from the LB pool.

[WARNING]  (42103) : Server psql/cockroach1 is DOWN, 
reason: Layer7 wrong status, code: 503, 
info: "Service Unavailable", check duration: 0ms. 
2 active and 2 backup servers left. 
0 sessions active, 0 requeued, 0 remaining in queue.

Therefore, new connection requests will be routed elsewhere.
After 45s, the initial_wait is done and the node starts to drain.
connections_timeout of 75s kicks in. Existing connections are allowed to continue working, waiting for the connection pool to retire them.
After about 60s, the connection pool will have retired all existing connections and recreated them elsewhere.
connections_timeout expires, and any remaining connection will be severed.
Eventually, node shutdown completes, gracefully.
The app won't have noticed any disruption.
Bring the node back up.

Test 2 - datacenter shutdown

In this test, all nodes in a given datacenter go down. The NY load balancer should route all traffic to TX LB.

Note from the HAProxy config file, I have added backup servers in case the 3 primary servers fail, and note that the 2 backup servers are the LB of the other regions.

Start a graceful shutdown for all 3 NY servers, using Ctrl+C.
initial_wait timeout kicks in, and the healthcheck endpoint returns HTTP503.

After about 30s, the NY HAProxy will mark all NY nodes as unavailable, and route new connections to the TX LB.

[WARNING]  (90981) : Server psql/cockroach1 is DOWN, 
reason: Layer7 wrong status, code: 503, 
info: "Service Unavailable", check duration: 0ms. 
2 active and 2 backup servers left. 1 sessions active, 0 requeued, 0 remaining in queue.

[WARNING]  (90981) : Server psql/cockroach2 is DOWN, 
reason: Layer7 wrong status, code: 503, 
info: "Service Unavailable", check duration: 0ms. 
1 active and 2 backup servers left. 2 sessions active, 0 requeued, 0 remaining in queue.

[WARNING]  (90981) : Server psql/cockroach3 is DOWN, 
reason: Layer7 wrong status, code: 503, 
info: "Service Unavailable", check duration: 0ms. 
0 active and 2 backup servers left. 
Running on backup. 2 sessions active, 0 requeued, 0 remaining in queue.

initial_wait is completed, and the connections_timeout kicks in. New connections will not be accepted.
After about 60s, all connections have been closed by the conn pool.

Everything proceeds like before. Despite all nodes going down, the app still found a way to the CockroachDB cluster, in this case, to TX, see below the transition from NY to TX, still via the NY LB (port 26290)

125 ('region=NY,dc=b',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x10b940490>
125 ('region=NY,dc=c',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x10b925590>
125 ('region=TX,dc=b',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x10b91f110>

130 ('region=NY,dc=b',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x10b91e390>
130 ('region=TX,dc=c',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x10b91ce90>
130 ('region=TX,dc=b',) <psycopg.Connection [IDLE] (host=localhost port=26290 user=root database=defaultdb) at 0x10b9405d0>

Bring all 3 nodes back up. Soon, the LB will start rerouting new connections to the NY nodes again.

Test 3 - LB goes down

In this test, we simulate the NY LB crashing.

When the NY LB goes down, all its connections will be broken, so they need to be recreated.
I have instantiated the connection pool with the check parameter so that every connection is health-checked before it is given to the caller.

The pool will verify the connection, identify that it is broken, delete it and create a new one.
As however the NY LB is down, the pool will use the 2nd host listed in the connection string, which is the TX LB.

Stop the NY LB by using Ctrl+C.
Check the app output. Notice that without throwing any error, connections now are established against port 26280, which is TX.
Bring the NY LB back up.
After all connections have been recycled, after 60s, you will see that new connections are routed back to 26270, NY.

Conclusion

By careful configuration, you can handle all services disruption gracefully; even if you don't use HAProxy as your LB solution, you will find in your LB very similar configuration options.

It is important to understand that all 3 services need to be configured, not just CockroachDB.

The example tests in this blog should help you validate your own setup, even if you don't use psycopg_pool as your connection pool. For Java apps, HikariCP can do exactly the same things.

You can now stop all processes and delete the node* directory in your filesystem.