Roman Tsypuk for AWS Community Builders

Posted on Apr 5 • Originally published at tsypuk.github.io

From Redis to Valkey: pre-migration Reconnaissance — detect all apps & connections in realtime

#aws #database #devops #monitoring

Abstract

Abraham Lincoln: "Give me six hours to chop down a tree and I will spend the first four sharpening the axe"

Redis Valkey is getting more popular due to its performance increase compared to classic Redis version, I'm starting a seris of posts
related to migration from one vendors' Redis implemnetation to AWS Valkey.

To choose proper migration technics, the most important step is a reconnaissance of pre-migration. In this post I'll explain how native Redis features can help to identify all services that have connection to Redis (what is really hard in distributed environment of enterprise level infrastucture, that was created with periodically changed stack, languages, SDKs by multiple engineering generations).

Valkey project bried history

Redis first version release was in 2009, since that time it grown from cache key-value storage to PubSub, Stream, DB and used in a lot of projects.

Thus cloud providers AWS, GCP, Azure, Oracle started providing Redis as a managed service - allowing engineers offload cluster management and all heavy lifting to providers.

But Redis company provided own cloud - RedisCloud with Redis-managed-services. So to have more clients in RedisCloud and bigger monetization, starting from Redis version 7.4 the license was changed oriented to other cloud-providers to pay for Redis if it is offered as a managed service.

At that moment a git fork of Redis was created https://github.com/valkey-io/valkey under name Valkey and it was maintained both by open-source community and Cloud-providers, since this fork had original OS license. Redis is written on C.

Today there are 2 different repositories and projects, each having its own release version and code name:

Redis (https://github.com/redis/redis)
Valkey (https://github.com/valkey-io/valkey)

At this moment the latest release versions are:

Redis v8.6.2
Valkey v9.0.3 (on top of Redis v7.2.4)

So both projects are implementing different features in parallel, like multithreading that was added recently.

Pre-migration Reconnaissance

I'm preparing migration of Redis instances from Redis-cloud to AWS Valkey and before migration need info about data access patterns and all Redis producers/consumers.

Identify clients that are reading/writing to Redis:

There are different technics that can be used to identify all writers/readers of Redis - tracing and monitoring tools
like datadog, x-ray, analysys of ENV variables set for services with aliases to endpoint of Redis.

And things are getting more complex in real life. Do not be surprised to see in your enterprise-leve stack:

distributed environment
AWS multi-account deployments, privatelink, vpc-perrings established, etc.
tons of running services that are written in multiple languages (go, java, ts, ruby)
lack of documentation, stackholders, etc.

Here I will show the technical that I found extra useful, that is natively supported out of the box by Redis, does
not require installation of any third-party agents, monitoring stack, etc.

redis-cli

Redis out of the box has functionality that allows to gather information about clients. You need establish
connection to your Redis server through cli and execute commands.

LIST CLIENT

The CLIENT LIST command returns information and statistics about the client connections server in a mostly human
readable format.

You can use one of the optional subcommands to filter the list. The TYPE type subcommand filters the list by clients'
type, where type is one of normal, master, replica, and pubsub. Note that clients blocked by the MONITOR command
belong
to the normal class.

The ID filter only returns entries for clients with IDs matching the client-id arguments.

redis-cxxxxx.us-east-1.ec2.cloud.xxxx.com:6379> client list
id=3066004040000 addr=xx.xx.xx.135:30746 laddr=xx.20.4.124:18585 fd=4789 name= age=18701 idle=827 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name=go-redis(,go1.24.13) lib-ver=9.17.2
id=3081428040000 addr=xx.xx.xx.145:31848 laddr=xx.20.4.124:18585 fd=4959 name= age=538 idle=221 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name= lib-ver=
id=2956190040001 addr=xx.xx.xx.118:1604 laddr=xx.20.4.124:18585 fd=5117 name= age=148140 idle=5529 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=set user=xxx resp=2 lib-name= lib-ver=
id=3072343040001 addr=xx.xx.xx.126:46494 laddr=xx.20.4.124:18585 fd=4048 name= age=11493 idle=428 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name=go-redis(,go1.24.13) lib-ver=9.17.2
id=3078769040001 addr=xx.xx.xx.173:9717 laddr=xx.20.4.124:18585 fd=4184 name= age=3714 idle=1 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name=go-redis(,go1.24.13) lib-ver=9.17.2
id=3080824040001 addr=xx.xx.xx.158:35937 laddr=xx.20.4.124:18585 fd=5035 name= age=1267 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name=go-redis(,go1.24.13) lib-ver=9.17.2
id=3081894040001 addr=xx.xx.xx.245:36487 laddr=xx.20.4.124:18585 fd=3686 name= age=26 idle=5 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name= lib-ver=
...

The most interesting data:

id: a unique 64-bit client ID.
addr: address/port of the client.
laddr: address/port of local address client connected to (bind address).
name: the name set by the client with CLIENT SETNAME.
age: total duration of the connection in seconds.
idle: idle time of the connection in seconds.
db: current database ID.
cmd: last command played.
user: the authenticated username of the client.
lib-name - the name of the client library that is being used.
lib-ver - the version of the client library.

Having this information from active Redis will allow detecting all client, distinguish them by sesssionID and track
additional information about
library versio, library name, name of the client. Also additional details about the session.

MONITOR

MONITOR is a debugging command that streams back every command processed by the Redis server. It can help in
understanding what is happening to the database. This command can both be used via redis-cli and via telnet.

Because MONITOR streams back all commands, its use comes at a cost. The following (totally unscientific) benchmark
numbers illustrate what the cost of running MONITOR can be.

redis-cxxxxx.us-east-1.ec2.cloud.xxxx.com:6379> monitor
OK
1774787120.638084 [0 xx.xx.xx.222:40277] "get" "prefixkey:namespacea:default_data"
1774787120.649084 [0 xx.xx.xx.243:58512] "get" "prefixkey:namespace2:default_data"
1774787120.652084 [0 xx.xx.xx.222:40277] "get" "prefixkey:namespace3:default_data"
1774787120.687083 [0 xx.xx.xx.50:64040] "zrange" "prefixkey:namespace4:allowed_data" "0" "-1"

CLIENTNAME feature

Output from these commands will be enough in 99% of cases. If you are hosting your apps on EC2, EKS,
ECS, lambda functions - you can easily map IP addresses to dedicated pods/container IPs.

But I got a corner-case - the source Redis is running in RedisLabs - it is a cloud environment that provides
redis-as-a-service (under the hood deploying it to cloud infrasstructure, in case AWS EC2 instances).

But Consumer apps are running in kubernetes cluster and since RedisLabs is hosted externally out of AWS account the network
traffic flows through k8s cluster node.

Since on every kubernetes node is running a lot of payloads in each pod, but all of them when connecting to Redis will have
the same IP address — the IP address of the cluster node on which they are hosted.

So this info can limit the blast radius of services but since node afinity is not in use and due to big landscape of
services, it is hard to identify Redis writers/readers - all pods will have same address set of Cluster nodes.

And here comes CLIENT_NAME - this feature is available since Redis Open Source 2.6.9 - it assigns a name to the
current connection.

The assigned name is displayed in the output of CLIENT LIST so that it is possible to identify the client that
performed a given connection.

However, it is not possible to use spaces in the connection name as this would violate the format of the CLIENT LIST
reply.

Every new connection starts without an assigned name.

setting names to connections is a good way to debug connection leaks due to bugs in the application using Redis.

Once clients are instrumented and deployed, using redis-cli we can check the data.

Writing tool to track clients and operations on Redis:

To make this information in a table format I have created python-based tool that interacts with Redis using RESP
protocol and renders information:

┌────────────────┬─────────────────┬───────────────────────┬─────────┬─────────┬──────────────────────────────────┬─────┐
│ Client IP      │ Name            │ Lib                   │ Lib Ver │ User    │ Full Key                         │ GET │
├────────────────┼─────────────────┼───────────────────────┼─────────┼─────────┼──────────────────────────────────┼─────┤
│ 10.xx.xx.53    │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.xx.xx.173   │                 │ go-redis(,go1.21.1)   │ 8.0.2   │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│                │                 │                       │         │         │ prefixkey:namespace2:default_data│   1 │
│ 10.100.238.99  │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.100.244.76  │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.100.75.149  │                 │ go-redis(,go1.24.13)  │ 9.17.2  │ default │ prefixkey:namespace1:default_data│   1 │
│                │                 │                       │         │         │ prefixkey:namespace1:default_data│   1 │
│ 10.104.127.44  │                 │                       │         │ default │                                  │   0 │
│ 10.104.136.207 │                 │ go-redis(,go1.24.13)  │ 9.17.2  │ default │                                  │   0 │
│ 10.104.94.31   │                 │                       │         │ default │                                  │   0 │
│ 10.170.3.253   │ monitor-v2-prod │ python-socket-monitor │ 1.0     │ default │                                  │   0 │
└────────────────┴─────────────────┴───────────────────────┴─────────┴─────────┴──────────────────────────────────┴─────┘

For some period in realtime we can track what operations are performed on Redis server and who are the client.
Once all of them are identified and there no unknown areas it's time to analyze Redis instance and plan migraion.

Instrument client with clientName

Redis allows instrument redis SDK with name of the client to be identified on every connection. Here is go-lang code
snippet:

rdb := redis.NewClient(&redis.Options{
    Addr:       "REDIS_HOST:REDIS_PORT",
    Password:   "REDIS_PASSS",
    DB:         0,
    ClientName: "service1-writer",
})

Same can be done on every programming languages libraries, or even at the low-level of socket with Redis RESP-protocol
level:

def send_command(sock, *args):
  """Send a Redis RESP protocol command."""
  command = f"*{len(args)}\r\n"
  for arg in args:
    arg_str = str(arg)
    command += f"${len(arg_str)}\r\n{arg_str}\r\n"
  sock.sendall(command.encode("utf-8"))


def set_client_name(sock, client_name):
  """Assign a client name to the current Redis connection."""
  if not client_name:
    return
  send_command(sock, "CLIENT", "SETNAME", client_name)
  response = read_line(sock)
  if not response.startswith("+OK"):
    raise ConnectionError(f"CLIENT SETNAME failed: {response}")


def set_client_info(sock, lib_name=None, lib_version=None):
  """Assign client library metadata to the current Redis connection."""
  for attribute, value in (("LIB-NAME", lib_name), ("LIB-VER", lib_version)):
    if not value:
      continue
    send_command(sock, "CLIENT", "SETINFO", attribute, value)
    response = read_line(sock)
    # Redis < 7.2 may not support CLIENT SETINFO, so ignore failures.
    if response.startswith("-"):
      continue

Challenge 1: that there are too many connections to this RedisDB

With our custom monitor tool, now we are tracking all clients with details and suddenly see that there are too many
clients and keys that are being accessed.
Since Redis instance is used by multiple service and we are interested in extraction from it only specific prefix
pattern key, we can modify the monitor tool to track only those prefixes access.

add predicate for prefix of keyset—after that, you will see only client names, IPs and operations to a limited subset
of keystore are that is a subject of extraction.
{: .prompt-danger }

Challenge 2: client versions are too old and will not support Valkey

It's important to check SDK client versions—do not be surprised that you are using outdated Redis and outdated
libraries.
In this case, if you provision the latest AWS Valkey and will try to just change the endpoint URL in apps config that
will not work - you apps will get error connections in RESP procol and commands.

upgrade and align client versions to the latest supported redis SDK
{: .prompt-danger }

Completed redis client versions and clientnames alignment

After upgrading all clients code to latest golang redis library version (that is for today is 9.18.0) and adding client
name to redis sessions, now we can review active connections and see what keys are being accessed and ready for the
planning the actual migration:

┌────────────────┬─────────────────┬───────────────────────┬─────────┬─────────┬──────────────────────────────────┬─────┐
│ Client IP      │ Name            │ Lib                   │ Lib Ver │ User    │ Full Key                         │ GET │
├────────────────┼─────────────────┼───────────────────────┼─────────┼─────────┼──────────────────────────────────┼─────┤
│ 10.xx.xx.53    │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.xx.xx.173   │ service1-writer │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│                │                 │                       │         │         │ prefixkey:namespace2:default_data│   1 │
│ 10.xx.xx.149   │ service2-reader │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.142   │ service3-reader │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.141   │ service4-reader │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.129   │ service5-writer │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.159   │ service3-reader │ go-redis-(.go1.24.13) │ 9.18.0  │  xxxx   │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.44    │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.xx.xx.207   │ service2-reader │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │                                  │   0 │
│ 10.xx.xx.31    │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.xx.xx.253   │ monitor-v2-prod │ python-socket-monitor │ 1.0     │ xxxx    │                                  │   0 │
└────────────────┴─────────────────┴───────────────────────┴─────────┴─────────┴──────────────────────────────────┴─────┘

On top of this tool can be added read/write ios per client measurement, key access detection or any analytic
operations for troubleshooting & refactoring. Its intention is not limited to pre-migration analysis.
{: .prompt-danger }

Reconnaissance step output summary

updated and aligned library versions of all Redis SDKs
properly trackable client names and all connections
fully identified all writers/readers that are planned to be migrated to AWS Valkey
clear understanding of data access patterns (read/write to keystore)

In the next post, I’ll dive into Redis topologies and the critical components to consider when preparing and running a migration, as they depend on the chosen topology.

DEV Community