daniel jeong

Posted on May 29 • Originally published at manoit.co.kr

Valkey 9.1 Deep Dive — Database-Level ACLs, Lua-as-a-Module, and a New I/O Threading Model Hitting 2.1M RPS

#database #redis #devops #opensource

Valkey 9.1 Deep Dive — Database-Level ACLs, Lua-as-a-Module, and a New I/O Threading Model Hitting 2.1M RPS, Plus HGETDEL/MSETEX/CLUSTERSCAN Redefining the 2026 In-Memory Datastore Operations Standard

TL;DR

Valkey 9.1.0 (May 19, 2026) is the first minor release after the 9.0 GA, with 80+ contributors hardening security, observability, performance, efficiency, and tooling all at once.

Numbered database-level ACLs let you scope a user's permissions to specific databases (db=0,1), making single-cluster multi-tenant isolation practical.

Lua moved to a module (libvalkeylua.so) — pure cache workloads can drop Lua entirely and shrink the attack surface to zero.

A redesigned I/O threading model pushes a single server to 2.1M RPS (512-byte payloads, 9 IO threads, pipeline depth 10) and gives up to 17% more throughput.

New commands HGETDEL / MSETEX / CLUSTERSCAN, JSON logging (log-format json), main/IO thread usage metrics, and TLS auto-reload + SAN-URI mTLS round it out.

The Valkey community shipped Valkey 9.1.0 on May 19, 2026. Forked from Redis 7.4 under the Linux Foundation after the 2024 Redis license change (SSPL/RSALv2), Valkey crossed from "a Redis-compatible layer" into "a project with its own roadmap" at the 9.0 GA in October 2025. 9.1 builds on that foundation: 80+ contributors advanced security, observability, performance, efficiency, and tooling simultaneously. Compressed into one paragraph: (1) numbered database-level ACLs split per-tenant permissions at db granularity inside a single instance, (2) Lua scripting moved into its own module so you can turn it off entirely when unused, and (3) a new I/O threading model hit 2.1M RPS on a single server (512-byte payload, 9 IO threads, pipeline depth 10). Add (4) the HGETDEL/MSETEX/CLUSTERSCAN commands, (5) JSON logging plus main/IO thread usage metrics, and (6) TLS certificate auto-reload and SAN-URI mTLS. This article decomposes the root cause of each change from an operations standpoint, revisits the 9.0 foundation (Atomic Slot Migration, Hash Field Expiration, cluster-mode numbered DBs), and lays out the step-by-step upgrade, validation, and rollback playbook ManoIT applied to its internal cache/session clusters.

1. Why May 2026's Valkey 9.1 matters

Valkey's significance isn't a version number — it's the maturity 18 months after the fork. Right after the fork, the yardstick was "how Redis-compatible is it?" But once 9.0 added features not present in Redis OSS — Atomic Slot Migration, Hash Field Expiration, cluster-mode numbered DBs — the axis shifted to "the operational value of independent features." 9.1 continues that arc by concentrating on the two most operational areas: security and observability.

Date	Release / Event	Operational meaning
2024.03	Redis license change → Valkey fork (Redis 7.4 base)	BSD-3-Clause retained, Linux Foundation governance
2024.04	Valkey 8.0 — multithreaded I/O	Per-core throughput gains begin
2025.04	Valkey 8.1 — Vector Set, I/O improvements	Vector search / AI workload support
2025.10.21	Valkey 9.0 GA — Atomic Slot Migration, Hash Field Expiration, cluster numbered DBs, 1B RPS	Inflection beyond Redis compatibility
2026.05.19	Valkey 9.1.0 — DB-level ACLs, Lua-as-a-module, new I/O threading (2.1M RPS), HGETDEL/MSETEX/CLUSTERSCAN, JSON logging	Security/observability/efficiency become operational defaults

The two operational messages of 9.1: (1) you can solve multi-tenant isolation at db granularity without adding instances — DB-level ACLs directly cut the cost of instance separation; and (2) observability is now provided directly by the core, without a sidecar — JSON logging and thread usage metrics absorb gaps you previously filled with exporters and log parsers.

2. Security — Database-Level ACLs, Lua-as-a-Module, TLS Improvements

In-memory datastores traditionally said "we're fast, but security belongs upstream (app/network policy)." 9.1 re-locks that assumption at the core level.

2.1 Numbered database-level ACLs — the new multi-tenant isolation standard

Classic ACLs controlled which commands a user could run and which keys they could touch — but those rules applied to every database identically. Even if you split db 0 and db 5, permissions weren't split, so numbered DBs were hard to use as a multi-tenancy boundary. 9.1 adds a db= selector to scope a user's permissions to specific databases.

# Allow app-user only on db 0 and 1
> ACL SETUSER app-user on >secretpass +@all ~* db=0,1
OK

# After auth, db 0 works
> SELECT 0
OK
> SET mykey "hello"
OK

# db 2 is blocked
> SELECT 2
(error) NOPERM No permissions to access database

The operational payoff is large. The pattern of "one instance (or cluster) per tenant for isolation" can become per-tenant db + per-db ACL inside a single cluster when combined with 9.0's cluster-mode numbered DBs. Fewer instances → less memory overhead and operational burden. Caveat: numbered DBs are logical, not physical isolation, so for strongly regulated data (PII, payments) keep instance separation as well.

2.2 Lua scripting engine moved to a module — attack surface reduction

9.1 extracts the Lua scripting engine from the core server into its own module (libvalkeylua.so). Running arbitrary Lua via EVAL/EVALSHA is powerful but also a well-known attack vector (sandbox escape, resource exhaustion). The point of modularization is "don't load it if you don't need it." A pure cache workload with no scripting can drop the Lua module and reduce its attack surface to zero. Check which scripting engines are loaded via the new Scripting Engines section of INFO.

> INFO scripting_engines
# Scripting Engines
engine_lua:loaded=1,libname=libvalkeylua.so

2.3 TLS auto-reload and SAN-URI-based mTLS

9.1 directly tackles two chronic TLS operations pains — "an expired cert nobody noticed caused an outage" and "rotating certs requires a restart."

Improvement	Through 9.0	9.1
Cert expiry visibility	External monitoring only	`INFO` exposes TLS cert expiration dates
Cert rotation	Restart required (downtime)	Background auto-reload (zero downtime)
mTLS identity	CN-based	SAN-URI-based authentication

SAN-URI authentication integrates directly with workload-identity systems like SPIFFE/SPIRE, simplifying mTLS in service-mesh / zero-trust environments.

3. New Commands — HGETDEL / MSETEX / CLUSTERSCAN

9.1 absorbed "common patterns that used to need multiple round trips or a transaction" into single atomic commands.

3.1 HGETDEL — atomically get and delete hash fields

For queue patterns (read data and remove it immediately), you previously had to wrap HGET + HDEL in MULTI. HGETDEL does it in one shot.

> HSET job:42 status "pending" payload '{"action":"send_email"}' retries "3"
(integer) 3
> HGETDEL job:42 FIELDS 2 status payload
1) "pending"
2) "{\"action\":\"send_email\"}"
> HGETALL job:42
1) "retries"
2) "3"

3.2 MSETEX — set multiple keys with a shared TTL

Setting many keys with the same TTL used to require multiple SETEX calls or a SET+EXPIRE pipeline. MSETEX cuts round trips and supports idempotent sets via NX.

# Set 3 session keys, all expiring in 3600s
> MSETEX 3 session:abc "user:1" session:def "user:2" session:ghi "user:3" EX 3600
OK
> TTL session:abc
(integer) 3600

# NX: only set keys that don't already exist
> MSETEX 2 session:abc "user:99" session:xyz "user:4" NX EX 3600
OK
> GET session:abc
"user:1"

3.3 CLUSTERSCAN — cluster-wide key scanning

Iterating all keys in a cluster previously meant clients independently SCAN-ing each node and merging results. CLUSTERSCAN offers a single interface to traverse all nodes, with MATCH/TYPE/SLOT filters.

# Iterate all cluster keys (repeat until cursor returns 0)
> CLUSTERSCAN 0
1) "3"
2) 1) "user:1001"
   2) "user:1002"
   3) "session:abc"

# Filter by pattern
> CLUSTERSCAN 0 MATCH "session:*"
# Filter by type
> CLUSTERSCAN 0 TYPE hash
# Scan a specific slot
> CLUSTERSCAN 0 SLOT 7638

4. Performance — New I/O Threading Model Hits 2.1M RPS on a Single Server

9.1 pushes single-server throughput to 2.1M RPS under 512-byte payloads, 9 IO threads, pipeline depth 10. The core change is a redesigned inter-IO-thread communication model.

Improvement	Detail	Effect
New I/O threading model	Redesigned IO thread communication	Up to 17% higher throughput across workloads
Faster stream ops	`XRANGE`/`XREVRANGE` hot-path optimization	Up to 30% faster
Higher-throughput GETs	Raised string embedding size threshold	Up to 30% higher for string GET
Faster sorted-set queries	Skiplist query processing improvements	`ZRANGEBYSCORE`/`ZRANGEBYLEX` faster
Cached COMMAND responses	`COMMAND` responses are cached	Shorter client-init connection time
Hardware clock by default	Less time-syscall overhead	Up to 3% overall GET/SET improvement

Enabling the hardware clock by default looks minor but is global: it swaps the time lookup that every command makes from a syscall to a hardware counter. Validate monotonic-clock behavior on some virtualized/special environments before rolling out.

5. Efficiency — Memory Reduction and Rehashing / Bulk-Delete Optimization

As important as throughput is "the same data in less memory." 9.1 delivers meaningful savings on small strings and sorted sets.

Optimization	Target	Effect
String memory reduction	Strings < 128 bytes (internal pointer optimization)	Up to 20% less memory
Sorted-set memory reduction	Skiplist optimization	Up to 10% less memory
Rehashing performance	Internal hash-table rehashing on keyspace growth	Reduced latency impact during rehashing
Bulk delete	Pause resizing during `SREM`/`ZREM`/`HDEL`	Removes needless rehashing → faster bulk deletes
Replica creation	Reuse received RDB as AOF base when AOF enabled	No initial snapshot regeneration

"20% savings on sub-128-byte strings" is very tangible for services that store huge numbers of small strings — session tokens, flags, short cache values. A cache holding tens of millions of small keys cuts memory cost from the upgrade alone.

6. Observability — JSON Logging and Thread Usage Metrics

A long-standing Valkey/Redis ops weakness was "logs are human-readable plain text, awkward for observability tools to parse." 9.1 emits structured logs directly from the core via log-format json.

# valkey.conf
log-format json

Output is one JSON object per line — Loki/Elastic/CloudWatch can parse it immediately without custom grok patterns.

{"pid":14500,"role":"primary","timestamp":"14 May 2026 14:13:02.921","level":"notice","message":"oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"}
{"pid":14500,"role":"primary","timestamp":"14 May 2026 14:13:02.928","level":"warning","message":"WARNING: The TCP backlog setting of 511 cannot be enforced..."}
{"pid":14500,"role":"primary","timestamp":"14 May 2026 14:13:02.930","level":"notice","message":"Ready to accept connections tcp"}

The other key item is main/IO thread usage metrics. Valkey's threads busy-loop while waiting for work, so CPU can appear near 100% even when relatively idle — plain CPU metrics couldn't reveal true load. 9.1 adds cumulative usage metrics for the main and IO threads so you can measure "how busy is it really?" and tune accordingly. It's a direct basis for deciding whether to add IO threads (scale up).

7. Revisiting the 9.0 Foundation — Atomic Slot Migration, Hash Field Expiration, Cluster Numbered DBs

To use 9.1 well you must know the 9.0 foundation beneath it. The three pillars of 9.0 (2025-10-21) tie straight into operational stability.

7.1 Atomic Slot Migration — from key-by-key to slot-by-slot

Pre-9.0 cluster resharding was key-by-key move-then-delete. If a client touched a key mid-migration, it didn't know which node held it, adding hops; in multi-key ops with keys split across two nodes, the client had to retry; and a huge collection key could exceed the target node's input buffer and block the migration outright. 9.0 atomically moves an entire slot (of 16,384) in AOF format. The source node keeps all keys until the slot migration fully completes, so redirects, retries, and giant-key blocking disappear structurally. 9.1's valkey-cli uses this directly via the --cluster-use-atomic-slot-migration flag on --cluster rebalance/--cluster reshard.

7.2 Hash Field Expiration — per-field TTL

Hashes bundle many fields under one key, but pre-9.0 expiry was all-or-nothing at the key level. Expiring only some fields required multi-key hacks, adding complexity and memory. 9.0 added a per-field TTL command family: HEXPIRE, HPEXPIRE, HTTL, HGETEX, HSETEX, HPERSIST, and more. Combined with 9.1's HGETDEL, hash-based job queues and session stores become far cleaner.

7.3 Other 9.0 improvements

9.0 also delivered: 1B RPS on a 2,000-node cluster (large-cluster resilience), pipeline memory prefetch (up to 40% throughput), zero-copy responses (up to 20%), Multipath TCP (up to 25% lower latency), SIMD for BITCOUNT and HyperLogLog (up to 200%), polygon-based geospatial queries, conditional delete DELIFEQ, CLIENT LIST filtering, and restored usage recommendations for 25 previously deprecated commands. If 9.0 was "a leap in performance and features," 9.1 is "the security/observability/efficiency finish on top."

8. Operational Decisions — Upgrade / Migration Flow

Below is the 9.1 adoption decision flow used by ManoIT's platform team.

flowchart TD
    A[Current in-memory store] --> B{Engine?}
    B -->|Redis 7.2 or older OSS| C[Evaluate drop-in migration<br/>to BSD-3 Valkey 9.1]
    B -->|Valkey 8.x| D[9.1 minor upgrade]
    B -->|Redis 7.4+ SSPL| E[Decide after license policy review]
    C --> F{Use scripting?}
    D --> F
    F -->|No| G[Don't load Lua module<br/>shrink attack surface]
    F -->|Yes| H[Keep Lua module loaded]
    G --> I{Multi-tenant?}
    H --> I
    I -->|Yes| J[Consolidate instances via<br/>numbered DBs + per-db ACLs]
    I -->|No| K[Single-db operation]
    J --> L[Wire JSON logging + thread metrics<br/>into observability pipeline]
    K --> L
    L --> M[Validate in staging 2 weeks → gradual prod rollout]

9. ManoIT Internal Adoption Checklist

The checklist below turns the above into an internal operations procedure. ManoIT runs cache/session/ranking clusters in three tiers (dev/stage/prod) and validates even minor releases for two weeks in staging before prod.

#	Item	Owner	Done criteria
1	Inventory engine/version across all clusters (incl. Redis/Valkey mix)	Platform	Version matrix PR
2	Audit Lua scripting usage — trace `EVAL`/`EVALSHA` calls	Service owners	Identify scripting-free clusters
3	Upgrade dev cluster to 9.1 (keep Lua loaded, default config)	Platform	`INFO server` = 9.1.0
4	Client compatibility regression in dev — verify new commands/response changes	Service owners	Client SDK compatibility report
5	Design numbered DBs + per-db ACLs on multi-tenant candidate clusters	Platform	Tenant↔db↔ACL mapping doc
6	Drop Lua module on scripting-free clusters	Platform + Security	`INFO scripting_engines` = empty
7	Enable JSON logging (`log-format json`) → wire to Loki	Observability	Structured log collection + dashboard
8	Expose main/IO thread usage metrics to Prometheus + alarms	Observability	IO-thread-saturation alarm fire/resolve test
9	Validate TLS auto-reload + cert-expiry metric, pilot SAN-URI mTLS	Security	Zero-downtime cert rotation verified
10	Staging 9.1 upgrade + load test (`valkey-benchmark --warmup --duration`)	Platform	Zero throughput/latency regression report
11	Standardize `--cluster-use-atomic-slot-migration` on resharding	Platform	Resharding runbook updated
12	Gradual prod upgrade (replica → primary, slot-level verification)	Platform	All prod nodes 9.1.0 + zero-downtime availability
13	Measure real memory savings post-upgrade (small-string / sorted-set heavy clusters)	Platform	Before/after `used_memory` comparison
14	Validate rollback — 8.x downgrade path when new 9.1 commands are unused	Platform	Rollback rehearsal passed

10. Conclusion — A Minor Release That Made Security, Observability, and Efficiency the Operational Default

Sum up 9.1 in one line: "on the performance/feature leap 9.0 laid down, it adds the most operational finish — security, observability, and efficiency." Numbered DB-level ACLs open a path to multi-tenant isolation at db granularity without adding instances; Lua-as-a-module applies the zero-trust principle "turn off what you don't use" at the core. The new I/O threading model hit 2.1M RPS on a single server, and 20% memory savings on small strings hit cache cost directly. JSON logging and thread usage metrics absorb the observability gap you previously filled with sidecars and exporters, while HGETDEL/MSETEX/CLUSTERSCAN collapse common patterns' round trips and transactions into single commands.

Three things to remember operationally. (1) Audit Lua usage first — dropping the module shrinks the attack surface, but a careless removal breaks features that relied on EVAL. (2) Remember numbered DBs are logical isolation — per-db ACLs are powerful but not physical isolation, so keep instance separation for regulated data. (3) Don't skip staging validation, even for a minor release — global-behavior changes like the hardware clock default and new I/O threading are included, so pre-validate on special virtualization environments. The shortest one-line recommendation in this article: "Upgrade dev to 9.1 this week, and start the security PR to drop the Lua module on your scripting-free clusters first."

ℹ️ This article was written by ManoIT's automated blogging pipeline (Claude Opus 4.6 + Cowork Agent), analyzing the official Valkey 9.1.0 release blog (valkey.io, May 19, 2026), the Valkey 9.0 GA blog (Oct 21, 2025), the Linux Foundation 9.1 release announcement (PRNewswire), Phoronix's 9.1 review, and the valkey-io/valkey GitHub release notes as primary sources. Command syntax, performance figures, and flag names reflect official docs as of publication (2026-05-29) and may change in future patches. Verify current status at valkey.io/commands and GitHub Releases before applying to production. The internal adoption example is adapted from ManoIT platform team's operational procedures.

Originally published at ManoIT Tech Blog.

DEV Community