Michael Smith

Posted on May 4

Redis Array: The Long Road to a Powerful Data Structure

#discuss #news #tech #ai

Redis Array: The Long Road to a Powerful Data Structure

Meta Description: Discover the Redis array: short story of a long development process — how this data structure evolved, what it can do today, and how to use it effectively in your stack.

TL;DR

Redis didn't arrive at its array-handling capabilities overnight. From simple string-based workarounds in the early days to the rich, production-ready data structures available today, the journey of managing array-like data in Redis is a story of pragmatic engineering, community-driven iteration, and hard-won lessons. This article walks you through that evolution, explains the current best practices, and gives you actionable guidance on choosing the right Redis data structure for your use case.

Key Takeaways

Redis has never had a native "array" type — developers have historically used Lists, Sets, Sorted Sets, and Hashes to approximate array behavior
The introduction of RedisJSON (now part of Redis Stack) was the closest thing to true array support Redis has ever offered
Performance tradeoffs between List, Hash, and JSON approaches are significant — choosing wrong can cost you at scale
Redis 7.x and the Redis Stack modules (as of 2026) represent the most mature, production-ready state of array-like data handling in Redis history
Serialization strategies remain one of the most underappreciated sources of latency in Redis-heavy applications

Introduction: A Data Structure That Wasn't There (Until It Kind Of Was)

If you've worked with Redis for any meaningful length of time, you've probably asked yourself: "Why doesn't Redis just have a proper array type?"

It's a fair question. Arrays are arguably the most fundamental compound data structure in programming. Yet Redis — one of the world's most widely deployed in-memory data stores — took a famously winding path to provide anything resembling native array support. The Redis array: short story of a long development process is really a story about how a tool built for speed and simplicity had to grow up without losing either quality.

Understanding that journey isn't just historical trivia. It directly informs how you should structure your data today.

[INTERNAL_LINK: Redis data structures overview]

The Early Days: Strings, Serialization, and Suffering

The Original Workaround

When Redis launched in 2009, Salvatore Sanfilippo (antirez) was solving a very specific problem: making a fast, persistent key-value store that could handle real-time data. The initial data model was intentionally minimal.

Early adopters who needed to store array-like data had two choices:

Serialize the whole array as a string — JSON-encode your array, store it as a single Redis string, retrieve the whole thing, deserialize it in your application, modify it, then write it back
Use Redis Lists — a linked-list implementation that offered O(1) push/pop at both ends but O(n) random access

Neither was ideal. The serialization approach had an obvious problem: you couldn't atomically update a single element. Every update required a full read-modify-write cycle, creating race conditions in concurrent environments and introducing unnecessary network overhead.

The List approach was better for queue-like patterns but awkward for random-access array semantics. If you needed the 47th element of a 10,000-item list, Redis had to traverse the list from one end — not exactly the O(1) behavior you'd expect from an array index.

Why This Mattered in Production

Consider a real-world example: a leaderboard system storing player scores. In 2011, a typical implementation might store each player's score history as a serialized JSON string. A simple "append new score" operation meant:

GET player:12345:scores (fetch ~2KB of data)
Deserialize in application memory
Append new score
Re-serialize
SET player:12345:scores (write ~2KB back)

At 10,000 requests per second, this pattern generates enormous unnecessary bandwidth and CPU overhead — both on the Redis server and the application tier.

[INTERNAL_LINK: Redis performance optimization tips]

The Middle Period: Hashes, Sorted Sets, and Clever Workarounds

Hashes as Pseudo-Arrays

Redis Hashes (introduced early and formalized by Redis 2.0) gave developers a more flexible tool. A Hash maps string field names to string values within a single key. Developers quickly realized you could fake array indexing by using numeric field names:

HSET myarray 0 "value_a"
HSET myarray 1 "value_b"
HSET myarray 2 "value_c"
HGETALL myarray

This was genuinely useful. You could now update a single "element" with O(1) complexity:

HSET myarray 1 "updated_value_b"

No full read-modify-write cycle required. The tradeoff? You lost ordering guarantees. HGETALL doesn't return fields in insertion order (at least not reliably across Redis versions), and there was no built-in concept of "length" beyond counting fields with HLEN.

Sorted Sets: The Unsung Hero

For ordered array-like data, Sorted Sets (ZSets) emerged as an unexpectedly powerful tool. By using the array index as the score:

ZADD myarray 0 "value_a"
ZADD myarray 1 "value_b"
ZADD myarray 2 "value_c"
ZRANGE myarray 0 -1

You got ordered retrieval, O(log n) insertion, and range queries essentially for free. The catch: member values must be unique. You can't have two identical values at different positions, which limits the pattern for general-purpose array storage.

Data Structure	Random Access	Ordered	Duplicates	Atomic Updates	Ideal Use Case
String (serialized)	❌ Full read	✅	✅	❌	Small, infrequently updated arrays
List	O(n)	✅	✅	Push/Pop only	Queues, stacks
Hash (numeric keys)	O(1)	❌	✅	✅	Sparse arrays, record fields
Sorted Set	O(log n)	✅	❌	O(log n)	Ranked/ordered unique data
RedisJSON Array	O(n) path	✅	✅	✅ (path-based)	True nested arrays

The Turning Point: RedisJSON and the Arrival of Real Array Support

What RedisJSON Changed

The release of RedisJSON (originally a Redis Labs module, later integrated into Redis Stack) was the closest thing to a genuine paradigm shift in how Redis handles array-like data. For the first time, you could store actual JSON documents — including nested arrays — and manipulate individual elements using JSONPath syntax.

JSON.SET user:1 $ '{"name": "Alice", "scores": [95, 87, 91]}'
JSON.ARRAPPEND user:1 $.scores 88
JSON.GET user:1 $.scores

This was transformative. You could now:

Append to a nested array without fetching the entire document
Get the length of an array with JSON.ARRLEN
Pop elements with JSON.ARRPOP
Insert at specific indices with JSON.ARRINSERT
Search within arrays using JSONPath filter expressions

The Performance Reality Check

RedisJSON arrays aren't magic. The underlying implementation stores JSON documents as a tree structure in memory, and operations that modify array elements still require internal tree traversal. For very large arrays (tens of thousands of elements), operations can become noticeably slower than equivalent Hash or Sorted Set operations.

A 2024 benchmark study by the Redis community found that for arrays under ~1,000 elements, RedisJSON's path-based operations were competitive with Hash-based approaches. Beyond that threshold, the overhead of JSONPath evaluation became measurable.

Practical recommendation: If you're storing arrays of more than ~5,000 elements and need frequent random access, consider whether a Hash with numeric string keys might actually serve you better than a RedisJSON array.

[INTERNAL_LINK: RedisJSON performance benchmarks]

Redis in 2026: Where Things Stand Today

Redis Stack and the Unified Module Ecosystem

As of 2026, Redis Stack bundles RedisJSON, RediSearch, RedisTimeSeries, and RedisBloom into a single, cohesive offering. The array story has matured considerably:

JSONPath support is now fully compliant with the RFC 9535 specification
Index integration means you can create secondary indexes on array elements via RediSearch, enabling queries like "find all users whose scores array contains a value over 90"
RESP3 protocol improvements have reduced serialization overhead for complex data types

Managed Redis Services: The Practical Choice

For most teams in 2026, running Redis yourself is increasingly rare. The managed services have matured significantly:

Redis Cloud — The official managed offering from Redis Ltd. Excellent integration with Redis Stack modules, including full RedisJSON support. Best choice if you're heavily invested in the module ecosystem. Pricing can be steep at scale.
Upstash — Serverless Redis with per-request pricing. Excellent for variable workloads and edge deployments. RedisJSON support is available but check current module compatibility for your specific use case.
AWS ElastiCache for Redis — Solid operational reliability, but module support (including RedisJSON) has historically lagged behind Redis Cloud. Verify current module availability before committing.

Practical Guide: Choosing Your Redis Array Strategy

Decision Framework

Use this framework when deciding how to store array-like data in Redis:

Use a Redis List when:

Your primary operations are push/pop from either end
You need queue or stack semantics
Array length is bounded and manageable
You don't need random access by index

Use a Redis Hash (numeric keys) when:

You need O(1) random access to individual elements
Array elements are independent (updating one doesn't affect others)
You're comfortable managing your own "length" counter
Array size could exceed a few thousand elements

Use a Sorted Set when:

Elements have a natural numeric ranking
All values are unique
You need range queries (e.g., "elements at positions 10-20")

Use RedisJSON Arrays when:

Your data is naturally nested or document-shaped
You need to store arrays alongside other structured data
Array size stays under ~5,000 elements for frequent-access patterns
You want to leverage RediSearch indexing on array contents

Code Example: The Right Pattern for Score Histories

Here's a production-ready pattern for storing a user's score history, using RedisJSON:

import redis
import json

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

# Initialize user with empty scores array
r.json().set('user:1001', '$', {
    'username': 'alice',
    'scores': [],
    'metadata': {'created': '2026-01-15'}
})

# Append new score atomically - no read-modify-write needed
r.json().arrappend('user:1001', '$.scores', 94)
r.json().arrappend('user:1001', '$.scores', 87)

# Get array length
length = r.json().arrlen('user:1001', '$.scores')

# Get last 5 scores using array slicing (Redis Stack 2.4+)
recent_scores = r.json().get('user:1001', '$.scores[-5:]')

[INTERNAL_LINK: Redis Python client best practices]

Common Mistakes and How to Avoid Them

Mistake 1: Storing Large Arrays as Serialized Strings in 2026

This pattern should be extinct by now, but it persists in legacy codebases. If you're still doing SET mykey (json.dumps(my_array)), you're leaving performance on the table and creating concurrency hazards. Migrate to RedisJSON or a Hash-based approach.

Mistake 2: Using Lists for Random Access

Lists are O(n) for index-based access. If you find yourself doing LINDEX mylist 847, your data model needs rethinking.

Mistake 3: Ignoring Memory Implications

Redis stores everything in RAM. A 10,000-element RedisJSON array with complex nested objects can easily consume several megabytes per key. Use MEMORY USAGE keyname regularly to audit your largest keys.

Mistake 4: Not Setting Expiry on Temporary Arrays

If you're using Redis arrays for temporary computation or session data, always set a TTL. Memory leaks from forgotten keys are a silent killer in production Redis deployments.

The Honest Assessment: Redis Arrays in 2026

Redis has come a remarkably long way from its "serialize everything to a string" origins. The Redis array story — the short story of a long development process — is ultimately a story about pragmatic evolution. Each intermediate solution (Lists, Hashes, Sorted Sets) solved real problems while creating new constraints. RedisJSON finally provided something close to first-class array support, but it came with its own performance envelope that developers need to understand.

The good news: in 2026, you have genuinely excellent options. The bad news: there's still no single "Redis array" that works optimally for every use case. Understanding the tradeoffs remains essential.

Conclusion and CTA

The evolution of array handling in Redis mirrors the broader maturation of the entire ecosystem — from a scrappy, opinionated key-value store to a multi-model data platform. Whether you're maintaining a legacy application that still serializes arrays to strings or building a new service on Redis Stack, understanding this history helps you make better architectural decisions today.

Ready to modernize your Redis data model? Start by auditing your current usage with redis-cli --bigkeys to identify oversized serialized arrays, then evaluate whether RedisJSON or a Hash-based approach better fits your access patterns. The migration path is more straightforward than you might think — and the performance gains are real.

[INTERNAL_LINK: Migrating from string serialization to RedisJSON]

Frequently Asked Questions

Q1: Does Redis have a native array data type?

No, Redis does not have a built-in "array" primitive in the way that programming languages do. However, RedisJSON (part of Redis Stack) supports JSON arrays as a first-class document type, and Redis Lists, Hashes, and Sorted Sets can all be used to approximate array behavior depending on your access patterns.

Q2: What's the difference between a Redis List and a Redis array?

A Redis List is a doubly-linked list that supports O(1) push/pop operations at both ends but O(n) random access. A true "array" would offer O(1) random access by index. For O(1) random access in Redis, a Hash with numeric string keys is the closest native approximation, while RedisJSON arrays offer path-based access with JSONPath.

Q3: Is RedisJSON production-ready for large-scale applications?

Yes, as of 2026, RedisJSON is mature and production-ready. It's used by major enterprises in high-traffic environments. The main caveat is performance at very large array sizes (5,000+ elements with frequent random access), where Hash-based approaches may outperform it. Always benchmark with your specific data shape and access patterns.

Q4: How does Redis handle concurrent writes to an array?

Redis is single-threaded for command execution, so individual commands are inherently atomic. For multi-step operations (like read-modify-write sequences), use Redis Transactions (MULTI/EXEC) or Lua scripts to ensure atomicity. RedisJSON's path-based commands (like JSON.ARRAPPEND) are atomic by default, which is one of their key advantages over serialized string approaches.

Q5: Should I use Redis Cloud or self-hosted Redis for RedisJSON in production?

For most teams, a managed service like Redis Cloud is the pragmatic choice — you get automatic failover, module updates, and operational support without the overhead of managing Redis yourself. Self-hosted Redis makes sense if you have strict data residency requirements, very high scale where managed pricing becomes prohibitive, or deep internal Redis expertise. For teams under ~50GB of data with standard availability requirements, managed services almost always win on total cost of ownership.

DEV Community

Redis Array: The Long Road to a Powerful Data Structure

Redis Array: The Long Road to a Powerful Data Structure

TL;DR

Key Takeaways

Introduction: A Data Structure That Wasn't There (Until It Kind Of Was)

The Early Days: Strings, Serialization, and Suffering

The Original Workaround

Why This Mattered in Production

The Middle Period: Hashes, Sorted Sets, and Clever Workarounds

Hashes as Pseudo-Arrays

Sorted Sets: The Unsung Hero

The Turning Point: RedisJSON and the Arrival of Real Array Support

What RedisJSON Changed

The Performance Reality Check

Redis in 2026: Where Things Stand Today

Redis Stack and the Unified Module Ecosystem

Managed Redis Services: The Practical Choice

Practical Guide: Choosing Your Redis Array Strategy

Decision Framework

Code Example: The Right Pattern for Score Histories

Common Mistakes and How to Avoid Them

Mistake 1: Storing Large Arrays as Serialized Strings in 2026

Mistake 2: Using Lists for Random Access

Mistake 3: Ignoring Memory Implications

Mistake 4: Not Setting Expiry on Temporary Arrays

The Honest Assessment: Redis Arrays in 2026

Conclusion and CTA

Frequently Asked Questions

Top comments (0)