DEV Community: JosephAkayesi

Design a Distributed Key-Value Store

JosephAkayesi — Fri, 22 May 2026 03:57:50 +0000

Key-Value Stores: Design and Architecture

A key-value store is a non-relational database that maps unique keys to values. Keys must be unique, and each key maps directly to a value — typically a string or object.

It supports two core operations: put(key, value) and get(key). Simple in concept, but serious tradeoffs emerge at scale — every design must balance read performance, write performance, and memory usage.

Single-Node Simplicity

On a single node, a key-value store is straightforward. All operations are atomic — reads always return the latest state, writes are immediately consistent, and there's no coordination overhead.

The problem is scale. As data grows, a single instance hits capacity limits and you need to distribute data across multiple nodes. That's where things get complicated.

The CAP Theorem

Distributed systems must contend with network partitions — they're unavoidable. The CAP theorem states that when a partition occurs, you can guarantee at most two of three properties: Consistency, Availability, and Partition Tolerance.

Ideal scenario — no partition, full consistency and availability:

When a partition occurs, you must choose:

Choose consistency — block all writes until the failed node recovers, preventing stale data. Essential for systems like banking where inconsistency is unacceptable.
Choose availability — keep accepting reads and writes, sync the failed node when it returns. Reads may be stale, but the system stays responsive.

Data Partitioning

To distribute data across nodes, we use consistent hashing. Servers and keys are both mapped onto a hash ring. A key is stored on the first server encountered when moving clockwise from the key's position.

Naive consistent hashing creates two problems: uneven key distribution and unequal partition sizes. Virtual nodes solve this — each physical server occupies multiple positions on the ring, smoothing out distribution proportionally to each server's capacity.

Key benefits of partitioning: automatic scaling, higher storage capacity, improved data locality, and heterogeneous server support.

Data Replication

Partitioning spreads data but doesn't protect it. Replication does — by storing copies of each key across N distinct nodes, ideally across separate data centers to guard against localized failures like power outages.

If one node goes down, others can still serve the request. But if enough nodes fail simultaneously, requests may block while waiting for the cluster to recover. That's where quorum comes in.

Quorum

Quorum controls how many nodes must acknowledge a read or write before it's considered successful. Given N total nodes, R read replicas, and W write replicas:

Configuration	Effect
R = 1, W = N	Optimized for fast reads
W = 1, R = N	Optimized for fast writes
W + R > N	Strong consistency guaranteed (e.g. N=3, W=R=2)
W + R ≤ N	Strong consistency not guaranteed

The key insight behind W + R > N: any read set and any write set must overlap, so at least one node always has the latest data.

Sloppy Quorum

Standard quorum can still return stale data. If a write lands on node s1 but a subsequent read hits s2, the client sees stale data — since reads and writes are load-balanced, you can't control which nodes they hit.

Sloppy quorum addresses this. Rather than requiring acknowledgement from a strict set of N nodes, it allows any available nodes in the cluster to temporarily handle requests when the designated nodes are unavailable. This keeps the system responsive during partial failures while still satisfying the W + R > N invariant once the cluster stabilizes.

For a 3-node cluster with W = R = 2, a majority of nodes must acknowledge every read and write. This guarantees that the read and write sets always overlap — so at least one node in any read quorum will always have the latest data.

Inconsistency Resolution

Even with quorum, replicas can diverge. Versioning handles this: each write increments a version number, and on subsequent writes clients must first fetch the latest version before incrementing and writing. Readers resolve conflicts by comparing versions and discarding stale data.

DynamoDB uses this reconciliation technique on the read path.

Consistency Models

Model	Behaviour
Strong consistency	All reads return the most up-to-date value
Weak consistency	Reads may return stale data
Eventual consistency	Reads may be stale, but replicas converge over time

Failure Detection

Relying on a single server to report node failures is unreliable and noisy at scale. The Gossip Protocol offers a better approach:

Each node maintains a membership list with heartbeat counters. Nodes periodically increment their counters and share them with random peers. If a node's heartbeat stops incrementing past a defined threshold, it's marked offline.

This is decentralized, low-overhead, and scales well.

Handling Failures

Temporary failures use hinted handoff: when a node is unavailable, a neighbor absorbs its writes and syncs them back once the original node recovers.

Permanent failures use anti-entropy with Merkle trees — a data structure that efficiently identifies which portions of two replicas differ, minimizing the data transferred during resync. For full data center outages, cross-datacenter replication is essential.

System Architecture

Core design principles:

Clients interact via get(key) and put(key, value)
A coordinator node acts as a proxy between the client and the cluster
Nodes are distributed on a consistent hashing ring
The system is fully decentralized — no single point of failure
Every node is symmetric, handling the same set of responsibilities

Write Path

Write is appended to a commit log (durability)
Data is written to an in-memory cache
When the cache exceeds a threshold, it flushes to an SSTable on disk

Read Path

The system first checks the in-memory cache and returns immediately on a hit. On a miss:

Check the in-memory cache — on a miss, proceed
Consult the Bloom filter to identify which SSTables likely contain the key
Query the relevant SSTables and return the result to the client

Summary

Component	Purpose
Consistent hashing	Distribute keys evenly across nodes
Virtual nodes	Fix uneven distribution in consistent hashing
Replication	Prevent data loss; ensure availability
Quorum (W + R > N)	Balance consistency and availability
Versioning	Resolve conflicts between replicas
Gossip protocol	Decentralized failure detection
Hinted handoff	Handle temporary node failures
Merkle trees	Efficient sync after permanent failures
Bloom filters	Fast disk lookups on the read path

Design Consistent Hashing

JosephAkayesi — Wed, 18 Mar 2026 22:09:44 +0000

Consistent Hashing — System Design Deep Dive

A consistent hashing algorithm is a technique that allows distributed systems to evenly distribute requests and data across a cluster of servers.

At scale, consistent hashing becomes a fundamental building block for building reliable and horizontally scalable systems.

Why Do We Need Consistent Hashing?

Consistent hashing provides several important benefits:

✅ Minimizes data redistribution when servers join or leave
✅ Promotes even distribution of load across servers
✅ Protects systems from cascading failures during topology changes
✅ Reduces unnecessary cache misses at scale

Real-World Examples

Consistent hashing appears everywhere in modern distributed systems:

Amazon DynamoDB uses it to partition and replicate data
Apache Cassandra uses it to distribute data across nodes
Akamai CDN uses it to route requests to edge servers

In reality, the exact implementation depends entirely on your system's access patterns and infrastructure requirements.

The Rehashing Problem

Before understanding consistent hashing, we need to understand the problem it solves.

There are n servers in your cluster. Each server's index is computed using:

server_idx = hash(key) % n

hash = Hash function to hash all keys
n    = Size of your server pool

To fetch a key we perform the following operation:

f(key) % 4

The output is the server on which that key is stored.

This approach works fine when the server pool is fixed and data is distributed evenly.

Problem: Server Changes

When a server is added or removed, nearly all keys must be remapped.

Below is how server indexes are affected when we remove a server from the pool:

The new distribution of keys after the server is removed:

Most keys are redistributed — not only those that were previously stored on the server that went offline.

Core Consistent Hashing Concepts

Most consistent hashing implementations are built on a few foundational ideas:

Hash Space and Hash Ring
Hash Servers
Hash Keys
Server Lookup

Let's walk through each one.

Hash Space and Hash Ring

Imagine a hash algorithm whose output goes from x0, x1 ... xn.

For example, if our hash function is hash(key) % 100, the hash space runs from 0 to 99. In real systems, the space is far larger — SHA-1 runs from 0 to 2^160 - 1.

We then connect the two ends of the hash space to form a hash ring.

Hash Servers

We use our hashing function to map servers onto the ring using either the server name or IP address.

Note: Most real-world systems use a different hashing function for servers than for keys.

Hash Keys

Keys are hashed and placed onto the same ring.

Server Lookup

To determine which server a key belongs to, we move clockwise from the key's position on the ring until we hit the nearest server.

Adding and Removing Servers

Add a Server

When a new server is added, only a subset of keys need to be redistributed — those whose clockwise path now reaches the new server first.

In the example below, only key0 was redistributed. All other keys remain intact.

Remove a Server

When a server is removed, only the keys that were mapped to it are redistributed to the next server clockwise.

Two Issues in the Basic Approach

The basic consistent hashing approach introduces two problems:

1. Uneven partition sizes — When servers are added or removed, partitions can become imbalanced, leading to uneven load.

2. Non-uniform key distribution — Keys may cluster on certain servers, leaving others underutilized.

A technique called virtual nodes is used to solve both of these problems.

Virtual Nodes

In addition to real nodes, we place multiple virtual nodes for each server sparsely across the hash ring. Each server is now responsible for multiple partitions.

To find a key's server, we locate the nearest virtual node clockwise from the key's position.

As the number of virtual nodes increases, key distribution becomes more balanced.

Finding Affected Keys

When a server is added:

Move anticlockwise from the new server's position to the nearest existing server. All keys in that range are redistributed onto the new server.

When a server is removed:

The affected range starts at the removed server and moves anticlockwise until the next server is found. Keys in that range are redistributed to the surviving server.

Algorithm Comparison

Algorithm	Redistribution on Change	Handles Hotspots	Complexity
Simple Modulo Hashing	All keys remapped	❌ No	Low
Basic Consistent Hashing	Subset of keys	⚠️ Partial	Medium
Consistent Hashing + Virtual Nodes	Subset of keys	✅ Yes	Medium-High

Request Flow

Client sends a request with a key
Key is hashed onto the ring
System moves clockwise to find the nearest server (or virtual node)
Request is forwarded to that server
On topology change, only affected keys are redistributed

Consistent Hashing in Distributed Systems

Scaling introduces new challenges even with consistent hashing.

Hotspot Problem

Even with virtual nodes, certain keys (e.g. celebrity data in social networks) can generate disproportionate traffic to one server.

Solutions:

Further subdivide hotspot partitions
Add dedicated servers for high-traffic keys
Apply application-level caching in front of the ring

Replication

For fault tolerance, keys are typically replicated to the next N servers clockwise on the ring. This ensures data survives individual node failures without a full redistribution.

Monitoring and Observability

After deployment, monitoring consistent hashing behavior is critical.

Track:

Load distribution across nodes
Redistribution events on topology changes
Virtual node count and balance metrics
Replication lag across replicas
Hotspot frequency and severity

Consistent hashing is not a "set and forget" mechanism — it requires continuous tuning of virtual node counts and replication factors as your cluster evolves.

Final Thoughts

Consistent hashing is more than just a smarter way to distribute keys. It is a core scalability mechanism that:

stabilizes systems during cluster topology changes,
ensures fair distribution of load across nodes,
and enables systems to scale horizontally with confidence.

Choosing the right configuration of virtual nodes and replication strategy depends heavily on your traffic patterns, data size, and fault tolerance requirements.

Design it carefully — because at scale, consistent hashing becomes part of your system's scalability strategy.

Design a Rate Limiter

JosephAkayesi — Mon, 02 Mar 2026 20:21:29 +0000

Rate Limiting — System Design Deep Dive

A rate limiter is a piece of software that regulates how much traffic a client can send to a server within a given period of time.

At scale, rate limiting becomes a fundamental building block for building reliable and cost-efficient systems.

Why Do We Need Rate Limiting?

Rate limiters provide several important benefits:

✅ Prevent denial-of-service (DoS) attacks
✅ Promote fair usage of shared resources
✅ Protect backend services from overload
✅ Reduce infrastructure and operational costs

Real-World Examples

Rate limiting appears everywhere in modern applications:

Users can share up to 150 posts per day
Users can post 300 tweets within 2 hours
Users can make two withdrawal transactions within 15 seconds

In reality, rate limits depend entirely on your application's access patterns and business rules.

Where Can Rate Limiters Live?

Rate limiters can be deployed in different parts of the system:

1️⃣ Client-Side Rate Limiting

Client-side rate limiting happens within the application itself.

Pros

Reduces unnecessary requests early
Improves perceived responsiveness

Cons

Less secure
Clients can tamper with requests and bypass restrictions

2️⃣ Server-Side Rate Limiting

Server-side rate limiting enforces rules centrally.

Pros

Strong enforcement
Cannot be bypassed easily
Reliable tracking of usage

3️⃣ API Gateway / Middleware Layer

A very common approach is placing the rate limiter at the API gateway.

This allows all incoming traffic to be evaluated before reaching backend services.

Core Rate Limiting Algorithms

Most industry rate limiters are based on a few well-known algorithms.

Fixed Window Counter
Sliding Window Counter
Token Bucket
Leaky Bucket

Let’s walk through each one.

Fixed Window Counter

In a fixed window counter, clients can make a specific number of requests within a fixed time interval.

Example:

100 requests per minute

Problem: Burstiness

A client could send:

100 requests at 00:59
Another 100 requests at 01:00

Result: 200 requests within seconds, even though the limit is 100 per minute.

Sliding Window Counter

Instead of resetting counters at fixed intervals, the sliding window evaluates requests relative to the current time.

When a request arrives:

The system checks how many requests occurred during the previous time window.
If the limit is exceeded, the request is rejected.

This significantly reduces burst traffic compared to fixed windows.

Token Bucket

The token bucket is one of the most widely used rate limiting algorithms.

How it works

A bucket contains tokens.
Each token allows one request.
Tokens refill at a constant rate.
Requests consume tokens.
If no tokens remain → request is rejected.

Burst traffic is allowed as long as tokens are available.

More expensive operations can consume multiple tokens.

This makes token buckets ideal for high-traffic APIs.

Leaky Bucket

The leaky bucket processes requests at a constant rate.

Think of it as a FIFO queue:

Requests enter the queue
Requests are processed steadily
When the queue is full, new requests are dropped

This smooths traffic spikes and ensures consistent processing.

High-Level Architecture

A rate limiter typically acts as middleware between clients and servers.

Every incoming request is evaluated before reaching the API.

If a request exceeds limits, the server responds with:
HTTP 429 — Too Many Requests

Helpful Rate Limit Headers

Servers often return headers to help clients behave correctly:

Header	Meaning
`X-RateLimit-Remaining`	Remaining allowed requests
`X-RateLimit-Limit`	Maximum allowed requests
`X-RateLimit-Retry-After`	Seconds before retrying

Rule Configuration

Rate limiting rules define what is allowed.

Example:

Maximum 5 marketing messages per day
Maximum 5 login attempts per minute

Rules are typically:

Stored on disk or configuration services
Loaded into cache by workers
Evaluated in middleware during requests

Request Flow

Client sends request
Request reaches rate limiter middleware
Rules are loaded from cache
Counters and timestamps are checked
Request is either forwarded or throttled

Rate Limiting in Distributed Systems

Scaling introduces new challenges.

Race Conditions

Multiple concurrent requests may update counters simultaneously.

Example:

Limit = 3 requests/sec
Two threads read counter value = 2
Both allow requests → limit exceeded

Solutions:

Atomic operations
Redis sorted sets
Distributed locks (with performance tradeoffs)

Synchronization Problems

In distributed systems:

Requests may hit different servers
Replication lag causes stale counters
Limits become inconsistent

Sticky sessions can help but are usually avoided due to operational complexity.

Centralized Rate Limiting (Global Cache)

A common solution is using a centralized datastore like Redis.

All nodes read and update shared counters.

Tradeoffs:

Potential single point of failure
Increased latency for global users

Performance Optimization

A better large-scale solution is a multi–data center architecture.

Deploy rate limiter nodes close to users
Maintain regional counters
Synchronize data using eventual consistency

Benefits:

Reduced latency
Improved user experience
Better global scalability

Monitoring and Observability

After deployment, monitoring is critical.

Track:

Rate limit hit frequency
False positives
Traffic patterns
Algorithm effectiveness
User impact

Rate limiting is not a “set and forget” system — it requires continuous tuning.

Final Thoughts

Rate limiting is more than just protecting APIs from abuse. It is a core reliability mechanism that:

stabilizes systems under load,
ensures fairness,
and controls operational costs.

Choosing the right algorithm and architecture depends heavily on your traffic patterns, scale, and consistency requirements.

Design it carefully — because at scale, rate limiting becomes part of your system’s resilience strategy.

AWS Secrets Manager Agent

JosephAkayesi — Thu, 12 Feb 2026 10:51:36 +0000

AWS Secrets Manager Agent

What is the AWS Secrets Manager Agent?

When building applications, you often need to provide developers with access to sensitive information such as database credentials, API keys, or authentication tokens. However, you do not want these secrets shared insecurely (for example, via email or hardcoded in source code).

AWS Secrets Manager allows you to securely store and manage secrets. Your application can then retrieve these secrets at runtime to connect to services such as:

Amazon RDS
Amazon DocumentDB
Third-party APIs
Internal services

Typically, the application calls AWS Secrets Manager directly to retrieve the secret whenever it needs it.

The Scaling Problem

For applications with a small user base, retrieving secrets directly from AWS Secrets Manager works well.

However, as your system scales, this approach can become inefficient:

If your application retrieves a secret on every request
And your system handles a large number of requests (for example, hundreds of thousands or millions)

You may end up making an extremely high number of API calls to Secrets Manager.

This can lead to:

Increased latency
Higher costs
API rate limiting
Potential throttling

How the AWS Secrets Manager Agent Solves This

The AWS Secrets Manager Agent addresses this issue by acting as a local caching layer.

According to the official documentation:

“The AWS Secrets Manager Agent is a local HTTP service that you can install and use in your compute environments to read secrets from Secrets Manager and cache them in memory.”

Instead of your application calling AWS Secrets Manager directly:

The application makes a request to a local HTTP endpoint (for example, localhost).
The agent retrieves the secret from AWS Secrets Manager.
The agent caches the secret in memory.
Subsequent requests are served from the in-memory cache.

This significantly reduces the number of direct calls to AWS Secrets Manager and helps prevent rate limiting.

Where It Can Be Used

The Secrets Manager Agent is a client-side HTTP service that standardizes secret retrieval across compute environments, including:

AWS EC2
Amazon ECS
Amazon EKS
AWS Lambda

Because it exposes an HTTP interface, it is language-agnostic and works with any application stack.

Configuration Options

The Secrets Manager Agent can be configured with:

Maximum number of connections
Cache time-to-live (TTL)
Localhost HTTP port
Cache size

This allows you to control performance, memory usage, and secret refresh behavior.

Benefits of Using the Secrets Manager Agent

Client-side HTTP caching layer
Standardized secret consumption across compute types
Works with EC2, ECS, EKS, and Lambda
Language-agnostic and open source
Can fetch live credentials, reducing the need for container restarts when using static environment variables
Built-in protection against server-side request forgery (SSRF)
Post-quantum TLS enabled by default

When Should You Use the Secrets Manager Agent?

Use the Secrets Manager Agent when:

Your application frequently retrieves secrets
You are operating at scale and want to avoid rate limiting
You want to reduce latency caused by repeated API calls
You want a standardized way to consume secrets across multiple compute environments
You want to avoid restarting containers when secrets rotate

When You May Not Need It

You may not need the Secrets Manager Agent if:

Your application retrieves secrets only once at startup
Your traffic is low and rate limits are not a concern
You are already using another secure caching mechanism
You inject secrets at deployment time and do not require runtime retrieval

Example Usage Scenario

Imagine an application running on Amazon ECS that connects to an RDS database.

Without the Agent

Each container retrieves database credentials directly from AWS Secrets Manager.
Under high load, this can result in excessive API calls.

With the Agent

The container queries the local Secrets Manager Agent endpoint.
The agent retrieves and caches the database credentials.
Subsequent requests are served from memory.
API calls to AWS Secrets Manager are significantly reduced.

Summary

The AWS Secrets Manager Agent is a local HTTP caching service that improves scalability, reduces latency, and helps prevent rate limiting when retrieving secrets from AWS Secrets Manager.

It is particularly useful for high-traffic, distributed systems where secrets are accessed frequently and must be securely managed without compromising performance.

A Framework for System Design Interviews

JosephAkayesi — Sat, 07 Feb 2026 09:37:06 +0000

System design interviews often feel vague at a high level. You may be asked to design a large-scale system, one that took years to build, in under an hour. Clearly, it is impossible to design such a system in full detail within that time. So what, then, is the real purpose of a system design interview?

A system design interview is not about building a production-ready system. It is an exercise designed to evaluate your ability to reason about complex problems, structure a solution, and explain design decisions along with their tradeoffs. Interviewers are far more interested in how you think than in the final architecture you propose.

Every system design problem is different, and there is no one-size-fits-all solution. That said, there are common steps you can follow to approach most system design interviews effectively.

Below is a simple and practical template for navigating system design interviews.

A 4-step process for effective system design interviews

Step-1 Understand the problem and establish design scope**

Do not rush into designing the system before fully understanding the problem. Start by clarifying the requirements and constraints.

Key questions to ask include:

What does the system do?
How many users will the system support?
What guarantees must the system provide?
- Availability?
- Consistency?
- Scalability?
What is the expected growth pattern?
Are there existing services or components we can leverage?

These questions help define the scope of the system and guide your design decisions and tradeoffs.

Note that these are only sample questions. The exact questions you ask will depend on the system you are asked to design. Sometimes the interviewer will provide direct answers, and other times you will be expected to make reasonable assumptions.

As you go, clearly document all requirements and assumptions. This ensures you do not miss important details and gives you a solid reference point to design against.

Example:
Say you are asked to design a news feed system.

The conversation between you and your interview may look something like this.

Interviewer:

Design a news feed system.

Candidate:

Before I start designing, I would like to clarify the core requirements. What are the main features we want to support?

Interviewer:

Users should be able to see a personalized feed of posts from accounts they follow.

Candidate:

Understood. To size the system properly, how many users do we expect to support?

Interviewer:

Around 100 million monthly active users, with about 20 percent daily active users.

Candidate:

So roughly 20 million daily active users. How frequently do users create posts?

Interviewer:

Each active user creates about one post per day.

Candidate:

How often do users read or refresh their feed compared to posting?

Interviewer:

Users read their feed much more frequently, about 50 feed reads per post.

Candidate:

What are the latency and freshness requirements?

Interviewer:

Feed loads should be under 200 milliseconds, and new posts should appear in followers’ feeds within a few seconds.

Step 2: Propose a high-level design and get buy-in

Once the problem and scope are clear, propose a high-level design. At this stage, treat the interviewer as a teammate and align on the overall approach before going deeper.

Ask yourself what the main components of the system are and how they interact to solve the problem.

For the news feed example, the system can be broken down into two main flows:

Feed publishing: A user creates a post, which is validated and persisted.
Feed building: A user’s feed is generated by aggregating posts from followed accounts, typically in reverse chronological order.

This level of abstraction ensures you and the interviewer agree on the system’s structure before diving into details.

Step 3: Deep dive into key components

After agreeing on the high-level design, prioritize the most critical components and dive deeper into them. Focus on areas that are central to the system’s functional requirements.

This is where you discuss:

Data models
Read versus write tradeoffs
Scalability strategies
Caching
Bottlenecks and failure modes

Explain why you choose certain approaches and what tradeoffs they introduce. This step is often the core of the interview.

Step 4: Wrap up and discuss improvements

In the final stage, the interviewer may ask follow-up questions such as:

How would you improve the system?
What happens if the user base grows by 100 times?
What are the main bottlenecks?
Where does the system fall short?

No system is perfect. Being open about limitations and discussing how you would address them leaves a strong impression and demonstrates maturity in system design thinking.

Prefix sums and range queries

JosephAkayesi — Mon, 26 Jan 2026 14:52:23 +0000

Discovering prefix sums is one of those moments where you go aha!

It’s staggering how memory-efficient this technique is.

Prefix sums are used when you want to compute values over a range —

i.e. range queries.

Example

Let’s say you want to compute the number of YouTube views over different time periods.

views   = [7, 1, 2, 5, 2, 6, 8]
periods = [[0, 1], [0, 5], [1, 4], [3, 3]]

output = [8, 23, 10, 5]

i represents the i-th day in the views array
views[i] is the number of views on that day
Each element in periods represents a range over which we want the cumulative sum

For example, periods[0] = [0, 1] means:

compute the cumulative views from day 0 to day 1 (inclusive).

A naïve approach

A valid approach would be:

For each period, find the sub-array for that range
Iterate through it and sum all the elements
Append the result to an output array

The problem with this approach is that we recompute many overlapping ranges without keeping track of previous work.

Can we do better?

Prefix sums to the rescue 🚀

Yes. An optimal approach is to use a prefix sum.

The idea is simple:

Compute a cumulative sum array once
Each element at index i stores the sum of values from index 0 to i

Let’s call this array prefix.

prefix[0] = views[0]
prefix[1] = views[0] + views[1]
prefix[2] = views[0] + views[1] + views[2]
prefix[3] = views[0] + views[1] + views[2] + views[3]
...

For our example:

prefix = [7, 8, 10, 15, 17, 23, 31]

Querying ranges with prefix sums

For any period [l, r]:

If the range starts at day 0, the answer is simply:

sum = prefix[r]

Because each prefix[r] already represents the cumulative sum from day 0 to day r.

What if the range does not start at 0?

Let’s say the period is [1, 4].

We know:

prefix[4] = views[0] + views[1] + views[2] + views[3] + views[4]
prefix[0] = views[0]

So to get the sum from day 1 to day 4, we subtract what came before the range:

sum = prefix[r] - prefix[l - 1]

In general:

sum = prefix[r] - prefix[l - 1]

This works because:

prefix[r] gives the total up to the end of the range
prefix[l - 1] gives everything before the range
Subtracting removes what we don’t care about

Intuition (what’s really happening)

prefix[4] = views[0] + views[1] + views[2] + views[3] + views[4]
prefix[0] = views[0]
-----------------------------------------------
result    =          views[1] + views[2] + views[3] + views[4]

Everything before the range is discarded. We only keep what’s inside [l, r].

Single-day ranges

For a period like [3, 3]:

prefix[3] = views[0] + views[1] + views[2] + views[3]
prefix[2] = views[0] + views[1] + views[2]

sum = prefix[3] - prefix[2]
sum = 15 - 10
sum = 5

Which is exactly views[3].

That’s the magic of prefix sums:

precompute once, answer every range query in O(1).

DAGs & Topological Sorting

JosephAkayesi — Sat, 24 Jan 2026 13:26:19 +0000

Directed acyclic graph (DAG)

These are graphs that do not have cycles.
Hence they can be topologically sorted.
When graph is arranged from the starting node it moves in one direction.
These graphs do not visit their predecessors; only successors.

Topological sorting

DAGs come up a lot when we speak about topological sorting.

Topological sorting is a really cool technique for solving a myriad of problems.

1. Prerequsites for courses in college
2. CICD pipelines where one stage needs to run before the other.
3. Data engineering pipelines
4. Git commit log graph

The pattern that comes up for DAGs are that all nodes depend forward.

Any problem that has this pattern can be solved using topological sorting.

Notice that all arrows move in one direction.
None of the arrows depends backwards.
All nodes depend forward.

🌲 Finding the Longest Aligned Chain in a Binary Tree

JosephAkayesi — Sat, 24 Jan 2026 09:56:32 +0000

The Challenge

Given a binary tree, we say a node is aligned if its value is the same as its depth (where the root is at depth 0). Our goal is to return the length of the longest descendant chain of aligned nodes.

Note: The chain must follow a parent-to-child path, but it does not need to start at the root.

Thought Process

To find the longest chain, we need to traverse the tree while tracking our current depth. A Bottom-Up Depth First Search (DFS) works best here:

State Tracking: Pass the current depth down as we move through the tree.
Recursive Step: For any node, the longest aligned chain starting at that node depends on the results of its left and right children.
The Logic:
If node.value == depth, this node is aligned. Its chain length is 1 + max(left_child_chain, right_child_chain).
If node.value != depth, the chain breaks. We return 0 to the parent, but we must have already recorded the maximum found so far elsewhere.
Global Max: Use a member variable to track the maximum chain length encountered during the entire traversal.

Example:
Input Array: [7, 1, 3, 2, 8, 2, None, 4, 3, None, None, 3, 3]

Depth 1: Node(1) is aligned.
Depth 2: Node(2) is aligned.
Depth 3: Node(3) is aligned.
Result: 3 (Chain: 1 → 2 → 3)

Final Solution

from typing import Optional, List
from collections import deque

class TreeNode:
    def __init__(self, value: int):
        self.value = value
        self.left: Optional["TreeNode"] = None
        self.right: Optional["TreeNode"] = None

class Solution:
    def __init__(self):
        self.res = 0

    def alignedChained(self, root: Optional[TreeNode]) -> int:
        def dfs(node: Optional[TreeNode], depth: int) -> int:
            if not node:
                return 0

            # Post-order traversal: Get values from children first
            left = dfs(node.left, depth + 1)
            right = dfs(node.right, depth + 1)

            if node.value == depth:
                # Node matches depth, extend the chain
                current_chain = 1 + max(left, right)
                self.res = max(self.res, current_chain)
                return current_chain

            # Chain breaks here
            return 0

        dfs(root, 0)
        return self.res

Key Takeaways

In tree problems, always identify if you need information from parents (Top-down) or children (Bottom-up). Here, we needed both (depth from parent, chain length from children).
The "chain" logic is very similar to the "Longest Path" or "Diameter" problems on LeetCode.

Scale from Zero to Millions of Users

JosephAkayesi — Mon, 19 Jan 2026 09:23:43 +0000

Scale from Zero to Millions of Users

By Joseph Akayesi

Originally published on Medium: https://medium.com/@josephakayesi/scale-from-zero-to-millions-of-users-3e91daa771d9

Designing systems to support millions of users can be challenging. It requires refinement and continuous improvement. In this article, we’ll walk through how to design a system that scales from zero users to millions of users.

Single Server Setup

At the beginning, a single-server setup is enough to serve a small number of users. Every component required to service requests exists on this single server, including:

Web server
Database
Cache
Other supporting services

Request Flow in a Single Server Setup

A user makes a request to your website using a URL.
DNS resolution occurs and translates the URL to an IP address.
The user sends a request to the web server using that IP address.
The web server processes the request and responds with an HTML page.

This setup works well initially but quickly becomes insufficient as traffic grows.

Separating the Database Layer

A single-server setup cannot handle increasing traffic efficiently. As the user base grows, the web application layer must be separated from the database layer so both can scale independently.

Which Databases Should You Use?

There are generally two main types of databases:

Relational Databases

Relational databases maintain strict referential integrity. They enforce relationships between tables and organize data into rows and columns.

Popular relational databases include:

MySQL
PostgreSQL
OracleDB

Non-Relational (NoSQL) Databases

Non-relational databases provide looser referential integrity and are designed to store unstructured or semi-structured data, often as documents or key-value pairs.

Popular non-relational databases include:

MongoDB
DynamoDB
CouchDB

Vertical Scaling vs Horizontal Scaling

As your user base grows, your system must handle increased traffic. There are two primary scaling strategies:

Vertical Scaling

Vertical scaling increases the capacity of a single server by adding more resources such as:

CPU
RAM
Storage

Vertical scaling is simple and works well for low to moderate traffic, but it has a hard upper limit.

Horizontal Scaling

Horizontal scaling adds more servers to form a cluster. This approach allows the system to scale beyond the limits of a single machine.

Load Balancer

When many users access your system concurrently, servers can become overloaded. A load balancer solves this problem.

A load balancer sits between clients and servers and distributes incoming requests to the next available server.

Types of Load Balancers

Application Load Balancer (ALB)
Network Load Balancer (NLB)

With a load balancer, the web tier becomes highly available, as requests can be routed across multiple servers.

What About the Data Tier?

While the web tier may now be redundant, the data tier often still consists of a single database, making it a single point of failure.

This problem is addressed using database replication.

Database Replication

Database replication distributes copies of data across multiple machines.

Advantages of Database Replication

Better performance
High availability
Improved reliability

Typical Request Flow with Replication

A user gets the load balancer IP from DNS.
The user connects to the load balancer.
The request is routed to one of the web servers.
Read operations go to replica (slave) databases.
Write, update, and delete operations go to the master database.

Cache

A typical web application consists of a web tier and a data tier. To improve performance and reduce latency, we introduce a cache.

A cache stores frequently accessed or expensive-to-compute data in memory for fast retrieval.

Popular caching systems include:

Redis
Memcached
Valkey

Cache Tier

The cache tier sits between the web servers and the database.

Frequently accessed data is stored in the cache
Subsequent requests are served directly from the cache
Database load is significantly reduced

Common Caching Strategies

Read-through
Read-around
Write-back

Caches typically store data as key-value pairs and support expiration times.

Cache Considerations

Use cache only for temporary data
Maintain consistency between cache and database
Use expiration policies to avoid stale data
Avoid single points of failure by using cache clusters
Choose appropriate eviction policies (LRU, LFU, MRU)

Content Delivery Network (CDN)

A Content Delivery Network (CDN) is a globally distributed network of servers that serve static content, such as:

HTML files
Images and videos
JavaScript and CSS files

CDNs store copies of static content closer to users, reducing latency and improving performance.

CDN Considerations

CDNs cost money due to data replication
Cache expiration must be carefully configured
Always provide a fallback in case the CDN fails

Stateless Web Tier

Stateful Architecture

In a stateful architecture, user session data is stored on the server. This forces users to connect to the same server throughout their session, which limits scalability.

Stateless Architecture

In a stateless architecture:

User session data is stored externally (database or cache)
Any server in the cluster can handle any request
Scalability and fault tolerance improve significantly

Data Centers

Large-scale systems often operate across multiple data centers to enable automatic failover and global availability.

Message Queues

Message queues enable asynchronous processing and further scalability.

They support event-driven architectures, where:

Producers publish events
Consumers process events asynchronously

This decouples services and improves reliability, especially for long-running or heavy operations.

Logs, Metrics, and Automation

As systems grow, observability becomes critical.

Logs help debug issues
Metrics provide insight into performance and capacity
Automation ensures reliability and consistency

These are essential in large-scale systems with many moving parts.

Database Scaling

As data volume grows, the database layer must also scale.

Vertical Database Scaling

Add more CPU, memory, and storage
Simple but limited by hardware constraints

Horizontal Database Scaling

Add more database instances
Distribute data across instances
Enables near-infinite scalability

Sharding

Horizontal database scaling is commonly achieved through sharding.

Sharding breaks data into smaller partitions distributed across multiple database instances.

Requests are routed to the correct shard using techniques such as consistent hashing.

Drawbacks of Sharding

While powerful, sharding introduces complexity:

Resharding data is difficult
Celebrity (hot key) problem
Joins and normalization become harder

Careful planning is required before adopting sharding.

Conclusion

Scaling from zero to millions of users is an incremental journey. By evolving your architecture step by step and applying the right techniques at the right time, you can build systems that are scalable, resilient, and performant.