Madhukar Vissapragada

Posted on Jun 26 • Edited on Jul 16

Caching — From Fundamentals to Production Systems

#systemdesign #interview #softwareengineering #showdev

Section 1: What is Caching?

Caching is a technique to store a small amount of data for faster retrieval and higher throughput.

Caching is used in two scenarios:

High read frequency — when a particular piece of data is requested a very high number of times
Expensive computation — when fetching data requires complex joins or aggregations — caching the result avoids recomputing it every time

Computer memory hierarchy

To understand why we cache only a small amount of data, we need to look at how computer memory is structured. The higher up in the hierarchy, the faster the read — but the less space available.

Level	Speed	Size
Registers	Ultra fast	Bytes
CPU Cache	Very fast	Kilobytes
RAM	Fast	Gigabytes
Flash / SSD	Slower than RAM	Terabytes
Hard Disk	Slowest	Terabytes

The key insight — the smaller the storage, the faster the retrieval. Caching works by moving frequently accessed data up this hierarchy, closer to the CPU and away from the slow disk.

Section 2: Caching at Various Levels of the Application

Caching does not happen at just one place in a system. It happens at multiple levels — from the browser all the way to the backend database layer.

Frontend caching

As a frontend developer, you can cache data directly in the browser using:

Local Storage — persists even after the browser is closed
Session Storage — cleared when the browser tab is closed
Indexed DB — a full browser-side database for storing large structured data

Third party caching — CDN

Just like DNS is a third party service that resolves domain names, a CDN is a third party service that offers caching as a service.

CDN providers like Akamai, Cloudflare, and AWS Cloudfront have set up edge nodes around the globe. Static content is served directly from these edge nodes — close to the user — instead of travelling all the way to the origin server.

What can be cached in edge nodes? Any static content larger than 10KB — images, videos, code files, web pages, audio files.

CDN providers either own or rent space in data centers around the world to set up these edge nodes. Today, 70% of internet traffic is handled by CDNs — they are the backbone of the modern internet.

Backend caching

On the backend, we have three types of cache:

Local Cache — cache lives inside the application server's memory. Inherently distributed since every app server has its own cache.
Distributed Global Cache — a separate dedicated cache cluster shared by all app servers. Data is either sharded or replicated across cache servers.
Single Global Cache — one dedicated cache server shared by all app servers. No routing algorithm needed.

Section 3: CDN Deep Dive — How Facebook Serves Videos

Let's understand how CDNs work with a real world example.

Upload flow — Madhukar uploads a video

Madhukar sends a video upload request to Facebook's application servers
Facebook's application servers upload the video to a file storage service like S3 and receive back an S3 url
Facebook's application servers send the S3 url to a CDN provider like Akamai to cache the video across their edge nodes
The application server stores all the metadata in the database — video_id, s3_url, cdn_url

View flow — Gopi watches the video
Gopi sends a GET request to Facebook's application servers
Facebook's application servers respond with the CDN url
Gopi's device sends a DNS request to resolve the CDN url to an IP address

How does DNS find the nearest edge node?

Geo DNS
The authoritative DNS servers of the CDN look at the IP address of the user's DNS resolver and return the nearest edge node IP. But there is a problem — if a user changes their DNS server to something like 8.8.8.8, the authoritative server sees Google's IP instead of the user's actual location and may return a far away edge node. This is why Geo DNS is not widely used.

Anycast
All edge nodes share the same IP address. When Gopi's request is sent to that IP, the internet's BGP routers calculate the number of hops required to reach each edge node and automatically route the request to the one with the fewest hops — the nearest edge node. Gopi's DNS settings are completely irrelevant here.

Gopi's request is routed to the nearest edge node via BGP
The video is served from the edge node

Do we cache on the first request?

No. CDNs do not cache on the first or second request. This is called 2 hit caching.

Over time CDN providers observed that many requests are one-hit wonders — a file gets requested once and never again. Caching those wastes precious edge node memory. So CDN providers set a threshold of 2 — only after the third request is the content cached and served from the edge node.

Can the backend server access the CDN?

Technically yes — but CDNs are client facing. They are designed to serve content to end users, not to backend servers.

Section 4: Cache Eviction

Cache memory is limited. When the cache is full and a new record needs to be inserted, the system must remove something to make space. This is called cache eviction.

There are several eviction policies:

LRU — Least Recently Used
Removes the item that has not been accessed for the longest time. The assumption is that if something hasn't been used recently it is unlikely to be used again soon. LRU is the industry standard and is used by the majority of applications.

LFU — Least Frequently Used
Removes the item that has been accessed the fewest number of times overall. Useful when some data is accessed in bursts but overall rarely used.

FIFO — First In First Out
Removes the item that was inserted first regardless of how often it was accessed. Simple but often inefficient — a very popular item that was cached early could get evicted even if it is still heavily used.

Policy	Evicts	Best for
LRU	Least recently accessed	General purpose — industry standard
LFU	Least frequently accessed	Data with stable long term access patterns
FIFO	Oldest inserted item	Simple systems where recency does not matter

Section 5: Cache Invalidation Patterns

Cache invalidation means the data in the cache is no longer valid and needs to be updated or removed. There are four main patterns to handle this.

TTL — Time To Live (Cache Aside)

TTL provides eventual consistency and is ideal for simple DB queries.

Every cached item has an expiry time. After the TTL expires the data is no longer valid.

All write requests go directly to the DB
All read requests go to the cache
- Cache hit → serve from cache
- Cache miss → fetch from DB, asynchronously write to cache The system can serve stale data until the TTL expires. This is acceptable for most use cases where eventual consistency is fine.

Write  →  DB only
Read   →  Cache → hit: serve | miss: fetch from DB → populate cache

Write Around

Write Around is ideal for scenarios where fetching data from the DB is very complex — involving many joins and aggregations.

All write requests go directly to the DB
All read requests go to the cache
A cron job runs periodically, pulls data from the DB and writes it to the cache
Until the cron job runs, the cache will serve stale data
When the first request comes in there is no data in the cache — a 404 is returned to the client

Write  →  DB only
Read   →  Cache → hit: serve | miss: 404
Cron   →  DB → Cache (periodic sync)

Write Through

Write Through provides immediate consistency.

Every write request goes to both the cache and the DB atomically. All read requests go to the cache — which is always fresh.

On a single server — atomic writes are achievable using locks, since both cache and DB share the same memory
On a distributed system — atomic writes require 2 Phase Commit, which increases latency and adds rollback complexity Single server implementation:

# Both writes succeed
lock.acquire()
try:
    write_to_cache(key, value)
    write_to_db(key, value)
finally:
    lock.release()

# Cache write fails
lock.acquire()
try:
    write to cache  ✗  # fails here
    write to DB        # never reached
except CacheError:
    # nothing to rollback
    log("cache write failed")
finally:
    lock.release()

# DB write fails
lock.acquire()
try:
    write to cache  ✓
    write to DB     ✗  # fails here
except DBError:
    rollback cache     # undo cache write
    log("db write failed, cache rolled back")
finally:
    lock.release()

Write  →  Cache + DB atomically
Read   →  Cache always (always fresh)

Write Back

In Write Back all reads and writes go directly to the cache. A background cron job periodically fetches data from the cache and dumps it into the DB.

No data consistency — DB is always behind
No stale reads — since all reads come from cache which has the latest data
Data loss is possible — if the cache crashes before the cron job runs, that window of data is lost
Ideal for use cases where intermediate data points are not critical and some data loss is acceptable

Write  →  Cache only
Read   →  Cache only (always fresh)
Cron   →  Cache → DB (periodic dump)

Conclusion

Caching is not a single technique — it is a layered strategy that operates at every level of a modern application, from the browser to the CDN to the backend database layer.

The choice of cache pattern is always a tradeoff:

If your system can tolerate stale data — TTL or Write Around keeps things simple and scalable
If your system requires immediate consistency — Write Through gives you that guarantee but at the cost of latency and distributed complexity
If your system is write-heavy and can tolerate data loss — Write Back gives you the highest write throughput

The next time your app loads a video in milliseconds, a feed in under a second, or a product page instantly — caching is quietly doing its job at every layer of the stack.

Credits:

DEV Community