DEV Community

Madhukar Vissapragada
Madhukar Vissapragada

Posted on • Edited on

Caching — From Fundamentals to Production Systems

Section 1: What is Caching?

Caching is a technique to store a small amount of data for faster retrieval and higher throughput.

Caching is used in two scenarios:

  1. High read frequency — when a particular piece of data is requested a very high number of times
  2. Expensive computation — when fetching data requires complex joins or aggregations — caching the result avoids recomputing it every time

Computer memory hierarchy

To understand why we cache only a small amount of data, we need to look at how computer memory is structured. The higher up in the hierarchy, the faster the read — but the less space available.

Level Speed Size
Registers Ultra fast Bytes
CPU Cache Very fast Kilobytes
RAM Fast Gigabytes
Flash / SSD Slower than RAM Terabytes
Hard Disk Slowest Terabytes

The key insight — the smaller the storage, the faster the retrieval. Caching works by moving frequently accessed data up this hierarchy, closer to the CPU and away from the slow disk.


Section 2: Caching at Various Levels of the Application

Caching does not happen at just one place in a system. It happens at multiple levels — from the browser all the way to the backend database layer.

Frontend caching

As a frontend developer, you can cache data directly in the browser using:

  • Local Storage — persists even after the browser is closed
  • Session Storage — cleared when the browser tab is closed
  • Indexed DB — a full browser-side database for storing large structured data

Third party caching — CDN

Just like DNS is a third party service that resolves domain names, a CDN is a third party service that offers caching as a service.

CDN providers like Akamai, Cloudflare, and AWS Cloudfront have set up edge nodes around the globe. Static content is served directly from these edge nodes — close to the user — instead of travelling all the way to the origin server.

What can be cached in edge nodes? Any static content larger than 10KB — images, videos, code files, web pages, audio files.

CDN providers either own or rent space in data centers around the world to set up these edge nodes. Today, 70% of internet traffic is handled by CDNs — they are the backbone of the modern internet.

Backend caching

On the backend, we have three types of cache:

  • Local Cache — cache lives inside the application server's memory. Inherently distributed since every app server has its own cache.
  • Distributed Global Cache — a separate dedicated cache cluster shared by all app servers. Data is either sharded or replicated across cache servers.
  • Single Global Cache — one dedicated cache server shared by all app servers. No routing algorithm needed.


Section 3: CDN Deep Dive — How Facebook Serves Videos

Let's understand how CDNs work with a real world example.

Upload flow — Madhukar uploads a video

  1. Madhukar sends a video upload request to Facebook's application servers
  2. Facebook's application servers upload the video to a file storage service like S3 and receive back an S3 url
  3. Facebook's application servers send the S3 url to a CDN provider like Akamai to cache the video across their edge nodes
  4. The application server stores all the metadata in the database — video_id, s3_url, cdn_url

    View flow — Gopi watches the video

  5. Gopi sends a GET request to Facebook's application servers

  6. Facebook's application servers respond with the CDN url

  7. Gopi's device sends a DNS request to resolve the CDN url to an IP address

    How does DNS find the nearest edge node?

Geo DNS
The authoritative DNS servers of the CDN look at the IP address of the user's DNS resolver and return the nearest edge node IP. But there is a problem — if a user changes their DNS server to something like 8.8.8.8, the authoritative server sees Google's IP instead of the user's actual location and may return a far away edge node. This is why Geo DNS is not widely used.

Anycast
All edge nodes share the same IP address. When Gopi's request is sent to that IP, the internet's BGP routers calculate the number of hops required to reach each edge node and automatically route the request to the one with the fewest hops — the nearest edge node. Gopi's DNS settings are completely irrelevant here.

  1. Gopi's request is routed to the nearest edge node via BGP
  2. The video is served from the edge node

Do we cache on the first request?

No. CDNs do not cache on the first or second request. This is called 2 hit caching.

Over time CDN providers observed that many requests are one-hit wonders — a file gets requested once and never again. Caching those wastes precious edge node memory. So CDN providers set a threshold of 2 — only after the third request is the content cached and served from the edge node.

Can the backend server access the CDN?

Technically yes — but CDNs are client facing. They are designed to serve content to end users, not to backend servers.


Section 4: Cache Eviction

Cache memory is limited. When the cache is full and a new record needs to be inserted, the system must remove something to make space. This is called cache eviction.

There are several eviction policies:

LRU — Least Recently Used
Removes the item that has not been accessed for the longest time. The assumption is that if something hasn't been used recently it is unlikely to be used again soon. LRU is the industry standard and is used by the majority of applications.

LFU — Least Frequently Used
Removes the item that has been accessed the fewest number of times overall. Useful when some data is accessed in bursts but overall rarely used.

FIFO — First In First Out
Removes the item that was inserted first regardless of how often it was accessed. Simple but often inefficient — a very popular item that was cached early could get evicted even if it is still heavily used.

Policy Evicts Best for
LRU Least recently accessed General purpose — industry standard
LFU Least frequently accessed Data with stable long term access patterns
FIFO Oldest inserted item Simple systems where recency does not matter


Section 5: Cache Invalidation Patterns

Cache invalidation means the data in the cache is no longer valid and needs to be updated or removed. There are four main patterns to handle this.


TTL — Time To Live (Cache Aside)

TTL provides eventual consistency and is ideal for simple DB queries.

Every cached item has an expiry time. After the TTL expires the data is no longer valid.

  • All write requests go directly to the DB
  • All read requests go to the cache
    • Cache hit → serve from cache
    • Cache miss → fetch from DB, asynchronously write to cache The system can serve stale data until the TTL expires. This is acceptable for most use cases where eventual consistency is fine.
Write  →  DB only
Read   →  Cache → hit: serve | miss: fetch from DB → populate cache
Enter fullscreen mode Exit fullscreen mode

Write Around

Write Around is ideal for scenarios where fetching data from the DB is very complex — involving many joins and aggregations.

  • All write requests go directly to the DB
  • All read requests go to the cache
  • A cron job runs periodically, pulls data from the DB and writes it to the cache
  • Until the cron job runs, the cache will serve stale data
  • When the first request comes in there is no data in the cache — a 404 is returned to the client
Write  →  DB only
Read   →  Cache → hit: serve | miss: 404
Cron   →  DB → Cache (periodic sync)
Enter fullscreen mode Exit fullscreen mode

Write Through

Write Through provides immediate consistency.

Every write request goes to both the cache and the DB atomically. All read requests go to the cache — which is always fresh.

  • On a single server — atomic writes are achievable using locks, since both cache and DB share the same memory
  • On a distributed system — atomic writes require 2 Phase Commit, which increases latency and adds rollback complexity Single server implementation:
# Both writes succeed
lock.acquire()
try:
    write_to_cache(key, value)
    write_to_db(key, value)
finally:
    lock.release()

# Cache write fails
lock.acquire()
try:
    write to cache    # fails here
    write to DB        # never reached
except CacheError:
    # nothing to rollback
    log("cache write failed")
finally:
    lock.release()

# DB write fails
lock.acquire()
try:
    write to cache  
    write to DB       # fails here
except DBError:
    rollback cache     # undo cache write
    log("db write failed, cache rolled back")
finally:
    lock.release()
Enter fullscreen mode Exit fullscreen mode
Write  →  Cache + DB atomically
Read   →  Cache always (always fresh)
Enter fullscreen mode Exit fullscreen mode

Write Back

In Write Back all reads and writes go directly to the cache. A background cron job periodically fetches data from the cache and dumps it into the DB.

  • No data consistency — DB is always behind
  • No stale reads — since all reads come from cache which has the latest data
  • Data loss is possible — if the cache crashes before the cron job runs, that window of data is lost
  • Ideal for use cases where intermediate data points are not critical and some data loss is acceptable
Write  →  Cache only
Read   →  Cache only (always fresh)
Cron   →  Cache → DB (periodic dump)
Enter fullscreen mode Exit fullscreen mode



Conclusion

Caching is not a single technique — it is a layered strategy that operates at every level of a modern application, from the browser to the CDN to the backend database layer.

The choice of cache pattern is always a tradeoff:

  • If your system can tolerate stale data — TTL or Write Around keeps things simple and scalable
  • If your system requires immediate consistency — Write Through gives you that guarantee but at the cost of latency and distributed complexity
  • If your system is write-heavy and can tolerate data loss — Write Back gives you the highest write throughput

The next time your app loads a video in milliseconds, a feed in under a second, or a product page instantly — caching is quietly doing its job at every layer of the stack.


Credits

This blog is based on the HLD curriculum taught at Scaler Academy by instructor Pragy Agarwal.

Top comments (0)