DEV Community

Cover image for Caching Approaches
Eyal Gilstron, Senior Manager
Eyal Gilstron, Senior Manager

Posted on • Updated on

Caching Approaches

Intro

Caching is storing the result of an operation so that future requests are served faster. Anything that might be slow to retrieve, can benefit if you store your data in advance, for future reuse of the time computed data.
In this article I will talk a bit on the major types of caching, but I will focus more on Server-Side Caching.

When do we need to cache?

• When computation runs multiple times.
• When computation is slow.
• When your hosted provider charges per database access – to save money.
• When we can predict and reuse the same output of the computation for a specific input, to reduce the need to recompute every time.

Major types of caching (in a nutshell)

Server Cache

Store queries, content, code on one or more servers. It's controlled by the server (not by the client).

CDN Caching - CDN is a Content Delivery Network, actually it's a cluster of servers that caches content that’s loaded using the server that’s closest to the end user for faster loading times.

Object Caching: Storing database queries in the back-end for a quick retrieval on page loads.

Client Cache

I want to focus here on Web Caching: Site, Browser, Proxy and Gateway.

A site cache, or also known as a page cache, is a mechanism that temporarily stores data such as web pages or media content when a web page is loaded for the first time.

A browser cache is also a type of site caching. It works in the same way and it’s a cache mechanism that’s built into a browser such as Local Storage, Session Storage and cookies.

Browser caching is controlled at the individual user level. Proxy and gateway is on a much larger scale.

Distributed Cache

Used by the big web sites like YouTube, Google web apps, Facebook and more.

Allow synchronization of distributed cache memory.

Allows the web servers to pull and store from distributed server’s memory.

Allow the web server to simply serve pages and avoid cases such 'out of memory' issues. Made up of a cluster of cheaper machines only serving up memory.

Common in-memory data stores

Redis and Memcached are the most common open-source in-memory data stores.
Memcached has high-performance distributed memory cache service, and Redis is an open-source key value store. Similar to Memcached, Redis stores most of the data in the memory.

You should read this blog before you pick one.

Let's now focus on the Server-side Caching


Very naive and simple example of caching to understand cache hit/miss

// Using Hashtable
var request = "someRequest";
var result; // Init it with default value...
if (myCache.ContainsKey(request))
result = myCache[request]; // Cache Hit - Faster
else
{
newVal = dbRead(); // Cache Miss - Slower
myCache.Add(request, newVal);
result = newVal;
}
return result;

Caching Approaches

  1. No Caching – perform a database read every time we need to do query.

  2. Naive caching - performs a database read just on cache misses. be careful folks as this method can trigger a bug, for example: when the front page becomes out of date.

  3. Clear Cache - just on cache misses we need to do a database read. No potential bugs expected. (If its' done right)

  4. Refresh cache - rarely performs database reads on page views, only on the initialization of the application. Hence the cache is empty and that first page view, and the following pages are cached. This generates one database read per submission.
    Using this approach reduces the hits so that a page view does not hit the database hardly ever, and that's a nice thing to have.

  5. Update cache - The fastest option, does zero database reads ever. This is slightly better than the forth one.

I created a snip to emphasis the fifth approach

Alt Text

Hence, the trade off will be then the faster but has complexity of inserts vs. slower database reads.

Caching Models:

Cache Aside

image

Pros: cache only what we need.
Cons: cache misses are expensive and implementation is complex.

Read Through

image

Pros: cache only what we need, transparent to the application.
Cons: cache misses are expensive, reliability.

Write Through

Data is written to the cache and the real DB at the same time. The good thing here is that I/O completion is only confirmed once the data has been written to both places.

Advantage: fast retrieval while making sure the data is in the real DB and is not lost in case the cache is disrupted.

Disadvantage: Writing data will experience latency as you have to write to two places every time.

When should I use it?

Good for apps that write and then reread data frequently. This will result in slightly higher write latency but low read latency. Spend a bit longer writing once, but then benefit from reading frequently with low latency.

image

Pros: data is always up to date.
Cons: write are expensive, redundant data.

Write Behind

Data is written to the cache and Then I/O completion is confirmed. The data is then typically also written to the real DB in the background, but the completion confirmation is not blocked on that.

Advantage: Low latency and high throughput for write-intensive applications.

Disadvantage: A risk to data availability because the cache could fail (and result in data loss) before the data is persisted to the real DB. This results in the data being lost.

When should I use it?

To achieve best performance for mixed workloads as both read and write I/O have similar response time levels. Down to earth, you can add resiliency by duplicating writes to reduce the likelihood of data loss.

image

Pros: no write penalty, reduce load on storage.
Cons: reliability, lack of consistency.

Write Around

Data is written only to the real DB without writing to the cache.
I/O completion is confirmed as soon as the data is written to the real DB.

Advantage: Good for not flooding the cache with data that may not subsequently be reread.

Disadvantage: Reading recently written data will result in a cache miss (hence a higher latency) because the data can only be read from the slower real DB.

When should I use it?

When accessing applications that don’t frequently re-read recently written data. This will result in lower write latency but higher read latency which is an acceptable trade-off for these scenarios.

I hope these help you folks (:

Top comments (0)