DEV Community

Unpublished Post. This URL is public but secret, so share at your own discretion.
Cover image for you got memcached!!!

you got memcached!!!

Before getting into the article you should know what caching is...

a cache is a high-speed data storage layer which stores a subset of data, so that future requests for that data are served up faster than is possible by accessing the data's primary location.

So caching allows you to effectively reuse previously retrieved or computed data.

Memcached

Memcached is a high performance multithreaded event-based key/value store intended to be used in a distributed system.

This is how GitHub defines Memcached. Let's start by understanding the definition and later look how each points are achieved.

The aim of the article is to understand how memcached works and mainly understanding the system design behind a distributed caching system.

High Performance

So it is expected of a caching service like memcached to be fast. Memcached operations are almost all O(1), we will see why. Memcached can serve requests in less than a milliseconds.

Multi-threaded

Memcached is thread-safe, that means multiple threads can safely access a single instance of memcached without any problem. Memcached uses a multithreaded architecture to handle incoming requests. When a new request comes, the worker thread from the thread pool handles the request. All threads in the pool share the same hash table, which contains the cached data.

Here is an example of how to use memcached as multithreaded service:

import threading
import memcached

mem_client = memcached.Client(['127.0.0.1:11211'])

def worker_thread():
    value = mem_client.get('key')

    print(f"value for 'key' is: {value}")


if __name__ == '__main__':
    threads = []
    for i in range(10):
        t = threading.thread(target=worker_thread)
        threads.append(t)
        t.start()

    # waiting for all thread to finish
    for t in threads:
        t.join()
Enter fullscreen mode Exit fullscreen mode

In this example, we created a memcached.Client which connects to instance running on 127.0.0.1:11211. Then we have a worker_thread function, that retrieves the values from the cache using get method. In main, multiple threads are started, an each thread executes the worker_thread function concurrently. And as memcached is thread-safe, there won't be any issue with concurrent access to the cache. And for this memcached uses "lock striping", with this when a thread want to access a particular item in a hash table, it first computes the hash code of the key and then locks the appropriate partition. Which ensures that only one thread can be modified or accessed at a given time.

Key-value store

Memcached is in-memory key value storage system. An item consists of key and a value. A key is a string 250 characters. A value can be any type (1 MB configurable). Memcached define itself a simple caching service, so some configuration limitations like these are expected of it. Keys have expiration date (TTL - Time to Live), but LRU can kick in i.e., if the key is not used, even having longer TTL can affect the key. Memcached mentions itself as non-persistent, so yes it will help avoid expensive queries, but you should know that these values will not always be there i.e., they are not persistent.

How Memcached manages memory?

Generally, allocation of item in memory is random. And this results in memory fragmentations. And to avoid memory fragmentation, memcached uses slab for memory management. So for each cache, memory is allocated in pages, typically 1 MB configurable. Each slab is divided into chunks of equal sizes, all equal sizes slabs are collectively called slab classes.

Image description

As can see in the above image, pages consists of chunks of fixed size. Items are stored in chunks and each slabclass has a fixed chink size, and this avoids memory fragmentation.

Top comments (0)