DEV Community

Cover image for Build a Memcached-Compatible Cache Server in Go: Binary Protocol, LRU Eviction, and Concurrent Design
Nithin Bharadwaj
Nithin Bharadwaj

Posted on

Build a Memcached-Compatible Cache Server in Go: Binary Protocol, LRU Eviction, and Concurrent Design

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Imagine you run a busy library. Every time someone asks for a popular book, a librarian has to run down to the basement archives, find it on a giant shelf, and bring it back. This takes time, and the librarians get tired. Now, picture a small, smart desk right at the entrance. We put the top 100 most-requested books on it. When someone asks for one, the front desk hands it over instantly. The basement is only bothered for books no one has asked for lately. That front desk is a cache. It makes everything faster by keeping copies of frequently used data in a place that's quick to reach.

Building such a system as a network service is a fantastic way to understand performance, memory, and concurrency. I want to walk you through creating a server in Go that speaks the same language as Memcached, a widely-used caching system. This means existing applications can use our server without changing a line of their code.

The goal is simple: store key-value pairs in the server's memory and retrieve them at lightning speed. We must handle many clients at once, manage limited memory wisely, and understand a specific way of talking over the network called the binary protocol.

Let's look at the foundation. Our server needs to listen for connections, manage the stored data, and track its own performance.

type MemcachedServer struct {
    listener net.Listener
    cache    *CacheStore
    stats    ServerStats
    workers  int
}
Enter fullscreen mode Exit fullscreen mode

The CacheStore is the heart. It's a protected map that holds our data. We guard it with a mutex, a lock that ensures only one thing changes it at a time to prevent chaos. We also track how much memory we're using.

type CacheStore struct {
    mu       sync.RWMutex
    items    map[string]*CacheItem
    lruList  *LRUList
    maxBytes int64
    usedBytes int64
}
Enter fullscreen mode Exit fullscreen mode

Each item we store is more than just a value. It needs a key to find it, the actual data, some flags, an expiration time, and a unique identifier called a CAS (Compare-And-Swap) value.

type CacheItem struct {
    key      string
    value    []byte
    flags    uint32
    expTime  int64
    cas      uint64
    next     *CacheItem
    prev     *CacheItem
}
Enter fullscreen mode Exit fullscreen mode

Notice the next and prev? Those are for our LRU list. LRU stands for "Least Recently Used." It's our strategy for when the desk gets full. If we need to make space for a new book, we remove the one that hasn't been asked for the longest time. A doubly-linked list helps us track this order efficiently.

Starting the server involves setting up a network listener and preparing our cache.

func NewMemcachedServer(addr string, maxMemory int64, workers int) (*MemcachedServer, error) {
    listener, err := net.Listen("tcp", addr)
    if err != nil {
        return nil, err
    }

    return &MemcachedServer{
        listener: listener,
        cache: &CacheStore{
            items:    make(map[string]*CacheItem),
            lruList:  &LRUList{},
            maxBytes: maxMemory,
        },
        workers: workers,
    }, nil
}
Enter fullscreen mode Exit fullscreen mode

When the server starts, it launches several worker goroutines. These are like extra receptionists. They all wait for new clients to walk in.

func (ms *MemcachedServer) Start() error {
    log.Printf("Server started on %s", ms.listener.Addr())
    var wg sync.WaitGroup
    for i := 0; i < ms.workers; i++ {
        wg.Add(1)
        go ms.acceptConnections(&wg)
    }
    wg.Wait()
    return nil
}
Enter fullscreen mode Exit fullscreen mode

Each connection from a client is handled in its own goroutine. This means one slow client doesn't block everyone else. It's like giving each visitor their own librarian.

func (ms *MemcachedServer) handleConnection(conn net.Conn) {
    defer conn.Close()
    buffer := make([]byte, 4096)
    for {
        n, err := conn.Read(buffer)
        if err != nil {
            break
        }
        ms.processRequest(conn, buffer[:n])
    }
}
Enter fullscreen mode Exit fullscreen mode

Now, how do we talk? Memcached's binary protocol is a strict format. Every client request is a packet with a fixed 24-byte header. You have to read the header to know what the client wants to do.

The header tells us the operation (get, set, delete), the length of the key, and the length of any extra data. We parse it like this:

func (ms *MemcachedServer) processRequest(conn net.Conn, data []byte) {
    atomic.AddUint64(&ms.stats.commands, 1)
    magic := data[0]
    opcode := data[1]
    keyLen := binary.BigEndian.Uint16(data[2:4])
    extrasLen := data[4]
    keyStart := 24 + int(extrasLen)
    key := string(data[keyStart : keyStart+int(keyLen)])
    // ... handle the command
}
Enter fullscreen mode Exit fullscreen mode

binary.BigEndian.Uint16 reads two bytes in "Big Endian" order, which is the network standard. The key is then pulled out from its specific position in the data slice.

Let's handle a GET command. The client sends a key. We look it up in our cache.

func (ms *MemcachedServer) handleGet(conn net.Conn, key string, magic byte) {
    item := ms.cache.Get(key)
    if item == nil {
        atomic.AddUint64(&ms.stats.misses, 1)
        ms.sendNotFound(conn, magic, key)
        return
    }
    atomic.AddUint64(&ms.stats.hits, 1)
    ms.sendValue(conn, magic, item)
}
Enter fullscreen mode Exit fullscreen mode

The cache.Get method does the real work. It uses a read-lock first, which allows many other GET operations to happen simultaneously. If the item exists and isn't expired, we move it to the front of our LRU list before returning it.

func (cs *CacheStore) Get(key string) *CacheItem {
    cs.mu.RLock()
    item, exists := cs.items[key]
    cs.mu.RUnlock()

    if !exists {
        return nil
    }
    if item.expTime > 0 && time.Now().Unix() > item.expTime {
        cs.Delete(key)
        return nil
    }
    cs.lruList.MoveToFront(item)
    return item
}
Enter fullscreen mode Exit fullscreen mode

A SET command is more involved. The client sends the key, value, flags, and an expiration time. We need to create a new CacheItem and find a place for it in memory.

func (ms *MemcachedServer) handleSet(conn net.Conn, data []byte, key string, extrasLen byte, keyLen uint16) {
    extrasStart := 24
    flags := binary.BigEndian.Uint32(data[extrasStart : extrasStart+4])
    expTime := binary.BigEndian.Uint32(data[extrasStart+4 : extrasStart+8])

    bodyLen := binary.BigEndian.Uint32(data[8:12])
    valueStart := extrasStart + int(extrasLen) + int(keyLen)
    value := data[valueStart : valueStart+int(bodyLen)-int(extrasLen)-int(keyLen)]

    item := &CacheItem{
        key:     key,
        value:   value,
        flags:   flags,
        expTime: time.Now().Add(time.Duration(expTime) * time.Second).Unix(),
        cas:     atomic.AddUint64(&ms.stats.commands, 1),
    }
    if ms.cache.Set(item) {
        ms.sendStored(conn, magic, key)
    } else {
        ms.sendError(conn, magic, "SERVER_ERROR out of memory")
    }
}
Enter fullscreen mode Exit fullscreen mode

The cache.Set method is where memory management happens. We take a full, exclusive lock because we're changing the state. First, we check if adding this new item would push us over our memory limit.

func (cs *CacheStore) Set(item *CacheItem) bool {
    cs.mu.Lock()
    defer cs.mu.Unlock()

    itemSize := int64(len(item.key) + len(item.value) + 64)
    for cs.usedBytes+itemSize > cs.maxBytes && cs.lruList.size > 0 {
        cs.evictOldest()
    }
    if cs.usedBytes+itemSize > cs.maxBytes {
        return false
    }
    // ... store the item
}
Enter fullscreen mode Exit fullscreen mode

If we're out of space, we enter a loop: evictOldest() removes the item at the tail of the LRU list (the least recently used one) until we have enough room or the cache is empty. This is automatic, silent cleanup.

The LRU list operations are classic pointer manipulation. To move an item to the front when it's accessed, we adjust its neighbors' pointers and then place it at the head.

func (ll *LRUList) MoveToFront(item *CacheItem) {
    if ll.head == item {
        return
    }
    // Remove from current position
    if item.prev != nil {
        item.prev.next = item.next
    }
    if item.next != nil {
        item.next.prev = item.prev
    }
    if ll.tail == item {
        ll.tail = item.prev
    }
    // Add to front
    item.next = ll.head
    item.prev = nil
    if ll.head != nil {
        ll.head.prev = item
    }
    ll.head = item
}
Enter fullscreen mode Exit fullscreen mode

Sending a response back to the client also follows the binary protocol format. For a successful GET, we send a header, the flags (as "extras"), the key, and the value.

func (ms *MemcachedServer) sendValue(conn net.Conn, magic byte, item *CacheItem) {
    header := make([]byte, 24)
    header[0] = magic
    header[1] = 0x00 // GET response opcode
    binary.BigEndian.PutUint16(header[2:4], uint16(len(item.key)))
    header[4] = 4 // 4 bytes of extras (the flags)
    binary.BigEndian.PutUint32(header[8:12], uint32(4+len(item.key)+len(item.value)))
    binary.BigEndian.PutUint64(header[16:24], item.cas)

    extras := make([]byte, 4)
    binary.BigEndian.PutUint32(extras, item.flags)

    conn.Write(header)
    conn.Write(extras)
    conn.Write([]byte(item.key))
    conn.Write(item.value)
}
Enter fullscreen mode Exit fullscreen mode

We also want to know how our server is performing. The stats command lets a client query us. We track counts atomically so we can read them safely from any goroutine.

type ServerStats struct {
    connections uint64
    commands    uint64
    gets        uint64
    sets        uint64
    deletes     uint64
    hits        uint64
    misses      uint64
    evictions   uint64
}
// Increment with: atomic.AddUint64(&ms.stats.hits, 1)
Enter fullscreen mode Exit fullscreen mode

When a stats request comes in, we format these numbers into key-value pairs the client expects.

func (ms *MemcachedServer) handleStats(conn net.Conn, magic byte) {
    stats := ms.GetStats()
    responses := []struct{ key, value string }{
        {"curr_connections", fmt.Sprintf("%d", stats.connections)},
        {"cmd_get", fmt.Sprintf("%d", stats.gets)},
        {"cmd_set", fmt.Sprintf("%d", stats.sets)},
        {"get_hits", fmt.Sprintf("%d", stats.hits)},
        {"get_misses", fmt.Sprintf("%d", stats.misses)},
        {"evictions", fmt.Sprintf("%d", stats.evictions)},
    }
    for _, stat := range responses {
        ms.sendStat(conn, magic, stat.key, stat.value)
    }
    ms.sendStatEnd(conn, magic)
}
Enter fullscreen mode Exit fullscreen mode

Finally, our main function brings it all together. We create the server with a 100MB limit and 4 worker goroutines. We might also start a background ticker to print a simple performance log.

func main() {
    server, err := NewMemcachedServer(":11211", 100*1024*1024, 4)
    if err != nil {
        log.Fatal(err)
    }
    go func() {
        ticker := time.NewTicker(5 * time.Second)
        for range ticker.C {
            stats := server.GetStats()
            hitRate := 0.0
            if stats.hits+stats.misses > 0 {
                hitRate = float64(stats.hits) / float64(stats.hits+stats.misses) * 100
            }
            memUsage := float64(server.cache.UsedBytes()) / float64(server.cache.maxBytes) * 100
            fmt.Printf("Cons: %d | Gets: %d (%.1f%% hit) | Mem: %.1f%% used\n",
                stats.connections, stats.gets, hitRate, memUsage)
        }
    }()
    server.Start()
}
Enter fullscreen mode Exit fullscreen mode

Running this server provides a real, usable cache. You can point a web application's Memcached client at localhost:11211, and it will store and retrieve data without knowing it's not the official software. The LRU eviction keeps the active dataset in memory. The concurrent design handles many requests. The binary protocol parsing is efficient.

Building this teaches you about network programming, protocol design, concurrent data structures, and system resource management. You see how a simple map isn't enough for a production cache; you need expiration, eviction, and observability. It's a project that starts simple but introduces the real complexities of building a reliable system component. You end up with a deep, practical understanding of how a fundamental piece of infrastructure works.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

Top comments (0)