Tarang Kishor

Posted on Mar 15

I Built a Leaderboard API Using a Skip List From Scratch (No Redis)

#python #fastapi #datastructures #webdev

Every time you see a leaderboard in a game, a coding contest, or a trading app — something has to answer the question "what rank is this player?" in milliseconds, even with millions of entries.

The standard answer is Redis sorted sets. But I wanted to understand why Redis sorted sets are fast — so I built the underlying data structure from scratch: a skip list.

This is the story of building a full production-grade leaderboard REST API, deploying it live, and benchmarking it at 1 million players.

Live demo: https://leaderboard-api-gu4v.onrender.com/docs
GitHub: https://github.com/tar-ang-2004/Leaderboard-API

What even is a skip list?

A skip list is a probabilistic layered linked list. The bottom layer (level 0) is a complete sorted linked list of all nodes. Each higher level is a "fast lane" that skips over groups of nodes — like express lanes on a highway.

level 3:  HEAD ──────────────────────────────────────→ [bob:900]  → None
level 2:  HEAD ─────────────→ [carol:650] ───────────→ [bob:900]  → None
level 1:  HEAD → [alice:500] → [carol:650] ──────────→ [bob:900]  → None
level 0:  HEAD → [alice:500] → [carol:650] ──────────→ [bob:900]  → None
                  rank 3         rank 2                  rank 1

When you want to find rank 1, you start at the highest level and take giant leaps forward, then drop down to finer levels as you approach the target. This gives O(log n) search on average — same as a binary search tree, but far simpler to implement.

Each node is promoted to higher levels with probability 0.5 — like flipping a coin. This randomness is what makes the math work out to O(log n) expected height.

Why not just use a sorted array or a BST?

Operation	Skip List	Sorted Array	Balanced BST
Insert	O(log n)	O(n)	O(log n)
Delete	O(log n)	O(n)	O(log n)
Rank query	O(log n)	O(log n)	O(log n)
Implement	Simple	Trivial	Hard

A sorted array is O(n) on insert — every new score means shifting elements. A balanced BST (like a red-black tree) has the right complexity but is notoriously hard to implement correctly. A skip list hits the sweet spot.

The core implementation

The key insight is the _find_update method — it walks from the top level down to level 0 and records the predecessor at each level. This predecessor array is then used for both insert and delete:

def _find_update(self, score: float, ts: float) -> list:
    update  = [None] * MAX_LEVEL
    current = self.head

    for i in range(self.level - 1, -1, -1):
        while (
            current.forward[i] is not None
            and self._comes_before(current.forward[i], score, ts)
        ):
            current = current.forward[i]
        update[i] = current

    return update

Insert splices the new node in at every level it participates in:

def insert(self, player_id: str, score: float, timestamp: float) -> None:
    update    = self._find_update(score, timestamp)
    new_level = self._random_level()

    node = SkipNode(player_id, score, timestamp, [None] * new_level)

    for i in range(new_level):
        node.forward[i]      = update[i].forward[i]
        update[i].forward[i] = node

    self.size += 1

And get_rank counts how many nodes come before the target:

def get_rank(self, player_id, score, timestamp):
    rank    = 0
    current = self.head

    for i in range(self.level - 1, -1, -1):
        while (
            current.forward[i] is not None
            and self._comes_before(current.forward[i], score, timestamp)
        ):
            rank   += 1
            current = current.forward[i]

    candidate = current.forward[0]
    if candidate and candidate.player_id == player_id:
        return rank + 1
    return None

The LRU cache layer

The skip list is fast — but get_top(100) still has to walk 100 nodes every time. If the same leaderboard is queried thousands of times per second, that adds up.

The solution: cache the results of hot reads. But a plain dict grows unbounded and never evicts stale data.

Enter the LRU (Least-Recently-Used) cache — backed by Python's OrderedDict:

class LRUCache:
    def __init__(self, capacity: int = 256):
        self.capacity = capacity
        self._cache   = OrderedDict()

    def get(self, key: str):
        if key not in self._cache:
            return None
        self._cache.move_to_end(key)   # promote to MRU end
        return self._cache[key]

    def put(self, key: str, value):
        if key in self._cache:
            self._cache.move_to_end(key)
        self._cache[key] = value
        if len(self._cache) > self.capacity:
            self._cache.popitem(last=False)  # evict LRU end

The critical piece: every write invalidates all cached reads for that leaderboard:

def invalidate_prefix(self, prefix: str) -> int:
    stale = [k for k in self._cache if k.startswith(prefix)]
    for k in stale:
        del self._cache[k]

So a score update flushes lb_abc:top:10:0, lb_abc:range:1:100, etc. — ensuring correctness while keeping reads fast.

The speedup is real

I benchmarked this with bench.py at 10k and 100k players:

n players	top-100 cache MISS	top-100 cache HIT	speedup
10,000	~315 µs	~2.8 µs	114x
100,000	~340 µs	~2.8 µs	123x

The cache hit time stays flat (~2.8 µs) regardless of n — it's just a dict lookup. The miss time grows slowly with n because the skip list walk is O(k), not O(n).

The API layer

The skip list and LRU cache are wrapped in a LeaderboardStore, which is wired into a FastAPI app with 15 endpoints:

Try it in 3 curl commands

# 1. Create a leaderboard
curl -X POST https://leaderboard-api-gu4v.onrender.com/v1/leaderboards \
  -H "Content-Type: application/json" \
  -d '{"name": "my-game", "order": "desc"}'
# → {"id": "lb_a1b2c3d4", "name": "my-game", ...}

# 2. Submit a score
curl -X POST https://leaderboard-api-gu4v.onrender.com/v1/leaderboards/lb_a1b2c3d4/scores \
  -H "Content-Type: application/json" \
  -d '{"player_id": "alice", "score": 4200, "metadata": {"country": "IN"}}'
# → {"player_id": "alice", "score": 4200, "rank": 1, "percentile": 100.0}

# 3. Get top 10
curl https://leaderboard-api-gu4v.onrender.com/v1/leaderboards/lb_a1b2c3d4/top?k=10

Other useful endpoints:

# Player rank + 3 players above and below
curl ".../players/alice/rank?window=3"

# Everyone in the 2000-2999 score range (e.g. a tier system)
curl ".../search?min_score=2000&max_score=2999"

# Paginated full rankings
curl ".../rankings?page=1&page_size=20"

# Bulk submit 500 scores at once
curl -X POST ".../scores/bulk" -d '{"entries": [...]}'

Confirming O(log n) empirically

One thing I love about this project — you can see the O(log n) growth in the benchmark output:

Complexity check — get_rank (avg)
           n        avg_ms    growth vs prev
      10,000      0.002835               —
     100,000      0.004653           1.64x
   1,000,000      0.007120           1.53x

Going from 10k to 100k players (10x more data), the rank query only gets 1.64x slower — not 10x. That's O(log n) in action. log(100000) / log(10000) ≈ 1.25 — close to the measured 1.64x (the gap is real-world overhead like cache misses and memory access patterns).

Architecture overview

Request → FastAPI router
              ↓
        LeaderboardStore
              ↓
    ┌─────────┴──────────┐
    SkipList          LRUCache
    O(log n) ops      O(1) hot reads
    ↑
  HashMap
  O(1) score lookup

Each leaderboard owns its own skip list, hash map, and LRU cache. The store layer isolates all business logic — swapping to Redis only requires changing store.py.

What I learned

1. The bug that took an hour to find
My first skip list sorted ascending instead of descending. The comparator _comes_before was flipped. All 12 tests failed. Classic off-by-one-direction bug.

2. Cache invalidation is the hard part
Getting the LRU cache right was trickier than the skip list. The edge case: a score update must invalidate all cached queries for that leaderboard, not just the specific key that changed. The invalidate_prefix method handles this cleanly.

3. O(log n) is only fast if your constant is small
A skip list with 16 levels has a large constant factor. At small n (< 1000), a sorted array is actually faster in practice. The skip list wins at scale.

Use it in your project

The API is free and live — use it for:

Gaming leaderboards
Coding contest rankings
Typing speed competitions
Hackathon scoreboards
Any app with scores

# Full source code
git clone https://github.com/tar-ang-2004/Leaderboard-API

# Or hit the live API directly
https://leaderboard-api-gu4v.onrender.com/docs

Note: Free tier on Render sleeps after 15 min inactivity — first request may take ~50s to wake up.

What's next

WebSocket support — push rank changes to connected clients in real time
Redis backend — swap the in-memory store for persistence and horizontal scaling
React frontend — a visual leaderboard UI on top of this API

If you found this useful, star the repo and let me know what you're building with it!

GitHub: https://github.com/tar-ang-2004/Leaderboard-API
Live API: https://leaderboard-api-gu4v.onrender.com/docs