DEV Community

Cover image for Building a Cricket Trivia Game Was Easy. Normalising 7,000+ Players Was Hard.
Jerry Satpathy
Jerry Satpathy

Posted on

Building a Cricket Trivia Game Was Easy. Normalising 7,000+ Players Was Hard.

When I started building Stumped!, a cricketer guessing game, I thought the hard part would be coming up with clever clues.

I was wrong.

The real hard part was turning thousands of raw, ball-by-ball cricket scorecards into clean, human-readable player profiles.

Here is how a simple trivia game helped me learn more about normalising data.

Stumped! The Almanac Project By Code Media Labs

The Dream vs. The Reality

I wanted to generate rich, dynamic clues for players, like:

"This batter scored 573 runs in the death overs at a strike rate of 135.8."

To do that, I turned to the amazing open datasets at Cricsheet. They provide incredible ball-by-ball archives. But there's a catch: raw match data and game-ready player profiles speak entirely different languages.

Cricsheet tells you what happened on delivery 4.2 of Match X. It does not tell you a player's career stats. Everything had to be derived from scratch.

1. Turning Matches into Careers

Step one was flipping the data architecture from match-centric to player-centric. I built a pipeline that ingests every single delivery and progressively updates a player’s lifetime accumulator object:

players[name] = {
    "matches": 0,
    "bat_runs": 0,
    "bat_balls": 0,
    "bowl_runs": 0,
    "bowl_wickets": 0,
    // ...you get the idea
}
Enter fullscreen mode Exit fullscreen mode

Instead of querying a massive database of matches every time a user plays, we build the careers once beforehand.

Stumped! The Almanac Project

2. The "Who TF is V Kohli?" Problem

Then came the initials. Cricket scorecards love abbreviations.

EJG Morgan

RG Sharma

V Kohli

Humans see "V Kohli" and know it's Virat. Computers see it and shrug. Worse, multiple players share the same initials.

I tried scraping external sports sites to map these messy strings to unique humans, but between rate limits, anti-bot shields, and wildly inconsistent formatting, the scrapers failed hard. Right now, I am back to the drawing board, trying to figure out how to reliably enrich these player profiles without losing my sanity.

(Pro tip: This is exactly why Stumped asks users to guess surnames. Surnames are way more reliable than ambiguous initials.)

Stumped! The Almanac Project

3. Extracting the Spicy Stats

Basic career averages are boring trivia. To make the game fun, I needed to slice and dice the data into highly specific cricket archetypes:

  1. Batting Phases: Grouping deliveries into Powerplays (overs 0–5), Middle (6–15), and Death (16–20) to find the clutch finishers.
  2. The Psychology of the Chase: Tracking performance when setting a target vs. chasing one.
  3. Nemesis vs Favourite Bowler: Who dismisses this batter the most? Who do they absolutely smash for fun? (Rule #1 of the pipeline: Your nemesis cannot also be your favourite victim. The logic got messy here, but it made the clues feel remarkably human.)
  4. The Weird Stuff: Tracking golden ducks, diamond ducks, maiden overs, and dot-ball percentages.

Stumped! The Almanac Project by Code Media Labs

4. Flattening the Monster

Deeply nested JSON objects are a pain to consume on the frontend. The final step of the pipeline takes all those complex, deep career structures and flattens them into a clean, single-level profile:

{
  "bat_runs": 2443,
  "bat_average": 29.08,
  "bat_strike_rate": 136.7,
  "fielding_catches": 47,
  "nemesis_bowler_name": "Hardik Pandya",
  "favorite_bowler_name": "Umar Gul"
}
Enter fullscreen mode Exit fullscreen mode

Now, generating a clue is as simple as reading a single key-value pair.

Stumped! The Almanac Project

Lesson Learned

Building the game took a fraction of the time it took to clean records and handle weird edge cases. Under the hood, a single clue represents thousands of rows of raw match data processed into something actually readable.
If I started this project again, I'd invest in the name normalisation layer first, before writing/generating a single line of stat-aggregation code.

We just launched the game! If you want to check out this and other fun side projects we’ve been hacking on, take a look at The Almanac Project or test your cricket trivia knowledge directly at Stumped!.

And as always Happy Coding!

Top comments (0)