DEV Community

Cover image for I built an anime ranking site that measures real buzz, not marketing hype
mitonton310
mitonton310

Posted on

I built an anime ranking site that measures real buzz, not marketing hype

Most anime ranking sites show you what's being promoted. The big seasonal titles, the ones with the biggest ad budgets, the ones every site agrees on. I wanted to see something different: what are fans actually watching and talking about right now?

So I built AnimeBuzz — an anime discovery site that ranks shows by a single number I call the Buzz Score, computed from real public engagement data instead of marketing.

Here's how it works under the hood.

The idea: a "Buzz Score" from real signals

The core problem is turning messy, multi-source engagement data into one comparable 0–100 number. I blend four public signals:

  • YouTube view counts (60%) — trailers, clips, reaction videos. The strongest "people are actually watching this" signal.
  • YouTube likes (15%) — positive reactions, not just views.
  • AniList popularity (20%) — how many community members track the title.
  • AniList favourites (5%) — the most dedicated fans.

YouTube buzz (75%) blended with AniList fan support (25%).

The scoring math (and why naive normalization fails)

You can't just add raw numbers — millions of views versus a few thousand favourites would make views completely dominate. Each metric has to be normalized to 0–1 first.

The naive approach is to divide by the maximum value. But that breaks the moment one mega-hit (say, a Demon Slayer) shows up: it becomes the 1.0 reference and flattens everything else to near-zero.

The fix: normalize against the 95th percentile instead of the max.

def percentile_95(values):
    """Use the 95th-percentile value as the 'full marks' line,
    so a single viral outlier can't flatten the whole distribution."""
    if not values:
        return 1
    s = sorted(values)
    idx = int(len(s) * 0.95)
    return max(s[min(idx, len(s) - 1)], 1)  # avoid div-by-zero
Enter fullscreen mode Exit fullscreen mode

Then the score is just a weighted sum of the normalized metrics:

score = (
    (views / max_views)   * 60 +
    (likes / max_likes)   * 15 +
    (pop   / max_popular) * 20 +
    (favs  / max_favs)    *  5
)
score = round(min(score, 100.0), 1)
Enter fullscreen mode Exit fullscreen mode

One more detail: the normalization basis is calculated on the current season only, so the scale reflects "what's hot right now" and doesn't get dragged around by all-time classics in the back catalog.

The data pipeline

No database — everything is plain JSON files in git. A set of Python scripts run on a schedule:

  • fetch_anilist.py — seasonal anime, studios, genres, streaming links (AniList GraphQL API, free)
  • fetch_youtube.py — view/like counts (YouTube Data API, quota-limited so it fills in a batch per day)
  • fetch_themes.py — opening/ending songs from the wonderful AnimeThemes.moe API, keyed by AniList ID
  • calc_buzz_score.py — computes scores and emits all the ranking JSON files
  • generate_sitemap.py — rebuilds the sitemap

A gotcha worth sharing: when re-fetching from AniList, I have to preserve keys that other scripts wrote (YouTube data, themes, etc.), or a daily refresh would wipe them:

PRESERVE_KEYS = [
    "youtube", "buzz_score", "voice_actors", "streaming",
    "themes",  # ← learned this one the hard way
]
Enter fullscreen mode Exit fullscreen mode

Frontend: static all the way

The site is Vike (vike-react) + React + TypeScript, fully prerendered to static HTML at build time, then served from Cloudflare Workers as static assets. No server, no runtime — every page is a file.

For ~3,000 pages this is fantastic: instant loads, trivial scaling, basically free hosting. The trade-off is build time and the discipline of treating data as a build input.

Automation

A GitHub Actions cron runs daily: fetch → score → regenerate → commit → build → deploy. The site refreshes itself every day with zero manual work, including recording each title's score into a history file so I can chart how buzz rises and falls over time.

What got built along the way

Once the core ranking worked, the dataset opened up a bunch of features almost for free:

  • Best anime openings & endings — ranked, and you can watch the actual OP/ED
  • Voice actor (seiyuu) filmographies, aggregated from the per-anime cast data
  • Hidden gems (high community score, low buzz), all-time rankings, airing calendar

Lessons learned

  • Percentile normalization >>> max normalization for any "score from mixed signals" problem. Outliers are the enemy.
  • JSON-in-git is underrated for read-heavy sites. No DB to run, full version history, trivial to deploy statically.
  • A brand-new domain is slow to get indexed — no amount of clever engineering substitutes for time and real links. (Still very much in that phase!)

If you're into anime, take a look and tell me what your Buzz Score gut-check gets wrong: anime.douga-summary.jp

Happy to answer any questions about the stack in the comments 👇

Top comments (0)