Building a GraphQL Video Discovery API with Strawberry and FastAPI

#graphql #python #fastapi #api

Our REST endpoint for trending videos was returning a 14-field JSON object to every client, and almost no client wanted all 14 fields. The mobile feed needed id, title, thumbnail, and view_velocity. The embeddable widget needed id, title, and embed_url. The internal trend dashboard needed the full analytics payload plus per-region breakdowns. So we did what every team does first: we added ?fields= query params, then ?include=channel, then ?expand=stats.regions, and within a quarter the endpoint had grown into a small, undocumented query language nobody on the team fully understood.

The breaking point was the European region filter. At ViralVidVault we track viral video velocity across European markets, and a GDPR-compliant analytics view means clients should be able to ask for exactly the regional aggregates they're allowed to see — and nothing more. Bolting that onto an ad-hoc REST sublanguage was a losing battle. GraphQL solves the over-fetching and under-fetching problem properly, and Strawberry plus FastAPI gives you a typed, async, Python-native way to ship it. This is how we rebuilt the discovery layer.

Why Strawberry and not Graphene

If you've touched Python GraphQL before, you probably reached for Graphene. We did too, years ago. The problem with the older code-first libraries is that the schema and your Python types drift apart — you declare a graphene.String() here and a type hint there, and nothing forces them to agree. Strawberry flips this around: you write ordinary Python dataclasses with type annotations and a decorator, and the schema is derived from those annotations. Your IDE, mypy, and the GraphQL schema all see the same source of truth.

The second reason is async. Our stack already leans on async I/O heavily — the public site runs on PHP 8.4 with SQLite in WAL mode behind LiteSpeed, but the discovery and analytics services are Python, and they spend most of their time waiting on the database and on Cloudflare Workers KV lookups. Strawberry resolvers can be async def natively, and FastAPI's ASGI server (uvicorn) runs them concurrently without a thread pool. For an API whose job is to fan out and aggregate, that matters.

Let's start with the types. Here is the core schema for a video and its trend metrics.

import strawberry
from enum import Enum
from datetime import datetime


@strawberry.enum
class Region(Enum):
    DE = "DE"
    FR = "FR"
    GB = "GB"
    ES = "ES"
    IT = "IT"
    NL = "NL"
    PL = "PL"


@strawberry.type
class TrendMetrics:
    view_velocity: float  # views per hour, last 6h window
    rank_change_24h: int   # positive = climbing
    peak_region: Region
    sampled_at: datetime


@strawberry.type
class Video:
    id: strawberry.ID
    title: str
    channel: str
    thumbnail_url: str
    embed_url: str
    published_at: datetime
    duration_seconds: int

    @strawberry.field
    async def metrics(self, info: strawberry.Info) -> TrendMetrics:
        # Resolved lazily — only computed if the client asks for it.
        loader = info.context["metrics_loader"]
        return await loader.load(self.id)

The important line is the metrics field resolver. It's an async method, and it only runs when a client actually selects metrics in their query. The mobile feed that wants four flat fields never triggers the metrics lookup at all. That single property — resolvers run on demand — is what kills the over-fetching problem at the root, without you maintaining a ?fields= whitelist by hand.

Wiring Strawberry into FastAPI

Strawberry ships an ASGI router built for exactly this. You construct a Schema, wrap it in GraphQLRouter, and mount it on your FastAPI app. The router handles the GraphQL HTTP protocol, the GraphiQL explorer in development, and — critically — lets you inject a per-request context.

import strawberry
from fastapi import FastAPI, Request
from strawberry.fastapi import GraphQLRouter

from .loaders import build_metrics_loader
from .resolvers import Query


async def get_context(request: Request) -> dict:
    # One DataLoader instance per request, never shared across requests.
    return {
        "request": request,
        "metrics_loader": build_metrics_loader(),
        "region": request.headers.get("X-VVV-Region", "GB"),
    }


schema = strawberry.Schema(query=Query)
graphql_app = GraphQLRouter(
    schema,
    context_getter=get_context,
    graphiql=True,  # disable in production
)

app = FastAPI(title="VVV Discovery API")
app.include_router(graphql_app, prefix="/graphql")

Two things deserve emphasis. First, the context is built fresh per request. This is non-negotiable for the DataLoader pattern we'll get to next — a loader that batches and caches must not leak one user's results into another request. Second, the region is pulled from a header rather than from query arguments. We set that header in a Cloudflare Worker at the edge based on the visitor's country, which keeps the region-resolution logic out of every client and gives us one consistent place to enforce GDPR data-residency rules.

The N+1 problem is not optional to solve

GraphQL makes the N+1 query problem trivially easy to create. Consider a query that asks for the top 50 trending videos and, for each, its metrics. The naive implementation fires one query for the video list, then 50 separate queries for the metrics — 51 round trips to satisfy one request. Under load, this is how you take a database down.

The fix is the DataLoader pattern. A DataLoader collects all the individual .load(id) calls made within a single tick of the event loop, then dispatches them as one batched query. Strawberry has first-class support for it.

from strawberry.dataloader import DataLoader
from typing import Sequence
import aiosqlite


async def _batch_load_metrics(ids: Sequence[str]) -> list["TrendMetrics"]:
    placeholders = ",".join("?" for _ in ids)
    query = f"""
        SELECT video_id, view_velocity, rank_change_24h,
               peak_region, sampled_at
        FROM trend_metrics
        WHERE video_id IN ({placeholders})
    """
    async with aiosqlite.connect("discovery.db") as db:
        db.row_factory = aiosqlite.Row
        rows = await db.execute_fetchall(query, tuple(ids))

    by_id = {
        r["video_id"]: TrendMetrics(
            view_velocity=r["view_velocity"],
            rank_change_24h=r["rank_change_24h"],
            peak_region=Region(r["peak_region"]),
            sampled_at=datetime.fromisoformat(r["sampled_at"]),
        )
        for r in rows
    }
    # CRITICAL: return results in the SAME ORDER as the input ids.
    return [by_id[i] for i in ids]


def build_metrics_loader() -> DataLoader:
    return DataLoader(load_fn=_batch_load_metrics)

That IN (...) query turns 50 round trips into one. The non-obvious rule that trips people up: the batch function must return one result per input id, in the same order as the ids it was given. DataLoader maps the returned list back to the original .load() calls positionally. If your SQL returns rows in a different order, or drops rows for missing ids, the mapping silently corrupts. Building a dict and re-indexing by the input order, as above, is the safe pattern. If a metric might genuinely be missing, return None in that slot and make the field Optional rather than letting the dict lookup raise.

Because our analytics tables live in SQLite with WAL mode enabled, concurrent reads don't block the writer that ingests fresh view counts every few minutes. The batched IN read is a single fast index scan, and WAL means the cron job populating trend_metrics never stalls the API.

Designing the query for discovery, not CRUD

A discovery API is not a CRUD API. The root query should express the questions people actually ask: "what's trending in France right now," "what's climbing fastest in the gaming category," "give me videos similar to this one." Here's the root Query type with a paginated, filterable trending feed.

import strawberry
from typing import Optional


@strawberry.input
class TrendingFilter:
    region: Optional[Region] = None
    category: Optional[str] = None
    min_velocity: float = 0.0
    published_after: Optional[datetime] = None


@strawberry.type
class VideoEdge:
    cursor: str
    node: Video


@strawberry.type
class VideoConnection:
    edges: list[VideoEdge]
    has_next_page: bool
    total_count: int


@strawberry.type
class Query:
    @strawberry.field
    async def trending(
        self,
        info: strawberry.Info,
        first: int = 20,
        after: Optional[str] = None,
        filters: Optional[TrendingFilter] = None,
    ) -> VideoConnection:
        first = min(first, 100)  # never let a client ask for 10k rows
        region = filters.region if filters else info.context["region"]
        repo = info.context["request"].app.state.video_repo
        return await repo.trending(
            first=first, after=after, region=region, filters=filters
        )

A few decisions baked in here that I'd defend in review. The first argument is clamped server-side to 100 — GraphQL gives clients enormous flexibility, and you must put a ceiling on it or someone will request first: 100000 and depth-bomb your database. We use cursor-based pagination (after + opaque cursor) rather than offset/limit, because the trending feed reorders constantly; offset pagination on a moving dataset skips and duplicates rows. And the region defaults to the edge-resolved context value when the client doesn't override it, so the GDPR-correct regional scope is the path of least resistance rather than something each client has to remember.

Query depth and cost limits keep you alive

The flip side of GraphQL's flexibility is that a single malicious or careless query can be catastrophically expensive. A client can nest video { related { related { related { ... } } } } and force exponential work. Before you expose a GraphQL endpoint publicly, you need two guards: a maximum query depth and a complexity/cost budget.

Strawberry supports both through schema extensions. The depth limiter rejects queries nested past a threshold; the cost analyzer assigns a weight to each field and rejects queries whose total estimated cost exceeds a budget.

import strawberry
from strawberry.extensions import QueryDepthLimiter, AddValidationRules
from graphql.validation import ValidationRule


class DisableIntrospectionInProd(ValidationRule):
    # Block schema introspection on the public endpoint.
    def enter_field(self, node, *_args):
        if node.name.value in ("__schema", "__type"):
            self.report_error(
                GraphQLError("Introspection is disabled.", node)
            )


schema = strawberry.Schema(
    query=Query,
    extensions=[
        QueryDepthLimiter(max_depth=8),
        AddValidationRules([DisableIntrospectionInProd]),
    ],
)

Depth 8 is generous for a discovery API — most real queries are two or three levels deep. Anything past eight is either a mistake or an attack, and rejecting it at validation time means it never touches a resolver or the database. We disable introspection on the public endpoint too: there's no reason to hand an attacker a full map of the schema, and our trusted clients ship with a generated, version-pinned schema document anyway. In staging we keep introspection and GraphiQL on, because the developer experience is genuinely excellent.

Persisted queries are the next step beyond depth limiting, and worth mentioning even though we run them as a follow-up rather than day one. Instead of accepting arbitrary query strings, the client sends a hash; the server only executes queries from a pre-registered allowlist built at the client's deploy time. That converts your public GraphQL surface from "anyone can ask anything" into "clients can run exactly the queries we shipped," which is a far smaller attack surface and lets the edge cache responses by hash.

Caching at the edge

GraphQL has a reputation for being un-cacheable because everything is a POST to a single URL. That reputation is half-true and half-laziness. You absolutely can cache GraphQL, you just have to be deliberate. Our trending feed changes on the order of minutes, not seconds, so a short edge cache is safe and enormously effective.

We run the cache in a Cloudflare Worker sitting in front of the FastAPI origin. The Worker normalizes the query, hashes it together with the resolved region, and caches the JSON response for a short TTL. The Go-style logic below is the shape of the Worker (written here in Go for a clear, typed illustration of the cache-key derivation; the production Worker is the JavaScript equivalent).

package main

import (
    "crypto/sha256"
    "encoding/hex"
    "strings"
)

// cacheKey derives a stable key from the GraphQL query body and the
// edge-resolved region. Whitespace is collapsed so cosmetic
// differences don't fragment the cache.
func cacheKey(query, region string) string {
    normalized := strings.Join(strings.Fields(query), " ")
    raw := region + "\x00" + normalized
    sum := sha256.Sum256([]byte(raw))
    return "gql:" + hex.EncodeToString(sum[:])
}

// cacheable returns true only for read queries. Mutations and any
// query carrying an auth token bypass the cache entirely.
func cacheable(method, query, authHeader string) bool {
    if method != "POST" {
        return false
    }
    if authHeader != "" {
        return false
    }
    return !strings.Contains(query, "mutation")
}

The two rules that keep this honest: the region is part of the cache key, so a French visitor never sees a cached German feed, and anything carrying an auth header bypasses the cache so personalized or rights-restricted data never lands in a shared edge cache. With a 90-second TTL, the overwhelming majority of anonymous trending requests are served from Cloudflare's edge and never reach Python at all. The origin only does real work when the cache is cold or a query is novel.

What GraphQL actually bought us

The migration wasn't free — DataLoaders, depth limits, and cache-key discipline are real work, and a naive GraphQL deployment is genuinely more dangerous than a boring REST endpoint. But six months in, the wins are concrete:

The ?fields= sublanguage is gone. Clients select exactly what they need, and adding a field to the schema doesn't bloat anyone's payload.
One round trip replaces many. The trend dashboard used to make four REST calls (videos, metrics, channel, regional breakdown) and stitch them client-side. It now makes one GraphQL query, and DataLoaders keep the backend query count flat.
The schema is the contract. Strawberry derives it from typed Python, so the API documentation, the client codegen, and the server can't silently disagree.
Regional scoping lives in one place. The edge resolves the region, the context carries it, and the default path is the GDPR-correct one.

If you're sitting on a REST endpoint that has slowly grown its own query language, that's the signal you've outgrown REST for that surface. Start with the read path only — leave writes on REST if you like — put a DataLoader behind every list-of-children field, clamp pagination and depth before you go public, and cache aggressively at the edge with the region in the key. Strawberry and FastAPI make the typed, async core of that pleasant to write; the discipline around limits and caching is what makes it safe to run.