Building a Production Video Metadata Service with Litestar and Async Python

#python #litestar #async #performance

The 9 ms wall PHP can't break

Every viral video we surface at ViralVidVault carries about 14 fields of metadata - view counts, channel handles, duration, age-restriction flags, regional availability, the music license shape that Cloudflare Workers in our edge filter cares about. The PHP 8.4 + SQLite WAL stack serves a category page in 9-12 ms on warm OPcache, which is honestly faster than most of the JavaScript frameworks I keep being told to migrate to. The bottleneck was never SQL. The bottleneck was the metadata pipeline behind it: 30-40 outbound YouTube Data API calls every 90 seconds, plus geo-replicated rechecks for European availability rules, plus a side channel that signs IndexNow pings to Bing and Yandex.

PHP can do this. PHP did do this for ten months. But pcntl_fork and a swoole-shaped event loop made the operational surface area uncomfortable for a one-person ops team. curl_multi with 40 handles per worker melted memory in unpredictable ways under LiteSpeed's process model, and the few times it OOMed it took down the page cache warm-up job with it. I went looking for an async runtime that I could put in front of a thin worker queue, give three CPUs to, and forget about. After a month of evaluation I chose Litestar (the project formerly known as Starlite). This article walks through how the metadata service is built, why Litestar beat FastAPI for our shape of workload, the parts that were genuinely painful, and the parts that turned out to be free.

Why Litestar over FastAPI

I want to be precise about this comparison because it gets dragged into tribal arguments that don't help anyone. FastAPI is excellent. It's also where most of the async Python ecosystem standardized. The reason Litestar fit better for a video metadata service has nothing to do with raw benchmarks - both are within noise of each other for a single endpoint - and everything to do with three concrete architectural choices.

First, Litestar's dependency injection treats lifecycle scopes as first-class. I have an aiosqlite connection pool that wants to live for the lifetime of the app, a per-request unit-of-work, and a per-handler YouTube API quota tracker. In FastAPI I ended up writing helper functions or context managers to wire these together. In Litestar the DI container handles app, request, and connection scopes natively, and they nest correctly when a route depends on a service that depends on a pool.

Second, Litestar's structured concurrency around before_send and after_response hooks is more predictable than FastAPI's middleware ordering. When you're flushing telemetry events to a sidecar after the response is sent, you don't want to wonder which middleware layer is going to swallow your exception. The lifecycle is documented as a sequence of well-known points in the ASGI flow, and once you internalize it you can reason about exactly when your code will run.

Third, Litestar has built-in support for msgspec, which serializes our 14-field metadata struct around 5x faster than pydantic v2 in our internal benchmarks. For a service whose hot path is "deserialize JSON from YouTube, transform, serialize to our internal protocol," that matters. We measured 38% lower CPU on a synthetic 50k req/min load test purely from the serializer change.

The migration cost was about three days of code changes plus a week of running both stacks side by side behind a load balancer split. I did not have to retrain my brain. If you know ASGI, you know Litestar.

Designing the metadata service boundary

The metadata service is a separate Python process from the PHP webapp. They talk over a Unix domain socket with msgpack-encoded requests. This boundary is the most important design decision in the whole project because it means the PHP side stays synchronous, simple, and cache-friendly, while the async machinery is sealed inside a single binary that I can restart without touching the website. The blast radius of a Python bug stops at the socket, and the blast radius of a LiteSpeed worker recycle stops before it touches my YouTube quota state.

Here is the core handler that the PHP layer hits. It batches a request for multiple video IDs and returns enriched metadata, hitting our SQLite read replica first and falling back to YouTube only for stale or missing rows:

from __future__ import annotations
import asyncio
from typing import Annotated
import msgspec
from litestar import Litestar, post
from litestar.di import Provide
from litestar.params import Body

from app.deps import get_pool, get_youtube_client, get_quota_tracker
from app.models import VideoMetadata, MetadataBatchRequest

class MetadataBatchResponse(msgspec.Struct):
    fresh: list[VideoMetadata]
    stale_refreshed: list[VideoMetadata]
    quota_remaining: int

@post(
    "/v1/metadata/batch",
    dependencies={
        "pool": Provide(get_pool, sync_to_thread=False),
        "yt": Provide(get_youtube_client, sync_to_thread=False),
        "quota": Provide(get_quota_tracker, sync_to_thread=False),
    },
)
async def batch_metadata(
    data: Annotated[MetadataBatchRequest, Body()],
    pool,
    yt,
    quota,
) -> MetadataBatchResponse:
    rows = await pool.fetch_many(data.video_ids, max_age_seconds=3600)
    fresh = [r for r in rows if not r.is_stale]
    stale_ids = [r.video_id for r in rows if r.is_stale]
    known = {r.video_id for r in rows}
    missing_ids = [vid for vid in data.video_ids if vid not in known]

    to_fetch = stale_ids + missing_ids
    refreshed: list[VideoMetadata] = []

    if to_fetch:
        async with quota.reserve(units=len(to_fetch)) as reservation:
            if reservation.granted:
                chunks = [to_fetch[i:i + 50] for i in range(0, len(to_fetch), 50)]
                results = await asyncio.gather(
                    *(yt.fetch_chunk(c) for c in chunks),
                    return_exceptions=True,
                )
                for result in results:
                    if isinstance(result, Exception):
                        continue
                    refreshed.extend(result)
                if refreshed:
                    await pool.upsert_many(refreshed)

    return MetadataBatchResponse(
        fresh=fresh,
        stale_refreshed=refreshed,
        quota_remaining=quota.remaining(),
    )

app = Litestar(route_handlers=[batch_metadata])

Three things worth pointing out. The quota.reserve async context manager is a critical correctness detail - YouTube's Data API quota is global per project, so two coroutines racing to spend the last 50 units will get one HTTP 403 each unless something coordinates them. We learned this the hard way in week two of production. Second, return_exceptions=True on the gather is intentional - one YouTube chunk failing should not poison the whole batch, the dead chunk just stays stale until next request. Third, we never await pool.upsert_many inside the quota reservation, because writing to SQLite WAL should not block the API call slot.

Async all the way down with httpx and aiosqlite

The pool implementation is where Python's async story gets opinionated. aiosqlite is a thin wrapper around the synchronous sqlite3 module that runs queries in a background thread, which sounds wrong but is actually correct: SQLite I/O is already serialized at the file level under WAL mode, and the wrapper gives us the await syntax we need to compose with httpx without blocking the event loop.

Our pool is hand-rolled because the off-the-shelf options either don't expose WAL pragmas or insist on connection-per-request semantics that are wasteful for a long-lived process. Here's the meaningful part:

import asyncio
import aiosqlite
import msgspec
from contextlib import asynccontextmanager
from typing import AsyncIterator

class SQLitePool:
    def __init__(self, path: str, size: int = 4):
        self._path = path
        self._size = size
        self._queue: asyncio.Queue[aiosqlite.Connection] = asyncio.Queue(maxsize=size)
        self._initialized = False

    async def init(self) -> None:
        for _ in range(self._size):
            conn = await aiosqlite.connect(self._path, isolation_level=None)
            await conn.execute("PRAGMA journal_mode=WAL")
            await conn.execute("PRAGMA synchronous=NORMAL")
            await conn.execute("PRAGMA temp_store=MEMORY")
            await conn.execute("PRAGMA mmap_size=268435456")
            await conn.execute("PRAGMA busy_timeout=5000")
            conn.row_factory = aiosqlite.Row
            await self._queue.put(conn)
        self._initialized = True

    @asynccontextmanager
    async def acquire(self) -> AsyncIterator[aiosqlite.Connection]:
        conn = await self._queue.get()
        try:
            yield conn
        finally:
            await self._queue.put(conn)

    async def fetch_many(self, video_ids: list[str], max_age_seconds: int):
        placeholders = ",".join("?" * len(video_ids))
        sql = (
            f"SELECT video_id, payload, fetched_at, "
            f"(strftime('%s','now') - fetched_at) AS age "
            f"FROM video_metadata WHERE video_id IN ({placeholders})"
        )
        async with self.acquire() as conn:
            cursor = await conn.execute(sql, video_ids)
            rows = await cursor.fetchall()
            await cursor.close()
        return [_row_to_metadata(r, max_age_seconds) for r in rows]

    async def upsert_many(self, items) -> None:
        sql = (
            "INSERT INTO video_metadata (video_id, payload, fetched_at) "
            "VALUES (?, ?, strftime('%s','now')) "
            "ON CONFLICT(video_id) DO UPDATE SET "
            "payload=excluded.payload, fetched_at=excluded.fetched_at"
        )
        async with self.acquire() as conn:
            await conn.execute("BEGIN IMMEDIATE")
            try:
                await conn.executemany(
                    sql,
                    [(i.video_id, msgspec.json.encode(i)) for i in items],
                )
                await conn.execute("COMMIT")
            except Exception:
                await conn.execute("ROLLBACK")
                raise

A few pragmas earn their keep here. The 256 MB mmap and synchronous=NORMAL settings are not safe for financial workloads, but for a viral-video metadata cache where the worst-case loss is "we refetch from YouTube," they are exactly right. They cut p99 SQLite latency from 14 ms to under 2 ms. busy_timeout=5000 is a cheap insurance policy against the rare contention spike when the cron-driven backfill job runs at the same time as a burst of organic traffic.

The other choice that matters is isolation_level=None on the aiosqlite connection. By default, Python's sqlite3 module wraps DML statements in implicit transactions and commits on the next non-DML statement, which interacts badly with explicit BEGIN IMMEDIATE. Setting it to None gives us autocommit semantics, and we manage transactions explicitly where we need them.

Caching layers and Cloudflare Workers coordination

The metadata service does not directly serve traffic to browsers. Three layers sit in front of it. Cloudflare Workers handle the edge, LiteSpeed page cache holds the rendered HTML for category and watch pages, and the PHP layer has its own data cache for sub-second hits. Each layer has a different invalidation story, and the metadata service is the source of truth that the Workers layer consults when it needs to decide whether a cached entry should be revalidated.

The Workers integration is small but load-bearing. When a viral video first appears on a trending list, the Worker fetches metadata directly from our service (bypassing the PHP layer) so it can attach geo-availability hints to the cache key. That avoids the situation where a French viewer and a German viewer get the same cached response for a video that is blocked in Germany. The Worker code is straightforward:

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const videoId = url.pathname.split('/').pop();
    const country = request.cf?.country || 'XX';

    const cacheKey = new Request(
      `https://cache.viralvidvault.com/v/${videoId}/${country}`,
      request,
    );
    const cache = caches.default;

    let response = await cache.match(cacheKey);
    if (response) return response;

    const metaResp = await fetch(
      `${env.METADATA_ORIGIN}/v1/metadata/batch`,
      {
        method: 'POST',
        headers: { 'Content-Type': 'application/msgpack' },
        body: encodeMsgpack({ video_ids: [videoId] }),
      },
    );

    if (!metaResp.ok) {
      return new Response('upstream error', { status: 502 });
    }

    const meta = await decodeMsgpack(metaResp);
    const item = meta.fresh[0] || meta.stale_refreshed[0];

    if (!item || (item.blocked_countries || []).includes(country)) {
      return new Response('not available', { status: 451 });
    }

    response = new Response(JSON.stringify(item), {
      headers: {
        'Content-Type': 'application/json',
        'Cache-Control': `public, max-age=${item.ttl}, stale-while-revalidate=600`,
        'X-Vault-Edge-Country': country,
      },
    });

    await cache.put(cacheKey, response.clone());
    return response;
  },
};

The stale-while-revalidate is what lets us treat the metadata service as a slow-but-correct dependency without hurting tail latency. Workers serve the stale entry instantly and refresh in the background. The metadata service's job is to make sure the background refresh has fresh upstream data, which is exactly what Litestar's async batch handler provides.

On the PHP side, the integration is even simpler - we call the metadata service over a Unix socket using a thin wrapper:

<?php
declare(strict_types=1);

final class MetadataClient
{
    private const SOCKET_PATH = '/run/metadata.sock';
    private const TIMEOUT_SECONDS = 2;

    public function __construct(private readonly LoggerInterface $logger) {}

    /**
     * @param list<string> $videoIds
     * @return array<string, array<string, mixed>>
     */
    public function fetchBatch(array $videoIds): array
    {
        if ($videoIds === []) {
            return [];
        }

        $sock = stream_socket_client(
            'unix://' . self::SOCKET_PATH,
            $errno,
            $errstr,
            self::TIMEOUT_SECONDS,
        );

        if ($sock === false) {
            $this->logger->warning('metadata.sock unreachable', [
                'errno' => $errno,
                'errstr' => $errstr,
            ]);
            return [];
        }

        stream_set_timeout($sock, self::TIMEOUT_SECONDS);

        $payload = msgpack_pack(['video_ids' => array_values($videoIds)]);
        fwrite($sock, pack('N', strlen($payload)) . $payload);

        $header = fread($sock, 4);
        if ($header === false || strlen($header) !== 4) {
            fclose($sock);
            return [];
        }
        $len = unpack('N', $header)[1];
        $body = '';
        while (strlen($body) < $len) {
            $chunk = fread($sock, $len - strlen($body));
            if ($chunk === false || $chunk === '') break;
            $body .= $chunk;
        }
        fclose($sock);

        $decoded = msgpack_unpack($body);
        $out = [];
        foreach (($decoded['fresh'] ?? []) as $item) {
            $out[$item['video_id']] = $item;
        }
        foreach (($decoded['stale_refreshed'] ?? []) as $item) {
            $out[$item['video_id']] = $item;
        }
        return $out;
    }
}

This client is what every PHP controller talks to when it needs metadata that is not already in the LiteSpeed page cache. The 2-second timeout is generous - 95% of calls finish in under 8 ms because the metadata service responds from its in-memory queue or warm SQLite pages. The remaining 5% are first-time-seen video IDs that trigger an actual YouTube call, and the timeout cushion handles that without taking down the PHP request.

GDPR-compliant telemetry without sacrificing observability

Running a service that touches European visitors means I cannot ship raw request payloads to a third-party APM. I can ship aggregated metrics, latency histograms, error rates - but anything that identifies a viewer needs to be either anonymized at the edge or stay on my own hardware. Litestar's lifecycle hooks made this surprisingly clean to enforce.

The shape that works for us:

Every request gets a correlation_id (a random 96-bit value with no user-derived entropy).
The after_response hook ships an event to a local in-memory ring buffer.
A separate background task flushes 1-minute aggregates to a self-hosted VictoriaMetrics instance.
No IPs, no user agents, no referer headers leave the box. Just request counts, p50/p95/p99 latency, quota burn rate, and error class.

import secrets
import time
from collections import deque
from litestar import Request
from litestar.types import Scope

_event_buffer: deque = deque(maxlen=10000)

async def before_request(scope: Scope) -> None:
    scope["state"]["correlation_id"] = secrets.token_hex(12)
    scope["state"]["start_ns"] = time.monotonic_ns()

async def after_response(request: Request) -> None:
    elapsed_ms = (time.monotonic_ns() - request.scope["state"]["start_ns"]) / 1_000_000
    _event_buffer.append({
        "ts": int(time.time()),
        "route": request.scope["route_handler"].handler_name,
        "status": request.scope.get("response_status", 0),
        "elapsed_ms": elapsed_ms,
        "cid": request.scope["state"]["correlation_id"],
    })

The cid is the only thing that could theoretically be correlated to a request, and it lives in memory for at most 60 seconds before the aggregation task collapses it into bucket counts. We use it during incident investigation - a 502 from a Worker carries the cid in a response header, and we can grep the in-memory buffer for it before it ages out. After that, the only trace is anonymous histograms.

This pattern would have been more invasive in FastAPI, where you typically use middleware classes that have to negotiate ordering with the auth and CORS middlewares. Litestar's lifecycle hooks happen at well-defined points in the ASGI flow and do not fight with anything else.

Deployment, observability, and what I would do differently

The metadata service runs as a single systemd unit with three uvicorn workers behind a Unix socket. uvicorn binds the socket, systemd restricts its file mode to 0660, and the LiteSpeed PHP process runs as a group member that can read and write it. There is no TCP socket exposed anywhere, which removes a category of security concerns and makes the firewall config trivially small. Restarts use systemd's ExecReload with a SIGHUP to uvicorn's master, which drains in-flight requests before swapping workers - the PHP client sees at most a single 2-second timeout during a deploy, and the Cloudflare Worker's stale-while-revalidate covers that gap from the viewer's perspective.

For observability beyond the GDPR-clean histograms, I rely on three things:

Structured JSON logs to stdout, which systemd-journald captures and rotates.
A /healthz endpoint that checks SQLite pool depth and last successful YouTube call timestamp.
A manual metadata-cli command (a tiny click-based binary in the same venv) that I run from SSH when something feels wrong.

The CLI dumps the in-memory buffer, the quota tracker state, and the connection pool stats. Nine times out of ten that is enough to diagnose any operational anomaly without needing to attach a debugger or ship logs anywhere else.

If I were starting over today I would do three things differently. I would put the YouTube quota tracker in Redis from day one instead of in-process, because the day we needed two metadata-service instances for failover we had to retrofit a distributed quota implementation in a hurry. I would use a proper migration tool from the start instead of hand-rolled ALTER TABLE calls - I picked yoyo-migrations eventually and it works fine, but the first month of schema drift between dev and prod was unpleasant. And I would write the msgpack client in PHP as a phpext, because the userland implementation we use loses about 0.4 ms per call to PHP loop overhead, and at our request volumes that adds up to noticeable CPU on the LiteSpeed workers.

The Litestar choice itself has aged extremely well. We have not had a single framework-level incident in eleven months of production. The maintainer community is responsive, the docs are honest about what is and is not supported, and the upgrade path from 2.x has been smooth. For anyone building a service-boundary process in Python that has to coordinate async I/O across two or three different upstream APIs, I would unhesitatingly recommend it over the alternatives I evaluated. The PHP + SQLite + LiteSpeed + Cloudflare front end remains beautifully boring, and the boring part of the stack is exactly where I want my edge.