Using SurrealDB Graph Queries for Video Recommendation at Scale

#surrealdb #graphdb #php #backend

Last quarter I hit a wall I had been avoiding for two years. The recommendation layer on TrendVidStream — a multi-region streaming discovery site running on PHP 8.4, SQLite FTS5, and a fleet of cron jobs shipping content over FTP across eight regions (US, GB, JP, KR, TW, SG, VN, TH, HK) — was producing recommendations that were technically correct and emotionally dead. A user who watched a Korean indie thriller would be served Korean indie thrillers. A user who watched three K-pop reaction videos would be drowned in K-pop reaction videos. The relevance metric looked great in the dashboard. The session length was collapsing.

The problem was structural. SQLite FTS5 is brilliant for text search and acceptable for tag overlap, but the moment you ask a question like what do people who watched this also watch, weighted by region affinity, decayed over the last 14 days, and filtered through creators they have not yet seen, you are no longer doing search. You are doing graph traversal. I had been simulating graph traversal with three nested subqueries and a materialized co-occurrence table that we rebuilt nightly. It worked. It also took 380ms per recommendation on the warm path and 2.1 seconds on the cold path. For a discovery site where the first page must render under 200ms, that is unacceptable.

This article is about what happened when I moved the recommendation layer to SurrealDB and used its graph query syntax to express the traversals natively. I am not going to pitch SurrealDB as a replacement for your primary database — SQLite still owns the content catalog and the FTS index on TrendVidStream. But for the recommendation graph specifically, the impact was large enough that I want to write it down while it is fresh.

Why a graph database, and why SurrealDB specifically

The recommendation problem on a video discovery site decomposes into a handful of relationships: users watched videos, videos belong to categories, videos were uploaded by creators, videos are popular in regions, and videos are similar to other videos via embedding distance. Every one of those is an edge. Every meaningful recommendation query is a traversal over two or three of those edges with weights attached.

You can model this in a relational store. I did, for two years. The cost is that every query becomes a join pyramid, and every join pyramid forces the planner to make choices that are correct on average and wrong on the long tail. The cold-path 2.1-second query was the planner picking the wrong join order for a user with an unusual viewing history.

I evaluated three options before settling. Neo4j is the obvious choice but the licensing and operational footprint were heavier than I wanted for a project that runs on shared LiteSpeed hosting. Dgraph has a beautiful query language but the community has thinned out and I did not want to bet on it. SurrealDB sat in a sweet spot: it speaks a SQL-adjacent dialect that my brain already understands, it supports graph edges as first-class citizens with the -> and <- traversal operators, it ships as a single binary, and it has a PHP client that does not feel like an afterthought.

The specific feature that sold me was the RELATE statement. In SurrealDB you do not model edges as a separate table you have to remember to join. You write RELATE user:alice -> watched -> video:abc123 SET weight = 0.8, region = 'JP', at = time::now() and the edge exists as a queryable, traversable, attribute-bearing thing. The query language then lets you walk it: SELECT ->watched->video->similar_to->video AS recs FROM user:alice. That single line replaces a 40-line query in my old setup.

Modeling the graph

Before writing any traversal queries I had to decide what the graph actually looked like. The temptation when you first touch a graph database is to model everything as edges. Resist it. Edges are expensive to write and only worth it when you will actually traverse them. Attributes that you only ever read alongside the node should stay on the node.

Here is the schema I settled on, expressed as the bootstrap script I run when standing up a fresh SurrealDB instance for a new region:

import asyncio
from surrealdb import Surreal

SCHEMA = """
DEFINE TABLE video SCHEMAFULL;
DEFINE FIELD title ON video TYPE string;
DEFINE FIELD duration_sec ON video TYPE int;
DEFINE FIELD region ON video TYPE string ASSERT $value IN ['US','GB','JP','KR','TW','SG','VN','TH','HK'];
DEFINE FIELD published_at ON video TYPE datetime;
DEFINE FIELD view_count ON video TYPE int DEFAULT 0;
DEFINE INDEX video_region ON video FIELDS region;
DEFINE INDEX video_published ON video FIELDS published_at;

DEFINE TABLE creator SCHEMAFULL;
DEFINE FIELD handle ON creator TYPE string;
DEFINE FIELD primary_region ON creator TYPE string;

DEFINE TABLE category SCHEMAFULL;
DEFINE FIELD slug ON category TYPE string;

DEFINE TABLE user SCHEMAFULL;
DEFINE FIELD anon_id ON user TYPE string;
DEFINE FIELD home_region ON user TYPE string;
DEFINE FIELD created_at ON user TYPE datetime DEFAULT time::now();

DEFINE TABLE watched SCHEMAFULL TYPE RELATION FROM user TO video;
DEFINE FIELD weight ON watched TYPE float DEFAULT 1.0;
DEFINE FIELD region ON watched TYPE string;
DEFINE FIELD at ON watched TYPE datetime DEFAULT time::now();

DEFINE TABLE uploaded_by SCHEMAFULL TYPE RELATION FROM video TO creator;
DEFINE TABLE in_category SCHEMAFULL TYPE RELATION FROM video TO category;
DEFINE TABLE similar_to SCHEMAFULL TYPE RELATION FROM video TO video;
DEFINE FIELD score ON similar_to TYPE float;
\"""

async def bootstrap(region: str):
    async with Surreal(f"ws://surreal-{region}.internal:8000/rpc") as db:
        await db.signin({"user": "root", "pass": "redacted"})
        await db.use("trendvid", region)
        for stmt in SCHEMA.strip().split(";"):
            stmt = stmt.strip()
            if stmt:
                await db.query(stmt + ";")
        print(f"bootstrapped {region}")

if __name__ == "__main__":
    for r in ["US","GB","JP","KR","TW","SG","VN","TH","HK"]:
        asyncio.run(bootstrap(r))

Two things worth flagging. First, I run a separate SurrealDB namespace per region. The recommendation graph for Japan is genuinely different from the graph for Germany — different creators, different category weights, different viewing patterns. Cramming them into one database meant every query needed a WHERE region = ? filter and the planner had to work harder. Separate databases per region mean smaller indexes and faster traversals. The trade-off is that cross-region recommendations require a federated query at the application layer, but those are rare enough on a discovery site that I do not mind.

Second, the similar_to edge is precomputed offline from video embeddings. I do not generate embeddings inside SurrealDB. I use a nightly job that pulls the latest video corpus, runs it through a sentence-transformer model, computes cosine similarity, and writes the top 50 neighbors per video back as edges. SurrealDB is the wrong place to do vector math at scale. It is the right place to traverse the results.

The traversal that changed everything

Here is the core recommendation query, the one that replaced 40 lines of nested SQL and a materialized table:

SELECT
  ->watched->video->similar_to->video AS direct_similar,
  ->watched->video<-watched<-user->watched->video AS collaborative,
  ->watched->video->in_category->category<-in_category<-video AS category_neighbors
FROM user:⟨anon_id⟩
FETCH direct_similar, collaborative, category_neighbors;

Read it left to right. Starting from the user node, walk the watched edge out to videos. From those videos, walk three different paths in parallel: direct similarity edges, other users who watched the same videos and what else they watched, and the categories of those videos and other videos in the same categories. The FETCH clause pulls the full video records back instead of just IDs.

This is a single query. It returns three lists of candidate videos. The application layer then merges, dedupes, scores, and ranks them. On warm cache the query returns in 18-30ms for users with under 200 watched videos, and 60-90ms for heavy users with 2000+ watched videos. On cold cache the worst I have seen is 240ms, and that was for a user whose watch history spans all eight regions.

The scoring layer is in PHP because that is where the rest of TrendVidStream lives. It looks like this:

<?php
declare(strict_types=1);

final class RecommendationScorer
{
    private const WEIGHT_DIRECT = 0.55;
    private const WEIGHT_COLLAB = 0.30;
    private const WEIGHT_CATEGORY = 0.15;
    private const REGION_BONUS = 0.20;
    private const DECAY_HALFLIFE_DAYS = 14.0;

    public function __construct(
        private readonly string $userRegion,
        private readonly array $alreadyWatched,
    ) {}

    public function score(array $surrealResult): array
    {
        $candidates = [];
        $this->absorb($candidates, $surrealResult['direct_similar'] ?? [], self::WEIGHT_DIRECT);
        $this->absorb($candidates, $surrealResult['collaborative'] ?? [], self::WEIGHT_COLLAB);
        $this->absorb($candidates, $surrealResult['category_neighbors'] ?? [], self::WEIGHT_CATEGORY);

        foreach ($candidates as $id => &$entry) {
            if (isset($this->alreadyWatched[$id])) {
                unset($candidates[$id]);
                continue;
            }
            if (($entry['video']['region'] ?? null) === $this->userRegion) {
                $entry['score'] += self::REGION_BONUS;
            }
            $ageDays = $this->ageInDays($entry['video']['published_at'] ?? null);
            if ($ageDays !== null) {
                $entry['score'] *= 2 ** (-$ageDays / self::DECAY_HALFLIFE_DAYS);
            }
        }

        uasort($candidates, fn($a, $b) => $b['score'] <=> $a['score']);
        return array_slice($candidates, 0, 24, preserve_keys: true);
    }

    private function absorb(array &$bucket, array $videos, float $weight): void
    {
        foreach ($videos as $video) {
            $id = $video['id'] ?? null;
            if ($id === null) continue;
            if (!isset($bucket[$id])) {
                $bucket[$id] = ['video' => $video, 'score' => 0.0];
            }
            $bucket[$id]['score'] += $weight;
        }
    }

    private function ageInDays(?string $iso): ?float
    {
        if ($iso === null) return null;
        $then = strtotime($iso);
        if ($then === false) return null;
        return max(0.0, (time() - $then) / 86400.0);
    }
}

The scorer is intentionally boring. All the cleverness happens in the graph traversal. The PHP layer just merges the three buckets with fixed weights, applies a region bonus when the video matches the user's home region, and decays by published age with a 14-day half-life. I tried fancier scoring — logistic regression on click-through, learned weights per region — and the difference in session length was inside the noise. A boring scorer over a smart graph beats a smart scorer over a flat table.

Backfilling the watch edges from existing data

The migration was the part I was most nervous about. We had two years of watch events in SQLite, roughly 180 million rows. Naive insertion into SurrealDB at 1000 edges per second would take two and a half days. I needed it to take under six hours so I could do it in a single overnight maintenance window.

The trick was batching RELATE statements and shipping them as a single multi-statement query per network round-trip. SurrealDB happily executes 500 statements per call without complaint, and the bottleneck moves from the network to the disk. Here is the backfill worker I wrote:

package main

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "strings"
    "sync"
    "time"

    _ "github.com/mattn/go-sqlite3"
    "github.com/surrealdb/surrealdb.go"
)

const batchSize = 500
const workerCount = 8

type watchRow struct {
    UserID  string
    VideoID string
    Region  string
    Weight  float64
    At      time.Time
}

func main() {
    sqliteDB, err := sql.Open("sqlite3", "file:catalog.db?mode=ro&_journal=WAL")
    if err != nil {
        log.Fatal(err)
    }
    defer sqliteDB.Close()

    jobs := make(chan []watchRow, workerCount*2)
    var wg sync.WaitGroup

    for i := 0; i < workerCount; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            sdb, err := surrealdb.New("ws://surreal.internal:8000/rpc")
            if err != nil {
                log.Fatalf("worker %d: %v", id, err)
            }
            if _, err := sdb.Signin(map[string]any{"user": "root", "pass": "redacted"}); err != nil {
                log.Fatal(err)
            }
            sdb.Use("trendvid", "global")
            for batch := range jobs {
                if err := flush(sdb, batch); err != nil {
                    log.Printf("worker %d batch error: %v", id, err)
                }
            }
        }(i)
    }

    rows, err := sqliteDB.QueryContext(context.Background(),
        `SELECT user_anon_id, video_id, region, weight, watched_at FROM watch_events ORDER BY watched_at`)
    if err != nil {
        log.Fatal(err)
    }
    defer rows.Close()

    batch := make([]watchRow, 0, batchSize)
    count := 0
    start := time.Now()
    for rows.Next() {
        var r watchRow
        if err := rows.Scan(&r.UserID, &r.VideoID, &r.Region, &r.Weight, &r.At); err != nil {
            log.Fatal(err)
        }
        batch = append(batch, r)
        if len(batch) >= batchSize {
            jobs <- batch
            batch = make([]watchRow, 0, batchSize)
        }
        count++
        if count%100000 == 0 {
            rate := float64(count) / time.Since(start).Seconds()
            log.Printf("%d rows shipped, %.0f/sec", count, rate)
        }
    }
    if len(batch) > 0 {
        jobs <- batch
    }
    close(jobs)
    wg.Wait()
    log.Printf("done: %d rows in %s", count, time.Since(start))
}

func flush(sdb *surrealdb.DB, batch []watchRow) error {
    var sb strings.Builder
    for _, r := range batch {
        fmt.Fprintf(&sb,
            "RELATE user:⟨%s⟩ -> watched -> video:⟨%s⟩ SET weight=%f, region='%s', at='%s';\n",
            r.UserID, r.VideoID, r.Weight, r.Region, r.At.UTC().Format(time.RFC3339))
    }
    _, err := sdb.Query(sb.String(), nil)
    return err
}

Eight workers, 500 statements per batch, sustained throughput of about 14,000 edges per second on a single SurrealDB instance with NVMe storage. The full 180 million row backfill took four hours and eleven minutes. The string interpolation for IDs uses the SurrealDB record bracket syntax ⟨...⟩ because some of our user IDs contain colons and dashes that would otherwise need escaping.

I know what you are thinking and you are right: string interpolation into a query is a classic injection foot-gun. In production this code runs against IDs that have already been validated as UUIDs and SHA1 hashes upstream. If you are adapting this for fields that come from user input directly, use the parameterized form: RELATE $user -> watched -> $video SET ... with a separate parameters map. I went with interpolation in the backfill specifically because the IDs are trusted and the throughput gain from skipping parameter binding on 180M rows was substantial.

What I would do differently

Three things, in order of how much pain they caused.

First, I should have set up the per-region namespaces from day one instead of starting with a single global database and splitting later. The split took a weekend and a careful re-export. If you are starting fresh and you know your data has regional structure, model it in the database topology, not just in a region field.

Second, I underestimated how aggressively the similar_to edges needed to be pruned. My first version stored the top 200 neighbors per video. The graph ballooned, the traversals slowed down, and the recommendations got worse because the long tail of weak similarities drowned out the strong ones. Cutting to top 50 made the graph 75% smaller and the recommendations noticeably better. More edges is not more signal.

Third, I built a sophisticated cache invalidation system for the recommendation results and then realized it was pointless. The graph changes constantly — every watch event is a new edge — but the recommendations for a given user only need to update every few minutes for the experience to feel responsive. A flat 300-second TTL on the application-side cache is fine. The fancy invalidation logic is now deleted and I do not miss it.

Conclusion

The move to SurrealDB cut warm-path recommendation latency from 380ms to 25ms and cold-path from 2.1s to 240ms. More importantly, the queries are now expressions I can read in one breath, which means I can change them. The old SQL version was a fortress I was afraid to touch. The graph version invites experimentation: I have shipped four new recommendation strategies in the last two months because trying a new traversal takes an hour instead of a day.

SurrealDB is not the right tool for every problem. It is not where you put your source of truth if your source of truth is already happy in Postgres or SQLite. But for the recommendation graph specifically — the bit of your application that is genuinely about relationships between things rather than facts about things — moving it to a database that treats relationships as first-class is one of the highest leverage changes I have made to TrendVidStream in years.