ahmet gedik

Posted on May 27

Building Multi-Hop Video Recommendations With SurrealDB Graph Queries

#surrealdb #php #database #performance

For about a year, the recommendation slot on every watch page at TopVideoHub was powered by a SQLite query I'm not proud of. Six joins, two correlated subqueries, a GROUP BY over co_watch counts, and a final ORDER BY mixing recency decay with region affinity. When the page cache was warm it didn't matter — LiteSpeed shipped the gzipped HTML in 12 ms. When it was cold and we had a trending video from Japan pulling in 60 concurrent first-time viewers across HK, TW, and SG, the same query took 380–700 ms and stacked PHP-FPM workers behind a single SQLite write lock.

The recommendation problem itself isn't unusual. Given a video the user is currently watching, find the next five videos most likely to keep them on the site, weighted by language, region, and (in our case) Asia-Pacific category overlap. What is unusual is doing it in a stack like ours — PHP 8.4 on LiteSpeed, SQLite with FTS5 and a CJK tokenizer for search, Cloudflare in front. We've intentionally avoided MySQL and Redis to keep the deploy surface small. The trade-off is that any time the access pattern wants a graph, you end up writing self-joins.

I spent two weekends moving the recommendation layer to SurrealDB, kept everything else on SQLite, and the p95 dropped from 410 ms to 28 ms. This post is the version of that work I wish I'd had when I started — with the SurrealQL queries that actually shipped, the failure modes that took a day to debug, and the PHP integration code that talks to SurrealDB over its HTTP endpoint.

Why my SQLite recommender hit a wall

Before the graph rewrite, the schema looked roughly like this:

videos(id, title, channel_id, region, language, category_id, published_at, view_count)
watch_sessions(session_hash, video_id, watched_at, region) — append-only, ~4M rows
channels(id, name, region, language)
categories(id, slug, parent_id)

To get "viewers who watched X also watched Y," I was doing a self-join on watch_sessions keyed by session_hash, then aggregating. SQLite handled it, but I had to keep adding indexes that bloated the file, and any query that wanted "videos watched by people who watched videos by channels in the same language family as the current channel" became a four-level CTE that the planner sometimes refused to optimize. SQLite's planner is good, but it's optimized for relational scans, not transitive closures.

The real breaking point was when I tried to add multi-hop traversal — "viewers who watched videos by channels similar to this channel." Two hops in, and the join exploded. I added a materialized co_watch_pairs table refreshed by cron, and now I had a stale-data problem on top of a slow-query problem.

What SurrealQL graph syntax actually buys you

SurrealDB is a multi-model database written in Rust. The pitch is that it's document, relational, and graph in one engine, with a query language (SurrealQL) that borrows SQL syntax but adds graph traversal operators -> and <-. You can run it embedded (RocksDB or SurrealKV backend) or as a server. I run it as a server on the same VM as PHP-FPM, talking over the HTTP /sql endpoint with bearer auth.

The operator that mattered for me is the arrow. SELECT ->watched->video AS next FROM session:abc123 reads as: from the session record, follow watched edges outward, and return the video records they point to. You can chain it: ->watched->video<-watched<-session->watched->video walks "videos watched by people who watched what I watched." In SQL that's three self-joins and a GROUP BY. In SurrealQL it's one expression and the planner handles deduplication.

A few things to internalize before writing any code:

Edges are first-class records. When you RELATE session:x->watched->video:y, you create an actual row in a watched table. You can attach properties to it (timestamp, watch duration, region) and filter on them like any other table.
IDs are typed. video:tt_5fJk2 is a record ID where video is the table and tt_5fJk2 is the key. ULIDs, UUIDs, or arbitrary strings all work.
SurrealQL is transactional per statement. No BEGIN / COMMIT for single queries; multi-statement transactions exist if you need them.
The HTTP /sql endpoint returns an array of results, one per statement. This trips up everyone the first time.

Modeling video watches as graph edges

The schema I shipped is intentionally small. SurrealDB lets you go schemaless, but I wanted enough structure to catch typos in ingestion. Here's the definition that runs in migrations/001_init.surql:

DEFINE TABLE video SCHEMAFULL;
DEFINE FIELD title     ON video TYPE string;
DEFINE FIELD region    ON video TYPE string ASSERT $value IN ['US','JP','KR','HK','TW','SG','VN','TH','CN','GB'];
DEFINE FIELD language  ON video TYPE string;
DEFINE FIELD category  ON video TYPE string;
DEFINE FIELD channel   ON video TYPE record<channel>;
DEFINE FIELD published ON video TYPE datetime;
DEFINE FIELD views     ON video TYPE int DEFAULT 0;
DEFINE INDEX video_region_lang ON video FIELDS region, language;

DEFINE TABLE channel SCHEMAFULL;
DEFINE FIELD name     ON channel TYPE string;
DEFINE FIELD region   ON channel TYPE string;
DEFINE FIELD language ON channel TYPE string;

DEFINE TABLE session SCHEMAFULL;
DEFINE FIELD region     ON session TYPE string;
DEFINE FIELD first_seen ON session TYPE datetime DEFAULT time::now();

DEFINE TABLE watched SCHEMAFULL TYPE RELATION FROM session TO video;
DEFINE FIELD at         ON watched TYPE datetime DEFAULT time::now();
DEFINE FIELD duration_s ON watched TYPE int DEFAULT 0;
DEFINE FIELD region     ON watched TYPE string;
DEFINE INDEX watched_at_idx ON watched FIELDS at;

A few choices I made and almost regretted:

I originally let region be a free string. Two weeks in, I had JP, jp, and Japan in production from three different ingestion paths. The ASSERT $value IN [...] clause is cheap and catches this at write time.
The watched table is defined as TYPE RELATION FROM session TO video. This restricts which records can be the endpoints. Skip it and SurrealDB will happily relate anything to anything, then your traversal queries silently return empty sets when a typo creeps in.
time::now() is the SurrealQL function for the current timestamp. There's also time::group() and time::floor(), which are useful for windowed aggregations.

Walking the graph for viewers who also watched

This is the query that replaced the six-join SQLite monstrosity. The idea: from the current video, walk backward to all sessions that watched it, then forward to other videos those sessions watched, then aggregate and rank by overlap.

LET $target = video:⟨tt_5fJk2⟩;
LET $window = time::now() - 30d;

SELECT
    out AS video,
    count() AS overlap
FROM watched
WHERE in IN (
    SELECT VALUE in FROM watched WHERE out = $target
)
  AND out != $target
  AND at > $window
GROUP BY out
ORDER BY overlap DESC
LIMIT 50;

A walkthrough, because this took me a while to get right:

LET $target = video:⟨tt_5fJk2⟩ binds a parameter. The angle brackets ⟨ ⟩ are SurrealQL's way of escaping record IDs that contain characters outside [a-zA-Z0-9_]. YouTube IDs frequently have hyphens, so I escape them as a habit.
The inner SELECT VALUE in FROM watched WHERE out = $target collects every session that watched the target video. VALUE flattens the result into a plain list of IDs instead of objects with an in field — without it, the outer IN check compares an object to an ID and silently returns nothing.
The outer WHERE in IN (...) filters edges where the session belongs to that set, and out != $target removes the current video from its own recommendation list. Forgetting that gives you a beautiful but useless top result.
GROUP BY out aggregates by the other-video endpoint; count() is the overlap signal.

On my dataset (~4.2M watched edges, ~280K videos, ~1.1M sessions), this query runs in 18–25 ms cold and 3–6 ms warm. The equivalent SQLite query was 280–410 ms cold.

Region-aware scoring with CJK affinity

The Asia-Pacific angle matters here because a user watching a Korean variety show probably doesn't want a Vietnamese cooking video next, even if the raw co-watch count is high — drive-by viewers from algorithmic recommendation can inflate weak signals. The shipped query weights overlap by language and region match using inline arithmetic; you'll see it embedded in the PHP recommendations method below. SurrealDB's IF / ELSE inside a SELECT is awkward to read but cheap to evaluate, and FETCH lets us inline channel records in a single round trip instead of N+1.

One thing I had to learn the hard way: math::mean() returns NONE on an empty list, which then breaks any arithmetic that consumes it. I coalesce with ?? directly inside the query rather than dealing with it in PHP.

Talking to SurrealDB from PHP 8.4

There's no official PHP SDK for SurrealDB. The HTTP /sql endpoint is straightforward — bearer token, POST a SurrealQL statement, get back JSON. I wrote a minimal client:

<?php
declare(strict_types=1);

final class SurrealClient
{
    public function __construct(
        private readonly string $endpoint,
        private readonly string $namespace,
        private readonly string $database,
        private readonly string $token,
    ) {}

    /**
     * @param array<string, mixed> $vars
     * @return list<array<string, mixed>>
     */
    public function query(string $sql, array $vars = []): array
    {
        $body = $sql;
        if ($vars !== []) {
            $decls = [];
            foreach ($vars as $name => $value) {
                $decls[] = sprintf(
                    'LET $%s = %s;',
                    $name,
                    json_encode($value, JSON_THROW_ON_ERROR | JSON_UNESCAPED_UNICODE),
                );
            }
            $body = implode("\n", $decls) . "\n" . $sql;
        }

        $ch = curl_init($this->endpoint . '/sql');
        curl_setopt_array($ch, [
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_POST           => true,
            CURLOPT_POSTFIELDS     => $body,
            CURLOPT_TIMEOUT        => 2,
            CURLOPT_HTTPHEADER     => [
                'Accept: application/json',
                'Content-Type: text/plain',
                'NS: ' . $this->namespace,
                'DB: ' . $this->database,
                'Authorization: Bearer ' . $this->token,
            ],
        ]);

        $raw    = curl_exec($ch);
        $status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);

        if ($status !== 200 || $raw === false) {
            throw new RuntimeException("SurrealDB query failed: HTTP {$status}");
        }

        $decoded = json_decode((string) $raw, true, flags: JSON_THROW_ON_ERROR);
        $last    = end($decoded);

        if (!is_array($last) || ($last['status'] ?? '') !== 'OK') {
            $detail = is_array($last) ? ($last['result'] ?? 'unknown') : 'unknown';
            throw new RuntimeException("SurrealDB statement error: {$detail}");
        }

        return $last['result'];
    }
}

Three details that aren't obvious from the docs:

The HTTP endpoint returns one result object per statement, each with its own status field. When you send LET $foo = ...; SELECT ...; you get back a two-element array. end($decoded) grabs the last one, which is what the caller actually wants.
I use text/plain for the content type. SurrealDB accepts application/json too, but then you have to wrap the SQL in a JSON envelope. Plain text is simpler and the request body is more grep-friendly in logs.
The 2-second timeout is intentional. If SurrealDB can't answer a recommendation query in 2 seconds, I'd rather fall back to "popular in your region" from SQLite than block the page render. The recommendation rail is below the fold; a blank slot beats a hung page.

Wiring it into the watch page

The watch page controller composes the SurrealDB call with our existing SQLite-backed video fetch and the LiteSpeed page cache layer. The relevant code path looks like this:

<?php
public function recommendations(string $videoId, string $region): array
{
    $cacheKey = "reco:{$videoId}:{$region}";
    $cached   = $this->cache->get($cacheKey);
    if ($cached !== null) {
        return $cached;
    }

    try {
        $rows = $this->surreal->query(
            <<<'SURQL'
            LET $tgt    = type::thing('video', $vid);
            LET $lang   = (SELECT VALUE language FROM ONLY $tgt);
            LET $reg    = (SELECT VALUE region   FROM ONLY $tgt);
            LET $window = time::now() - 30d;

            SELECT
                meta::id(out)        AS video_id,
                out.title            AS title,
                out.channel.name     AS channel_name,
                out.region           AS region,
                out.language         AS language,
                count()              AS overlap,
                (math::mean(duration_s) ?? 0) AS avg_duration,
                (count() * 1.0)
                    * (IF out.language = $lang THEN 1.5 ELSE 1.0 END)
                    * (IF out.region   = $reg  THEN 1.3 ELSE 1.0 END)
                    AS score
            FROM watched
            WHERE in IN (
                SELECT VALUE in FROM watched WHERE out = $tgt
            )
              AND out != $tgt
              AND at > $window
            GROUP BY out
            FETCH out, out.channel
            ORDER BY score DESC
            LIMIT 8;
            SURQL,
            ['vid' => $videoId],
        );
    } catch (RuntimeException $e) {
        error_log("surreal reco failed for {$videoId}: " . $e->getMessage());
        return $this->fallbackPopularInRegion($region);
    }

    $this->cache->set($cacheKey, $rows, ttl: 1800);
    return $rows;
}

The type::thing('video', $vid) function is SurrealDB's safe way to construct a record ID from a string. Don't string-concat the ID into the query — record IDs can contain characters that need escaping, and type::thing handles it. It also enforces table type, so if $vid doesn't exist in video, the comparison just returns false rather than throwing.

The 1800-second cache TTL (30 minutes) is shorter than our typical category cache because recommendations decay faster than category listings. For trending Asia-Pacific videos during a release window — a new K-pop MV, for example — recommendations shift hourly as the watch graph fills in. The cache invalidates implicitly by TTL; we don't try to push updates from the watch ingestion path.

Streaming watch events into the graph

Ingestion runs from the watch page itself, async. When a user clicks play, we fire a POST /events/watch request from JavaScript. The PHP handler queues it; a cron job every 60 seconds drains the queue into SurrealDB in batches:

<?php
public function flushWatchQueue(int $batchSize = 500): int
{
    $pending = $this->queue->take($batchSize);
    if ($pending === []) {
        return 0;
    }

    $statements = [];
    foreach ($pending as $event) {
        $session  = $this->escapeId($event['session_hash']);
        $video    = $this->escapeId($event['video_id']);
        $region   = $event['region'];
        $duration = (int) $event['duration_s'];

        $statements[] = sprintf(
            "UPSERT session:⟨%s⟩ SET region = '%s', first_seen = time::now();",
            $session, $region,
        );
        $statements[] = sprintf(
            "RELATE session:⟨%s⟩->watched->video:⟨%s⟩ SET region = '%s', duration_s = %d;",
            $session, $video, $region, $duration,
        );
    }

    $this->surreal->query(implode("\n", $statements));
    $this->queue->ack($pending);

    return count($pending);
}

private function escapeId(string $id): string
{
    return str_replace(['⟩', "\n", "\r", "'"], '', $id);
}

Batching matters: at our peak (~80 watch events/second across all four sites) one query per event would saturate the HTTP loop. Five hundred events per batch with sub-second latency keeps the queue empty under normal load and degrades gracefully under spikes — the queue grows, the cron catches up.

UPSERT is non-destructive. If the session already exists, the SET clause updates it; if not, it creates it. That avoids a separate existence check round trip.

The escapeId helper is a paranoid guard against malformed session hashes or video IDs that contain closing angle brackets, newlines, or single quotes. Session hashes are SHA-256 hex strings in our pipeline so they shouldn't, but the day they do, I'd rather get a stripped ID than a SurrealQL injection.

CJK considerations I didn't expect

Our SQLite FTS5 setup uses a custom CJK tokenizer — unicode61 can't segment Chinese or Japanese properly. SurrealDB is on a different code path (we don't do full-text search inside SurrealDB), but the recommendation results still flow through templates that render Chinese, Japanese, and Korean titles, and there are two gotchas:

Use JSON_UNESCAPED_UNICODE when serializing parameters. Without it, a title like "新海誠" becomes \u65b0\u6d77\u8aa0 in the SurrealQL request body. SurrealDB parses it fine, but log lines become unreadable when you're trying to figure out which query exploded.
Mind Cloudflare workers and charset headers. I had one weekend where a Worker layer down-converted UTF-8 to Latin-1 because of a missing header on the origin response. Recommendation titles showed up as ç¥žæµ· in the browser. The fix was on the PHP response, not on SurrealDB.

What I don't recommend SurrealDB for

This is a focused use case — graph traversal for recommendations. I'm not migrating the rest of the stack. Honest negatives:

The SurrealDB binary needs ~200 MB resident under load. On a low-tier VPS, that's real RAM you're spending alongside PHP-FPM and SQLite's page cache.
The query planner is young. I had a query that worked fine on 100K edges and OOM-ed at 5M because it expanded an intermediate result set. Adding LIMIT inside the subquery fixed it, but I shouldn't have had to.
The Rust and JS clients are solid; the PHP story is "write your own HTTP client." That's a 50-line file, but it's still a file you have to maintain.
WebSocket-based live queries are interesting, but I haven't found a use for them on a server-rendered PHP site.

If you're already running Neo4j or Memgraph happily, SurrealDB probably isn't a reason to switch. If you're like me — a small stack that just sprouted a graph-shaped problem — it's a clean way to add graph traversal without standing up a JVM.

Conclusion

The pattern that worked: keep SQLite as the source of truth for everything relational (videos, channels, categories, blog posts, search), and use SurrealDB as a specialized index for the one access pattern that didn't fit — multi-hop graph traversal for recommendations. Writes flow into both stores from the same queue. Reads are split by query shape.

The numbers that matter: p95 recommendation latency dropped from 410 ms to 28 ms, PHP-FPM worker time on watch pages dropped by ~30%, and the materialized co_watch_pairs table I was refreshing every 15 minutes is gone. The cron flush runs in under 200 ms per batch. Total infrastructure cost increase: one extra systemd service on the same VM.

The thing I'd tell past-me: don't try to do graph traversal in a relational database past two hops. The query plan reads fine in isolation, but the moment your dataset crosses a few million rows, the optimizer makes choices you don't expect, and you rebuild the same workaround everyone has built before — materialized denormalized tables refreshed by cron. A purpose-built graph engine is a fairer fight.

DEV Community