Search Autocomplete Across Scripts and Diacritics
A user in Dubai types Arabic into the TrendVidStream search bar. Someone in Copenhagen types "nyhe" meaning to find "nyheder" (news). PostgreSQL's pg_trgm extension handles both cases with the same index.
Enable pg_trgm and unaccent
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS unaccent;
Indexing Multi-Script Titles
CREATE INDEX idx_videos_title_trgm ON videos USING GIN (title gin_trgm_ops);
-- Unaccent-aware index for CE and Nordic regions
CREATE INDEX idx_videos_title_unaccent_trgm
ON videos USING GIN (unaccent(title) gin_trgm_ops)
WHERE region IN ('CZ', 'FI', 'DK', 'BE', 'CH');
CREATE INDEX idx_videos_channel_trgm ON videos USING GIN (channel_title gin_trgm_ops);
Standard Autocomplete Query
SELECT video_id, title, channel_title, thumbnail_url, region,
word_similarity($1, title) AS wsim
FROM videos
WHERE region = $2 AND $1 <% title
ORDER BY wsim DESC, view_count DESC
LIMIT 8;
Diacritic-Insensitive Search for CE and Nordic Regions
Czech and Nordic users expect "zivot" to find "život" (life):
SELECT video_id, title, thumbnail_url, region,
word_similarity(unaccent($1), unaccent(title)) AS wsim
FROM videos
WHERE region = $2 AND unaccent($1) <% unaccent(title)
ORDER BY wsim DESC, view_count DESC
LIMIT 8;
Arabic Trigram Behaviour
Arabic shares character clusters differently from Latin scripts. Lower the threshold to 0.15 for better recall:
SET pg_trgm.word_similarity_threshold = 0.15;
SELECT video_id, title, word_similarity($1, title) AS wsim
FROM videos WHERE region = 'AE' AND $1 <% title
ORDER BY wsim DESC LIMIT 8;
Python FastAPI Endpoint with Per-Region Logic
from fastapi import FastAPI, Query
from pydantic import BaseModel
import asyncpg
app = FastAPI()
pool = None
THRESHOLDS = {'AE': 0.15, 'CZ': 0.20, 'FI': 0.20, 'DK': 0.20, 'default': 0.25}
UNACCENT_REGIONS = {'CZ', 'FI', 'DK', 'BE', 'CH'}
class AutocompleteItem(BaseModel):
video_id: str
title: str
channel: str
thumbnail: str
region: str
@app.get('/api/autocomplete', response_model=list[AutocompleteItem])
async def autocomplete(
q: str = Query(min_length=2, max_length=100),
region: str = Query(default='GB', regex='^[A-Z]{2}$'),
):
threshold = THRESHOLDS.get(region, THRESHOLDS['default'])
use_unaccent = region in UNACCENT_REGIONS
if use_unaccent:
sql = """
SELECT video_id, title, channel_title, thumbnail_url, region
FROM videos WHERE region = $2 AND unaccent($1) <% unaccent(title)
ORDER BY word_similarity(unaccent($1), unaccent(title)) DESC, view_count DESC
LIMIT 8
"""
else:
sql = """
SELECT video_id, title, channel_title, thumbnail_url, region
FROM videos WHERE region = $2 AND $1 <% title
ORDER BY word_similarity($1, title) DESC, view_count DESC LIMIT 8
"""
async with pool.acquire() as conn:
await conn.execute(f'SET pg_trgm.word_similarity_threshold = {threshold}')
rows = await conn.fetch(sql, q, region)
return [AutocompleteItem(video_id=r['video_id'], title=r['title'],
channel=r['channel_title'], thumbnail=r['thumbnail_url'], region=r['region'])
for r in rows]
Frontend: RTL-Aware Dropdown
function renderAutocomplete(results, isRTL = false) {
const list = document.getElementById('autocomplete-list');
list.innerHTML = '';
list.setAttribute('dir', isRTL ? 'rtl' : 'ltr');
results.forEach(item => {
const li = document.createElement('li');
li.innerHTML = `<img src="${item.thumbnail}" width="40" alt=""><span>${item.title}</span><small>${item.channel}</small>`;
li.onclick = () => navigate(`/watch/${item.video_id}`);
list.appendChild(li);
});
}
renderAutocomplete(results, currentRegion === 'AE');
On TrendVidStream, trigram autocomplete serves results in under 10ms for 5-40 character queries from European and Middle Eastern users.
This article is part of the Building TrendVidStream series. Check out TrendVidStream to see these techniques in action.
Top comments (0)