How I Built a Streaming Search Engine in a Week

#showdev

Ever googled "where can I watch [movie]?" and gotten a wall of SEO spam, affiliate links, and outdated info? Yeah, me too.

So I built WhereCanIWatch.tv — a free streaming availability search engine that actually works.

The Problem

There are 200+ streaming services now. Finding where a specific movie or show is available shouldn't require opening 6 tabs and checking each service manually. The existing solutions are either:

Paywalled
Covered in ads
Outdated within days
Missing half the services

The Stack

Next.js 16 with App Router and ISR (Incremental Static Regeneration)
Supabase (PostgreSQL) for the database
TMDB API for metadata, ratings, trailers, and streaming provider data
OMDb API for IMDb/Rotten Tomatoes/Metacritic scores
DigitalOcean droplet ($12/mo) with Coolify for container management
Cloudflare free tier for CDN, SSL, and DDoS protection
Traefik v3 as reverse proxy

Architecture Decisions

Three-Tier Rendering Strategy

Not every page deserves the same treatment:

Database has everything — 15,000+ titles with full metadata
Pages render on-demand via ISR — first request builds the page, then it's cached for 24h
Sitemap is gated — only the top 5,000 titles (by quality score) get into the sitemap

This keeps Google focused on our best content while the full catalog is still accessible.

Provider Data Pipeline

The biggest challenge was keeping streaming availability data fresh. Services add and remove titles constantly. Our pipeline runs on cron:

Smart ingest pulls new titles from TMDB (hourly)
Provider refresh pulls watch/providers data from TMDB every 3 hours
OMDb enrichment backfills IMDb/RT/MC ratings
Deep link refresh improves outbound links to streaming services

All running on the same $12/mo server. Total API costs: ~$1/mo (OMDb).

Image Optimization

Movie posters are the biggest performance bottleneck. TMDB serves multiple sizes (w342, w500, w780), so we use Next.js <Image> with responsive srcsets to serve the right size for each viewport. Combined with Cloudflare edge caching, this got our Lighthouse performance score to 88-90.

SEO Strategy

For a content site, SEO is everything. Some things that worked:

Structured data — Movie/TVSeries schema with ratings, cast, and BreadcrumbList
Dynamic titles — "Watch Grey's Anatomy (2005) Online - All Seasons on Hulu" (includes the primary streaming service)
Video indexing — YouTube trailer embeds with proper thumbnails got us 1,500+ videos indexed in Google
Quality gating — only pages with poster + overview + year + IMDb rating get into the sitemap
SearchAction schema — enables Google sitelinks searchbox

Within 5 days of launch, Google had indexed 3,200+ pages and we were getting organic traffic from 10+ countries.

What I'd Do Differently

Start with provider data, not metadata — We built a beautiful catalog but only had streaming availability for 1% of titles at launch. Should have prioritized the TMDB watch/providers API from day one.
Don't migrate hosting on day 4 — We moved from Vercel to self-hosted and accidentally noindex'd 98% of the site for 13 hours. Lesson: have a deployment checklist.
ISR cache invalidation matters — When you fix a bug, the fix doesn't go live until the ISR cache expires. We had to restart the container to flush stale cached pages.