DEV Community

Cover image for From Prototype to Production: Building a Multimodal Video Search Engine
Jason Peterson
Jason Peterson

Posted on

From Prototype to Production: Building a Multimodal Video Search Engine

In my last post, I wrote about the unreasonable effectiveness of model stacking for media search—combining CLIP, Whisper, and ArcFace to find video content through visual descriptions, dialog, and faces. Over the holidays I expanded that afternoon hack into something more production-like.

Live demo: fennec.jasongpeterson.com
Starter code: github.com/JasonMakes801/fennec-search

Try This

  1. Go to fennec.jasongpeterson.com (desktop browser)
  2. Enter older man on phone, harbor background in Visual Content → click +
  3. Click the face of the older guy with glasses sitting with the harbor at his back
  4. Enter the Americans had launched their missiles in Dialog (Semantic mode) → click +
  5. Play the clip

You've drilled down to an exact shot without metadata, timecodes, or remembering exact words. The semantic search is fuzzy—he actually says "What it was telling him was that the US had launched their ICBMs," but that's close enough.

Search result showing the scene

What's Under the Hood

  • Containerized architecture: Vue/Nginx frontend, FastAPI backend, standalone ingest worker, Postgres+pgvector—all via docker-compose
  • Background enrichment: Polling-based worker that handles drive mounting/unmounting gracefully (Watchdog doesn't work reliably with NFS/network shares)
  • Semantic dialog search: Sentence-transformer embeddings so "Americans launched missiles" finds "US fired rockets"
  • Frame-accurate playback: HTML5 video decode to canvas using requestVideoFrameCallback()
  • EDL export: Queue scenes and export CMX 3600 for NLE roundtrip

The Postgres + pgvector setup turned out cleaner than expected—vector similarity combined with metadata filtering in a single query just works.

Links


Demo footage from Pioneer One, a Creative Commons-licensed Canadian drama. Built with significant help from Claude Code.

Top comments (0)