DEV Community

Tomson Lee
Tomson Lee

Posted on

How I Built a Chrome Extension to Automate Grok AI Video Generation

If you've used Grok AI for image and video generation, you've probably hit the same frustrations I did — no batch download, no way to queue multiple video generations, and no tools to manage your growing media library.

So I built GrokMediaDownloader, a Chrome extension that turns Grok into a proper media production pipeline.

The Problem

Grok's image and video generation is impressive, but the platform gives you almost zero workflow tooling:

  • No batch operations — downloading 50 favorites means 50 individual clicks
  • No video queuing — you sit there babysitting each generation
  • No story workflow — chaining multiple video clips into a sequence is entirely manual
  • No organization — everything dumps into a flat favorites list

I kept thinking "this should be an extension," and eventually I just built it.

What It Does

Video Gen Queue — Automated Batch Generation

This was the feature that started it all. Instead of manually clicking "generate" one at a time, the Video Gen Queue lets you:

  1. Browse your Grok favorites and batch-add items to a queue
  2. Configure generation settings (duration, resolution, style mode)
  3. Hit start — it processes the entire queue automatically
  4. Walk away and come back to finished results

The tricky part was rate limit detection. Grok throttles generation requests, so the extension uses a Fetch Interceptor to detect rate limits and auto-pause/resume. There's also a 90-second timeout fallback for edge cases where the API response hangs.

// Simplified flow
Browse Favorites → Add to Queue → Configure Settings → Start → Collect Results
Enter fullscreen mode Exit fullscreen mode

4 generation modes are supported: Custom, Normal, Fun, and Spicy — matching Grok's built-in options.

Story Mode — Visual Storyboard for Video Sequences

Story Mode is where things get interesting from a technical perspective. It's a visual storyboard where you can:

  • Drag-and-drop clips to arrange your sequence
  • Auto-extract first/last frames from each clip
  • Send extracted frames back to Grok for continuation generation (so each new clip picks up where the last one ended)
  • Merge everything locally using WebAssembly FFmpeg

That last point is key — all video processing happens in your browser. No uploads to any server. I used FFmpeg compiled to WebAssembly (FFmpeg.js/WASM) to handle the merging entirely client-side.

The workspace lives in Chrome's Side Panel API, so it stays open while you browse Grok — no popup that disappears when you click away.

Storage: Breaking Chrome's 10MB Limit

Chrome extensions have a hard 10MB limit on chrome.storage. That's nothing when you're storing video metadata, frame extractions, and project data.

The solution was IndexedDB. It's a browser-native database with effectively unlimited storage, and it works perfectly in extension contexts. All Story Mode data, project folders, and queue state live in IndexedDB.

Other Features Worth Mentioning

  • Stream Capture — a real-time DOM observer that auto-captures images as you scroll through Grok. It watches for new image nodes and queues them for batch download.
  • HD Video Upgrade — auto-scans your favorites for 480p videos and sends upgrade requests to get 720p versions.
  • Project-Based Download — organizes downloads into folders by date and prompt, with metadata JSON export.
  • Date & History Filter — filter by date range with automatic deduplication against download history.

Architecture Decisions

A few technical choices that shaped the extension:

Pure client-side processing. No backend server, no cloud uploads. Everything runs in the browser. This was a deliberate privacy decision — users shouldn't have to trust a third party with their generated content.

Manifest V3. Chrome is deprecating Manifest V2, so I built on V3 from the start. The Side Panel API (used for Story Mode) is V3-only anyway.

No framework for the marketing site. The website is pure static HTML/CSS/JS with Tailwind via CDN. No build step, no bundler. It supports 4 languages (English, Traditional Chinese, Japanese, Korean) with hardcoded body text per language — no JS dependency for translated content, which keeps it SEO-friendly.

Pricing

The extension has a generous free tier that covers basic downloads and 50 Video Gen Queue uses. The Pro license is a one-time $4.99 payment (not a subscription) that unlocks everything — Story Mode, unlimited queue, Stream Capture, HD upgrades, and all future features.

I went with one-time pricing because subscription fatigue is real, and this is a tool, not a service.

What I Learned

  1. WebAssembly FFmpeg is production-ready for browser video processing. The compile size is large (~25MB), but for an extension that handles video merging, it's worth it.

  2. IndexedDB is underrated for extension storage. Most tutorials still show chrome.storage.local, but IndexedDB handles complex data and large payloads much better.

  3. The Side Panel API is great for persistent UI. Unlike popups or new tabs, the side panel stays open while the user interacts with the host page — perfect for tools that work alongside a web app.

  4. Rate limit handling makes or breaks automation tools. Without proper detection and backoff, the Video Gen Queue would be useless. The Fetch Interceptor approach (running in the MAIN world to intercept actual API responses) was more reliable than guessing from HTTP status codes.

Try It Out

If you use Grok for image or video generation and want to stop babysitting the process, give it a try. Happy to answer questions in the comments.

Top comments (0)