Day 11: The Photo Booth AI Application - Real‑Time Filters, Spatial Intelligence & Subagents
What if you could build a full AR‑style photo booth - camera access, face detection, real‑time filters, capture, download, and QR sharing - all in a single day? And what if you didn’t have to build it alone?
That’s exactly what Day 11 challenged me to do.
Using goose subagents, I built a complete Fun House Photo Booth web app with festive filters, MediaPipe face tracking, mobile support, and a full capture pipeline. It feels like having a small engineering team working in parallel - because that’s exactly what subagents simulate.
Day 11: Photo Booth AI Application 📸
The Challenge: Build a Real-Time Filter App in One Day
The festival director wanted a magical selfie booth:
- Open on your phone
- See yourself with fun filters
- Filters track your face
- Switch between effects
- Capture the photo
- Download it
- Share it
This is where subagents shine.
Enter: The Fun House Photo Booth (Built with Subagents)
I split the work into specialized subagents - just like a real dev team:
Subagent 1 - Core App Builder
- Built the HTML/CSS/JS structure
- Implemented camera access
- Created the live video preview
- Added capture + download
- Made everything mobile‑responsive
Subagent 2 - Filter Engineer
- Integrated MediaPipe Face Landmarker
- Implemented 468‑point face mesh
- Built the real‑time filter system
- Anchored filters to specific landmarks
- Added filter switching
Optional Subagents I Added
- Stylist - polished the UI (FilterSense branding)
- Documentation Writer - created usage notes
- Performance Optimizer - ensured smooth tracking
Subagents let me parallelize the work and keep the build clean and modular.
Tech Stack
- goose Subagents task orchestration
- Sonnet 4.5 by Anthropic
- HTML/CSS/JS core app
- MediaPipe Face Landmarker local spatial intelligence
- Canvas API rendering filters + mesh
- SessionStorage storing captured images
- QR workflow for sharing
- Mobile‑first UI responsive layout
No backend. No server. Everything runs locally.
My Experience (From Camera to AR Filters)
I started by building a clean UI - a glowing camera icon, a “FilterSense” title, and an Enter button. Once inside, the app activates the camera, loads MediaPipe, and begins tracking the user’s face in real time.
Then the fun begins:
- Select a filter
- Watch it attach to your face
- Move, tilt, smile - it follows
- Capture the moment
- Download or share
The entire experience feels like a lightweight AR app running directly in the browser.
What My Application Does
- Opens the camera instantly
- Tracks the face using MediaPipe
- Renders a 468‑point mesh
- Applies filters anchored to landmarks
- Lets users switch filters
- Captures a clean photo
- Stores it safely
- Redirects to an export page
- Supports download + QR sharing
- Works smoothly on mobile
It’s a complete photo booth system.
Spatial Intelligence (MediaPipe Face Landmarker)
One of the most advanced parts of this build is the spatial intelligence. Instead of sending video frames to a server, the entire face‑tracking pipeline runs in the browser using MediaPipe’s Face Landmarker.
Why this matters
- Real‑time performance
- Low latency
- Offline capability
- Privacy‑preserving
- No external compute required
How it works
I load the FaceLandmarker and FilesetResolver modules, which return:
- 468 face landmarks
- 3D positional data
- Stable tracking across movement
- Mesh topology
- Mesh can be removed at any time These landmarks drive the entire filter system.
Mesh Rendering
I implemented a full tessellation renderer using the MediaPipe FACEMESH_TESSELATION array. It draws:
- glowing neon nodes
- connecting edges
- animated mesh movement
This visualizes the underlying AI in real time.
Filter Anchoring
Each filter is mapped to a specific landmark:
'Crown': { landmark: 10, offsetY: -60 }
'Beard': { landmark: 152, offsetY: 40 }
'Reindeer Eyelashes': { landmark: 159, offsetY: -10 }
This ensures perfect alignment as the user moves.
Clean Capture Pipeline
To avoid tainted canvases, I built a safe capture flow:
- Create a fresh canvas
- Draw the video frame
- Draw only the mesh (no external PNGs)
- Export as PNG
- Store in sessionStorage
- Redirect to export page
This guarantees consistent captures across browsers.
Technical Highlights
The app uses structured subagents to divide responsibilities cleanly. The Core App Builder handles UI, camera access, capture, and mobile responsiveness. The Filter Engineer manages MediaPipe initialization, mesh rendering, and filter anchoring. The system uses a clean canvas pipeline to avoid CORS issues and ensures safe PNG export.
Spatial intelligence runs entirely on‑device, enabling real‑time AR effects without external compute. Filters follow the user’s face with sub‑pixel accuracy thanks to landmark‑driven positioning. The UI is fully responsive, and the workflow supports capture, download, and QR‑based sharing.
Insights
- Subagents feel like having a real dev team although it was myself, my code and my design
- MediaPipe’s local inference is incredibly powerful
- Clean capture pipelines matter
- Spatial intelligence unlocks AR‑level experiences
- Declarative workflows scale beautifully
- Mobile‑first design is essential for real‑world use
Powered By
- goose by Block powered by Sonnet 4.5 by Anthropic
- MediaPipe by Google
- HTML/CSS/JS
- My own design + engineering workflow
My Final Thoughts
This was one of the most fun builds so far. Using subagents, I created a full AR‑style photo booth with real‑time filters, spatial intelligence, and a polished UI all running locally in the browser. The combination of MediaPipe, canvas rendering, and goose orchestration made it possible to build something that feels magical.
Day 11: Solved Filter Sense Photo Booth: Delivered. Festival magic: Activated.
This post is part of my Advent of AI journey AI Engineering: Advent of AI with goose Day 11.
Follow along for more AI adventures with Eri!



Top comments (0)