This is a submission for the Google AI Studio Multimodal Challenge
What I Built
The Actor's Dojo is a sophisticated, AI-powered web application designed to be a personal acting coach for actors of all levels. It solves a critical problem for actors: the difficulty of receiving objective, specific, and immediate feedback on their work.
The app creates a powerful, private rehearsal space where an actor can upload a video of their performance. They can upload a monologue, a scene, or an audition tape. Then, they'll select a specific acting methodology (like Stanislavski's System, the Meisner Technique, or a general analysis) to serve as an analytical "lens." The AI, powered by the Gemini API, conducts a deep analysis of the performance through this lens, providing a comprehensive report that includes quantitative scores, qualitative critiques, and actionable advice. This allows actors to refine their craft with the precision of a world-class coach, available 24/7.
Demo
Deployed Applet:
https://the-actor-s-dojo-755083487039.us-west1.run.app
Screenshots & Walkthrough:
Step 1: Setup & Upload - The user is presented with a clean interface to (1) choose their analytical lens from a list of famous acting techniques, (2) upload their performance video, and (3) describe which actor in the scene to focus on.
Step 2: Analysis Results - After processing, the app displays a rich, detailed report. Key features include an overall score, a "Coach's Summary" with the single most important takeaway, a radar chart visualizing scores across different metrics, and a list of detailed critiques.
Video: Interactive Feedback in Action
A short video demonstrating the full user flow. It shows a user uploading a scene, receiving the analysis, and then clicking on a "Key Moments Timeline" marker (e.g., [00:32] - Improvement). The video player instantly jumps to the 32-second mark, creating a seamless and effective learning loop.
➡️ Watch the Full Video Demo Here
How I Used Google AI Studio
I leveraged Google AI Studio and the Gemini API as the intelligent core of The Actor's Dojo. The application's primary functionality is built around the gemini-2.5-flash
model's powerful multimodal capabilities.
My implementation uses the @google/genai
SDK to send a composite prompt to the model. This prompt consists of two distinct parts:
- A video file: The user's uploaded performance.
- A text prompt: A detailed set of instructions that tells the model to act as an expert acting coach, specifies which acting technique to use for the analysis, and describes which actor to focus on.
Furthermore, I configured the API call to require a structured JSON output by defining a strict response schema. This ensures the data returned from the model is consistent, reliable, and can be easily parsed and rendered into the various UI components of the analysis report, such as the radar chart, scorecards, and interactive timeline.
Multimodal Features
The key multimodal feature of The Actor's Dojo is its ability to perform a context-aware video and text analysis to generate interactive, time-stamped feedback.
Functionality: When you upload a video and provide text instructions (e.g., "Analyze the actor in the red shirt using the Meisner Technique"), the Gemini model processes the visual and auditory data from the video in conjunction with the text-based context. It doesn't just "watch" the video; it understands who to watch and what to look for. The model then identifies specific, crucial moments in the performance and returns them as an array of timelineMarkers
, each with a timestamp (e.g., "01:15"), a feedback type ("kudos" or "improvement"), and a concise comment.
User Experience Enhancement: This multimodal approach transforms the user experience from a passive reading of a report into an active and interactive learning session.
-
Unprecedented Specificity: Instead of generic feedback like "show more emotion," the AI can provide highly specific, actionable notes tied to a precise moment, such as:
[00:45] - Improvement: This emotional peak felt slightly forced. Try connecting to a specific 'Emotional Recall' to ground this moment in authentic feeling.
Seamless Feedback Loop: The UI makes these timeline markers clickable. When an actor clicks on a piece of feedback, the video player automatically seeks to that exact timestamp. This allows the performer to instantly review their choice in that moment and fully grasp the AI coach's note, dramatically accelerating the learning process. This direct, contextual link between feedback and performance is only possible through a deeply integrated multimodal system.
Top comments (0)