This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built an AI Fitness and Form Coach, a personal trainer app that uses multimodal AI to ensure users have proper form during exercises. It solves the problem of unsupervised training, which often leads to injury or ineffective workouts. The app provides a truly interactive experience with two modes of operation: a real-time live webcam session for immediate coaching, and an upload mode for a pre-recorded video. By analyzing the user's form and listening to their breathing, the app provides personalized, voice-based feedback, effectively bringing the experience of a personal coach to the user's home. In addition, after the real-time feeback, it provides a detailed post-workout analysis.
Demo
Deployed Applet: https://ai-fitness-and-form-coach-506537616410.us-west1.run.app/
Video Demo:
Screenshots:
How I Used Google AI Studio
I used Google AI Studio as the central hub for this project, from prototyping to deployment. It allowed me to rapidly develop the application's frontend and integrate it with the Gemini API. The ability to define a detailed System Instruction was crucial for shaping the AI's persona as an expert, encouraging, and safe personal trainer.
The most critical feature of Google AI Studio for this project was the seamless Cloud Run deployment. This enabled me to transform my prototype into a live, scalable web service that can handle video and audio streaming, without the complexity of managing a separate backend infrastructure.
Multimodal Features
My app is a robust demonstration of Gemini's multimodal capabilities, leveraging a sophisticated combination of three key features to create an innovative and practical user experience:
Real-time Video Analysis (Gemini 2.5 Pro/Flash): The app uses the webcam to capture a continuous video stream of the user's workout. The AI analyzes this visual data in real time, identifying key body parts and assessing joint angles and posture to determine the quality of the user's form. This core functionality is what enables the immediate, on-the-spot coaching.
Audio Understanding (Live API): The app uses the device's microphone to listen to the user during their workout. The Live API processes the audio stream, understanding the rhythm of their breathing, grunts of exertion, or even a verbal request. This provides an additional layer of data to the AI's analysis, allowing it to give more informed and empathetic feedback.
Real-time Voice Feedback: Based on the AI's multimodal analysis, the app synthesizes a human-like voice to provide real-time coaching. This instant, hands-free, and interactive feedback loop is a core innovation that enhances the user experience by providing a dynamic and responsive coaching session that feels like a human interaction.
Top comments (0)