This is a submission for the Google AI Studio Multimodal Challenge
What I Built
AI Personal Trainer is an experimental fitness app with a "voice-first" approach that turns your smartphone into an interactive workout partner. The app is primarily controlled by voice commands, allowing you to focus on exercises rather than the screen.
The problem I'm exploring:
- Distractions during workouts: The need to constantly interact with the phone screen
- Lack of personalization: Most apps offer one-size-fits-all solutions
- Passive interaction: Apps work as trackers rather than assistants
Implemented features:
- ๐ค Voice program creation: Dialog with AI to create personalized workout programs
- ๐ง Real-time audio interaction: Two-way voice communication during workouts
- ๐ Comprehensive database system: System for storing programs, sessions, and progress
- ๐ Analytics dashboard: Visual progress tracking and performance insights
- ๐ Google Calendar integration: Automatic addition of workouts to calendar
- ๐ฏ Hybrid architecture: Combining dialog speed with analysis accuracy
Demo
โก Live Applet
๐ฑ View in AI Studio
๐ป GitHub Repository:ai-personal-trainer
How I Used Google AI Studio
Development started directly in Google AI Studio, where I experimented with different approaches to multimodal interaction.
Development process:
- Prototyping in Google AI Studio: Creating user interface and initial system setup
- Export and development: Downloaded the project for local development
- Extended development: Used Gemini CLI to integrate complex functions
- Final deployment: Uploaded the finished project and used Deploy App
Two-model architecture:
Main model: Real-time dialog
// Connection to live audio dialog
sessionRef.current = await clientRef.current.live.connect({
model: 'gemini-2.5-flash-preview-native-audio-dialog',
callbacks: {
onopen: () => setConnectionStatus('connected'),
onmessage: async (message) => {
// Processing user speech
if (message.serverContent?.inputTranscription) {
const userText = message.serverContent.inputTranscription.text;
onTranscript(userText);
}
// Playing AI response
const audio = message.serverContent?.modelTurn?.parts[0]?.inlineData;
if (audio && outputAudioContextRef.current) {
await playAudioResponse(audio);
}
}
},
config: {
systemInstruction: createDynamicPrompt(),
responseModalities: [Modality.AUDIO],
outputAudioTranscription: {}, // <--- Enable LLM transcription
inputAudioTranscription: {}, // <--- Enable user transcription
speechConfig: {
voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Orus' } }
}
}
});
Analytics model: Data extraction
// Precise interpretation of user commands
export const interpretWorkoutCommand = async (transcript: string): Promise<{ command: 'log_set' | 'get_form_tip' | 'chat_message', data: { reps?: number, weight?: number, text?: string } | null }> => {
const prompt = `You are an AI assistant interpreting voice commands from a user during a workout. The user's voice transcript is: "${transcript}".
Your task is to analyze the transcript and classify it into one of the following commands, extracting relevant data.
POSSIBLE COMMANDS:
1. 'log_set': The user is reporting the completion of a set. They might mention repetitions (reps) and/or weight.
- Keywords: "done", "finished", "log it", "reps", "weight", "kilos", numbers.
- Example Transcripts: "Okay, 12 reps at 50 kilos", "I'm done", "8 reps", "log 90 pounds".
2. 'get_form_tip': The user is asking for advice on their exercise form.
- Keywords: "form", "technique", "how do I do this", "am I doing it right".
- Example Transcripts: "check my form", "what's the technique for this".
3. 'chat_message': The user is saying something else, likely a question or comment for the AI coach. This is the default if no other command fits.
- Example Transcripts: "how many sets left", "I'm feeling tired", "what's the next exercise".
Respond in JSON format with "command" and optional "data".
- For 'log_set', 'data' should be an object with optional 'reps' and 'weight' numbers.
- For 'get_form_tip', 'data' should be null.
- For 'chat_message', 'data' should be an object with the original transcript as 'text'.
Return ONLY the JSON object.
Example Responses:
- Transcript: "10 reps at 80 kg" -> { "command": "log_set", "data": { "reps": 10, "weight": 80 } }
- Transcript: "how do I do this right?" -> { "command": "get_form_tip", "data": null }
- Transcript: "what's the next exercise?" -> { "command": "chat_message", "data": { "text": "what's the next exercise?" } }
`;
try {
const result = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: prompt,
...
Multimodal Capabilities
1. Seamless audio interaction
Implemented: using client.live.connect from Gemini Live API SDK connection with continuous bidirectional streaming
Uniqueness: Works like a phone conversation โ you can interrupt and get instant responses
2. Hybrid command processing
Architecture:
- Dialog Model: Maintains natural conversation
- Analysis Model: Extracts precise data from speech
Processing example:
User: "Did eight reps with sixty kilos, felt pretty easy"
Dialog Model โ "Great! Logged 8 reps with 60 kg. Should we increase the weight?"
Analysis Model โ
{
"command": "log_set",
"data": {
"reps": 8,
"weight": 60
}
}
3. Full-featured data system
Implemented architecture:
Workout programs (/programs/{programId}
):
{
"name": "Strength program, 12 weeks",
"createdBy": "userId",
"workouts": {
"day1": {
"dayName": "Chest and triceps",
"exercises": [
{
"exerciseId": "bench_press",
"name": "Barbell bench press",
"sets": [{"reps": 8, "weight": 60}],
"rest": 120
}
]
}
}
}
Detailed training sessions (/sessions/{sessionId}
):
{
"userId": "user123",
"date": "2024-01-15T10:00:00Z",
"programId": "strength_program_001",
"workoutId": "day1_chest",
"duration": 5400, // seconds
"voiceTranscript": "Complete log of conversation with AI...",
"performedSets": {
"set001": {
"exerciseId": "bench_press",
"setNumber": 1,
"reps": 8,
"weight": 62.5,
"timestamp": "2024-01-15T10:15:30Z"
}
}
}
4. Automatic calendar integration
Implemented: Direct integration with Google Calendar API
export const scheduleWorkouts = async (workouts: Workout[], accessToken: string): Promise<void> => {
if (!workouts || workouts.length === 0) {
throw new Error("No workouts to schedule.");
}
const schedulePromises = workouts.map(workout => {
const startTime = getNextWorkoutDate(workout.dayOfWeek);
const endTime = new Date(startTime.getTime() + 60 * 60 * 1000); // Assume 1-hour duration
const event = {
'summary': `Workout: ${workout.dayName}`,
'description': `Your scheduled workout session.\n\nExercises:\n- ${workout.exercises.map(e => e.name).join('\n- ')}`,
'start': {
'dateTime': startTime.toISOString(),
'timeZone': Intl.DateTimeFormat().resolvedOptions().timeZone,
},
'end': {
'dateTime': endTime.toISOString(),
'timeZone': Intl.DateTimeFormat().resolvedOptions().timeZone,
},
};
return fetch('https://www.googleapis.com/calendar/v3/calendars/primary/events', {
method: 'POST',
headers: {
'Authorization': `Bearer ${accessToken}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(event),
});
});
await Promise.all(schedulePromises);
};
5. Contextual understanding of fitness terminology
AI understands specific vocabulary:
- Recognition of data for set logging (
log_set
): AI searches user speech for numbers and keywords (such as "reps", "times", "weight", "pounds") to automatically fill in data about completed sets. - Processing requests and comments (
get_form_tip
,chat_message
): Phrases that don't contain direct data for logging are processed as trainer requests or simple comments.
6. Comprehensive Analytics Dashboard
Implemented: A dedicated analytics section that provides detailed insights into workout performance and progress tracking.
Features include:
- Visual progress charts and graphs
- Historical workout data visualization
- Performance metrics and trends analysis
Current MVP Limitations
Main challenges:
Speech recognition accuracy: AI doesn't always correctly interpret commands in live dialog, especially with background noise
Command execution: The model sometimes "forgets" to execute specific actions in the app after responding
Why Multimodality Matters for Fitness
Traditional fitness apps force you to choose: EITHER data tracking OR workout focus. The multimodal approach solves this dilemma:
- Voice interface allows you to stay focused on exercises
- Intelligent speech analysis structures data automatically
- Real-time feedback creates the feeling of a personal trainer
- Automatic workout scheduling integrates fitness into daily life
The result is a fitness companion that understands natural speech and adapts to each user's unique style.
Note: Despite my AI trainer being quite smart and motivating, we strongly recommend maintaining common sense, especially when it comes to health matters. ๐ช
Acknowledgments
I express deep gratitude to the organizers of the Google AI Studio Multimodal Challenge for the unique opportunity to experiment with cutting-edge artificial intelligence technologies.
Special thanks to:
- The Google AI Studio team for the intuitive platform that makes complex technologies accessible
- Gemini Live Audio API developers for the revolutionary real-time voice interaction technology
- The Dev.to community for providing an inspiring platform for innovative projects
This project was made possible by an ecosystem of cutting-edge tools and the supportive developer community that Google AI creates.
MVP developed with React, TypeScript, Firebase, Google Calendar API and Google Gemini multimodal capabilities
Built with โค๏ธ by Premananda
Top comments (1)
๐ค Partnership offer: 20% prize share for demo video creation
My hands-free AI fitness applet with Gemini Live API needs a professional demo video.
Ready to share 20% of any prize money for quality video production.
Project highlights real-time voice interaction and hands-free workout logging.
DM if interested!