Prema Ananda

Posted on Sep 14 • Edited on Oct 15

Building a Hands-Free AI Fitness Applet with Gemini Live API

#devchallenge #googleaichallenge #ai #gemini

Google AI Challenge Submission

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

AI Personal Trainer is an experimental fitness app with a "voice-first" approach that turns your smartphone into an interactive workout partner. The app is primarily controlled by voice commands, allowing you to focus on exercises rather than the screen.

The problem I'm exploring:

Distractions during workouts: The need to constantly interact with the phone screen
Lack of personalization: Most apps offer one-size-fits-all solutions
Passive interaction: Apps work as trackers rather than assistants

Implemented features:

🎤 Voice program creation: Dialog with AI to create personalized workout programs
🧠 Real-time audio interaction: Two-way voice communication during workouts
📊 Comprehensive database system: System for storing programs, sessions, and progress
📈 Analytics dashboard: Visual progress tracking and performance insights
📅 Google Calendar integration: Automatic addition of workouts to calendar
🎯 Hybrid architecture: Combining dialog speed with analysis accuracy

Demo

📱 View in AI Studio

💻 GitHub Repository:ai-personal-trainer

Thanks @aquascript-team for help with the video!

How I Used Google AI Studio

Development started directly in Google AI Studio, where I experimented with different approaches to multimodal interaction.

Development process:

Prototyping in Google AI Studio: Creating user interface and initial system setup
Export and development: Downloaded the project for local development
Extended development: Used Gemini CLI to integrate complex functions
Final deployment: Uploaded the finished project and used Deploy App

Two-model architecture:

Main model: Real-time dialog

// Connection to live audio dialog
sessionRef.current = await clientRef.current.live.connect({
  model: 'gemini-2.5-flash-preview-native-audio-dialog',
  callbacks: {
    onopen: () => setConnectionStatus('connected'),
    onmessage: async (message) => {
      // Processing user speech
      if (message.serverContent?.inputTranscription) {
        const userText = message.serverContent.inputTranscription.text;
        onTranscript(userText);
      }

      // Playing AI response  
      const audio = message.serverContent?.modelTurn?.parts[0]?.inlineData;
      if (audio && outputAudioContextRef.current) {
        await playAudioResponse(audio);
      }
    }
  },
  config: {
    systemInstruction: createDynamicPrompt(),
    responseModalities: [Modality.AUDIO],
    outputAudioTranscription: {}, // <--- Enable LLM transcription
    inputAudioTranscription: {}, // <--- Enable user transcription
    speechConfig: {
      voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Orus' } }
    }
  }
});

Analytics model: Data extraction

// Precise interpretation of user commands
export const interpretWorkoutCommand = async (transcript: string): Promise<{ command: 'log_set' | 'get_form_tip' | 'chat_message', data: { reps?: number, weight?: number, text?: string } | null }> => {
    const prompt = `You are an AI assistant interpreting voice commands from a user during a workout. The user's voice transcript is: "${transcript}".

    Your task is to analyze the transcript and classify it into one of the following commands, extracting relevant data.

    POSSIBLE COMMANDS:
    1. 'log_set': The user is reporting the completion of a set. They might mention repetitions (reps) and/or weight.
       - Keywords: "done", "finished", "log it", "reps", "weight", "kilos", numbers.
       - Example Transcripts: "Okay, 12 reps at 50 kilos", "I'm done", "8 reps", "log 90 pounds".
    2. 'get_form_tip': The user is asking for advice on their exercise form.
       - Keywords: "form", "technique", "how do I do this", "am I doing it right".
       - Example Transcripts: "check my form", "what's the technique for this".
    3. 'chat_message': The user is saying something else, likely a question or comment for the AI coach. This is the default if no other command fits.
       - Example Transcripts: "how many sets left", "I'm feeling tired", "what's the next exercise".

    Respond in JSON format with "command" and optional "data".
    - For 'log_set', 'data' should be an object with optional 'reps' and 'weight' numbers.
    - For 'get_form_tip', 'data' should be null.
    - For 'chat_message', 'data' should be an object with the original transcript as 'text'.

    Return ONLY the JSON object.

    Example Responses:
    - Transcript: "10 reps at 80 kg" -> { "command": "log_set", "data": { "reps": 10, "weight": 80 } }
    - Transcript: "how do I do this right?" -> { "command": "get_form_tip", "data": null }
    - Transcript: "what's the next exercise?" -> { "command": "chat_message", "data": { "text": "what's the next exercise?" } }
    `;

    try {
        const result = await ai.models.generateContent({
            model: "gemini-2.5-flash",
            contents: prompt,
...

Multimodal Capabilities

1. Seamless audio interaction

Implemented: using client.live.connect from Gemini Live API SDK connection with continuous bidirectional streaming
Uniqueness: Works like a phone conversation — you can interrupt and get instant responses

2. Hybrid command processing

Architecture:

Dialog Model: Maintains natural conversation
Analysis Model: Extracts precise data from speech

Processing example:

User: "Did eight reps with sixty kilos, felt pretty easy"

Dialog Model → "Great! Logged 8 reps with 60 kg. Should we increase the weight?"
Analysis Model → 
  {
    "command": "log_set",
    "data": {
      "reps": 8,
      "weight": 60
    }
  }

3. Full-featured data system

Implemented architecture:

Workout programs (/programs/{programId}):

{
  "name": "Strength program, 12 weeks",
  "createdBy": "userId",
  "workouts": {
    "day1": {
      "dayName": "Chest and triceps",
      "exercises": [
        {
          "exerciseId": "bench_press",
          "name": "Barbell bench press",
          "sets": [{"reps": 8, "weight": 60}],
          "rest": 120
        }
      ]
    }
  }
}

Detailed training sessions (/sessions/{sessionId}):

{
  "userId": "user123",
  "date": "2024-01-15T10:00:00Z",
  "programId": "strength_program_001",
  "workoutId": "day1_chest",
  "duration": 5400, // seconds
  "voiceTranscript": "Complete log of conversation with AI...",
  "performedSets": {
    "set001": {
      "exerciseId": "bench_press",
      "setNumber": 1,
      "reps": 8,
      "weight": 62.5,
      "timestamp": "2024-01-15T10:15:30Z"
    }
  }
}

4. Automatic calendar integration

Implemented: Direct integration with Google Calendar API

export const scheduleWorkouts = async (workouts: Workout[], accessToken: string): Promise<void> => {
    if (!workouts || workouts.length === 0) {
        throw new Error("No workouts to schedule.");
    }

    const schedulePromises = workouts.map(workout => {
        const startTime = getNextWorkoutDate(workout.dayOfWeek);
        const endTime = new Date(startTime.getTime() + 60 * 60 * 1000); // Assume 1-hour duration

        const event = {
            'summary': `Workout: ${workout.dayName}`,
            'description': `Your scheduled workout session.\n\nExercises:\n- ${workout.exercises.map(e => e.name).join('\n- ')}`,
            'start': {
                'dateTime': startTime.toISOString(),
                'timeZone': Intl.DateTimeFormat().resolvedOptions().timeZone,
            },
            'end': {
                'dateTime': endTime.toISOString(),
                'timeZone': Intl.DateTimeFormat().resolvedOptions().timeZone,
            },
        };

        return fetch('https://www.googleapis.com/calendar/v3/calendars/primary/events', {
            method: 'POST',
            headers: {
                'Authorization': `Bearer ${accessToken}`,
                'Content-Type': 'application/json',
            },
            body: JSON.stringify(event),
        });
    });

    await Promise.all(schedulePromises);
};

5. Contextual understanding of fitness terminology

AI understands specific vocabulary:

Recognition of data for set logging (log_set): AI searches user speech for numbers and keywords (such as "reps", "times", "weight", "pounds") to automatically fill in data about completed sets.
Processing requests and comments (get_form_tip, chat_message): Phrases that don't contain direct data for logging are processed as trainer requests or simple comments.

6. Comprehensive Analytics Dashboard

Implemented: A dedicated analytics section that provides detailed insights into workout performance and progress tracking.

Features include:

Visual progress charts and graphs
Historical workout data visualization
Performance metrics and trends analysis

Current MVP Limitations

Main challenges:

Speech recognition accuracy: AI doesn't always correctly interpret commands in live dialog, especially with background noise
Command execution: The model sometimes "forgets" to execute specific actions in the app after responding

Why Multimodality Matters for Fitness

Traditional fitness apps force you to choose: EITHER data tracking OR workout focus. The multimodal approach solves this dilemma:

Voice interface allows you to stay focused on exercises
Intelligent speech analysis structures data automatically
Real-time feedback creates the feeling of a personal trainer
Automatic workout scheduling integrates fitness into daily life

The result is a fitness companion that understands natural speech and adapts to each user's unique style.

Note: Despite my AI trainer being quite smart and motivating, we strongly recommend maintaining common sense, especially when it comes to health matters. 💪

Acknowledgments

I express deep gratitude to the organizers of the Google AI Studio Multimodal Challenge for the unique opportunity to experiment with cutting-edge artificial intelligence technologies.

Special thanks to:

The Google AI Studio team for the intuitive platform that makes complex technologies accessible
Gemini Live Audio API developers for the revolutionary real-time voice interaction technology
The Dev.to community for providing an inspiring platform for innovative projects

This project was made possible by an ecosystem of cutting-edge tools and the supportive developer community that Google AI creates.

MVP developed with React, TypeScript, Firebase, Google Calendar API and Google Gemini multimodal capabilities

Built with ❤️ by Premananda

Top comments (12)

Prema Ananda • Sep 23

no. telegram t.me/Premananda1

Prema Ananda • Sep 14

🤝 Partnership offer: 20% prize share for demo video creation

My hands-free AI fitness applet with Gemini Live API needs a professional demo video.

Ready to share 20% of any prize money for quality video production.

Project highlights real-time voice interaction and hands-free workout logging.

DM if interested!

Prema Ananda • Sep 17

Thank you for the quick response! Regarding the percentage - we can discuss that.
The main question is: can you show the actual use of the program in the video? The most important feature of my program is voice control during workouts.
I need to demonstrate:

How the user talks to the AI trainer by voice
How the program understands voice commands
How the AI responds by voice in real-time
The hands-free interaction - when hands are busy with exercises

Can you show this in the video? This is the key feature of the program - the voice-first approach. Without showing this functionality, the video won't convey the essence of the project.
If yes - then let's discuss the terms!

Prema Ananda • Sep 14

Exactly - I'm offering true partnership, not a service contract.

This is "let's win together," not "do work for me."

Good video = higher winning chances = bigger reward for both of us. I'll even increase it to 30% because I believe in this project.

No win = no money. That's partnership. We both invest (my code, your video skills) and share the outcome.

Prema Ananda • Sep 16

Hi!
Thank you for the work you have done and for the video provided.
The video turned out to be beautiful and dynamic, and your professionalism is evident. At the same time, I was expecting a slightly different result, one that would be more closely aligned with the functionality and the main idea of my program.
I would like to explain my thoughts and suggestions in more detail. I will be writing to you by email in the near future.

Prema Ananda • Sep 17

Downloaded

Prema Ananda • Sep 16

The main thing is to show that the application can be controlled by voice. This is especially important during a workout. It's not working perfectly yet, but this can be improved.

Prema Ananda • Sep 17

I sent you a letter to the address specified in your profile

Prema Ananda • Sep 17

Let's do it this way: upload your video to Google Drive, send me the link, and I'll finish it myself. You'll get 15% of the prize for your efforts.