Rahan Judes Michael

Posted on Jun 6

How We Built an AI Gym Trainer Using MediaPipe, React, and RAG 💪🤖

#ai #webdev #rag #programming

We've all been there.

You start doing a workout, feel incredibly confident about your form, and then realize you've been doing squats that look more like an interpretive dance routine.

That got me and my friend thinking:

What if a personal trainer could fit inside your laptop and tell you when you're doing an exercise correctly?

So we built an AI Gym Trainer that can analyze exercise form in real time, count repetitions, and answer fitness-related questions through a RAG-powered assistant.

Rather than training a machine learning model from scratch, we used MediaPipe's pose estimation capabilities and combined them with custom angle calculations and rule-based logic to create an intelligent workout companion.

The Problem

Many beginners, including us, don't always know whether an exercise is being performed correctly. A workout might feel effective, but poor form can reduce results and even increase the risk of injury.

We wanted to explore whether computer vision and AI could provide instant feedback and help users train with more confidence all from a simple webcam.

Project Overview

Application Dashboard

Exercise Tracking

Fitness Assistant

System Architecture

The project is split into two main parts:

A real-time exercise tracking system that analyzes user movements.
A RAG-powered fitness assistant that answers workout-related questions.

Real-Time Exercise Analysis

This is where the magic happens or where the application politely tells you that your "perfect squat" isn't actually perfect. 😅

The exercise analysis pipeline processes webcam input in real time and continuously monitors the user's movements to provide instant feedback.

Pose Detection with MediaPipe

The first step is getting the system to understand what the user is doing.

We use MediaPipe Pose to detect 33 body landmarks in real time, including key joints such as the shoulders, elbows, hips, knees, and ankles.

Think of it as giving the computer a very basic understanding of human anatomy without requiring it to survive medical school.

Angle Calculation

Once the landmarks are detected, we calculate joint angles between different body parts.

For example, during a bicep curl, the elbow angle changes as the arm moves up and down. By tracking these changes, the system can determine which stage of the exercise the user is currently in.

Form Validation

Now comes the part where the application becomes your strict gym buddy.

Instead of training a custom model, we use rule-based logic and predefined angle thresholds for each exercise.

If the movement meets the expected criteria, everything looks good. If not, the system provides feedback to help the user correct their form before they accidentally invent a brand-new exercise.

Repetition Counting

Finally, the system keeps track of exercise stages such as "up" and "down".

Whenever a complete movement cycle is detected, the repetition counter increases automatically.

No more arguing with yourself about whether that last rep counted—it does the counting for you.

How the System Works

Pose Detection: Teaching a Computer to Recognize a Human 🕺

The first challenge was getting the application to understand what it was looking at.

Using MediaPipe Pose, we detect 33 body landmarks in real time, including key joints such as the shoulders, elbows, hips, knees, and ankles.

These landmarks act as the "skeleton" that the rest of the system uses for analysis.

Angle Calculation: Turning Movements into Numbers 📐

Once the landmarks are detected, we calculate joint angles between different body parts.

For example, during a bicep curl, the elbow angle changes as the arm moves up and down. By tracking these changes, the system can understand where the user is in the exercise.

 const lm = pose.getLandmarks();

        if (lm) {
          const right = pose.calculateAngle(lm[16], lm[14], lm[12]);
          const left = pose.calculateAngle(lm[11], lm[13], lm[15]);

          const back = pose.calculateAngle(lm[24], lm[12], lm[11]);

          const shoulderLevel = Math.abs(
            pose.calculateAngle(lm[11], lm[23], lm[24]) -
              pose.calculateAngle(lm[12], lm[24], lm[23])
          );

At this point, the application doesn't know whether you're exercising correctly—it just knows a bunch of angles.

Form Validation & Rep Counting: The Strict Gym Buddy 💪

This is where the fun begins.

Instead of training a custom machine learning model, we use rule-based logic and angle thresholds to evaluate each exercise.

For example, during a bicep curl:

let newMsg = "";

          if (back < 70 || back > 110) {
            newMsg = "KEEP YOUR BACK STRAIGHT!";
          } else if (shoulderLevel > 20) {
            newMsg = "SHOULDERS NOT LEVEL!";
          } else if (right <= 15 && left > 15) {
            newMsg = "Straighten LEFT arm!";
          } else if (left <= 15 && right > 15) {
            newMsg = "Straighten RIGHT arm!";
          } else {
            newMsg = "";
          }

If the movement follows the expected pattern, the repetition is counted. If not, the system lets the user know that their form needs improvement.

Think of it as having a gym buddy who's constantly watching your workout—but instead of saying "one more rep bro," it's checking whether that rep actually counts.

RAG-Powered Fitness Assistant 🤖

While building the exercise tracker, we realized that users might have questions beyond just counting reps.

Questions like:

How do I improve my squat form?
Which muscles does a deadlift target?
How many sets should a beginner perform?

Rather than hardcoding answers or relying solely on a general-purpose chatbot, we integrated a Retrieval-Augmented Generation (RAG) pipeline to provide more relevant and context-aware responses.

The idea is simple: before generating an answer, the system first retrieves relevant information from a fitness knowledge base and then passes that context to the LLM.

Example Interaction

User: How do I improve my squat form?

Assistant: Keep your knees aligned with your toes, maintain a neutral spine, and aim to reach at least parallel depth while keeping your heels grounded.

Core RAG Workflow

try:
        while True:
            data = await websocket.receive_json()
            user_text = data["message"]
            assistantId = data["assistantId"]

            intent = detect_intent(user_text)
            if intent == "recommend": 
                response = recommend_workout() 
            elif intent == "exercise": 
                response = launch_exercise(user_text) 
            else: 
                response = get_rag_answer(user_text) 

            if intent == "exercise":
                await websocket.send_json(response)

            if intent != "exercise":
                for char in response:
                    await asyncio.sleep(0.02)
                    await websocket.send_json({
                        "type": "stream",
                        "token": char,
                        "assistantId": assistantId
                    })
            await websocket.send_json({"type": "done", "assistantId": assistantId})
    except WebSocketDisconnect:
        pass

class Profile(BaseModel):
    height: int
    weight: float
    targetWeight: float
    fitnessGoal: str
    age: int

@app.post("/generate_plan")
async def generate_plan(profile: Profile):
    try:
        age = profile.age
        height = profile.height
        weight = profile.weight
        target_weight = profile.targetWeight
        fitness_goal = profile.fitnessGoal

        prompt = f"""
        You are an expert fitness coach. Your task is to create a structured weekly workout plan tailored to the user's profile and goals.

        STRICT CONSTRAINTS:
        1. Schedule workouts ONLY for Monday through Friday. Saturday and Sunday MUST be rest days.
        2. Assign  4 exercises per workout day.
        3. You MUST ONLY select exercises from the "Available Exercises" list provided below. Do NOT hallucinate, recommend, or include any exercises outside of this list under any circumstances.

        USER PROFILE:
        - Age: {age}
        - Height: {height}
        - Weight: {weight}
        - Target Weight: {target_weight}
        - Fitness Goal: {fitness_goal}

        AVAILABLE EXERCISES:
        mentioned in the fitness trainer's knowledge base

This approach helps the assistant generate responses grounded in fitness-specific information rather than relying entirely on the model's general knowledge.

In simple terms, instead of saying "Trust me, bro," the assistant actually looks up relevant information before answering.

The Biggest Challenge We Faced: Making It Truly Real-Time ⚡

When we first built the project, everything worked perfectly... until we connected it to the frontend.

Our initial implementation ran MediaPipe entirely in Python. The React frontend would capture video frames, send them to the Python backend for processing, and then wait for the results to be sent back.

Initial Architecture

While this approach was functional, it introduced noticeable latency. Every frame had to travel from the browser to the backend and back again before feedback could be displayed.

The result?

Laggy pose tracking
Delayed feedback
A user experience that felt far from real-time

Not exactly what you want from a virtual fitness coach.

The Solution

After some experimentation, we decided to move MediaPipe directly into the React application.

Instead of sending video frames to the backend, pose detection could now run directly in the browser.

Optimized Architecture

This eliminated the constant round-trip communication between the frontend and backend.

The Impact

The improvement was immediately noticeable:

Smoother pose tracking
Faster feedback
Reduced backend workload
A much more responsive user experience

This challenge taught us an important lesson:

building AI applications isn't just about models and algorithms. Sometimes the biggest performance gains come from making better architectural decisions.

Results 🎯

After weeks of debugging, tweaking angle thresholds, and asking ourselves "why isn't this working?", we finally had a system that could track exercises in real time and answer fitness-related questions through a RAG-powered assistant.

Exercise Tracking in Action

The application was tested across multiple exercises, including:

💪 Bicep Curl

🏋️Hammer Curl

Using MediaPipe landmarks and custom rule-based logic, the system can:

- Track body movements in real time

- Count repetitions automatically

- Identify correct and incorrect form

- Provide instant workout feedback

And perhaps most importantly, it never loses count halfway through a workout. 😄

Meet the Fitness Assistant 🤖

We also integrated a RAG-powered fitness assistant that can answer workout and fitness-related questions.

Whether it's improving squat technique, understanding muscle groups, or learning about training concepts, the assistant retrieves relevant information before generating a response.

Chatbot Demo

What We're Most Proud Of

Looking back, the most satisfying part wasn't the rep counter or even the chatbot.

It was taking several different technologies—Computer Vision, MediaPipe, React, Python, and RAG—and combining them into a single application that feels practical and interactive.

There's still plenty of room for improvement, but seeing the system analyze exercises in real time and have meaningful fitness conversations made all the debugging sessions worth it.

Future Improvements

Support for additional exercises
Fine-tuning the speech recognition model
Mobile application support

Conclusion

Building this AI Gym Trainer was an exciting experience that combined computer vision, real-time processing, and generative AI into a single project.

We're continuing to improve the project and would love to hear your thoughts, suggestions, and feedback.

Project Links

🔗 GitHub Repository

If you found this project interesting, feel free to star the repository or share any ideas for future improvements!

Top comments (1)

Karthik K Pradeep • Jun 6

Great project and a very interesting architectural decision to move MediaPipe directly into the frontend for real-time performance. I'm curious about your choice to use rule-based angle thresholds for form validation instead of training an exercise classification model. Was this primarily for simplicity, interpretability, or data availability? Also, did you compare the accuracy and robustness of the rule-based approach against any ML-based alternatives, especially across users with different body types and movement patterns?