Dev

Posted on Mar 18

How I Built Real-Time AI Form Correction Into a Mobile Fitness App

#fitness #ai #android #tutorial

If you've ever done squats wrong for months and only found out when your knees started hurting — this article is for you.

I built a fitness app that watches you exercise through your phone camera and tells you in real time when your form is off. Not after the rep. Not in a post-workout summary. Right now, while you're mid-squat.

Here's how the technical stack works — and how you can build something similar.

The Core Problem

Most fitness apps track reps. Count sets. Log weights. That's useful data, but it doesn't answer the most important question: are you actually doing the exercise correctly?

Bad form leads to injuries, slower progress, and wasted time. A personal trainer would catch these issues instantly. A fitness app historically could not — until pose estimation models became fast enough to run on mobile.

How Pose Estimation Works

Modern pose estimation models (MediaPipe, MoveNet, PoseNet) detect key body landmarks from a video frame and return their 2D/3D coordinates. For a squat, you'd track:

Hip (x, y, z)
Knee (x, y, z)
Ankle (x, y, z)

From those three points, you can calculate the knee angle. A proper squat keeps the knee angle above ~90°. Simple geometry, powerful feedback.

// Calculate angle between three body points
function calculateAngle(pointA, pointB, pointC) {
  const radians = Math.atan2(pointC.y - pointB.y, pointC.x - pointB.x)
              - Math.atan2(pointA.y - pointB.y, pointA.x - pointB.x);
  let angle = Math.abs(radians * 180.0 / Math.PI);
  if (angle > 180) angle = 360 - angle;
  return angle;
}

// Example usage for knee angle
const kneeAngle = calculateAngle(
  landmarks.HIP,
  landmarks.KNEE,
  landmarks.ANKLE
);

if (kneeAngle < 90) {
  triggerFeedback('Go deeper — squat below parallel');
}

The Stack

Here's what I used to build this:

MediaPipe Pose — runs at 30fps on mid-range Android phones, gives 33 body landmarks
TensorFlow Lite — for custom exercise classifier (push-up vs squat vs deadlift detection)
Kotlin + CameraX — Android camera pipeline with real-time frame analysis
Claude API — natural language form coaching ("your left shoulder is dropping")

// Kotlin: Process camera frame with MediaPipe
class PoseAnalyzer : ImageAnalysis.Analyzer {
    private val poseDetector = Pose.getClient(PoseDetectorOptions.Builder()
        .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
        .build())

    override fun analyze(imageProxy: ImageProxy) {
        val inputImage = InputImage.fromMediaImage(
            imageProxy.image!!,
            imageProxy.imageInfo.rotationDegrees
        )

        poseDetector.process(inputImage)
            .addOnSuccessListener { pose ->
                analyzePose(pose)
            }
            .addOnCompleteListener {
                imageProxy.close()
            }
    }

    private fun analyzePose(pose: Pose) {
        val leftHip = pose.getPoseLandmark(PoseLandmark.LEFT_HIP)
        val leftKnee = pose.getPoseLandmark(PoseLandmark.LEFT_KNEE)
        val leftAnkle = pose.getPoseLandmark(PoseLandmark.LEFT_ANKLE)

        if (leftHip != null && leftKnee != null && leftAnkle != null) {
            val kneeAngle = calculateAngle(
                leftHip.position3D,
                leftKnee.position3D,
                leftAnkle.position3D
            )
            checkSquatForm(kneeAngle)
        }
    }
}

The AI Coaching Layer

Raw angle data is useful for engineers. But users need human language. That's where the Claude API comes in.

Instead of showing kneeAngle: 78.3°, I send the landmark data to Claude with a structured prompt and get back actual coaching language:

// Generate human-readable form feedback
async function generateFormFeedback(landmarkData, exercise) {
  const response = await anthropic.messages.create({
    model: 'claude-haiku-4-5-20251001', // Fast, cheap, good enough for this
    max_tokens: 150,
    system: 'You are a personal trainer giving real-time form feedback. Be specific, brief, and actionable. Max 1-2 sentences.',
    messages: [{
      role: 'user',
      content: `Exercise: ${exercise}\nBody angles: ${JSON.stringify(landmarkData)}\nWhat is wrong with their form?`
    }]
  });

  return response.content[0].text;
  // "Your right knee is caving inward. Push your knees out over your pinky toes."
}

The key insight: use a fast, cheap model for real-time feedback (Claude Haiku at sub-100ms latency) and only invoke it when you detect a form deviation — not on every frame.

Counting Reps Accurately

Rep counting sounds simple but is surprisingly tricky. The naive approach — count angle crossings past a threshold — breaks down with partial reps, pauses, and slow movements.

A better approach: track the full angle curve and detect complete oscillations using a state machine.

const RepState = { UP: 'UP', DOWN: 'DOWN' };

class RepCounter {
  constructor() {
    this.state = RepState.UP;
    this.repCount = 0;
    this.angleHistory = [];
  }

  update(kneeAngle) {
    this.angleHistory.push(kneeAngle);

    if (this.state === RepState.UP && kneeAngle < 100) {
      this.state = RepState.DOWN; // Started going down
    } else if (this.state === RepState.DOWN && kneeAngle > 160) {
      this.state = RepState.UP; // Came back up = completed rep
      this.repCount++;
      this.analyzeRepQuality();
    }
  }

  analyzeRepQuality() {
    const minAngle = Math.min(...this.angleHistory.slice(-30));
    this.angleHistory = []; // Reset for next rep
    return minAngle > 90 ? 'shallow' : 'full';
  }
}

Performance on Real Devices

This is where most tutorials skip the hard part. Here's what I measured:

Device	MediaPipe FPS	Full pipeline FPS	AI feedback latency
Pixel 7	30fps	28fps	180ms
Samsung A54	24fps	20fps	240ms
Pixel 4a	18fps	14fps	320ms

Tips for keeping it fast:

Run pose detection on a background thread — never on the UI thread
Skip AI feedback if confidence < 0.7 — low confidence landmarks give garbage angles
Throttle Claude API calls — trigger only on sustained form errors (2+ seconds), not momentary glitches
Cache exercise models — load TFLite models at app start, not at workout start

What's Hard That Nobody Talks About

Camera placement is brutal. Pose estimation assumes a full-body view from a specific angle. Users prop their phone at weird heights and angles. You need to detect poor camera placement and guide users to reposition before the workout starts.

Exercise detection is unsolved. Telling squats from lunges from step-ups reliably is hard. I ended up letting users select the exercise rather than auto-detecting it — not as smooth, but far more accurate.

Clothing matters. Dark pants on a dark floor, baggy hoodies — the model struggles with occlusion. Adding visual contrast suggestions ("wear contrasting clothing") to onboarding helped significantly.

The Result

After building all this from scratch and testing with real users, I turned it into IronCore Fit — an AI fitness app that watches your form in real time, counts reps, and coaches you like a trainer would. It handles squats, push-ups, deadlifts, lunges, planks, and more.

Try IronCore Fit on Android → IronCore Fit

Or if you want to build your own pose-based app, the techniques above will get you started. The hardest part is not the ML — it is the UX of giving feedback at exactly the right moment without annoying users. But that's a whole other article.

What exercise would you want AI form correction for first? Drop it in the comments — I'm adding new exercises every sprint.

DEV Community