If you've ever done squats wrong for months and only found out when your knees started hurting — this article is for you.
I built a fitness app that watches you exercise through your phone camera and tells you in real time when your form is off. Not after the rep. Not in a post-workout summary. Right now, while you're mid-squat.
Here's how the technical stack works — and how you can build something similar.
The Core Problem
Most fitness apps track reps. Count sets. Log weights. That's useful data, but it doesn't answer the most important question: are you actually doing the exercise correctly?
Bad form leads to injuries, slower progress, and wasted time. A personal trainer would catch these issues instantly. A fitness app historically could not — until pose estimation models became fast enough to run on mobile.
How Pose Estimation Works
Modern pose estimation models (MediaPipe, MoveNet, PoseNet) detect key body landmarks from a video frame and return their 2D/3D coordinates. For a squat, you'd track:
- Hip (x, y, z)
- Knee (x, y, z)
- Ankle (x, y, z)
From those three points, you can calculate the knee angle. A proper squat keeps the knee angle above ~90°. Simple geometry, powerful feedback.
// Calculate angle between three body points
function calculateAngle(pointA, pointB, pointC) {
const radians = Math.atan2(pointC.y - pointB.y, pointC.x - pointB.x)
- Math.atan2(pointA.y - pointB.y, pointA.x - pointB.x);
let angle = Math.abs(radians * 180.0 / Math.PI);
if (angle > 180) angle = 360 - angle;
return angle;
}
// Example usage for knee angle
const kneeAngle = calculateAngle(
landmarks.HIP,
landmarks.KNEE,
landmarks.ANKLE
);
if (kneeAngle < 90) {
triggerFeedback('Go deeper — squat below parallel');
}
The Stack
Here's what I used to build this:
- MediaPipe Pose — runs at 30fps on mid-range Android phones, gives 33 body landmarks
- TensorFlow Lite — for custom exercise classifier (push-up vs squat vs deadlift detection)
- Kotlin + CameraX — Android camera pipeline with real-time frame analysis
- Claude API — natural language form coaching ("your left shoulder is dropping")
// Kotlin: Process camera frame with MediaPipe
class PoseAnalyzer : ImageAnalysis.Analyzer {
private val poseDetector = Pose.getClient(PoseDetectorOptions.Builder()
.setDetectorMode(PoseDetectorOptions.STREAM_MODE)
.build())
override fun analyze(imageProxy: ImageProxy) {
val inputImage = InputImage.fromMediaImage(
imageProxy.image!!,
imageProxy.imageInfo.rotationDegrees
)
poseDetector.process(inputImage)
.addOnSuccessListener { pose ->
analyzePose(pose)
}
.addOnCompleteListener {
imageProxy.close()
}
}
private fun analyzePose(pose: Pose) {
val leftHip = pose.getPoseLandmark(PoseLandmark.LEFT_HIP)
val leftKnee = pose.getPoseLandmark(PoseLandmark.LEFT_KNEE)
val leftAnkle = pose.getPoseLandmark(PoseLandmark.LEFT_ANKLE)
if (leftHip != null && leftKnee != null && leftAnkle != null) {
val kneeAngle = calculateAngle(
leftHip.position3D,
leftKnee.position3D,
leftAnkle.position3D
)
checkSquatForm(kneeAngle)
}
}
}
The AI Coaching Layer
Raw angle data is useful for engineers. But users need human language. That's where the Claude API comes in.
Instead of showing kneeAngle: 78.3°, I send the landmark data to Claude with a structured prompt and get back actual coaching language:
// Generate human-readable form feedback
async function generateFormFeedback(landmarkData, exercise) {
const response = await anthropic.messages.create({
model: 'claude-haiku-4-5-20251001', // Fast, cheap, good enough for this
max_tokens: 150,
system: 'You are a personal trainer giving real-time form feedback. Be specific, brief, and actionable. Max 1-2 sentences.',
messages: [{
role: 'user',
content: `Exercise: ${exercise}\nBody angles: ${JSON.stringify(landmarkData)}\nWhat is wrong with their form?`
}]
});
return response.content[0].text;
// "Your right knee is caving inward. Push your knees out over your pinky toes."
}
The key insight: use a fast, cheap model for real-time feedback (Claude Haiku at sub-100ms latency) and only invoke it when you detect a form deviation — not on every frame.
Counting Reps Accurately
Rep counting sounds simple but is surprisingly tricky. The naive approach — count angle crossings past a threshold — breaks down with partial reps, pauses, and slow movements.
A better approach: track the full angle curve and detect complete oscillations using a state machine.
const RepState = { UP: 'UP', DOWN: 'DOWN' };
class RepCounter {
constructor() {
this.state = RepState.UP;
this.repCount = 0;
this.angleHistory = [];
}
update(kneeAngle) {
this.angleHistory.push(kneeAngle);
if (this.state === RepState.UP && kneeAngle < 100) {
this.state = RepState.DOWN; // Started going down
} else if (this.state === RepState.DOWN && kneeAngle > 160) {
this.state = RepState.UP; // Came back up = completed rep
this.repCount++;
this.analyzeRepQuality();
}
}
analyzeRepQuality() {
const minAngle = Math.min(...this.angleHistory.slice(-30));
this.angleHistory = []; // Reset for next rep
return minAngle > 90 ? 'shallow' : 'full';
}
}
Performance on Real Devices
This is where most tutorials skip the hard part. Here's what I measured:
| Device | MediaPipe FPS | Full pipeline FPS | AI feedback latency |
|---|---|---|---|
| Pixel 7 | 30fps | 28fps | 180ms |
| Samsung A54 | 24fps | 20fps | 240ms |
| Pixel 4a | 18fps | 14fps | 320ms |
Tips for keeping it fast:
- Run pose detection on a background thread — never on the UI thread
- Skip AI feedback if confidence < 0.7 — low confidence landmarks give garbage angles
- Throttle Claude API calls — trigger only on sustained form errors (2+ seconds), not momentary glitches
- Cache exercise models — load TFLite models at app start, not at workout start
What's Hard That Nobody Talks About
Camera placement is brutal. Pose estimation assumes a full-body view from a specific angle. Users prop their phone at weird heights and angles. You need to detect poor camera placement and guide users to reposition before the workout starts.
Exercise detection is unsolved. Telling squats from lunges from step-ups reliably is hard. I ended up letting users select the exercise rather than auto-detecting it — not as smooth, but far more accurate.
Clothing matters. Dark pants on a dark floor, baggy hoodies — the model struggles with occlusion. Adding visual contrast suggestions ("wear contrasting clothing") to onboarding helped significantly.
The Result
After building all this from scratch and testing with real users, I turned it into IronCore Fit — an AI fitness app that watches your form in real time, counts reps, and coaches you like a trainer would. It handles squats, push-ups, deadlifts, lunges, planks, and more.
Try IronCore Fit on Android → IronCore Fit
Or if you want to build your own pose-based app, the techniques above will get you started. The hardest part is not the ML — it is the UX of giving feedback at exactly the right moment without annoying users. But that's a whole other article.
What exercise would you want AI form correction for first? Drop it in the comments — I'm adding new exercises every sprint.
Top comments (0)