DEV Community

Cover image for Moving Beyond the Black Box: How I Built a Real-Time Voice Fitness Coach using Next.js 15, Convex, & Vapi.ai
Sanya Prabhakar
Sanya Prabhakar

Posted on

Moving Beyond the Black Box: How I Built a Real-Time Voice Fitness Coach using Next.js 15, Convex, & Vapi.ai

Moving Beyond the Black Box: How I Built a Real-Time Voice Fitness Coach using Next.js 15, Convex, & Vapi.ai

I've used a lot of fitness apps. I've followed their calorie numbers, done their workout splits, and trusted their recommendations - all without ever understanding where any of it came from. The number just appeared, authoritative and unexplained, and I was supposed to trust it.

Eventually, I stopped. Not because the numbers were wrong. Because I had no reason to believe they were right.

That frustration became FitExplain - a full-stack web application that generates personalized fitness and nutrition plans through a voice conversation, and then shows you exactly how every number was calculated. No black box. No mystery algorithm. Just metabolic science, made visible.

This post is a complete technical breakdown of how I built it.


๐Ÿง  The Core Problem I Was Solving

Most fitness apps share a quiet design flaw: they treat the user like they can't handle the truth.

You open the app, enter your weight and height, tap through a few goal screens, and get back a daily calorie target and a workout plan. The numbers look precise. They come with decimal points. But if you ask why 1,940 calories - why not 1,800, why not 2,100 - the app has no answer. It never does.

This isn't accidental. It's a design assumption: that authority is enough, that users will follow a number simply because the app said so.

Behavioral research consistently shows this assumption is wrong. People follow health guidance longer and more faithfully when they understand the reasoning behind it. Unexplained AI output, however accurate, produces shallow trust - and shallow trust doesn't survive the friction of real life.

FitExplain was built around the opposite assumption: show your work.


๐Ÿ—๏ธ Architecture Overview: The Triple-Cloud Handshake

Before diving into each piece, here's how the system fits together. I call it the Triple-Cloud Handshake:

User's Browser
    โ”‚
    โ”œโ”€โ”€โ”€โ”€ speaks to โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Vapi.ai (Voice AI)
    โ”‚                           โ”‚
    โ”‚                           โ””โ”€โ”€ webhook โ”€โ”€โ–บ Convex Backend
    โ”‚                                               โ”‚
    โ”œโ”€โ”€โ”€โ”€ subscribes to โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Convex (Reactive DB)
    โ”‚                                               โ”‚
    โ””โ”€โ”€โ”€โ”€ authenticated by โ”€โ”€โ–บ Clerk โ”€โ”€webhookโ”€โ”€โ–บ  Convex
Enter fullscreen mode Exit fullscreen mode

Four independent cloud services, each best-in-class for its specific role:

Layer Technology Role
Frontend Next.js 15 + Vercel UI, routing, dashboard
Voice AI Vapi.ai Conversational data collection
Backend + DB Convex Persistence, AI orchestration, real-time push
Auth Clerk User identity, session tokens

The browser talks to Vapi for voice and to Convex for data. Clerk and Vapi each communicate with Convex via authenticated server-side webhooks โ€” invisible to the user, but the backbone of the system.


๐ŸŽ™๏ธ Layer 1: Vapi.ai โ€” Voice as the Intake Form

The first question I had to answer was: why voice at all?

Most fitness apps use forms. Forms are fine โ€” until you realize what they actually produce. A dropdown asking "How active are you?" with options like "Sedentary," "Lightly Active," and "Moderately Active" doesn't capture useful data. Users don't know what "moderately active" means in caloric terms, so they pick whatever sounds right and end up with a generic output.

Voice is different. A conversation can ask follow-up questions. It can confirm ambiguous answers. It can be designed to elicit specific, structured information โ€” in a format that feels completely natural to the user.

The Vapi Workflow: 10 Nodes, 8 Parameters

I built a ten-node conversation workflow in Vapi that collects exactly eight parameters before allowing the session to close:

  1. Age - integer
  2. Current weight - numeric, with unit confirmation
  3. Height โ€” numeric, with unit confirmation
  4. Existing injuries - free text, normalized
  5. Primary fitness goal - categorized (fat loss / muscle gain / endurance / general fitness)
  6. Preferred weekly training sessions - integer (1โ€“7)
  7. Current fitness level - categorized (beginner / intermediate / advanced)
  8. Dietary restrictions or allergies - free text, normalized

The workflow uses guard conditions - Vapi will not advance to plan generation unless every required parameter has been captured and confirmed. If a user says something ambiguous like "I work out sometimes," the workflow probes: "About how many days a week would you say?" The session runs between 30 and 77 seconds across all tested users.

The Race Condition I Spent an Hour Debugging

Here's something that cost me a full debugging session: Vapi's call.ended webhook fires before the call analysis is complete. I had built my Convex action to extract parameters from the analysis object - which wasn't populated yet when the webhook arrived.

The fix was to listen for end-of-call-report instead of call.ended. The report event fires after analysis completes and includes the full structured output. A one-line change, but I only found it after reading the Vapi docs three times and adding console logs to every webhook handler.

// convex/http.ts - webhook handler
export const vapiWebhook = httpAction(async (ctx, request) => {
  const payload = await request.json();

  // Use end-of-call-report, NOT call.ended
  if (payload.message?.type === "end-of-call-report") {
    const analysis = payload.message.analysis;
    const structuredData = analysis?.structuredData;

    if (structuredData) {
      await ctx.runAction(internal.fitplan.generatePlan, {
        callId: payload.message.call.id,
        userData: structuredData,
      });
    }
  }

  return new Response("OK", { status: 200 });
});
Enter fullscreen mode Exit fullscreen mode

โšก Layer 2: Convex โ€” The Reactive Backend

This was the most interesting architectural decision in the project, and the one I'd recommend most strongly to other developers.

Why Not Just Use a REST API?

The standard approach would be: Vapi webhook hits an API route โ†’ API route calls Gemini โ†’ API route writes to a database โ†’ user refreshes the page to see results.

That works. But it means the user has to do something to see their plan. They have to know to refresh. The experience has a gap in it.

Convex eliminates the gap entirely.

How Convex Reactivity Works

In Convex, queries are subscriptions. When the frontend runs a query through the Convex React SDK, it opens a persistent WebSocket connection. Convex tracks which database documents that query depends on. The moment any of those documents change โ€” from any mutation, anywhere in the system - Convex re-executes the query and pushes the updated result to all subscribed clients.

The developer writes a query function. Convex handles subscription, invalidation, and delivery automatically.

// convex/fitplans.ts
export const getUserPlan = query({
  args: { userId: v.string() },
  handler: async (ctx, { userId }) => {
    return await ctx.db
      .query("fitPlans")
      .withIndex("by_user", (q) => q.eq("userId", userId))
      .order("desc")
      .first();
  },
});
Enter fullscreen mode Exit fullscreen mode
// app/dashboard/page.tsx โ€” React component
const plan = useQuery(api.fitplans.getUserPlan, { userId: user.id });

// 'plan' updates automatically when Convex pushes a new value.
// No polling. No useEffect with fetch. No page reload.
Enter fullscreen mode Exit fullscreen mode

For FitExplain, this meant: the moment the Vapi webhook triggers a plan creation mutation in Convex, the user's dashboard re-renders with the new plan. Latency between database write and screen update is WebSocket round-trip time plus Convex re-execution time โ€” under 200ms combined.

The Schema: Keeping Gemini Honest

One subtle but important decision: I defined a strict Convex schema for fitness plans, then validated all Gemini output against it before writing to the database. This caught a bug that took me a while to find โ€” Gemini was returning "2200" (string) instead of 2200 (integer) for calorie values. The Convex validator rejected it, which surfaced the issue immediately rather than letting corrupt data reach the frontend.

// convex/schema.ts
fitPlans: defineTable({
  userId: v.string(),
  callId: v.string(),
  createdAt: v.number(),
  userProfile: v.object({
    age: v.number(),
    weightKg: v.number(),
    heightCm: v.number(),
    goal: v.string(),
    fitnessLevel: v.string(),
    weeklySessionTarget: v.number(),
    injuries: v.string(),
    dietaryRestrictions: v.string(),
  }),
  metabolics: v.object({
    bmr: v.number(),         // Mifflin-St Jeor result
    tdee: v.number(),        // BMR ร— activity multiplier
    targetCalories: v.number(), // TDEE adjusted for goal
    proteinGrams: v.number(),
    carbGrams: v.number(),
    fatGrams: v.number(),
  }),
  workoutPlan: v.array(v.object({
    day: v.string(),
    focus: v.string(),
    exercises: v.array(v.object({
      name: v.string(),
      sets: v.number(),
      reps: v.string(),
      rest: v.string(),
    })),
  })),
  mealPlan: v.array(v.object({
    meal: v.string(),
    foods: v.array(v.string()),
    calories: v.number(),
  })),
}).index("by_user", ["userId"]),
Enter fullscreen mode Exit fullscreen mode

๐Ÿค– Layer 3: Google Gemini 1.5 Flash - Structured AI Output

Most LLM integrations treat the model as a conversational partner. FitExplain treats Gemini as a structured data generator.

The distinction matters. A conversational response ("Here's your plan! On Monday, you should do...") cannot be stored cleanly in a database or rendered in a structured dashboard UI. Schema-compliant JSON can.

The Prompt Engineering

The system prompt does three things:

  1. Instructs Gemini to calculate BMR using Mifflin-St Jeor explicitly
  2. Instructs Gemini to derive TDEE using the standard activity multiplier table
  3. Requires output as valid JSON matching a predefined schema - no prose, no markdown, no preamble
// convex/fitplan.ts - Gemini prompt construction
const systemPrompt = `You are a fitness and nutrition AI that generates structured plans.
You MUST respond with ONLY valid JSON. No markdown. No explanation. No preamble.

Calculate BMR using Mifflin-St Jeor:
- Men: (10 ร— weight_kg) + (6.25 ร— height_cm) - (5 ร— age) + 5
- Women: (10 ร— weight_kg) + (6.25 ร— height_cm) - (5 ร— age) - 161

Apply TDEE multiplier:
- Sedentary (1-2 days/week): BMR ร— 1.2
- Lightly active (3 days/week): BMR ร— 1.375
- Moderately active (4-5 days/week): BMR ร— 1.55
- Very active (6-7 days/week): BMR ร— 1.725

Adjust target calories for goal:
- Fat loss: TDEE - 500
- Muscle gain: TDEE + 300
- Endurance/General: TDEE

Output JSON matching this exact schema: ${JSON.stringify(planSchema)}`;
Enter fullscreen mode Exit fullscreen mode

The apiVersion Bug

The Gemini Node.js SDK was returning HTTP 404 errors on every call. The fix was non-obvious: I needed to pass apiVersion: 'v1' explicitly when initializing the client. The default was pointing to a preview endpoint that wasn't available in my region.

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!, {
  apiVersion: 'v1', // Required โ€” default causes 404 in some regions
});
Enter fullscreen mode Exit fullscreen mode

One line. Forty-five minutes to find it.


๐Ÿ” Layer 4: Clerk - Authentication Done Right

I chose Clerk over NextAuth for one reason: its webhook system integrates cleanly with Convex. When a user signs up, Clerk fires a user.created webhook to Convex, which creates a user document in the database. Session tokens from Clerk are verified in every Convex query and mutation, so user data is always scoped correctly.

// convex/http.ts โ€” Clerk webhook
export const clerkWebhook = httpAction(async (ctx, request) => {
  const event = await validateClerkWebhook(request);

  if (event.type === "user.created") {
    await ctx.runMutation(internal.users.createUser, {
      clerkId: event.data.id,
      email: event.data.email_addresses[0].email_address,
      name: `${event.data.first_name} ${event.data.last_name}`,
    });
  }

  return new Response("OK", { status: 200 });
});
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Š The XAI Part: Making the Math Visible

The Explainable AI component of FitExplain isn't a separate feature - it's built into the dashboard display. Every number shown to the user has a derivation trace:

Your Daily Calorie Target: 1,940 kcal

How we calculated this:
โ”œโ”€โ”€ BMR (Mifflin-St Jeor): 1,847 kcal
โ”‚   โ””โ”€โ”€ (10 ร— 75kg) + (6.25 ร— 178cm) - (5 ร— 24) + 5
โ”œโ”€โ”€ TDEE (activity multiplier): 2,440 kcal
โ”‚   โ””โ”€โ”€ BMR ร— 1.325 (4 sessions/week = lightly-to-moderately active)
โ””โ”€โ”€ Fat loss adjustment: -500 kcal
    โ””โ”€โ”€ Standard 500kcal deficit targets ~0.5kg/week loss
Enter fullscreen mode Exit fullscreen mode

This is the core design principle: explainability as engineering input, not post-hoc annotation. I didn't generate a plan and then try to explain it afterward. I started from the explanation โ€” the formula, the multiplier table, the adjustment logic โ€” and used Gemini to instantiate that explanation for each individual user's parameters.

The AI doesn't replace the science. It applies it.


๐Ÿงช Testing Results

I ran 13 live voice sessions across users with different profiles:

Metric Result
Voice-to-dashboard latency 5โ€“11 seconds
Convex cached query performance < 5ms
Sessions completed without error 13/13
Schema validation failures 2 (caught before DB write)
Contextually differentiated plans 13/13

The two schema validation failures were both the string/integer type mismatch on calorie values โ€” Gemini occasionally returns numeric values as strings. I added a normalization step in the Convex action that coerces all numeric fields before validation, which resolved both failures.


๐Ÿš€ The Full Tech Stack

Frontend:     Next.js 15 (App Router), TypeScript, Tailwind CSS
Deployment:   Vercel (automatic from GitHub)
Voice AI:     Vapi.ai (10-node workflow, guard-conditioned)
Database:     Convex (reactive, serverless, TypeScript-native)
AI Model:     Google Gemini 1.5 Flash (structured JSON output)
Auth:         Clerk (webhooks + session verification)
Enter fullscreen mode Exit fullscreen mode

๐Ÿ’ก What I'd Do Differently

1. Schema validation from day one. I added the Convex schema validator after encountering the string/integer bug. I should have defined it before writing any Gemini integration code. Strict schema-first development would have caught that issue in local testing rather than in a live session.

2. Vapi workflow versioning. The Vapi workflow is configured in their dashboard UI, which makes version control awkward. I'd build a script to export and commit the workflow JSON alongside the codebase from the start.

3. Gemini output temperature. I used the default temperature (1.0) throughout development. In retrospect, setting temperature to 0.3โ€“0.5 for structured JSON generation would have produced more consistent outputs and fewer edge cases in the schema normalization layer.


๐Ÿ”ญ What's Next

The current system generates a plan once per voice session. The obvious next step is longitudinal tracking โ€” storing multiple plans over time, detecting progress, and adjusting recommendations based on what's changed. Convex's reactive queries make this architecturally straightforward; it's primarily a UI and prompt engineering problem.

Other directions I want to explore:

  • Wearable integration - pulling actual activity data from Apple Health or Garmin rather than relying on self-reported activity levels
  • Plan revision via voice - letting users update specific parameters (new injury, changed schedule) without redoing the full intake conversation
  • Nutrition logging - tracking actual meals against the generated plan, with Convex reactively updating macro targets based on what's been logged

Wrapping Up

The thing I kept thinking about while building FitExplain is that the transparency problem in fitness apps isn't really a hard technical problem. It's a design choice. Someone decided that showing users the formula would confuse them, or slow them down, or make the app feel less magical.

I think that's wrong. Showing the formula is what builds real trust โ€” the kind that survives a bad week and a missed workout, instead of evaporating the moment the user questions the number.

With cloud-native tools available today โ€” Vapi for voice, Convex for reactive data, Gemini for structured AI generation โ€” you can build a system that is both intelligent and transparent without trading one for the other.

That's what FitExplain does. And it's what I think a lot more fitness software should do.


The full project is documented in my final year report at Manipal University Jaipur. If you have questions about any specific part of the implementation โ€” the Vapi workflow design, the Convex schema, the Gemini prompt structure โ€” drop them in the comments.


Tags: #ai #gemini #frontend #database

Top comments (0)