DEV Community

Cover image for From Webcam to Wellness: Building a Real-Time AI Assistant for Students
DHRUVA WANI
DHRUVA WANI

Posted on

From Webcam to Wellness: Building a Real-Time AI Assistant for Students

DEV Weekend Challenge: Community

This is a submission for the DEV Weekend Challenge: Community

The Community

This project is built for my college community.
In my college, students spend 8–12 hours daily in front of screens — coding, studying, preparing for placements, and meeting deadlines.
Over time, this leads to:

  • Chronic poor posture
  • Fatigue
  • Stress-related breathing patterns
  • Reduced physical awareness The problem? Most students don’t realize it until discomfort becomes pain.

I wanted to build something proactive — a system that checks in before the damage happens.

What I Built

I built a Real-Time AI Medical Wellness Assistant that transforms a simple video call into a quick wellness check.

In just 8–10 seconds, the assistant:

  • Analyzes posture (slouching, shoulder imbalance, head tilt)
  • Estimates breathing rate using chest movement
  • Detects visible fatigue indicators
  • Provides empathetic verbal feedback
  • Generates a structured PDF wellness report

It is not a medical diagnostic tool — it is a preventive awareness system designed to help students self-correct early.

The goal is simple:
Make wellness accessible, instant, and frictionless.

Demo

Code

AI medical Wellness Assistnant 🩺

Wellness Assistant is a real-time, AI-powered Medical Wellness Video Assistant. It provides empathetic, non-diagnostic wellness insights by analyzing a user's physical, respiratory, and emotional markers through live video feed using multimodal AI models.

Built with FastAPI, vision-agents, and WebSockets, VitalsAI acts as a proactive wellness companion, capable of observing posture, estimating breathing patterns, and providing instant, conversational voice feedback.


✨ Features

  • Real-Time Video Analysis: Uses WebRTC and WebSockets to process live camera feeds.
  • Posture & Kinematics Assessment: Leverages YOLOv11 (yolo11n-pose.pt) to detect spinal alignment, shoulder symmetry, and physical strain.
  • Multimodal AI Companion:
    • Vision: Google Gemini & Ultralytics for visual reasoning and pose estimation.
    • Speech-to-Text: Deepgram for real-time transcription.
    • Text-to-Speech: ElevenLabs for a calm, clinical, and friendly voice assistant.
  • Live Dashboard: Real-time insights displayed in a unified HTML/JS dashboard.
  • Session Reports: Automatically generates a downloadable PDF summary of the wellness session.



The code includes:

  • Real-time video orchestration
  • Pose estimation pipeline
  • Multimodal AI reasoning logic
  • Speech-to-text and text-to-speech integration
  • PDF report generation

How I Built It

This project is powered by a real-time multimodal AI architecture.

Core stack:

  1. Vision Agents SDK (agent orchestration layer)
  2. GetStream (WebRTC video communication)
  3. YOLO Pose Estimation (for skeletal keypoints)
  4. Gemini Multimodal LLM (reasoning over visual + text data)
  5. Deepgram (speech-to-text)
  6. ElevenLabs (text-to-speech)
  7. FastAPI (backend server)
  8. React (frontend interface)
  9. Vision Agents handled the event-driven coordination between video, audio, pose extraction, and AI reasoning — allowing me to focus on designing the wellness intelligence layer instead of managing low-level streaming and inference pipelines.

If a simple 10-second check-in can improve a student’s posture, focus, or stress awareness — it’s worth building.
Technology should reduce friction, not add to it.
This is my step toward making AI truly helpful.

Top comments (0)