Skip to content

DEV Community

DHRUVA WANI

Posted on Mar 1

From Webcam to Wellness: Building a Real-Time AI Assistant for Students

#devchallenge #weekendchallenge #showdev

DEV Weekend Challenge: Community

This is a submission for the DEV Weekend Challenge: Community

The Community

This project is built for my college community.
In my college, students spend 8–12 hours daily in front of screens — coding, studying, preparing for placements, and meeting deadlines.
Over time, this leads to:

Chronic poor posture
Fatigue
Stress-related breathing patterns
Reduced physical awareness The problem? Most students don’t realize it until discomfort becomes pain.

I wanted to build something proactive — a system that checks in before the damage happens.

What I Built

I built a Real-Time AI Medical Wellness Assistant that transforms a simple video call into a quick wellness check.

In just 8–10 seconds, the assistant:

Analyzes posture (slouching, shoulder imbalance, head tilt)
Estimates breathing rate using chest movement
Detects visible fatigue indicators
Provides empathetic verbal feedback
Generates a structured PDF wellness report

It is not a medical diagnostic tool — it is a preventive awareness system designed to help students self-correct early.

The goal is simple:
Make wellness accessible, instant, and frictionless.

Demo

Code

dhruvawani17 / video-ai

AI medical Wellness Assistnant 🩺

Wellness Assistant is a real-time, AI-powered Medical Wellness Video Assistant. It provides empathetic, non-diagnostic wellness insights by analyzing a user's physical, respiratory, and emotional markers through live video feed using multimodal AI models.

Built with FastAPI, vision-agents, and WebSockets, VitalsAI acts as a proactive wellness companion, capable of observing posture, estimating breathing patterns, and providing instant, conversational voice feedback.

✨ Features

Real-Time Video Analysis: Uses WebRTC and WebSockets to process live camera feeds.
Posture & Kinematics Assessment: Leverages YOLOv11 (yolo11n-pose.pt) to detect spinal alignment, shoulder symmetry, and physical strain.
Multimodal AI Companion:
- Vision: Google Gemini & Ultralytics for visual reasoning and pose estimation.
- Speech-to-Text: Deepgram for real-time transcription.
- Text-to-Speech: ElevenLabs for a calm, clinical, and friendly voice assistant.
Live Dashboard: Real-time insights displayed in a unified HTML/JS dashboard.
Session Reports: Automatically generates a downloadable PDF summary of the wellness session.

…

The code includes:

Real-time video orchestration
Pose estimation pipeline
Multimodal AI reasoning logic
Speech-to-text and text-to-speech integration
PDF report generation

How I Built It

This project is powered by a real-time multimodal AI architecture.

Core stack:

Vision Agents SDK (agent orchestration layer)
GetStream (WebRTC video communication)
YOLO Pose Estimation (for skeletal keypoints)
Gemini Multimodal LLM (reasoning over visual + text data)
Deepgram (speech-to-text)
ElevenLabs (text-to-speech)
FastAPI (backend server)
React (frontend interface)
Vision Agents handled the event-driven coordination between video, audio, pose extraction, and AI reasoning — allowing me to focus on designing the wellness intelligence layer instead of managing low-level streaming and inference pipelines.

If a simple 10-second check-in can improve a student’s posture, focus, or stress awareness — it’s worth building.
Technology should reduce friction, not add to it.
This is my step toward making AI truly helpful.

Top comments (0)

Subscribe