This is a submission for the DEV Weekend Challenge: Community
The Community
This project is built for my college community.
In my college, students spend 8–12 hours daily in front of screens — coding, studying, preparing for placements, and meeting deadlines.
Over time, this leads to:
- Chronic poor posture
- Fatigue
- Stress-related breathing patterns
- Reduced physical awareness The problem? Most students don’t realize it until discomfort becomes pain.
I wanted to build something proactive — a system that checks in before the damage happens.
What I Built
I built a Real-Time AI Medical Wellness Assistant that transforms a simple video call into a quick wellness check.
In just 8–10 seconds, the assistant:
- Analyzes posture (slouching, shoulder imbalance, head tilt)
- Estimates breathing rate using chest movement
- Detects visible fatigue indicators
- Provides empathetic verbal feedback
- Generates a structured PDF wellness report
It is not a medical diagnostic tool — it is a preventive awareness system designed to help students self-correct early.
The goal is simple:
Make wellness accessible, instant, and frictionless.
Demo
Code
AI medical Wellness Assistnant 🩺
Wellness Assistant is a real-time, AI-powered Medical Wellness Video Assistant. It provides empathetic, non-diagnostic wellness insights by analyzing a user's physical, respiratory, and emotional markers through live video feed using multimodal AI models.
Built with FastAPI, vision-agents, and WebSockets, VitalsAI acts as a proactive wellness companion, capable of observing posture, estimating breathing patterns, and providing instant, conversational voice feedback.
✨ Features
- Real-Time Video Analysis: Uses WebRTC and WebSockets to process live camera feeds.
-
Posture & Kinematics Assessment: Leverages YOLOv11 (
yolo11n-pose.pt) to detect spinal alignment, shoulder symmetry, and physical strain. -
Multimodal AI Companion:
- Vision: Google Gemini & Ultralytics for visual reasoning and pose estimation.
- Speech-to-Text: Deepgram for real-time transcription.
- Text-to-Speech: ElevenLabs for a calm, clinical, and friendly voice assistant.
- Live Dashboard: Real-time insights displayed in a unified HTML/JS dashboard.
- Session Reports: Automatically generates a downloadable PDF summary of the wellness session.
The code includes:
- Real-time video orchestration
- Pose estimation pipeline
- Multimodal AI reasoning logic
- Speech-to-text and text-to-speech integration
- PDF report generation
How I Built It
This project is powered by a real-time multimodal AI architecture.
Core stack:
- Vision Agents SDK (agent orchestration layer)
- GetStream (WebRTC video communication)
- YOLO Pose Estimation (for skeletal keypoints)
- Gemini Multimodal LLM (reasoning over visual + text data)
- Deepgram (speech-to-text)
- ElevenLabs (text-to-speech)
- FastAPI (backend server)
- React (frontend interface)
- Vision Agents handled the event-driven coordination between video, audio, pose extraction, and AI reasoning — allowing me to focus on designing the wellness intelligence layer instead of managing low-level streaming and inference pipelines.
If a simple 10-second check-in can improve a student’s posture, focus, or stress awareness — it’s worth building.
Technology should reduce friction, not add to it.
This is my step toward making AI truly helpful.
Top comments (0)