DEV Community

abhinav pal
abhinav pal

Posted on

Building a Voice-First Assessment Platform for Visually Impaired Students with Sarvam AI

Computer-based assessments have a quiet accessibility problem. Most platforms assume the user can read text on a screen, click through options, and type their responses. For visually impaired students — particularly in India — this assumption effectively shuts them out entirely.

I wanted to fix that. Not with a workaround, but with an experience that feels native to voice from the ground up.

The Problem

Screen readers exist, but they're clunky, require separate setup, and often mispronounce Indian names, words, and sentence structures in ways that feel jarring and unnatural. The experience breaks down fast. What visually impaired Indian students actually need is a system that speaks to them the way people around them speak — in a familiar accent, at a natural pace, without sounding like a robot reading out a manual.

That's what led me to Sarvam AI.

Why Sarvam

I had tried other TTS APIs before. They worked, technically. But there was always something off — a flatness to the voice, a slightly Western lilt, a pronunciation of common Hindi-origin words that made it obvious the model had never really heard Indian English spoken naturally.

Sarvam's TTS was different. The first time I ran a test question through it, the output sounded like something a real person would say. The accent was warm and familiar — the kind of voice an Indian student would actually trust and follow without friction. That moment changed how I thought about the project. This wasn't just a convenience feature anymore. It was the core of the experience.

What I Built

The platform is a full-stack web app built with React and Tailwind on the frontend, Express.js on the backend, and PostgreSQL for storing user data and scores. The interaction model is deliberately simple. A single click anywhere on the screen triggers Sarvam TTS to read the current question aloud. A double click starts listening and transcribes the user's spoken answer using Sarvam STT. No keyboard required. No mouse precision required. Just two gestures, and the entire assessment is navigable.

For the demo, I built a stress-level detection psychometric test. Users log in, the system reads each question aloud, they speak their answer, and at the end their stress score and full response history is saved to the backend.

The Surprising Part

Integrating Sarvam's API was genuinely the smoothest part of the build. Clean endpoints, predictable responses, minimal setup. But the real surprise was how much the voice quality changed the feel of the product. A good accent is not a small detail. It is the difference between a tool a user tolerates and one they actually trust.

Indian English has its own rhythm. Sarvam's TTS captures that. For an accessibility use case where the voice is the entire interface, that matters more than almost any other technical decision.

What You Could Build Next

This stack opens up a lot of directions. A voice-first learning platform for rural students with low literacy. An Indic-language medical intake form for patients who cannot read. A voice-driven government form assistant that guides citizens through complex paperwork. An accessibility layer for any existing web app where you plug Sarvam TTS and STT on top and instantly make it usable for millions more people.

The infrastructure is simple. The impact is not. If you are building for India, Sarvam's models are worth a serious look — not because they support Indian languages, but because they actually understand how Indian users speak and listen. That is a different thing entirely.

Top comments (0)