Building a Voice-First Assessment Platform for Visually Impaired Students with Sarvam AI

#a11y #ai #learning #showdev

Computer-based assessments have a quiet accessibility problem. Most platforms assume the user can read text on a screen, click through options, and type their responses. For visually impaired students — particularly in India — this assumption effectively shuts them out entirely.

I wanted to fix that. Not with a workaround, but with an experience that feels native to voice from the ground up.

The Problem

Screen readers exist, but they're clunky, require separate setup, and often mispronounce Indian names, words, and sentence structures in ways that feel jarring and unnatural. The experience breaks down fast. What visually impaired Indian students actually need is a system that speaks to them the way people around them speak — in a familiar accent, at a natural pace, without sounding like a robot reading out a manual.

That's what led me to Sarvam AI.

Why Sarvam

I had tried other TTS APIs before. They worked, technically. But there was always something off — a flatness to the voice, a slightly Western lilt, a pronunciation of common Hindi-origin words that made it obvious the model had never really heard Indian English spoken naturally.

Sarvam's TTS was different. The first time I ran a test question through it, the output sounded like something a real person would say. The accent was warm and familiar — the kind of voice an Indian student would actually trust and follow without friction. That moment changed how I thought about the project. This wasn't just a convenience feature anymore. It was the core of the experience.

What I Built

The platform is a full-stack web app built with React and Tailwind on the frontend, Express.js on the backend, and PostgreSQL for storing user data and scores. The interaction model is deliberately simple. A single click anywhere on the screen triggers Sarvam TTS to read the current question aloud. A double click starts listening and transcribes the user's spoken answer using Sarvam STT. No keyboard required. No mouse precision required. Just two gestures, and the entire assessment is navigable.

For the demo, I built a stress-level detection psychometric test. Users log in, the system reads each question aloud, they speak their answer, and at the end their stress score and full response history is saved to the backend.

The Surprising Part

Integrating Sarvam's API was genuinely the smoothest part of the build. Clean endpoints, predictable responses, minimal setup. But the real surprise was how much the voice quality changed the feel of the product. A good accent is not a small detail. It is the difference between a tool a user tolerates and one they actually trust.

Indian English has its own rhythm. Sarvam's TTS captures that. For an accessibility use case where the voice is the entire interface, that matters more than almost any other technical decision.

What You Could Build Next

This stack opens up a lot of directions. A voice-first learning platform for rural students with low literacy. An Indic-language medical intake form for patients who cannot read. A voice-driven government form assistant that guides citizens through complex paperwork. An accessibility layer for any existing web app where you plug Sarvam TTS and STT on top and instantly make it usable for millions more people.

The infrastructure is simple. The impact is not. If you are building for India, Sarvam's models are worth a serious look — not because they support Indian languages, but because they actually understand how Indian users speak and listen. That is a different thing entirely.

Top comments (1)

Luis Cruz • Jun 11

This is a fantastic example of accessibility-first design meeting practical AI implementation. You’ve clearly identified a real gap — visually impaired students in India need an experience that’s naturally voice-driven, not just a screen reader workaround. The way you built the platform with just two gestures and leveraged Sarvam AI for TTS/STT shows an elegant balance of simplicity, usability, and technical robustness. Highlighting how the accent and rhythm of Indian English fundamentally change trust and engagement is a powerful insight that many developers overlook.
I’d love to learn more about user feedback. Have you had a chance to test the platform with actual visually impaired students? I’d be happy to help brainstorm additional use cases or ways to expand the voice-first experience across other accessibility scenarios.