Ilja Nevolin

Posted on Dec 14, 2020

Speech-to-text in browser

#dohackathon #machinelearning #node #javascript

What I built

Voicer is a solution that transcribes speech to text and works in your browser (Google Chrome only).
Designed for hearing impaired people to communicate with their friends more easily. Or follow an audio conversation without any sound.

Category Submission:

Program for the People: communication assistance

App Link

https://nevolin.be/voicer/?room=dohackathon

https://voicer-jofm9.ondigitalocean.app/?room=dohackathon

Screenshots

Description

Voicer is a solution that takes your microphone input, transcribes it to text and broadcasts the text to your connected friends. It uses the Web Speech API which is currently only available in Google Chrome. It's secured through HTTPS/SSL and respects everyone's privacy, no data is stored nor shared with third-parties.

Open the app link in your Chrome browser, allow microphone access, enter your username and submit. Now you can start talking and you'll see your words/sentences appear on screen.

Link to Source Code

https://github.com/healzer/voicer

Permissive License

MIT

Background

Many months ago I was building a music bot for Discord with voice enabled controls (e.g. play next, pause, shuffle, play random, play ). That bot got some traction and I started getting attention from people with hearing conditions. Unfortunately that bot has to be configured and hosted, which may be a little too hard for non-tech people. So I started looking into simpler solutions, and so voicer was born. It only needs Google Chrome to work.

Other browsers such as Safari, Edge and FireFox have their Speech API in development, so hopefully they'll be compatible soon.

How I built it

It's purely JavaScript/jQuery/HTML on the front-end, nothing too fancy.
And NodeJS for the back-end.
It uses web sockets for server-client communication to reduce latency to the minimum.

The beautiful part is that it allows you to join "rooms", so many people can use it with just a single server running. My app runs on a basic $5 digitalocean cloud app.

I did struggle for a few minutes to get it up and running, because the port wasn't set to 8080, but that was my fault :)

Additional Resources/Info

You can use the app as is, or you can host it yourself. The server component does not store any sensitive information about the conversations. The speech-to-text part is done by Google Chrome, in your browser. The server component is nothing more than a broker for all the connected users.

You can use third-party software to keep your browser/tab stay on top of all your other windows, this way you can keep following the conversation while working/gaming. It won't work for full-screen apps (so gamers need to be in windowed mode).

DEV Community