DEV Community

Ifeanyi Idiaye
Ifeanyi Idiaye

Posted on

1

Transcription & Translation App Powered by Assembly AI & Google Gemini

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I built a web application that captures live audio recording, via a web microphone; transcribes the recording, and then translates the transcript into any of 15 languages.

Demo

https://transcribe-and-translate.netlify.app/

AudioTranscriber

Journey

I used AssemblyAI's Universal-2 Speech-to-Text model's api to transcribe the audio recording. I got the API key from my AssemblyAI account dashboard. I built an audio transcriber function, which takes an audio file and passes that to AssemblyAI's transcriber function (aai.Transcriber()), which turns the speech into text.

Along with the audio transcription, I also implemented a translation feature using Google's Gemini 1.5 pro 002 model. This feature leverages the multi-modal capability of Google Gemini models to translate the audio transcript into any of 15 languages, including Spanish, Hindi, Yoruba, and Dutch.

You can find all the code on github: https://github.com/Ifeanyi55/Transcribe-and-Translate

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →