DEV Community

Ifeanyi Idiaye
Ifeanyi Idiaye

Posted on

1

Transcription & Translation App Powered by Assembly AI & Google Gemini

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I built a web application that captures live audio recording, via a web microphone; transcribes the recording, and then translates the transcript into any of 15 languages.

Demo

https://transcribe-and-translate.netlify.app/

AudioTranscriber

Journey

I used AssemblyAI's Universal-2 Speech-to-Text model's api to transcribe the audio recording. I got the API key from my AssemblyAI account dashboard. I built an audio transcriber function, which takes an audio file and passes that to AssemblyAI's transcriber function (aai.Transcriber()), which turns the speech into text.

Along with the audio transcription, I also implemented a translation feature using Google's Gemini 1.5 pro 002 model. This feature leverages the multi-modal capability of Google Gemini models to translate the audio transcript into any of 15 languages, including Spanish, Hindi, Yoruba, and Dutch.

You can find all the code on github: https://github.com/Ifeanyi55/Transcribe-and-Translate

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more