This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.
What I Built
I built AudioWhisperer, a web application that include 5 tools to provide speech-to-text functionalities for everyday use:
Blog Tutorial Generator: Transforms audio transcripts into markdown-formatted tutorial articles using Lemur LLM. The front-end displays both the rendered and raw Markdown formats.
Fluency Analyzer: According to several research papers, a way to improve fluency and build the ability to make impromptu speeches is to practice talking about a random topic for 1–3 minutes.
This tool builds up on this theory. It generates 5 random topics, then it allows users to record or upload 1-minute audio samples, and uses Lemur LLM to analyze user's fluency, and it returns an analysis of the transcript, with pain points and suggestions for improvement.Content Moderation Tool: Uses AssemblyAI's content moderation parameter to identify potentially harmful or sensitive words. The front-end visualizes the data through a bar chart and a table.
Transcript Translation: Provides multilingual transcript translation capabilities.
Subtitle Generation: Allows users to generate, download, and create subtitle files in multiple languages.
Demo
Web link
Server code
Client code
Rendered Blog Page
Markdown Blog page
Content Analyzer Data Page
Subtitle Generator
Translation Generator
Topics Generator
Fluency Analysis
Journey
Assembly AI model's allowed me to create a user-friendly application, where I got to use the audio-to-text feature across all the 5 tools I worked on. In my code, I set the content moderation parameter to true (content_safety: true
) to detect and list out sensitive content in media files. I added disfluency tracking by setting disfluencies: true
. I also included the language_detection: true
parameter, which helped to detect the original language of audio files. This was needed for me to efficiently translate the transcript to other languages.
For generating blog content and analyzing transcript fluency, I used Lemur AI.
My use of Lemur LLM for the blog content generator and fluency analysis qualifies my submission for the "No More Monkey Business" prompt.
Top comments (0)