Whisper-Groq-Transcriber: Speech-to-Text with Enhanced Features

Whisper-Groq-Transcriber is an innovative speech-to-text application that combines OpenAI's Whisper model with Groq integration to provide advanced transcription capabilities and enhanced functionality. Developed by Prateek Mohan, this cutting-edge project aims to streamline transcription tasks and boost productivity through features like flexible recording options, dynamic hotkeys, and an intuitive Gradio UI.

Key Features of Whisper-Groq-Transcriber

Utilizes OpenAI's Whisper model for accurate speech-to-text transcription directly from your microphone.

Seamlessly integrates Groq to handle JSON data and respond to queries using Groq's API

Supports various recording modes, including voice activity detection, press-to-toggle, and hold-to-record, providing flexibility to suit different preferences.
Allows customization of hotkeys for specific actions, streamlining your workflow and enhancing productivity.
Provides a user-friendly Gradio UI for easy interaction with the application, including tabs for chatting with the bot, managing hotkeys, adding URLs or PDFs, and updating context.

Main Components Overview
The main components of Whisper-Groq-Transcriber include:

transcription.py: Manages audio recording, transcription using local models or the OpenAI API, and post-processing of transcriptions.
json_handler.py: Handles loading, saving, and updating JSON data, including specialized functions for managing resume content.
status_window.py: Provides a status window using Tkinter to display recording and transcription status.
embedding_utils.py: Handles document loading, processing, chunking, and loading into a vector store.
groq_integration.py: Manages Groq API interactions, JSON data handling, and response generation.
main.py: The entry point of the application, setting up configurations, initializing models, and managing the recording and transcription process.
helpers.py: Provides utility functions for managing hotkeys, handling transcription, and interacting with the Gradio UI.

These components work together seamlessly to deliver a powerful and user-friendly speech-to-text experience with advanced features like Groq integration and dynamic hotkeys.

Screenshots and Videos

Gradio Interface showcases the main interface where users can interact with the application, including tabs for managing hotkeys, adding URLs or PDFs, and updating context.

Hotkey Creation demonstrates the process of setting up custom hotkeys for specific actions within the Gradio UI.

Changing Recording Mode displays the interface for modifying the recording mode, such as voice activity detection, press-to-toggle, or hold-to-record.

Hotkey in Action video illustrates the real-time functionality of a custom hotkey, showcasing its seamless integration with the application.

Recording Output video presents the transcribed text being automatically written to the active window, highlighting the accuracy and efficiency of the transcription process.

Use Cases for Whisper-Groq-Transcriber

Streamline meeting transcriptions, making it easier to keep track of discussions and action items.
Enhance accessibility by providing real-time transcriptions for individuals with hearing impairments during live events, meetings, or online content.
Transcribe lectures, seminars, and webinars, ensuring students and professionals don't miss any important information.
Assist content creators in transcribing their spoken content into text for editing and publishing as articles, blogs, or social media posts.
Improve customer support by transcribing customer calls and chats, enabling analysis and maintaining accurate records.
Facilitate legal and medical transcriptions, such as court proceedings and patient consultations.
Simplify research by transcribing interviews and focus group discussions for easier qualitative data analysis.
Integrate with home automation systems or other software to execute commands based on voice inputs, enhancing productivity and convenience.

DEV Community

Whisper-Groq-Transcriber: Speech-to-Text with Enhanced Features

Top comments (0)

Read next

Understanding DNS Records

Supabase Edge Functions

How and Why I Built Analyzr: A Lightweight, Real-Time Analytics Tool 🚀📊

AI Shows Promise But Struggles with Complex Pattern Recognition, Study Reveals