DEV Community

Cover image for Whisper-Groq-Transcriber: Speech-to-Text with Enhanced Features
Prateek Mohan
Prateek Mohan

Posted on

Whisper-Groq-Transcriber: Speech-to-Text with Enhanced Features

Whisper-Groq-Transcriber is an innovative speech-to-text application that combines OpenAI's Whisper model with Groq integration to provide advanced transcription capabilities and enhanced functionality. Developed by Prateek Mohan, this cutting-edge project aims to streamline transcription tasks and boost productivity through features like flexible recording options, dynamic hotkeys, and an intuitive Gradio UI.

GRADIO UI whisper app
Key Features of Whisper-Groq-Transcriber

  • Utilizes OpenAI's Whisper model for accurate speech-to-text transcription directly from your microphone.

Open ai whisper

  • Seamlessly integrates Groq to handle JSON data and respond to queries using Groq's API

Groq

  • Supports various recording modes, including voice activity detection, press-to-toggle, and hold-to-record, providing flexibility to suit different preferences.
  • Allows customization of hotkeys for specific actions, streamlining your workflow and enhancing productivity.
  • Provides a user-friendly Gradio UI for easy interaction with the application, including tabs for chatting with the bot, managing hotkeys, adding URLs or PDFs, and updating context.

Main Components Overview
The main components of Whisper-Groq-Transcriber include:

  • transcription.py: Manages audio recording, transcription using local models or the OpenAI API, and post-processing of transcriptions.
  • json_handler.py: Handles loading, saving, and updating JSON data, including specialized functions for managing resume content.
  • status_window.py: Provides a status window using Tkinter to display recording and transcription status.
  • embedding_utils.py: Handles document loading, processing, chunking, and loading into a vector store.
  • groq_integration.py: Manages Groq API interactions, JSON data handling, and response generation.
  • main.py: The entry point of the application, setting up configurations, initializing models, and managing the recording and transcription process.
  • helpers.py: Provides utility functions for managing hotkeys, handling transcription, and interacting with the Gradio UI.

These components work together seamlessly to deliver a powerful and user-friendly speech-to-text experience with advanced features like Groq integration and dynamic hotkeys.

Screenshots and Videos

  • Gradio Interface showcases the main interface where users can interact with the application, including tabs for managing hotkeys, adding URLs or PDFs, and updating context.

Image description

  • Hotkey Creation demonstrates the process of setting up custom hotkeys for specific actions within the Gradio UI.

Image description

  • Changing Recording Mode displays the interface for modifying the recording mode, such as voice activity detection, press-to-toggle, or hold-to-record.

Image description

  • Hotkey in Action video illustrates the real-time functionality of a custom hotkey, showcasing its seamless integration with the application.

Image description

  • Recording Output video presents the transcribed text being automatically written to the active window, highlighting the accuracy and efficiency of the transcription process.

Image description

Use Cases for Whisper-Groq-Transcriber

Image description

  • Streamline meeting transcriptions, making it easier to keep track of discussions and action items.
  • Enhance accessibility by providing real-time transcriptions for individuals with hearing impairments during live events, meetings, or online content.
  • Transcribe lectures, seminars, and webinars, ensuring students and professionals don't miss any important information.
  • Assist content creators in transcribing their spoken content into text for editing and publishing as articles, blogs, or social media posts.
  • Improve customer support by transcribing customer calls and chats, enabling analysis and maintaining accurate records.
  • Facilitate legal and medical transcriptions, such as court proceedings and patient consultations.
  • Simplify research by transcribing interviews and focus group discussions for easier qualitative data analysis.
  • Integrate with home automation systems or other software to execute commands based on voice inputs, enhancing productivity and convenience.

Top comments (0)