🌟 Speech Assistant: AI-Powered Audio-to-Text & MoM Generator

#python #openai #product

Hi Everyone! 👋

I’m excited to share a new project I’ve been working on: Speech Assistant. This application leverages cutting-edge technologies like Generative AI, OpenAI, and Streamlit to simplify audio-to-text conversion and automate the creation of Minutes of Meeting (MoM).

With Speech Assistant, you can effortlessly convert meeting recordings, call logs, or any audio file into text in multiple languages. It doesn’t stop there—it can generate insightful MoMs with key points, action items, and sentiment analysis, and even transform text back into audio!

You Check out this app here - Smart-Speech-Assistant

In this article, I’ll walk you through the features, tech stack, and how you can use or contribute to the project. Let’s dive in! 🚀

Speech Assistant is a cutting-edge tool powered by Generative AI to help you transform your audio files into actionable insights. The project leverages Python, OpenAI, and Streamlit to provide seamless audio-to-text conversion, text-to-audio synthesis, and automated Minutes of Meeting (MoM) creation.

✨ Features

🎙️ Audio-to-Text Conversion: Convert meetings, call recordings, and other audio files into text in multiple languages.
🔁 Text-to-Audio: Convert text back into audio in any language.
📝 Minutes of Meeting Generator: Automatically generate MoM for your audio recordings, including:
- Call Sentiments Analysis
- Key Points
- Summaries
- Action Points
📥 Download Options: Export MoM as a downloadable file for easy sharing.
🌍 Multilingual Support: Handle multiple languages for both audio and text.

🛠️ Tech Stack

Language: Python
Frontend: Streamlit
Backend: OpenAI APIs for Generative AI tasks
Libraries:
- Speech-to-text: OpenAI Whisper
- Text-to-speech: OpenAI
- Sentiment Analysis: OpenAI GPT
- File Handling: Pandas, OS

🚀 Getting Started

Prerequisites

Python 3.8+
OpenAI API Key
Install dependencies using:

   pip install -r requirements.txt  

### Installation  

1. Clone this repository:

bash

git clone https://github.com/r123singh/speech-assistant.git

cd speech-assistant


2. Run the Streamlit application:

bash

streamlit run app.py




### Usage  

1. Upload an audio file (supported formats: `.wav`, `.mp3`, `.m4a`).  
2. Select the desired operation:  
   - Convert audio to text.  
   - Generate MoM.  
   - Convert text to audio.  
3. View and download the generated outputs.  

## 📂 Project Structure  

speech-assistant/  
│  
├── main.py               # Main Streamlit application  
├── requirements.txt     # Python dependencies  
├── utils.py/               # Utility functions for processing  
├── assets/              # Example input/output files  
└── README.md            # Project documentation  

## 📖 Example  

### Input:  

- **Audio File**: Team meeting recording.  

### Output:  

- **Transcription**:  
  Welcome to our Q4 planning meeting. Today we'll discuss key objectives and allocate action items...  

- **Generated MoM**:  

  - **Call Sentiments**: Positive with constructive feedback.  
  - **Summary**: Discussed Q4 objectives, marketing strategy, and resource allocation.  
  - **Action Points**:  
    - Complete budget analysis by Dec 20.  
    - Finalize campaign design by Jan 5.  

## 🤝 Contributing  

We welcome contributions! Please follow these steps:  

1. Fork the repository.  
2. Create a feature branch: `git checkout -b feature-name`.  
3. Commit your changes: `git commit -m 'Add some feature'`.  
4. Push to the branch: `git push origin feature-name`.  
5. Open a pull request.  

⚡ **Transform your conversations into insights with Speech Assistant!**

DEV Community