DEV Community

Tahsin Abrar
Tahsin Abrar

Posted on

Speech Recognition from MP3 to Text

This project demonstrates how to convert an MP3 audio file to a WAV format and then use Google Speech Recognition to transcribe the audio into Bengali text. The code utilizes the SpeechRecognition and pydub libraries to handle audio processing.

Table of Contents

Installation

Before running the code, ensure you have the following packages installed. You can do this using pip in a Google Colab environment:

!pip install SpeechRecognition pydub
!apt-get install ffmpeg
Enter fullscreen mode Exit fullscreen mode

Usage

  1. Upload Your MP3 File: When prompted by the script, upload an MP3 audio file.
  2. Conversion: The script will automatically convert the MP3 file to WAV format.
  3. Transcription: It will then use Google Speech Recognition to transcribe the audio into Bengali text.
  4. Output: The transcribed text will be printed in the output section of the notebook.

Example

Simply run the script, upload an MP3 file, and view the printed output text in Bengali.

Code Explanation

Here's a breakdown of the code:

  1. Library Imports:

    • The necessary libraries are imported: files for file upload, AudioSegment from pydub for audio processing, and speech_recognition for speech-to-text conversion.
  2. File Upload:

    • The script allows users to upload an MP3 file using the files.upload() method.
  3. Audio Conversion:

    • The uploaded MP3 file is converted to WAV format using AudioSegment.from_mp3() and then exported.
  4. Speech Recognition:

    • The speech_recognition library is utilized to recognize speech from the audio file.
    • The audio data is read, and the recognize_google() method is called to transcribe the audio into Bengali text.
    • The recognized text is printed, or an error message is shown if the audio is not understood.

Reference Code

Here is the full reference code for the project:

# Import necessary libraries
from google.colab import files
import os
from pydub import AudioSegment
import speech_recognition as sr

# Upload an MP3 file
uploaded = files.upload()

# Convert MP3 to WAV
mp3_file = next(iter(uploaded))
wav_file = "converted.wav"

# Load the MP3 file
audio = AudioSegment.from_mp3(mp3_file)
# Export as WAV
audio.export(wav_file, format="wav")

# Initialize the recognizer
recognizer = sr.Recognizer()

# Perform speech recognition on the WAV file
with sr.AudioFile(wav_file) as source:
    audio_data = recognizer.record(source)
    try:
        # Recognize speech using Google Speech Recognition
        text = recognizer.recognize_google(audio_data, language='bn-BD')
        print("Bengali Text:", text)
    except sr.UnknownValueError:
        print("Sorry, I could not understand the audio.")
    except sr.RequestError as e:
        print(f"Could not request results from Google Speech Recognition service; {e}")
Enter fullscreen mode Exit fullscreen mode

Requirements

  • Python 3.x
  • Google Colab (for easy execution of the code)
  • Audio file in MP3 format

Top comments (0)