DEV Community

Cover image for Sophisticated Speech-to-Text Submission Template, The AssemblyAI challenge.
Mercy
Mercy

Posted on

9 3 3 3 3

Sophisticated Speech-to-Text Submission Template, The AssemblyAI challenge.

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

A Speech-to-Text Transcription Web Application using Flask for the backend and AssemblyAI's API for real-time audio transcription. The frontend, built with HTML, CSS, and jQuery, offers an interactive interface for users to control the transcription process and view transcribed text in real-time.

Demo

Here is the link to my app

Image description

Image description

Image description

Journey

Key Features

Real-Time Transcription:

  • Utilizes AssemblyAI's real-time API to process live audio input from the user's microphone and convert it to text.
  • Supports both partial and final transcripts.

Web Interface:

  • Clean and intuitive design with buttons to start and stop transcription.
  • Displays the transcribed text dynamically in a formatted
     block.

Flask Backend:

  • Handles routes for starting (/start), stopping (/stop), and retrieving the transcript (/transcript).
  • Runs transcription in a separate thread to ensure non-blocking operations.

Polling Mechanism:

  • Implements a JavaScript-based polling system using jQuery to fetch the latest transcribed text every second.

Customizable Word Boost:

  • Boosts recognition accuracy for specific words like "AWS," "Azure," and "Google Cloud."

Responsive Design:

  • Ensures usability across devices with a centralized, easy-to-use layout.

Technology Stack

Backend:

  • Python (Flask): Manages the web server and API interactions.
  • AssemblyAI API: Handles speech-to-text transcription.
import assemblyai as aai
from flask import Flask, render_template, jsonify
import os
from dotenv import load_dotenv
import threading

app = Flask(__name__)
load_dotenv()

aai.settings.api_key = os.getenv('API_KEY')

transcriber = None
transcribed_text = ""

def on_open():
    print("Transcription started!")

def on_data(transcript: aai.RealtimeTranscript):
    global transcribed_text
    if not transcript.text:
        return

    if isinstance(transcript, aai.RealtimeFinalTranscript):
        transcribed_text += transcript.text + "\n"
        print("Transcribed:", transcript.text)  # Verify text here
    else:
        print("Received partial:", transcript.text)


def on_error(error):
    print("Error:", error)

def on_close():
    print("Transcription stopped!")

def start_transcription():
    global transcriber
    microphone_stream = aai.extras.MicrophoneStream(sample_rate=16_000)
    transcriber = aai.RealtimeTranscriber(
        encoding=aai.AudioEncoding.pcm_mulaw,
        sample_rate=16_000,
        word_boost=["aws", "azure", "google cloud"],
        end_utterance_silence_threshold=500,
        on_open=on_open,
        on_data=on_data,
        on_error=on_error,
        on_close=on_close,
    )

    for audio_data in microphone_stream:
        if transcriber is not None:
            transcriber.stream(audio_data)
        else:
            break

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/start')
def start():
    global transcribed_text
    transcribed_text = ""  # Clear previous transcript
    threading.Thread(target=start_transcription).start()
    return jsonify({"message": "Transcription started!"})


@app.route('/stop')
def stop():
    global transcriber
    if transcriber is not None:
        transcriber.close()
        transcriber = None
        print("Transcriber closed")
    return jsonify({"message": "Transcription stopped!"})

@app.route('/transcript')
def transcript():
    global transcribed_text
    return jsonify({"transcript": transcribed_text})


if __name__ == "__main__":
    app.run(debug=True)
Enter fullscreen mode Exit fullscreen mode

Frontend:

  • HTML & CSS: Provides structure and styling for the user interface.
  • jQuery: Handles AJAX requests for starting, stopping, and polling the transcription.
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Speech to Text App</title>
    <script src="https://code.jquery.com/jquery-3.5.1.min.js"></script>
    <style>
        body {
            margin: 0;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh; /* Full viewport height */
            font-family: Arial, sans-serif;
            background-color: #f4f4f4; /* Light background for better readability */
        }

        #container {
            text-align: center;
            background: #ffffff;
            padding: 20px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
            border-radius: 8px;
        }

        button {
            margin: 10px;
            padding: 10px 20px;
            font-size: 16px;
            border: none;
            border-radius: 5px;
            background-color: #007bff;
            color: white;
            cursor: pointer;
        }

        button:hover {
            background-color: #0056b3;
        }

        pre {
            padding: 10px;
            background-color: #e9ecef;
            border-radius: 5px;
            overflow: auto;
        }
    </style>
</head>
<body>
    <div id="container">
        <h1>Speech-to-Text Transcription</h1>
        <button id="start">Start Transcription</button>
        <button id="stop">Stop Transcription</button>
        <h2>Transcribed Text:</h2>
        <pre id="transcript"></pre>
    </div>

    <script>
        $(document).ready(function() {
            let pollInterval; // Variable to hold the interval ID

            // Start transcription
            $('#start').click(function() {
                $.get('/start', function(data) {
                    console.log(data.message);

                    // Start polling for transcripts if not already polling
                    if (!pollInterval) {
                        pollInterval = setInterval(function() {
                            $.ajax({
                                type: 'GET',
                                url: '/transcript',
                                dataType: 'json',
                                success: function(data) {
                                    console.log(data);
                                    if (data && data.transcript) {
                                        $('#transcript').text(data.transcript);
                                    } else {
                                        $('#transcript').text('No transcription available yet.');
                                    }
                                },
                                error: function(err) {
                                    console.error('Error fetching transcript:', err);
                                }
                            });
                        }, 1000);
                    }
                });
            });

            // Stop transcription
            $('#stop').click(function() {
                $.get('/stop', function(data) {
                    console.log(data.message);

                    // Stop polling for transcripts
                    if (pollInterval) {
                        clearInterval(pollInterval);
                        pollInterval = null; // Reset the interval variable
                    }
                });
            });
        });
    </script>

</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Audio Input:

  • AssemblyAI's MicrophoneStream: Streams audio data for real-time processing.

I utilized additional prompts to enhance the project. I employed the #FlaskWebFramework for rendering templates and returning JSON responses, and I used the #dotenv library to load environment variables from the env file. On the frontend, I implemented CSS for styling the user interface.

Lastly, I want to thank my team, @devnenyasha, and @lindiwe09, for their UI idea. If not for them my UI would have been a mess.

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay