git-leo-here

Posted on Dec 22, 2024

TDoC '24 Day 6: Building a Web Interface for Vocalshift with Flask

Welcome to TDoC 2024! In Part 6, we explored how to create a web interface using the Flask framework. This interface serves as the frontend for the Voice Changer AI, enabling users to input text, upload audio files, and download processed results. This guide explains Flask fundamentals, analyzes the provided code, and helps you build your first Flask application.

What is Flask?

Flask is a lightweight web framework for Python that allows developers to build web applications quickly and efficiently. It’s an excellent choice for small to medium-sized projects.

Key Features of Flask:

Minimalistic: Keeps the core simple and lets you add features as needed.
Flexible: Provides freedom in structuring your application.
Extensible: Supports a wide range of extensions for authentication, databases, and more.

Setting Up Flask

Installation

Install Flask using pip:

pip install flask

Basic Flask App

Here’s a simple Flask application:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def home():
    return "Hello, Flask!"

if __name__ == '__main__':
    app.run(debug=True)

@app.route: Maps a URL to a specific function.
app.run(debug=True): Runs the app in debug mode for easier testing.

Implementation of Web-Interface using Flask

In this web-interface, the Flask app handles the Voice Changer AI workflow:

Receiving User Input: Accepts audio or text along with optional speaker sample audio.
Processing Input: Passes input to the Vocalshift backend.
Providing Output: Sends the generated audio back to the user.

Step 1: Configuring the Application

from flask import Flask, render_template, request, send_file, redirect, url_for, flash
from werkzeug.utils import secure_filename
import os
from main import process_tts
from vocalshift import vocal_shift

app = Flask(__name__)
app.secret_key = 'supersecretkey'
UPLOAD_FOLDER = 'uploads'
OUTPUT_FOLDER = 'output'
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
os.makedirs(OUTPUT_FOLDER, exist_ok=True)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['OUTPUT_FOLDER'] = OUTPUT_FOLDER

Uploads and Outputs: Separate directories store uploaded files and outputs.
os.makedirs(): Ensures directories exist.

Step 2: Handling the Homepage Backend

@app.route('/', methods=['GET', 'POST'])
def index():
    if request.method == 'POST':
        text = request.form.get('text')
        language = request.form.get('language', 'en')
        speaker_file = request.files.get('speaker')
        audio_file = request.files.get('audio')
        output_filename = 'output.wav'
        output_path = os.path.join(app.config['OUTPUT_FOLDER'], output_filename)

        speaker_path = None
        if speaker_file:
            speaker_filename = secure_filename(speaker_file.filename)
            speaker_path = os.path.join(app.config['UPLOAD_FOLDER'], speaker_filename)
            speaker_file.save(speaker_path)

        if audio_file:
            audio_filename = secure_filename(audio_file.filename)
            audio_path = os.path.join(app.config['UPLOAD_FOLDER'], audio_filename)
            audio_file.save(audio_path)

            success = vocal_shift(
                input_audio=audio_path,
                output_audio=output_path,
                stt_model_size='base',
                speaker=speaker_path,
                effect=None,
                effect_level=1.0
            )
        else:
            if not text:
                flash('Text is required!', 'danger')
                return redirect(url_for('index'))

            # Perform TTS conversion using main.py
            success = process_tts(text, output_path, speaker_path, language)

        if success:
            return redirect(url_for('download_file', filename=output_filename))
        else:
            flash('Conversion failed', 'danger')
            return redirect(url_for('index'))

    return render_template('index.html')

GET: Displays the homepage with the HTML form.
POST: Processes user input (text and file upload).
render_template(): Renders the HTML file for the user interface.

Step 3: File Download

@app.route('/download/<filename>')
def download_file(filename):
    return send_file(os.path.join(app.config['OUTPUT_FOLDER'], filename), as_attachment=True)

send_file(): Sends the output audio file for download.
as_attachment=True: Ensures the file is downloaded instead of played in the browser.

Also we add in the functionality to start the server if the current file is executed by Python :

if __name__ == '__main__':
    app.run(debug=True)

Creating the HTML Interface

Here’s an example index.html file for the user interface:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>VOCALSHIFT</title>
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
    <style>
        body {
            background-color: #f8f9fa;
        }
        .container {
            max-width: 600px;
            margin-top: 50px;
            padding: 20px;
            background-color: #ffffff;
            border-radius: 8px;
            box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
        }
        .progress {
            display: none;
            margin-top: 20px;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1 class="mt-3 mb-4 text-center">VOCALSHIFT</h1>
        <form method="post" enctype="multipart/form-data" id="tts-form">
            <div class="form-group">
                <label for="text">Text</label>
                <textarea class="form-control" id="text" name="text" rows="3"></textarea>
            </div>
            <div class="form-group">
                <label for="language">Language</label>
                <input type="text" class="form-control" id="language" name="language" value="en">
            </div>
            <div class="form-group">
                <label for="speaker">Speaker Voice Sample (optional)</label>
                <input type="file" class="form-control-file" id="speaker" name="speaker">
            </div>
            <div class="form-group">
                <label for="audio">Upload Audio for Transformation (optional)</label>
                <input type="file" class="form-control-file" id="audio" name="audio">
            </div>
            <button type="submit" class="btn btn-primary btn-block">Convert</button>
        </form>
        <div class="progress">
            <div class="progress-bar progress-bar-striped progress-bar-animated" role="progressbar" style="width: 100%"></div>
        </div>
        {% with messages = get_flashed_messages(with_categories=true) %}
            {% if messages %}
                <div class="mt-3">
                    {% for category, message in messages %}
                        <div class="alert alert-{{ category }}">{{ message }}</div>
                    {% endfor %}
                </div>
            {% endif %}
        {% endwith %}
    </div>
    <script src="https://code.jquery.com/jquery-3.5.1.min.js"></script>
    <script>
        $(document).ready(function() {
            $('#tts-form').on('submit', function() {
                $('.progress').show();
            });
        });
    </script>
</body>
</html>

Features:

Bootstrap Integration: For styling and responsiveness.
Form Elements: Accepts text input and optional speaker audio.
Flash Messages: Displays validation and error messages.

Running the Application

Start the Flask Server

Run the Flask app:

python app.py

Visit http://127.0.0.1:5000 in your browser to access the interface.

What We Achieved Today

By the end of Part 6, you:

Understood the basics of Flask and how to configure routes.
Built a web interface for text input and file uploads.
Integrated the TTS backend with Flask to process and serve user requests.
Provided a seamless download option for generated files.

Looking Ahead

This completes the Vocalshift project! From Python basics to building a fully functional web app, you’ve covered a lot of ground. Moving forward, consider:

Hosting: Deploy your app using platforms like Heroku or AWS.
Enhancing the UI: Use advanced frameworks like React or Vue.js.
Adding Features: Implement real-time voice playback.

Resources from Today

Your Feedback Matters!

We’d love to hear about your experience! Share your questions, suggestions, or feedback in the comments below. Let’s keep innovating! 🚀

DEV Community