DEV Community

Cover image for ๐ŸŽ™๏ธ Building a Text-to-Speech (TTS) GUI with Python
Smooth Code
Smooth Code

Posted on

๐ŸŽ™๏ธ Building a Text-to-Speech (TTS) GUI with Python

Have you ever wanted to turn text into natural-sounding speech directly from your computer? With Python, it's easier than ever! By combining Microsoft Edge's neural voices (via the edge-tts library) and Python's built-in tkinter GUI framework, we can create a simple yet powerful Text-to-Speech (TTS) application.


This project lets you input text (or upload a file), select a voice, adjust the speaking speed, and save the output as an MP3 audio file.


โœจ Features

  • ๐ŸŽค Multiple Voice Options

    Supports various neural voices such as US English, British English, Australian English, Canadian English, Spanish, and more.

  • โšก Customizable Speech Rate

    Adjust speed from -50% (slower) to +50% (faster) using a slider.

  • ๐Ÿ“ Flexible Text Input

    Enter text directly or upload a text file.

  • ๐Ÿ’พ Export as MP3

    Save the generated speech to your preferred location.

  • ๐Ÿ–ฅ๏ธ Clean GUI

    Built with tkinter, offering a simple and user-friendly interface.


Dependencies

Ensure you have Python 3.7+ installed on your system. Then, install the edge-tts module:

pip install edge-tts
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ฆ Installation

  1. Clone the repository:
git clone https://github.com/smoothcoode/edge-tts-gui
cd edge-tts-gui
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies:
pip install edge-tts
Enter fullscreen mode Exit fullscreen mode
  1. Run the application:
python main.py
Enter fullscreen mode Exit fullscreen mode

๐Ÿ—ฃ๏ธ Listing Available Voices

To see all available neural voices, run:

python -m edge_tts --list-voices
Enter fullscreen mode Exit fullscreen mode

You'll find a variety of voices you can experiment with.


๐Ÿง‘โ€๐Ÿ’ป Source Code

Here's the complete Python script that powers the application:

import edge_tts
import tkinter as tk
from tkinter import ttk, messagebox, filedialog
import asyncio

async def generate_audio(text, voice, rate, output_file):
    communicate = edge_tts.Communicate(text=text, voice=voice, rate=rate)
    await communicate.save(output_file)
    messagebox.showinfo("Success", "File saved successfully: " + output_file)

# Initialize main window
root = tk.Tk()
root.title("Text to Speech")
root.geometry("600x400")

# Voice selection
VOICES = [
    "en-US-AndrewNeural", "en-US-AriaNeural", "en-US-AshTurboMultilingualNeural",
    "en-US-AshleyNeural", "en-US-AvaMultilingualNeural", "en-US-AvaNeural"
]
ttk.Label(root, text="Select a Voice:").grid(column=0, row=0, padx=10, pady=10, sticky="w")
voice_var = tk.StringVar(value=VOICES[0])
voice_dropdown = ttk.Combobox(root, values=VOICES, textvariable=voice_var, state="readonly")
voice_dropdown.grid(row=0, column=1, padx=10, pady=10, sticky="ew")
root.columnconfigure(1, weight=3)

# Speed slider
ttk.Label(root, text="Select a speed Rate").grid(row=1, column=0, padx=10, pady=10, sticky="w")
speed_var = tk.IntVar(value=0)
speed_slider = ttk.Scale(root, from_=-50, to=50, orient="horizontal", variable=speed_var)
speed_slider.grid(row=1, column=1, padx=10, pady=10, sticky="ew")

# Text input
ttk.Label(root, text="Enter a text").grid(row=2, column=0, padx=10, pady=10, sticky="w")
text_box = tk.Text(root, wrap="word")
text_box.grid(row=2, column=1, padx=10, pady=10, sticky="nsew")
root.rowconfigure(2, weight=1)

# Upload file button
def on_upload():
    file_path = filedialog.askopenfilename(filetypes=[("Text File", "*.txt")])
    if file_path:
        with open(file_path, "r") as f:
            content = f.read()
            text_box.delete("1.0", tk.END)
            text_box.insert("1.0", content)

upload_button = ttk.Button(root, text="Upload a text", command=on_upload)
upload_button.grid(row=3, column=1, padx=10, pady=10, sticky="e")

# Generate audio
def on_generate_audio():
    voice = voice_var.get()
    rate = speed_var.get()
    rate_str = f"+{rate}%" if rate >= 0 else f"{rate}%"
    text = text_box.get("1.0", tk.END).strip()
    if not text:
        messagebox.showwarning("Warning", "No text Provided")
        return
    output_file = filedialog.asksaveasfilename(defaultextension=".mp3", filetypes=[("MP3", "*.mp3")])
    if not output_file:
        return
    asyncio.run(generate_audio(text=text, voice=voice, rate=rate_str, output_file=output_file))

generate_button = ttk.Button(root, text="Generate Audio", command=on_generate_audio)
generate_button.grid(row=4, column=1, padx=10, pady=10)

# Run the GUI
root.mainloop()
Enter fullscreen mode Exit fullscreen mode

๐Ÿš€ Conclusion

This project is a great starting point for anyone exploring Text-to-Speech applications in Python. By leveraging edge-tts and tkinter, you can create a fully functional GUI tool that makes text come alive as natural-sounding speech.

Whether you want to narrate articles, build accessibility tools, or experiment with voice synthesis, this Python TTS GUI is a practical and fun project to try out.


Top comments (0)