DEV Community

Puffer
Puffer

Posted on

AI Twin — Voice Cloning with Text-to-Speech

An open-source project for creating AI voice clones using Coqui TTS (XTTS v2). This project enables you to generate natural-sounding speech in any voice by providing a sample audio file and text input.

🎯 Features

  • Voice Cloning: Clone any voice from a sample audio file
  • Multilingual Support: Works with multiple languages (English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Japanese, and more)
  • High-Quality Output: Powered by Coqui TTS XTTS v2 model for natural-sounding speech
  • Easy to Use: Simple notebook-based interface for quick voice generation
  • GPU Support: Automatically uses CUDA if available for faster processing

📋 Requirements

  • Python 3.7+
  • PyTorch
  • CUDA-capable GPU (optional, but recommended for faster processing)
  • Google Colab or Jupyter Notebook environment

🚀 Installation

  1. Clone this repository:
git clone https://github.com/yourusername/ai-twin.git
cd ai-twin
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies:
pip install -U scipy torch
Enter fullscreen mode Exit fullscreen mode
  1. Install Coqui TTS:
git clone https://github.com/idiap/coqui-ai-TTS.git
cd coqui-ai-TTS
pip install -e .
Enter fullscreen mode Exit fullscreen mode

💻 Usage

  1. Open TorTTS_API.ipynb in Jupyter Notebook or Google Colab
  2. Run the first cell to install dependencies and clone Coqui TTS
  3. Run the second cell to initialize the TTS model
  4. Upload a voice sample (MP3 or WAV format) - this will be used as the reference voice
  5. Upload a text file containing the text you want to convert to speech
  6. Run the final cell to generate the audio output
  7. Download the generated audio file

Example Workflow

# Initialize TTS
from TTS.api import TTS
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

# Generate speech
tts.tts_to_file(
    text="Your text here",
    speaker_wav="path/to/voice_sample.mp3",
    language="en",
    file_path="output.wav"
)
Enter fullscreen mode Exit fullscreen mode

📝 Supported Languages

English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Japanese, and more.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

📄 License

This project is open source and available under the MIT License.

📧 Contact

🙏 Acknowledgments

  • Coqui TTS - The amazing text-to-speech library that powers this project
  • XTTS v2 - The voice cloning model used in this project

⚠️ Disclaimer

This tool is for educational and research purposes. Please ensure you have proper authorization before cloning voices, especially for commercial use. Always respect privacy and consent when working with voice data.


Made with ❤️ by the open source community

Top comments (0)