DEV Community

Cover image for Clone Any Voice with Just 10 Seconds of Audio โ€” No Restrictions, No Gatekeepers
owly
owly

Posted on

Clone Any Voice with Just 10 Seconds of Audio โ€” No Restrictions, No Gatekeepers

๐Ÿ”Š Clone Any Voice with Just 10 Seconds of Audio โ€” No Restrictions, No Gatekeepers

Tired of the corporate muzzle? ElevenLabs and its clones wonโ€™t let you replicate anyoneโ€™s voice but your own. This project doesnโ€™t play by those rules. Itโ€™s raw, local, and totally under your control. No cloud. No identity checks. Just pure voice cloning power.

If youโ€™ve got a clean 5โ€“10 second .wav file, youโ€™ve got a voice model. Letโ€™s build it.


AI voice cloner skill

this skill requires at least 2Gb RAM to run(to load the model).

The AI doesnโ€™t need a massive datasetโ€”just a clean 5โ€“10 second .wav file of someone speaking.

outline:

  • You add the cloner file and voice sample to their respective places.
  • You run the cloner once to install dependencies and verify it works.
  • Then the LivinGrimoire skill works as long as it correctly imports the cloner and points to the sample.

phase 1 (1 time setup) clone a model for the voice:

Step 1: Create New Project

  1. Open PyCharm โ†’ New Project โ†’ Name it voice_cloner_app โ†’ Create

Step 2: Create JUST ONE FILE

Right-click project โ†’ New โ†’ Python File โ†’ Name it voice_cloner.py

Paste this complete code:

import subprocess
import sys
import os
import torch
from TTS.api import TTS

class SelfInstallingVoiceCloner:
    def __init__(self):
        self.check_dependencies()
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"Loading AI model on {self.device}...")
        self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
        self.voice_loaded = False

    def check_dependencies(self):
        """Install everything automatically if missing"""
        try:
            import TTS
            print("All dependencies already installed.")
        except ImportError:
            print("Installing required packages...")
            requirements = [
                "torch==2.0.1+cu118",
                "torchvision==0.15.2+cu118",
                "torchaudio==2.0.2+cu118",
                "TTS==0.20.2",
                "soundfile==0.12.1",
                "librosa==0.10.1"
            ]
            for package in requirements:
                subprocess.check_call([sys.executable, "-m", "pip", "install", package])
            print("Installation complete. Please restart the script.")
            sys.exit(1)

    def load_voice(self, audio_path):
        """Load a voice from audio file"""
        if not os.path.exists(audio_path):
            raise FileNotFoundError(f"Audio file not found: {audio_path}")
        self.reference_audio = audio_path
        self.voice_loaded = True
        print(f"Voice loaded: {os.path.basename(audio_path)}")

    def speak(self, text, output_path="output.wav", language="en"):
        """Make cloned voice say anything"""
        if not self.voice_loaded:
            raise ValueError("Load a voice first with load_voice()")
        self.tts.tts_to_file(
            text=text,
            speaker_wav=self.reference_audio,
            language=language,
            file_path=output_path
        )
        print(f"Audio saved: {output_path}")
        return output_path

# Create global instance
cloner = SelfInstallingVoiceCloner()

if __name__ == "__main__":
    # Demo usage
    cloner.load_voice("my_voice.wav")  # โ† Put your audio file in project folder
    cloner.speak("Voice cloning setup complete!", "test.wav")
Enter fullscreen mode Exit fullscreen mode

Step 3: Add Your Voice Sample

  1. Get your .wav audio file
  2. Drag and drop it into the PyCharm project folder
  3. Edit line 52 in the code: Change "my_voice.wav" to your actual filename

Step 4: RUN IT

Just run the file directly in PyCharm:

  • Right-click voice_cloner.py โ†’ Run 'voice_cloner'

What happens:

  1. First run: Installs all dependencies automatically
  2. Asks you to restart the script (just run it again)
  3. Second run: Loads AI model + your voice + generates test audio

optional: test the model in a new separate .py project:

from voice_cloner import cloner  # The cloner is already set up and ready to use

cloner.speak("This is so much simpler", "output1.wav")
cloner.speak("No separate installation needed", "output2.wav")
Enter fullscreen mode Exit fullscreen mode

phase 2: use that TTS as a livingrimoire(software design pattern) skill:

Copy voice_cloner.py into the new project directory.

Include your voice sample (e.g., my_voice.wav) in the same folder.

LivinGrimoireProject/
โ”œโ”€โ”€ main.py
โ”œโ”€โ”€ DLC/
โ”‚   โ”œโ”€โ”€ DiTTS_clone.py
โ”‚   โ”œโ”€โ”€ my_voice.wav
โ”‚   โ””โ”€โ”€ voice_cloner.py
Enter fullscreen mode Exit fullscreen mode
import os
from voice_cloner import cloner  # assuming voice_cloner is importable
from LivinGrimoire import Skill

class DiTTS_clone(Skill):
    def __init__(self):
        super().__init__()
        self.set_skill_type(3)  # continuous skill
        self.set_skill_lobe(2)  # output lobe
        # Path to voice sample relative to this file
        self.voice_sample = os.path.join(os.path.dirname(__file__), "my_voice.wav")
        self.sounds_dir = os.path.join(os.path.dirname(__file__), "sounds")
        os.makedirs(self.sounds_dir, exist_ok=True)
        cloner.load_voice(self.voice_sample)

    def input(self, ear: str, skin: str, eye: str):
        if not ear:
            return
        filename = self.__sanitize_filename(ear)
        path = os.path.join(self.sounds_dir, f"{filename}.wav")
        if os.path.isfile(path):
            cloner.play(path)
        else:
            cloner.speak(ear, path)
            cloner.play(path)

    def __sanitize_filename(self, txt: str) -> str:
        return txt.translate(str.maketrans('', '', "?':,\n")).replace(" ", "_")
Enter fullscreen mode Exit fullscreen mode

Top comments (0)