owly

Posted on Sep 20

Clone Any Voice with Just 10 Seconds of Audio — No Restrictions, No Gatekeepers

#ai #clone #python #designpatterns

🔊 Clone Any Voice with Just 10 Seconds of Audio — No Restrictions, No Gatekeepers

Tired of the corporate muzzle? ElevenLabs and its clones won’t let you replicate anyone’s voice but your own. This project doesn’t play by those rules. It’s raw, local, and totally under your control. No cloud. No identity checks. Just pure voice cloning power.

If you’ve got a clean 5–10 second .wav file, you’ve got a voice model. Let’s build it.

AI voice cloner skill

this skill requires at least 2Gb RAM to run(to load the model).

The AI doesn’t need a massive dataset—just a clean 5–10 second .wav file of someone speaking.

outline:

You add the cloner file and voice sample to their respective places.
You run the cloner once to install dependencies and verify it works.
Then the LivinGrimoire skill works as long as it correctly imports the cloner and points to the sample.

phase 1 (1 time setup) clone a model for the voice:

Step 1: Create New Project

Open PyCharm → New Project → Name it voice_cloner_app → Create

Step 2: Create JUST ONE FILE

Right-click project → New → Python File → Name it voice_cloner.py

Paste this complete code:

import subprocess
import sys
import os
import torch
from TTS.api import TTS

class SelfInstallingVoiceCloner:
    def __init__(self):
        self.check_dependencies()
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"Loading AI model on {self.device}...")
        self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
        self.voice_loaded = False

    def check_dependencies(self):
        """Install everything automatically if missing"""
        try:
            import TTS
            print("All dependencies already installed.")
        except ImportError:
            print("Installing required packages...")
            requirements = [
                "torch==2.0.1+cu118",
                "torchvision==0.15.2+cu118",
                "torchaudio==2.0.2+cu118",
                "TTS==0.20.2",
                "soundfile==0.12.1",
                "librosa==0.10.1"
            ]
            for package in requirements:
                subprocess.check_call([sys.executable, "-m", "pip", "install", package])
            print("Installation complete. Please restart the script.")
            sys.exit(1)

    def load_voice(self, audio_path):
        """Load a voice from audio file"""
        if not os.path.exists(audio_path):
            raise FileNotFoundError(f"Audio file not found: {audio_path}")
        self.reference_audio = audio_path
        self.voice_loaded = True
        print(f"Voice loaded: {os.path.basename(audio_path)}")

    def speak(self, text, output_path="output.wav", language="en"):
        """Make cloned voice say anything"""
        if not self.voice_loaded:
            raise ValueError("Load a voice first with load_voice()")
        self.tts.tts_to_file(
            text=text,
            speaker_wav=self.reference_audio,
            language=language,
            file_path=output_path
        )
        print(f"Audio saved: {output_path}")
        return output_path

# Create global instance
cloner = SelfInstallingVoiceCloner()

if __name__ == "__main__":
    # Demo usage
    cloner.load_voice("my_voice.wav")  # ← Put your audio file in project folder
    cloner.speak("Voice cloning setup complete!", "test.wav")

Step 3: Add Your Voice Sample

Get your .wav audio file
Drag and drop it into the PyCharm project folder
Edit line 52 in the code: Change "my_voice.wav" to your actual filename

Step 4: RUN IT

Just run the file directly in PyCharm:

Right-click voice_cloner.py → Run 'voice_cloner'

What happens:

First run: Installs all dependencies automatically
Asks you to restart the script (just run it again)
Second run: Loads AI model + your voice + generates test audio

optional: test the model in a new separate .py project:

from voice_cloner import cloner  # The cloner is already set up and ready to use

cloner.speak("This is so much simpler", "output1.wav")
cloner.speak("No separate installation needed", "output2.wav")

phase 2: use that TTS as a livingrimoire(software design pattern) skill:

Copy voice_cloner.py into the new project directory.

Include your voice sample (e.g., my_voice.wav) in the same folder.

LivinGrimoireProject/
├── main.py
├── DLC/
│   ├── DiTTS_clone.py
│   ├── my_voice.wav
│   └── voice_cloner.py

import os
from voice_cloner import cloner  # assuming voice_cloner is importable
from LivinGrimoire import Skill

class DiTTS_clone(Skill):
    def __init__(self):
        super().__init__()
        self.set_skill_type(3)  # continuous skill
        self.set_skill_lobe(2)  # output lobe
        # Path to voice sample relative to this file
        self.voice_sample = os.path.join(os.path.dirname(__file__), "my_voice.wav")
        self.sounds_dir = os.path.join(os.path.dirname(__file__), "sounds")
        os.makedirs(self.sounds_dir, exist_ok=True)
        cloner.load_voice(self.voice_sample)

    def input(self, ear: str, skin: str, eye: str):
        if not ear:
            return
        filename = self.__sanitize_filename(ear)
        path = os.path.join(self.sounds_dir, f"{filename}.wav")
        if os.path.isfile(path):
            cloner.play(path)
        else:
            cloner.speak(ear, path)
            cloner.play(path)

    def __sanitize_filename(self, txt: str) -> str:
        return txt.translate(str.maketrans('', '', "?':,\n")).replace(" ", "_")

DEV Community