๐ Clone Any Voice with Just 10 Seconds of Audio โ No Restrictions, No Gatekeepers
Tired of the corporate muzzle? ElevenLabs and its clones wonโt let you replicate anyoneโs voice but your own. This project doesnโt play by those rules. Itโs raw, local, and totally under your control. No cloud. No identity checks. Just pure voice cloning power.
If youโve got a clean 5โ10 second .wav
file, youโve got a voice model. Letโs build it.
AI voice cloner skill
this skill requires at least 2Gb RAM to run(to load the model).
The AI doesnโt need a massive datasetโjust a clean 5โ10 second .wav file of someone speaking.
outline:
- You add the cloner file and voice sample to their respective places.
- You run the cloner once to install dependencies and verify it works.
- Then the LivinGrimoire skill works as long as it correctly imports the cloner and points to the sample.
phase 1 (1 time setup) clone a model for the voice:
Step 1: Create New Project
- Open PyCharm โ New Project โ Name it
voice_cloner_app
โ Create
Step 2: Create JUST ONE FILE
Right-click project โ New โ Python File โ Name it voice_cloner.py
Paste this complete code:
import subprocess
import sys
import os
import torch
from TTS.api import TTS
class SelfInstallingVoiceCloner:
def __init__(self):
self.check_dependencies()
self.device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Loading AI model on {self.device}...")
self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
self.voice_loaded = False
def check_dependencies(self):
"""Install everything automatically if missing"""
try:
import TTS
print("All dependencies already installed.")
except ImportError:
print("Installing required packages...")
requirements = [
"torch==2.0.1+cu118",
"torchvision==0.15.2+cu118",
"torchaudio==2.0.2+cu118",
"TTS==0.20.2",
"soundfile==0.12.1",
"librosa==0.10.1"
]
for package in requirements:
subprocess.check_call([sys.executable, "-m", "pip", "install", package])
print("Installation complete. Please restart the script.")
sys.exit(1)
def load_voice(self, audio_path):
"""Load a voice from audio file"""
if not os.path.exists(audio_path):
raise FileNotFoundError(f"Audio file not found: {audio_path}")
self.reference_audio = audio_path
self.voice_loaded = True
print(f"Voice loaded: {os.path.basename(audio_path)}")
def speak(self, text, output_path="output.wav", language="en"):
"""Make cloned voice say anything"""
if not self.voice_loaded:
raise ValueError("Load a voice first with load_voice()")
self.tts.tts_to_file(
text=text,
speaker_wav=self.reference_audio,
language=language,
file_path=output_path
)
print(f"Audio saved: {output_path}")
return output_path
# Create global instance
cloner = SelfInstallingVoiceCloner()
if __name__ == "__main__":
# Demo usage
cloner.load_voice("my_voice.wav") # โ Put your audio file in project folder
cloner.speak("Voice cloning setup complete!", "test.wav")
Step 3: Add Your Voice Sample
- Get your
.wav
audio file - Drag and drop it into the PyCharm project folder
- Edit line 52 in the code: Change
"my_voice.wav"
to your actual filename
Step 4: RUN IT
Just run the file directly in PyCharm:
- Right-click
voice_cloner.py
โ Run 'voice_cloner'
What happens:
- First run: Installs all dependencies automatically
- Asks you to restart the script (just run it again)
- Second run: Loads AI model + your voice + generates test audio
optional: test the model in a new separate .py project:
from voice_cloner import cloner # The cloner is already set up and ready to use
cloner.speak("This is so much simpler", "output1.wav")
cloner.speak("No separate installation needed", "output2.wav")
phase 2: use that TTS as a livingrimoire(software design pattern) skill:
Copy voice_cloner.py
into the new project directory.
Include your voice sample (e.g., my_voice.wav
) in the same folder.
LivinGrimoireProject/
โโโ main.py
โโโ DLC/
โ โโโ DiTTS_clone.py
โ โโโ my_voice.wav
โ โโโ voice_cloner.py
import os
from voice_cloner import cloner # assuming voice_cloner is importable
from LivinGrimoire import Skill
class DiTTS_clone(Skill):
def __init__(self):
super().__init__()
self.set_skill_type(3) # continuous skill
self.set_skill_lobe(2) # output lobe
# Path to voice sample relative to this file
self.voice_sample = os.path.join(os.path.dirname(__file__), "my_voice.wav")
self.sounds_dir = os.path.join(os.path.dirname(__file__), "sounds")
os.makedirs(self.sounds_dir, exist_ok=True)
cloner.load_voice(self.voice_sample)
def input(self, ear: str, skin: str, eye: str):
if not ear:
return
filename = self.__sanitize_filename(ear)
path = os.path.join(self.sounds_dir, f"{filename}.wav")
if os.path.isfile(path):
cloner.play(path)
else:
cloner.speak(ear, path)
cloner.play(path)
def __sanitize_filename(self, txt: str) -> str:
return txt.translate(str.maketrans('', '', "?':,\n")).replace(" ", "_")
Top comments (0)