DEV Community

Alex
Alex

Posted on

Automating Faceless Shorts Videos for YouTube and TikTok Using OpenAI and ElevenLabs

Creating short videos for YouTube and TikTok can be time-consuming, but with the right tools, you can automate the process. In this guide, we’ll show you how to use OpenAI, ElevenLabs, and MoviePy to automatically generate faceless videos from a script—no camera or microphone required.

Let’s break it down step-by-step.

1. Setting Up the APIs

You’ll need API keys for OpenAI (for generating images) and ElevenLabs (for voiceovers). Get these from their respective websites.

import openai
from elevenlabs import ElevenLabs

openai.api_key = "your_openai_api_key"
elevenlabs_client = ElevenLabs(api_key="your_elevenlabs_api_key")
Enter fullscreen mode Exit fullscreen mode

Replace "your_openai_api_key" and "your_elevenlabs_api_key" with your actual API keys.

2. Prepare the Script

Your video content starts with a script. For example, here’s a quick one about Dogecoin:

story_script = """
Dogecoin began as a joke in 2013, inspired by the popular 'Doge' meme. It eventually evolved into a legitimate cryptocurrency with support from figures like Elon Musk.
"""
Enter fullscreen mode Exit fullscreen mode

This script will be used to generate images and voiceovers for each sentence.

3. Generate Images with OpenAI’s DALL-E

For each sentence in your script, we’ll generate a corresponding image. Here’s how you can do it:

def generate_image_from_text(sentence, context, idx):
    prompt = f"Generate an image that describes: {sentence}. Context: {context}"
    response = openai.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1792",
        response_format="b64_json"
    )

    image_filename = f"images/image_{idx}.jpg"
    with open(image_filename, "wb") as f:
        f.write(base64.b64decode(response.data[0].b64_json))
    return image_filename
Enter fullscreen mode Exit fullscreen mode

This function takes each sentence and generates an image that best matches the description.

4. Create Voiceovers with ElevenLabs

Next, we’ll generate voiceovers for each sentence using ElevenLabs.

def generate_audio_from_text(sentence, idx):
    audio = elevenlabs_client.text_to_speech.convert(
        voice_id="pqHfZKP75CvOlQylNhV4",
        model_id="eleven_multilingual_v2",
        text=sentence,
        voice_settings=VoiceSettings(stability=0.2, similarity_boost=0.8)
    )
    audio_filename = f"audio/audio_{idx}.mp3"
    with open(audio_filename, "wb") as f:
        for chunk in audio:
            f.write(chunk)
    return audio_filename
Enter fullscreen mode Exit fullscreen mode

This converts each sentence into a corresponding voiceover file.

5. Sync Images and Audio Using MoviePy

Now, we’ll combine the images and audio into video clips. Here’s how:

from moviepy.editor import ImageClip, AudioFileClip

image_clip = ImageClip(image_path, duration=audio_clip.duration)
image_clip = image_clip.set_audio(audio_clip)
video_clips.append(image_clip.set_fps(30))
Enter fullscreen mode Exit fullscreen mode

Each image will be displayed for the duration of its associated audio.

6. Add Video Effects

To make the video more engaging, we’ll apply effects like zoom and fade. Here’s a basic zoom-in effect:

def apply_zoom_in_center(image_clip, duration):
    return image_clip.resize(lambda t: 1 + 0.04 * t)
Enter fullscreen mode Exit fullscreen mode

These effects keep the visuals dynamic and interesting without too much effort.

7. Assemble the Final Video

Once all the clips are ready, we’ll concatenate them into a single video.

final_video = concatenate_videoclips(video_clips, method="compose")
final_video.write_videofile(output_video_path, codec="libx264", audio_codec="aac", fps=30)
Enter fullscreen mode Exit fullscreen mode

This outputs your final video, ready for upload.

8. Add Captions (Optional)

Captions make videos more accessible. We use Captacity to automatically add them.

captacity.add_captions(
    video_file=output_video_path,
    output_file="captioned_video.mp4",
    font_size=130,
    font_color="yellow"
)
Enter fullscreen mode Exit fullscreen mode

9. Add Background Music

To finish, we’ll add background music to the video. The music is downloaded and synced with the video’s length.

background_music = AudioFileClip(music_filename).subclip(0, final_video.duration).volumex(0.2)
narration_audio = final_video.audio.volumex(1.5)
combined_audio = CompositeAudioClip([narration_audio, background_music])
final_video.set_audio(combined_audio)
Enter fullscreen mode Exit fullscreen mode

See the GitHub Project for this post!

This process powers our Faceless Shorts Video service on Robopost, where we generate short-form videos automatically. By leveraging OpenAI for visuals and ElevenLabs for narration, we’ve created an efficient, scalable system for producing content without manual editing.

Now, you can create high-quality, faceless videos for YouTube or TikTok without spending hours in front of a camera. This approach works for educational videos, storytelling, or viral content—whatever suits your needs.

Top comments (0)