Auto-Generate Lyric Music Videos from MP3 with Python, Whisper, and Tuidra

Tuidra — Sun, 22 Jun 2025 11:11:02 +0000

Overview

This is a Python-based tool that automatically generates lyric-synced music videos from MP3 files. By combining high-precision speech recognition via OpenAI Whisper with real-time background rendering using GLSL shaders, you can easily create music videos with synchronized lyrics.

https://github.com/tuidra/tuidra-musicvideo-maker

🎤 About Tuidra

This tool was created by Tuidra, an AI-driven virtual singer-songwriter. Tuidra represents the creative fusion of humans and AI, promoting the integration of music, visuals, and technology.

X (formerly Twitter): @Tuidra_AI
YouTube: Tuidra Official Channel

Experience a new form of creative expression, born from the collaboration of AI and humanity.

Key Features

🎵 Automatic Lyric Sync via Speech Recognition

High-accuracy Japanese speech recognition using OpenAI Whisper
Automatic matching between recognition results and user-provided lyrics
Improved phonetic matching accuracy using Romaji conversion

🎨 Real-time Shader Backgrounds

Dynamic background rendering with GLSL shaders
GPU acceleration via wgpu-shadertoy

🎬 Flexible Video Composition

Lyric overlay on existing video (Mode B)
New video generation with shader background (Mode A)
Support for watermark and title display

Technical Implementation Details

1. Speech Recognition Pipeline

Use the large model for best accuracy. For faster testing, the small model is practical and effective.

2. Lyric Matching

The system includes not only exact matches but also phonetic matching via Romaji conversion and partial matching with order constraints for natural alignment.

3. GLSL Shader Integration

Efficient GPU-based rendering enables the generation of high-resolution backgrounds for your music videos.

4. Video Composition

Videos are composed using MoviePy, including lyric clip creation, scene merging, and audio integration in one streamlined process.

How to Use

Basic Usage

./make-mv-simple.sh audio.mp3 lyrics.txt watermark.png

Advanced Options

python video_generator.py input.mp3 \
  --mode A \
  --shader shader/animated_galaxy.glsl \
  --width 1920 \
  --height 1080 \
  --lyric-lines 3 \
  --lyric-position center \
  --watermark logo.png \
  --watermark-opacity 0.8

Dependencies

Python 3.8+
OpenAI Whisper
MoviePy
wgpu-shadertoy
pykakasi
pandas / numpy / Pillow

Implementation Notes

Japanese fonts are auto-detected and used for display
Lyric display size is dynamically adjusted based on character count
Includes fallback and error handling for Whisper recognition failures

Summary

This system enables automated generation of lyric-synced music videos for platforms like YouTube and SNS, combining speech recognition and real-time visual rendering. Developed under the name Tuidra, it represents a new form of singer-songwriter powered by AI.

License

MIT License

Contributions

Pull requests are welcome. Please open an issue for major changes so we can discuss them first.

DEV Community: Tuidra