Overview
This is a Python-based tool that automatically generates lyric-synced music videos from MP3 files. By combining high-precision speech recognition via OpenAI Whisper with real-time background rendering using GLSL shaders, you can easily create music videos with synchronized lyrics.
https://github.com/tuidra/tuidra-musicvideo-maker
🎤 About Tuidra
This tool was created by Tuidra, an AI-driven virtual singer-songwriter. Tuidra represents the creative fusion of humans and AI, promoting the integration of music, visuals, and technology.
- X (formerly Twitter): @Tuidra_AI
- YouTube: Tuidra Official Channel
Experience a new form of creative expression, born from the collaboration of AI and humanity.
Key Features
🎵 Automatic Lyric Sync via Speech Recognition
- High-accuracy Japanese speech recognition using OpenAI Whisper
- Automatic matching between recognition results and user-provided lyrics
- Improved phonetic matching accuracy using Romaji conversion
🎨 Real-time Shader Backgrounds
- Dynamic background rendering with GLSL shaders
- GPU acceleration via wgpu-shadertoy
🎬 Flexible Video Composition
- Lyric overlay on existing video (Mode B)
- New video generation with shader background (Mode A)
- Support for watermark and title display
Technical Implementation Details
1. Speech Recognition Pipeline
Use the large model for best accuracy. For faster testing, the small model is practical and effective.
2. Lyric Matching
The system includes not only exact matches but also phonetic matching via Romaji conversion and partial matching with order constraints for natural alignment.
3. GLSL Shader Integration
Efficient GPU-based rendering enables the generation of high-resolution backgrounds for your music videos.
4. Video Composition
Videos are composed using MoviePy, including lyric clip creation, scene merging, and audio integration in one streamlined process.
How to Use
Basic Usage
./make-mv-simple.sh audio.mp3 lyrics.txt watermark.png
Advanced Options
python video_generator.py input.mp3 \
--mode A \
--shader shader/animated_galaxy.glsl \
--width 1920 \
--height 1080 \
--lyric-lines 3 \
--lyric-position center \
--watermark logo.png \
--watermark-opacity 0.8
Dependencies
- Python 3.8+
- OpenAI Whisper
- MoviePy
- wgpu-shadertoy
- pykakasi
- pandas / numpy / Pillow
Implementation Notes
- Japanese fonts are auto-detected and used for display
- Lyric display size is dynamically adjusted based on character count
- Includes fallback and error handling for Whisper recognition failures
Summary
This system enables automated generation of lyric-synced music videos for platforms like YouTube and SNS, combining speech recognition and real-time visual rendering. Developed under the name Tuidra, it represents a new form of singer-songwriter powered by AI.
License
MIT License
Contributions
Pull requests are welcome. Please open an issue for major changes so we can discuss them first.

Top comments (0)