DEV Community

Tuidra
Tuidra

Posted on

Auto-Generate Lyric Music Videos from MP3 with Python, Whisper, and Tuidra

Overview

This is a Python-based tool that automatically generates lyric-synced music videos from MP3 files. By combining high-precision speech recognition via OpenAI Whisper with real-time background rendering using GLSL shaders, you can easily create music videos with synchronized lyrics.

https://github.com/tuidra/tuidra-musicvideo-maker

Image description


🎤 About Tuidra

This tool was created by Tuidra, an AI-driven virtual singer-songwriter. Tuidra represents the creative fusion of humans and AI, promoting the integration of music, visuals, and technology.

Experience a new form of creative expression, born from the collaboration of AI and humanity.


Key Features

🎵 Automatic Lyric Sync via Speech Recognition

  • High-accuracy Japanese speech recognition using OpenAI Whisper
  • Automatic matching between recognition results and user-provided lyrics
  • Improved phonetic matching accuracy using Romaji conversion

🎨 Real-time Shader Backgrounds

  • Dynamic background rendering with GLSL shaders
  • GPU acceleration via wgpu-shadertoy

🎬 Flexible Video Composition

  • Lyric overlay on existing video (Mode B)
  • New video generation with shader background (Mode A)
  • Support for watermark and title display

Technical Implementation Details

1. Speech Recognition Pipeline

Use the large model for best accuracy. For faster testing, the small model is practical and effective.

2. Lyric Matching

The system includes not only exact matches but also phonetic matching via Romaji conversion and partial matching with order constraints for natural alignment.

3. GLSL Shader Integration

Efficient GPU-based rendering enables the generation of high-resolution backgrounds for your music videos.

4. Video Composition

Videos are composed using MoviePy, including lyric clip creation, scene merging, and audio integration in one streamlined process.


How to Use

Basic Usage

./make-mv-simple.sh audio.mp3 lyrics.txt watermark.png
Enter fullscreen mode Exit fullscreen mode

Advanced Options

python video_generator.py input.mp3 \
  --mode A \
  --shader shader/animated_galaxy.glsl \
  --width 1920 \
  --height 1080 \
  --lyric-lines 3 \
  --lyric-position center \
  --watermark logo.png \
  --watermark-opacity 0.8
Enter fullscreen mode Exit fullscreen mode

Dependencies

  • Python 3.8+
  • OpenAI Whisper
  • MoviePy
  • wgpu-shadertoy
  • pykakasi
  • pandas / numpy / Pillow

Implementation Notes

  • Japanese fonts are auto-detected and used for display
  • Lyric display size is dynamically adjusted based on character count
  • Includes fallback and error handling for Whisper recognition failures

Summary

This system enables automated generation of lyric-synced music videos for platforms like YouTube and SNS, combining speech recognition and real-time visual rendering. Developed under the name Tuidra, it represents a new form of singer-songwriter powered by AI.


License

MIT License


Contributions

Pull requests are welcome. Please open an issue for major changes so we can discuss them first.


Top comments (0)