This is a simplified guide to an AI model called Singing_voice_conversion maintained by Lucataco. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model overview
The singing_voice_conversion model transforms any singer's voice to sound like a different target singer while maintaining the original melody and lyrics. Built on the Amphion framework using DiffWaveNetSVC technology, this model employs diverse semantic-based feature fusion to extract speaker-independent representations from source audio. Unlike simpler audio conversion tools, this implementation combines multiple pretrained models to capture complementary knowledge about melody, lyrics, and acoustic characteristics. The model supports 15 different target singers including popular artists like Taylor Swift, Adele, and Bruno Mars, as well as several Chinese vocalists. Created by lucataco, this tool offers more sophisticated voice conversion compared to basic whisperspeech-small text-to-speech systems by preserving the musical and emotional nuances of singing rather than just converting speech patterns.
Model inputs and outputs
The model processes audio files and converts the singing voice to match a selected target singer while preserving musical elements like pitch, timing, and lyrical content. Users can control various aspects of the conversion process including pitch shifting and inference quality.
Inputs
- source_audio: Input audio file containing the original singing voice to be converted
- target_singer: Selection from 15 available singers including Western artists (Taylor Swift, Adele, Beyonce, Bruno Mars, John Mayer, Michael Jackson) and Chinese vocalists (张学友, 李健, 汪峰, 王菲, 石倚洁, 蔡琴, 那英, 陈奕迅, 陶喆)
- pitch_shift_control: Choose between "Auto Shift" for automatic pitch adjustment or "Key Shift" for manual control
- key_shift_mode: Manual pitch adjustment range from -6 to +6 semitones when using Key Shift mode
- diffusion_inference_steps: Quality control parameter from 0 to 1000 steps, with higher values producing better quality but requiring more processing time
Outputs
- Audio file: Converted singing voice audio in the target singer's style while maintaining the original song structure
Capabilities
This model excels at maintaining music...
Click here to read the full guide to Singing_voice_conversion
Top comments (0)