This is a simplified guide to an AI model called Tangoflux maintained by Declare-Lab. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model Overview
Created by declare-lab, TangoFlux is a text-to-audio generation model that uses flow matching and preference optimization to create high-quality audio at 44.1kHz. Building on advancements from tango, it generates audio clips up to 30 seconds long in about 3 seconds using a single A40 GPU.
Model Inputs and Outputs
The model takes text prompts and converts them into stereo audio files through a multi-stage pipeline using FluxTransformer blocks. The system learns audio patterns through pre-training, fine-tuning, and preference optimization stages.
Inputs
- Text Prompt - Description of desired audio content
- Duration - Audio length in seconds (1-30)
- Steps - Number of inference steps (1-200)
- Guidance Scale - Controls adherence to prompt (1-20)
Outputs
- Audio File - 44.1kHz stereo WAV file matching the text description
Capabilities
The system excels at generating faithfu...
Top comments (0)