This is a simplified guide to an AI model called Stable-Audio-2 maintained by Ardianfe. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model overview
stable-audio-2 is a music generation model created by Replicate user ardianfe. It builds upon the capabilities of similar models like stable-audio-prod, music-gen-fn-200e, and stable-audio-open-1.0 to generate short audio samples, sound effects, and production elements using text prompts. The model is trained on a large dataset of music and audio to develop its generation capabilities.
Model inputs and outputs
stable-audio-2 takes in a text prompt that describes the desired audio output, along with some additional parameters like audio length, sampling rate, and batch size. It then generates the corresponding audio clip, which can be saved in either WAV or MP3 format.
Inputs
- Prompt: A text description of the desired audio output
- Seconds Total: The total length of the generated audio in seconds
- Seconds Start: The starting time point for the generated audio
- Cfg Scale: The "classifier-free guidance scale", which controls the balance between the prompt and the model's own generation
- Steps: The number of sampling steps to use during generation
- Sampler Type: The type of diffusion sampler to use, such as "dpmpp-3m-sde"
- Sigma Min: The minimum noise level for the diffusion process
- Sigma Max: The maximum noise level for the diffusion process
- Init Noise Level: The initial noise level for the diffusion process
- Batch Size: The number of audio clips to generate at once
- Output Format: The file format for the generated audio, either WAV or MP3
- Song Id: An optional identifier to use when saving the generated audio
Outputs
- Audio Clip: The generated audio clip in the requested format (WAV or MP3)
Capabilities
stable-audio-2 can generate a variet...
Top comments (0)