This is a simplified guide to an AI model called Llasa-3b-Long maintained by Kjjk10. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model Overview
llasa-3b-long delivers zero-shot voice cloning and text-to-speech capabilities with flexible multi-speaker support. Created by kjjk10, this model stands alongside others in the voice synthesis space like kokoro-82m and f5-tts, but distinguishes itself with podcast-focused features and long-form text processing.
Model Inputs and Outputs
The model processes text and voice samples to generate synthesized speech. It handles long-form content through an efficient chunking system, making it suitable for extended audio generation tasks.
Inputs
- Voice Sample - 16kHz audio file for voice cloning reference
- Text - Content to be converted to speech
- Prompt Text - Optional guide text (auto-extracted from voice sample if not provided)
- Chunk Length - Text segment size for processing (default 250)
Outputs
- Audio File - URI to the generated speech audio
Capabilities
The system supports three model sizes (...
Top comments (0)