Let's briefly refer back to this part of our system design.
We have three parts:
- Fetch Relevant CMS data
- Fetch stream from Open AI
- Feed Web socket audio stream api with text stream from Open AI
Eleven Labs uses web socket api to enable for audio stream, open AI uses rest api to return text stream.
What is important here to achieve is that we do not want to wait for the open AI response before we can start streaming audio from eleven labs.
This means we need a way to send first chunk of Open AI stream as soon as possible to eleven labs.
But we have a problem, Open AI or any other LLM api (Groq as well) returns chunks which do not represent any meaningful words which eleven labs requires in order to work as intended.
This means that we need to buffer chunks into words from Open AI and only when we have our first word buffered we send a message to web socket of Eleven Labs and start streaming audio.
For this purpose I wrote textChunker
which is TS version of Python example Eleven Labs has in their docs. 🔗🔗🔗 Click here to see Python version
🔗🔗🔗 Click here for TextChunker code
Example of Text Chunker usage.
socket.onopen = async function (_event) {
console.log("OPEN SOCKET");
const answerSource = await getAnswerSource();
const answerChunks = await getAnswerChunks(answerSource, question);
const bosMessage = {
text: " ",
voice_settings: {
stability: 0.5,
similarity_boost: 0.5,
},
xi_api_key: process.env.ELEVEN_LABS_API_KEY,
};
socket.send(JSON.stringify(bosMessage));
for await (const text of textChunker(answerChunks)) {
socket.send(JSON.stringify({ text: text, try_trigger_generation: true }));
}
const eosMessage = {
text: "",
};
socket.send(JSON.stringify(eosMessage));
};
🔗🔗🔗 For full implementation of Open AI with Eleven Labs click here
❤️If you would like to stay it touch please feel free to connect❤️
Top comments (0)