This is a great breakdown of the Web Speech API! In our project, MindCare AI, we found that the biggest hurdle for an AI counselor wasn't the LLM logic, but the 'human feel' of the voice response. We had to optimize our WebSocket streaming to ensure that the AI doesn't have that awkward 3-second 'thinking' delay, which is especially important in a sensitive counseling context. One thing we discovered is that chunking the TTS output significantly improves perceived empathy because the user isn't left in silence. Have you experimented with any specific streaming libraries to reduce the 'robotic' pause between sentences?
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)