Hey everyone!
Finally had some more time to work on my project during this vacation, worked on debugging the concurrency issue and I figured out the underlying cause.
Apparently, I cannot add new items to the stream (RxPy based) while the pipeline is executing, and it takes a loooong time to execute for every buffer because of the transcription taking a long time to generate.
I've explored some options but none of them address this specific use case, and I don't want to buffer and create a separate thread for every buffer to transcribe it. This can lead to unnecessary complexities in my opinion. For now, I'll be exploring the option of creating an audio chunk queue (buffer) for every client, a thread that always adds the queueing chunks once the transcription for the previous buffer is done and that's it. I think that should do the job for now.
Later on, I need to address implementing faster-whisper in this project for performance, and that should be about it for this version of the project that's all about making use of Diart to demonstrate how speaker diarization can be used with Whisper. A lot of the credit goes to Juan, the creator of Diart, who made an article that addresses this specific use case and has saved me tons of time with thinking the transcription logic through. He used concepts that I didn't think about using with my own application and he did it a loooot better.
That's it for today, hoping to finalize this tomorrow!
Top comments (0)