Hey everyone!
Yesterday I missed a day, long vacation day and I couldn't even look at the computer being that tired.
Today, I worked on finalizing the real-time transcription app I've been working on for the longest time now. After this project, the projects are gonna be way more flowy. I solved the concurrency problem (somewhat) by offloading the chunk receiving process (which has a blocking line while the transcription process is ongoing) using asyncio. The transcription process itself still isn't offloaded yet, but this doesn't seem to impact the receiving of audio chunks. However, I need to further test this, as when tested, the results weren't too promising. I must have missed something in the processing stage. This is what I'm going to review tomororw which is the last day of vacation, after which there'll be nothing but work and constant projects coming out.
This project is definitely demanding as many people are working on accurate speaker diarization with Whisper, it's definitely hard to achieve. There can even be more refinement in the current version that makes use of Diart, for example: At the end of the stream, re-compare all the embeddings to make sure there are no duplicates.
However, first, I gotta make sure that the transcriptions are generated smoothly which is not currently the case according to the testing I did. I'm hoping to have some time tomorrow to test this all out and release a demo, but I doubt it since I'll be out all day. Hoping to get as much as possible in!
I also changed out the print statements for logs with the logging library and made it a bit faster with language-specific transcriptions/models.
Happy coding everyone!
Top comments (0)