Hey everyone!
Today I finished everything I believe needs to be in the initial upgrade of the real-time transcription app project I've been working on. Hoping it gets merged!
Took yesterday off because of some math homework, will probably happen some more during this month.
I implemented "sequential" transcription, meaning the entire audio data will be re-transcribed every set amount of seconds. It allows for constant refinement and way more accuracy in the transcriptions. The speaker diarization isn't all that good though, since I'm simply using pyannote's diarization pipeline. The occasional speaker-swapping may occur, where Speaker 1 becomes Speaker 2 and so on. This is because I don't have any control over the speakers, I'd have to use pyannote's building blocks to do it. I also refactored the entire codebase to accompany this change. That's it! Hoping to finally move on to more exciting projects and keep this one a maintenance and usual improvement one :)
Happy coding!
Top comments (0)