Today, I improved the transcription accuracy in my real-time transcription app with Whisper (demo available in the post from yesterday) and tried to imrpove the accuracy of the Speaker Re-identification for the Speaker Diarization feature.
To do that, I chose to use a different library for the Speaker Re-identification part.
I tried to install pykaldi but failed miserably, I chose to use DeepSpeaker but that didn't go all that well. I didn't iterate much on implementing it in my project, so I'll put more effort on that tomorrow. If it works accurately, there'll be no need to install pykaldi.
I improved the transcription accuracy by using Whisper's large model, and didn't sacrifice on execution time by running my server on Google Colab (which I had a nightmare using copying and pasting files to).
Not much for today, hoping for a finished product tomorrow! Happy coding everyone :)
Top comments (0)