Today, I improved the "Diarizator" I created yesterday to implement Speaker Diarization in my real-time transcription app project. Since I'm using Whisper, I optimized it to work specifically with Whisper transcriptions. Instead of finding segments of speech in recordings, identifying the speakers, linking that to the transcription text somehow and only then being able to determine what was said by the speaker, Whisper transcriptions also include the segments it transcribed. It includes each segment with a transcription.
The approach I took was to identify the speaker in those provided segments, and that was enough. I already knew what was said, I just needd to find the speaker which wasn't that hard.
It's not that accurate, but that's something I'll work on when I improve the project. Right now, what I need is a starting point where all components work together in real-time with a decent amount of accuracy that I'll improve from there.
So the Diarizator worked across multiple recordings, but not in real-time since there are still no transcriptions being made in real time. But that was enough of a simulation, so in theory, it should work. However, the short transcriptions will have shorter recordings to identify the speaker with, and I still don't know the effect this will have on accuracy.
Having completed the Diarization part, I had to move on to receiving audio in real-time and transcribing it. That was the part I got stuck on yesterday.
Today, I didn't have much luck either. To be completely honest, that was a part I tried doing completely with ChatGPT as I understood the logic but didn't want to completely write it myself. All I needed was a starting point and then I could've worked from there.
However, every time, something didn't work quite right. So tomorrow, I'll be "less productive" and write it by myself. Right now, what matters is that I achieve a real-time audio stream to the server.
Achieving good real-time audio streaming is quite important, since I plan on making this an easily usable API for developers looking to implement real-time transcription in their apps with Whisper.
This day didn't feel too productive, but those happen. Tomorrow, I'll make up for it and more! Hoping and aimingto get at least one project or simple demo on GitHub.
That's it for today,
Happy coding everyone!
Top comments (0)