Today, I finalized everything in my real-time transcription app in terms of performance.
Tomorrow, I'll design the client and make the application super configurable for a better playground experience :)
Raw transcriptions and diarizated transcriptions are offered! However, there's clearly some tradeoff for the time being. The diarizated transcriptions are only refined when there's a speaker change, whereas the raw transcriptions when there's a buffer silence (depends on the buffer duration, in my applicaiton, 2 seconds of silence). So, every 2 seconds of silence, it will refine the transcription. Of course, this could be implemented in the diarizated transcriptions too. Pretty simply, actually. I don't know if I really wanna do it, as it may affect the context, but if the user's silent, it probably means it's a good time to refine it.
However, it's a super small detail that can be added into the user's config in the client, where they will be able to choose whether they want it to refine every time there's silence or only when there's speaker changes.
I also plan to make the refiniment prompt configurable to make this an actual playground for users to experiment with real-time transcription with Whisper.
I also got my first interview on Upwork today, hoping to get the job! A very small and quick one, but a step in right direction to Freelancing :)
That's about it, happy coding everyone!
Top comments (0)