Today, I worked on achieving partial transcription refinements with ChatGPT in my real-time transcription app. I refacotred the code that handles the detection of duplicate speakers as it was a bit of mess and didn't work all that well with the refinement, I re-iterated over the prompt and modified the application overall.
It's super difficult to decide how to make use of it. On one hand, I can make the application completely GPT-dependent and provide with the full transcription including the speakers so that it has a maximum amount of context and can even handle duplicate speaker detections. I could just give it the entire transcription and say "here is the transcription, Speaker 2 has been detected as a duplicate of Speaker 1, modify it" but in my opinion, this introdouces a lot of issues.
Other then that, I've noticed that ChatGPT has trouble with providing the output exactly as it was requested. It can also default to the examples you give it if it's unsure. That's why the prompt can require so many re-iterations. For reference, I'm using the "text-davinci-003" engine for this usage.
I honestly haven't checked if it's possible to get JSON answers from ChatGPT, which would allow me to keep some structure in the application and not make it an app with a single string that ChatGPT just keeps re-iterating and that I have no control over.
I have seen someone get JSON-formatted responses from ChatGPT when using langchain with GPT4, this could be worth looking into.
The thing is, I could've released this a long time ago. I just chose to make improvements and more improvements. I feel like the quality matters more than the time it takes to make the project. However, the quality of the project probably won't satisfy me, there are many things that are out of my control like lack of context for the ASR when working with buffers that are too short in duration, mistranscriptions and incorrect contextual understanding on ChatGPT's part (which my promopt would be partially responsible for). Those things might not happen at all, but they could. I want to release a version that's just enough in quality so that it could be improved. To get that first commit and then improve it from there.
Tomorrow, I'll finalize my approach for the raw and diarizated transcriptions with the ChatGPT transcription refinement, and look into what I mentioned above to see how the application could be improved in the future (JSON-formatted responses).
That's it for today,
Happy coding everyone!
Top comments (0)