Not a bad idea, funny enough we talked about a similar thing at work a couple weeks back, for generating transcripts from meetings in any app (discourse, zoom, etc...). As to keep a written version of the meeting for referencing later.
One thing that we though might be a bit complex was how to handle and differenciate different people talking, and transcribe it properly.
Instead of getting:
Yeah what do you mean i was talking about roger oh okay
We would've liked to get something like:
P1: Yeah
P2: what do you mean?
P3: I was talking about roger
P2: Oh, okay
I think this could probably be solved using AI, but it's not a trivial problem I feel like.
Might need to experiment a bit with the idea, it would be really cool to have a generic tool to generate transcripts/captions/translation in real time from any video/audio...
I'm a friendly, non-dev, cisgender guy from NC who enjoys playing music/making noise, hiking, eating veggies, and hanging out with my best friend/wife + our 3 kitties + 1 greyhound.
OOoo I love the idea of generating transcripts for notes of a call. That's a great usage!
I think this could probably be solved using AI, but it's not a trivial problem I feel like.
First off, non-developer talking so if I say anything off, please don't hesitate to correct me. 😅
But yes, that does sound tricky. Maybe there is some sort of algorithm out there that has already been trained to understand different voices that you could apply to this problem?
I was recently reading an article about how Get Back, the Beatles documentary restored audio and video footage using machine learning and there was a section that talked about how they trained the algorithm to know the difference between Paul and Johns' voices.
I don't imagine that they opened any of their breakthroughs in this area up for free use, but ya never know!
Not a bad idea, funny enough we talked about a similar thing at work a couple weeks back, for generating transcripts from meetings in any app (discourse, zoom, etc...). As to keep a written version of the meeting for referencing later.
One thing that we though might be a bit complex was how to handle and differenciate different people talking, and transcribe it properly.
Instead of getting:
We would've liked to get something like:
I think this could probably be solved using AI, but it's not a trivial problem I feel like.
Might need to experiment a bit with the idea, it would be really cool to have a generic tool to generate transcripts/captions/translation in real time from any video/audio...
OOoo I love the idea of generating transcripts for notes of a call. That's a great usage!
First off, non-developer talking so if I say anything off, please don't hesitate to correct me. 😅
But yes, that does sound tricky. Maybe there is some sort of algorithm out there that has already been trained to understand different voices that you could apply to this problem?
I was recently reading an article about how Get Back, the Beatles documentary restored audio and video footage using machine learning and there was a section that talked about how they trained the algorithm to know the difference between Paul and Johns' voices.
I don't imagine that they opened any of their breakthroughs in this area up for free use, but ya never know!
Ohh that's really interesting, I might need to look into that documentary and see If I can find any starting point!!