I'm a friendly, non-dev, cisgender guy from NC who enjoys playing music/making noise, hiking, eating veggies, and hanging out with my best friend/wife + our 3 kitties + 1 greyhound.
Not long ago, I had an idea for creating a sing-along styled transcript/captions service. I threw this thought out when we were hosting the Deepgram Hackathon...
But yeah, something that would be incredibly helpful to me is if there was a good transcript/captions service available to turn on when using a voice channel on Discord. I've been having mod meetups with trusted users and some of the folks don't speak English as their primary language, so it can be a bit difficult for them to keep up with the conversation and I know that captioning would be a huge help!
I actually looked for a solution and kinda found one outlined here but I'd love it if there was an app specifically for this!
Not a bad idea, funny enough we talked about a similar thing at work a couple weeks back, for generating transcripts from meetings in any app (discourse, zoom, etc...). As to keep a written version of the meeting for referencing later.
One thing that we though might be a bit complex was how to handle and differenciate different people talking, and transcribe it properly.
Instead of getting:
Yeah what do you mean i was talking about roger oh okay
We would've liked to get something like:
P1: Yeah
P2: what do you mean?
P3: I was talking about roger
P2: Oh, okay
I think this could probably be solved using AI, but it's not a trivial problem I feel like.
Might need to experiment a bit with the idea, it would be really cool to have a generic tool to generate transcripts/captions/translation in real time from any video/audio...
I'm a friendly, non-dev, cisgender guy from NC who enjoys playing music/making noise, hiking, eating veggies, and hanging out with my best friend/wife + our 3 kitties + 1 greyhound.
OOoo I love the idea of generating transcripts for notes of a call. That's a great usage!
I think this could probably be solved using AI, but it's not a trivial problem I feel like.
First off, non-developer talking so if I say anything off, please don't hesitate to correct me. 😅
But yes, that does sound tricky. Maybe there is some sort of algorithm out there that has already been trained to understand different voices that you could apply to this problem?
I was recently reading an article about how Get Back, the Beatles documentary restored audio and video footage using machine learning and there was a section that talked about how they trained the algorithm to know the difference between Paul and Johns' voices.
I don't imagine that they opened any of their breakthroughs in this area up for free use, but ya never know!
Oooo, I got a couple suggestions for ya!
Not long ago, I had an idea for creating a sing-along styled transcript/captions service. I threw this thought out when we were hosting the Deepgram Hackathon...
But yeah, something that would be incredibly helpful to me is if there was a good transcript/captions service available to turn on when using a voice channel on Discord. I've been having mod meetups with trusted users and some of the folks don't speak English as their primary language, so it can be a bit difficult for them to keep up with the conversation and I know that captioning would be a huge help!
I actually looked for a solution and kinda found one outlined here but I'd love it if there was an app specifically for this!
Not a bad idea, funny enough we talked about a similar thing at work a couple weeks back, for generating transcripts from meetings in any app (discourse, zoom, etc...). As to keep a written version of the meeting for referencing later.
One thing that we though might be a bit complex was how to handle and differenciate different people talking, and transcribe it properly.
Instead of getting:
We would've liked to get something like:
I think this could probably be solved using AI, but it's not a trivial problem I feel like.
Might need to experiment a bit with the idea, it would be really cool to have a generic tool to generate transcripts/captions/translation in real time from any video/audio...
OOoo I love the idea of generating transcripts for notes of a call. That's a great usage!
First off, non-developer talking so if I say anything off, please don't hesitate to correct me. 😅
But yes, that does sound tricky. Maybe there is some sort of algorithm out there that has already been trained to understand different voices that you could apply to this problem?
I was recently reading an article about how Get Back, the Beatles documentary restored audio and video footage using machine learning and there was a section that talked about how they trained the algorithm to know the difference between Paul and Johns' voices.
I don't imagine that they opened any of their breakthroughs in this area up for free use, but ya never know!
Ohh that's really interesting, I might need to look into that documentary and see If I can find any starting point!!