DEV Community

Discussion on: I have the weekend free | Suggest weird ideas and I'll try to code them!!!

Collapse
 
michaeltharrington profile image
Michael Tharrington • Edited

Oooo, I got a couple suggestions for ya!

Not long ago, I had an idea for creating a sing-along styled transcript/captions service. I threw this thought out when we were hosting the Deepgram Hackathon...

But yeah, something that would be incredibly helpful to me is if there was a good transcript/captions service available to turn on when using a voice channel on Discord. I've been having mod meetups with trusted users and some of the folks don't speak English as their primary language, so it can be a bit difficult for them to keep up with the conversation and I know that captioning would be a huge help!

I actually looked for a solution and kinda found one outlined here but I'd love it if there was an app specifically for this!

Collapse
 
nombrekeff profile image
Keff

Not a bad idea, funny enough we talked about a similar thing at work a couple weeks back, for generating transcripts from meetings in any app (discourse, zoom, etc...). As to keep a written version of the meeting for referencing later.

One thing that we though might be a bit complex was how to handle and differenciate different people talking, and transcribe it properly.

Instead of getting:

Yeah what do you mean i was talking about roger oh okay
Enter fullscreen mode Exit fullscreen mode

We would've liked to get something like:

P1: Yeah
P2: what do you mean?
P3: I was talking about roger
P2: Oh, okay
Enter fullscreen mode Exit fullscreen mode

I think this could probably be solved using AI, but it's not a trivial problem I feel like.

Might need to experiment a bit with the idea, it would be really cool to have a generic tool to generate transcripts/captions/translation in real time from any video/audio...

Collapse
 
michaeltharrington profile image
Michael Tharrington

OOoo I love the idea of generating transcripts for notes of a call. That's a great usage!

I think this could probably be solved using AI, but it's not a trivial problem I feel like.

First off, non-developer talking so if I say anything off, please don't hesitate to correct me. 😅

But yes, that does sound tricky. Maybe there is some sort of algorithm out there that has already been trained to understand different voices that you could apply to this problem?

I was recently reading an article about how Get Back, the Beatles documentary restored audio and video footage using machine learning and there was a section that talked about how they trained the algorithm to know the difference between Paul and Johns' voices.

I don't imagine that they opened any of their breakthroughs in this area up for free use, but ya never know!

Thread Thread
 
nombrekeff profile image
Keff

Ohh that's really interesting, I might need to look into that documentary and see If I can find any starting point!!