I have reached the stage in my programming path, that I truly hold in high respect those developers who meets up deadlines, most especially when those deadlines are set by people who are non-programmers and they set it on their “idea” of how long project ought to take.
As a voluntary project for my church, I am building a multi-functional application that is a text editor, audio transcriber, a search interface capable of both keywords, semantic search (and by His Grace, soon to be added a text summarization capability).
In a bid to spread the Truth of the Gospel of our Lord Jesus Christ, which have been almost overshadowed by false doctrines as forewarned by apostle Paul in Galatian 1: 6–9; 3: 1–3 [The whole book of Galatian was written as a warning to the Church against believing people who preach false doctrines with fair speeches — Romans 16: 17–18].
We are working on transcribing audio/video messages to be able to share it to more people, and seeing how tasking and extremely time consuming manually transcribing an audio/video message is [an hour audio/video can take over 6 hours, or for most a whole day to transcribe, and there are audio/video messages that are over 10 hours, try imagining how long that will take], even when Google Gboard speech-to-text functionalities is used.
In attempting to try to make the process easier, I made my developer skills and intentions known and I was tasked with concentrating on building the application, but to focus on only implementing the audio/video transcriber function, as that is what is sorely needed and was given 2 weeks to complete it. I had it completed in 3 weeks, though I have recurring nightmares of hidden bugs that are in the application.
Though most of the application buttons are unresponsive, as their functionalities have not been implemented. I am seriously excited of my achievement, the application was able to transcribe a 10 hour audio in 2 hours, leaving to others the easy task of just punctuating the transcripts [Something I am planning of automating using the interesting punctuation restoration models].
I will like to thank Wanderson M. Pimenta for making their wonderful PyQt GUI interface available publicly without which I would definitely be unable to build up the interface quick enough and Martin Fitzpack for his amazing books on PyQt5 as they serve as excellent reference guides when implementing the transcriber functionality,
Finally, Google and Houndify, because the application makes use of both Google and Houndify speech-to-text API, and I discovered that while been two times slower than Google, Houndify tends to be more accurate. Although as a goal, I am aiming to make the application completely offline using deep learning models to power the speech-to-text conversion.
The application is truly amazing! [at least in my perspective], it converts any audio/video to a .flac format, and since Google or Houndify won’t let you dump a 10 hour or a 5 hour video on their API for them to transcribe at once. I had to loop through the audio file to be sending 100sec of the audio to the API, and saving the returned text.
Okay I fibbed a little unintentionally, the application won’t just transcribe your 10 hour audio/video for you either, it would let you dump it but it won’t transcribe it. You must have to break it into multiple 1 hour audio/video before it will transcribe it, this is for memory concerns, as an audio in .flac format can become extremely large and might crash some systems that don’t have enough memory.
Whew the issues I encountered!, if not for Stack Overflow (Praise Be Upon It — PBUI) this application would not be, the most stressful was in trying to compile my scripts to .exe, and that was because of the FFMEG binary application I was using to convert audio/video files to the standard format. I solved this by having to modify a third party package I was using to convert the files programmatically.
Although the GUI leave much to be desired, the primary goal is accomplished, and that is what matters! Now I am further tasked with building and hosting a database and also a backup offline database [I will use PostgreSQL, it is the only DBMS I know] that will contained the transcripts and also implement a search functionality that will be able to search the database using keywords/semantic search [Now, I have no idea how to do that!] and also I have been given another 2 weeks.
Hopefully by the Grace of God, I will post of my progress in the next two week, Now let me get back to crazily coding and stalking Stack Overflow (PBUI).
Top comments (0)