DEV Community

Cover image for Speech Recognition in action
Nkosi Phillip
Nkosi Phillip

Posted on

Speech Recognition in action

I often listen to the Shop Talk podcast. It's one of my favourites. Episode 299 saw Paige Bailey on as a guest to speak about machine learning. The talk was inspiring and she advised beginners to jump right in with APIs. So, that's what I did.

I had the task of building a comment section for a page and decided to have users speak to post comments instead of type. The Web Speech API was the perfect tool for the job. The API has two parts; Speech to Text and Text to Speech. We'll put the former to use. At the time for writing this the MDN docs state that this is experimental technology. Which probably means that it may not work with most browsers.

comment section

Comment Section

How it went down:

My logic is taking place inside a React component. I'll first need to create a speech recognition object.

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = new SpeechRecognition()

I am now able to call methods on this object when I need to. So, to start it up I'll attach an onClick event to the button.

<button onClick={handleClick }>Click to speak</button>

Inside of my handleClick I'll run the following method;

recognition.start()

This starts the speech recognition service and begins to listen for incoming audio. The next thing we need to do is define the onresult method. It does exactly what the name implies. When the user is finished speaking a result is returned. This value is passed to the onresult method in the form of an event. Then to access the speech in the form of a string I just need to go a few levels deep until I find 'transcript'.

recognition.onresult = (e) => {
            const current = e.resultIndex
            const transcript = e.results[current][0].transcript
            const upperCase = transcript.charAt(0).toUpperCase() + transcript.substring(1);
            postComment(upperCase)
            fetchComments()
        }

After I have the string, I can now capitalise the sentence before sending it to my postComment method which sends it to the Firebase DB.

const postComment = (comment) => {
        //Create document object
        let doc = {
            title: comment
        }
        //Send object to Firebase DB
        db.collection('comments').add(doc).then(doc => {
            console.log(`${comment} added successfully to the database`)
        })
    }

And there we are, almost done. The last thing we need to do is ensure the user can see their comment after recording it. We'll make it so they don't have to refresh the page. For this I call my fetchComments function which sets state. And as we know, if state is changed a re-render is triggered.

const fetchComments = () => {
        //Get all comments from Firebase DB
        db.collection('comments').get().then(snapshot => {
        const dbComments =    snapshot.docs.map(item => item.data())
        setComments([...dbComments])
        })
    }

Problems I encountered:

Working in JS you're probably accustomed to most variable names being camel case. Notice above that the speech recognition methods aren't. This had me stuck for a while not knowing why it wasn't working. So, be sure to write them just as they are in the docs.

Github repo

Try posting a comment...
Khabib Tribute

Happy Hacking!!!

Top comments (0)