Debaditya-Mohanty

Posted on Feb 21, 2021 • Updated on Mar 8, 2021

Virtual Voice Assistance

#python

Have you ever wondered how cool it would be to have your own Virtual Voice. assistant? Imagine how easier it would be to send emails without typing a single word, doing Wikipedia searches without opening web browsers, and performing many other daily tasks like playing music with the help of a single voice command.

A voice assistant is a digital assistant that uses voice recognition, language processing algorithms, and voice synthesis to listen to specific voice commands and return relevant information or perform specific functions as requested by the user.

Here we only focus on operations performed by the voice assistant only on some specific commands .

Based on specific commands, spoken by the user, voice assistants can return relevant information by listening for specific keywords and filtering out the ambient noise.

So we have multiple approach towards building a voice assistance but here we will focus on the task oriented approach for the voice assistant .

THE BEGINING🙌

So lets start with some of the operations which can be performed by our voice assistant .

THE takeCommand()

 r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Listening...")
        r.pause_threshold = 1
        audio = r.listen(source)

    try:
        print("Recognizing...")    
        query = r.recognize_google(audio, language='en-in')
        print(f"User said: {query}\n")

    except Exception as e:
        # print(e)    
        print("Say that again please...")  
        return "None"
    return query

In the takeCommand() function we give input to our voice assistant through our microphone for which we have used the Speech Recognition module and PyAudio module in python. The mechanics behind the Speech recognition is that speech must be converted from physical sound to an electrical signal with a microphone, and then to digital data with an analog-to-digital converter. Once digitized, several models can be used to transcribe the audio to text.

In python multiple modules are available for speech recognition like :
1.apiai
2.assemblyai
3.google-cloud-speech
4.pocketsphinx
5.SpeechRecognition
6.watson-developer-cloud
7.wit

Some of these packages—such as wit and apiai—offer built-in features, like natural language processing for identifying a speaker’s intent, which go beyond basic speech recognition. Others, like google-cloud-speech, focus solely on speech-to-text conversion.

There is one package that stands out in terms of ease-of-use: SpeechRecognition.

Recognizing speech requires audio input, and SpeechRecognition makes retrieving this input really easy. Instead of having to build scripts for accessing microphones and processing audio files from scratch, SpeechRecognition will have you up and running in just a few minutes.
The SpeechRecognition library acts as a wrapper for several popular speech APIs and is thus extremely flexible. One of these—the Google Web Speech API—supports a default API key that is hard-coded into the SpeechRecognition library

Through the takeCommand() function the queries were extracted and on that basis different functions are called and task get performed .

speak()

def speak(audio):
    engine.say(audio)
    engine.runAndWait()

The first and foremost thing for an Virtual voice assistant is that it should be able to speak. To make our Virtual Assistant talk, we will make a function called speak(). This function will take audio as an argument(which is some text which we intend our virtual assistant to say), and then it will pronounce it.
Here we have used the pyttsx3 module for the conversion of text to speech library.

The above code initializes the pyttsx3 package. The Instance of the initialized pyttsx3 package is stored in the engine variable. We are calling the variable engine as it works as the engine and converts Text-To-Speech whenever execute the functions from the package.
There is a built-in say() function in the pyttsx3 package that takes a string value and speaks it out.
The engine.runAndWait() function keeps track when the engine starts converting text to speech and waits for that much time, and do not allow the engine to close. If we don’t write this code, it may happen that the engine might not work properly as the processes will not be synchronized.

wishme() and location()

def wishMe():
    hour = int(datetime.datetime.now().hour)
    if hour>=0 and hour<12:
        speak("Good Morning!")
        date()

    elif hour>=12 and hour<18:
        speak("Good Afternoon!")
        date()

    else:
        speak("Good Evening!")
        date()

    speak("Hello I am your Virtual Assistant. Please tell me how may I help you")

These are some of the task performed by our voice assistant which include wishing or greeting the user according to the time of computer.(This was implemented using the datetime module )

One of the classes defined in the datetime module is datetime class. We then used now() method to create a datetime object containing the current local date and time.

def location(query):
    index =query.lower().split().index("is")
    location = query.split()[index+ 1 : ]
    webbrowser.open("https://www.google.com/maps/place" + "+".join(location))
    speak("Opening " + str(search) + " on google")

It also include providing location on google maps according to the search queries.
Here we take the location from the query after finding context said after where is.From this we extract the information about the location and search it in the web.
Commonly used classes in the datetime module are:

date Class
time Class
datetime Class
timedelta Class

Additional Task

search_yt()
search_google()
screenshot()
screen_record()

def search_yt(query):
    index =query.lower().split().index("youtube")
    search = query.split()[index+ 1 : ]
    webbrowser.open("http://www.youtube.com/results?search_query=" + "+".join(search))
    speak("Opening " + str(search) + " on youtube")

def search_google(query):
    index =query.lower().split().index("google")
    search = query.split()[index+ 1 : ]
    webbrowser.open("https://www.google.com/search?q=" + "+".join(search))
    speak( "Opening " + str(search) + " on google")

Some other task include searching any playlist or any Video on YouTube , Searching about any location ,books ,celebrity etc. on google .For these operation we use the function search_yt() and search_google().These both are browser based operations.
These task are performed in similar ways as that of location()

To open any website, we need to import a module called "webbrowser". It is an in-built module, and we do not need to install it with a pip statement; we can directly import it into our program by writing an import statement.
The webbrowser module provides a basic interface to the system’s standard web browser. It provides an open function, which takes a filename or a URL, and displays it in the browser. If you call open again, it attempts to display the new page in the same browser window

def screenshot():
    img = pyautogui.screenshot()
    img.save()

Some of the OS based operations include taking a screenshot, emptying the recycle bin ,recording the windows screen .This tasks were mostly done by using the PyAutoGui library.
For screen recording we take infinite amount(theoritically) screenshots and combine them to create the video .Here we used 60 frame per second for our video .

resolution = (1920, 1080)
    codec = cv2.VideoWriter_fourcc(*"XVID")
    filename = "Recording.avi"

    #count=count+1
    fps = 60.0
    out = cv2.VideoWriter(filename, codec, fps, resolution) 
    cv2.namedWindow("Live", cv2.WINDOW_NORMAL)
    cv2.resizeWindow("Live", 480, 270)

    while(1): 

        img = pyautogui.screenshot() 

        frame = np.array(img) 

        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) 

        out.write(frame) 

        # Optional: Display the recording screen 
        cv2.imshow('Live', frame) 

        # Stop recording when we press 'q' 
        if cv2.waitKey(1) == ord('q'): 
            break

    out.release()

PyAutoGUI lets your Python scripts control the mouse and keyboard to automate interactions with other applications.

PyAutoGUI has several features:

1.Moving the mouse and clicking or typing in the windows of other applications.
2.Sending keystrokes to applications (for example, to fill out forms).
3.Take screenshots, and given an image (for example, of a button or checkbox), find it on the screen.
4.Locate an application’s window, and move, resize, maximize, minimize, or close it (Windows-only, currently)
5.Display message boxes for user interaction while your GUI automation script runs.

Main()

if 'wikipedia' in query:
            speak('Searching Wikipedia...')
            query = query.replace("wikipedia", "")
            results = wikipedia.summary(query, sentences=2)
            speak("According to Wikipedia")
            print(results)
            speak(results)

        elif 'open youtube' in query:
            webbrowser.open("youtube.com")

        elif 'open google' in query:
            webbrowser.open("google.com")

        elif 'open stackoverflow' in query:
            webbrowser.open("stackoverflow.com")

        elif 'search in youtube' in query:
            search_yt(query)

        elif 'search in google' in query:
            search_google(query)

        elif 'play music' in query:
            music_dir = 'songs'
            #music_dir = 'YOUR MUSIC DIRECTORY'
            songs = os.listdir(music_dir)
            print(songs)    
            os.startfile(os.path.join(music_dir, songs[0]))

....continues

In the main function we have some additional operations like searching something on WIKIPEDIA which has been executed by the help of the "wikipedia" python module.

This python library called Wikipedia allows us to easily access and parse the data from Wikipedia. In other words, you can also use this library as a little scraper where you can scrape only limited information from Wikipedia.
Here we have used the summary function for getting the results from the wikipedia.
On executing wikipedia.summary(query, sentences=2) this line of code we can get the summary of the desired article that we are looking for in string format.

It also include a note function which remembers some important details to be accomplished later and reminds you of that .(This can be improved by using the google calendar API )

Apart from this we have also used the pyjokes module which returns jokes on being called based on the category and language.
For different languages, choose the language from the given set of languages and you can also choose the category of joke.

Languages Supported By Pyjokes:

English – ‘en’
German – ‘de’
Spanish – ‘es’
Galician – ‘gl’
Basque – ‘eu’
Italian – ‘it’

Categories Included In Pyjokes:

For geeky jokes -’neutral’ (It is chosen by default)
For Chris Norris Jokes – ‘chuck’.
If you want all type of jokes – ‘all’
There is one more category known as ‘twister’ which only works for the German Language (‘de’). This mostly includes tongue twister.

Some APIs used

twilio : this is used for sending and receiving text messagestwilio .
news.api : this is used to get the current news from more than 50 countries NewsAPI .
Wolfram Alpha API : this api was used for answering to any kind of queries Wolfarm Alpha .
weather API : this api was used to get the information of the current weather of any city weather .

Conclusion

So the main function is a combination of several "elif" statements which gets executed based on some particular keywords in the query.

Many people will argue that the virtual assistant that we have created is not an A.I, but it is the output of a bunch of the statement. But, if we look at the fundamental level, the sole purpose of A.I develop machines that can perform human tasks with the same effectiveness or even more effectively than humans.

We have finally built our own voice assistant .Further we will add more functionalities to the voice assistant to perform more task.

That is pretty much it.

I hope this helps. Cheers!

For the complete code of the virtual voice assistant visit my github Profile

DEV Community

Virtual Voice Assistance

THE BEGINING🙌

Conclusion

Top comments (0)

Read next

Python code that inserts a large number of records into a MySQL table.

How to Use External Configuration Files in Python Production Code

La Función Lambda en Python: Simplificando Tu Código

Push notifications from server with Telegram Bot API