DEV Community

Cover image for Build Simple CLI-Based Voice Assistant with PyAudio, Speech Recognition, pyttsx3 and SerpApi
Dmitriy Zub ☀️ for SerpApi

Posted on • Originally published at serpapi.com

Build Simple CLI-Based Voice Assistant with PyAudio, Speech Recognition, pyttsx3 and SerpApi

Intro

As you saw by the title, this is a demo project that shows a very basic voice-assistant script that can answer your questions in the terminal based on Google Search results.

You can find the full code in the GitHub repository: dimitryzub/serpapi-demo-projects/speech-recognition/cli-based/

The follow-up blog post(s) will be about:

  • Web-based solution using Flask, some HTML, CSS and Javascript.
  • Android & Windows based solution using Flutter and Dart.

What we will build in this blog post

💡Click on the image to open demo video.


SerpApi Voice Assistant Demo Usage Example

Prerequisites

First, let's make sure we are in a different environment and properly install the libraries we need for the project. The hardest (possibly) will be to install pyaudio.

Virtual Environment and Libraries Installation

Before we start installing libraries, we need create and activate new environment for this project:

# if you're on Linux based systems
$ python -m venv env && source env/bin/activate
$ (env) <path>

# if you're on Windows and using Bash terminal
$ python -m venv env && source env/Scripts/activate
$ (env) <path>

# if you're on Windows and using CMD
python -m venv env && .\env\Scripts\activate
$ (env) <path>
Enter fullscreen mode Exit fullscreen mode
Explanation
python -m venv env tells Python to run module (-m) venv and create a folder called env.
&& Stands for AND.
source <venv_name>/bin/activate will activate your environment and you'll be able to install libraries only in that environment.

Now install all needed libraries:

pip install rich pyttsx3 SpeechRecognition google-search-results
Enter fullscreen mode Exit fullscreen mode

Now to pyaudio. Please, keep in mind that pyaudio may throw an error while installing. An additional research may be needed on your end.

If you're on Linux, we need to install some development dependencies to use pyaudio:

$ sudo apt-get install -y libasound-dev portaudio19-dev
$ pip install pyaudio
Enter fullscreen mode Exit fullscreen mode

If you're on Windows, it's simpler (tested with CMD and Git Bash):

pip install pyaudio
Enter fullscreen mode Exit fullscreen mode

Full Code

import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv

load_dotenv('.env')
console = Console()

def main():
    console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')

    recognizer = speech_recognition.Recognizer()

    while True:
        with console.status(status='Listening you...', spinner='point') as progress_bar:
            try:
                with speech_recognition.Microphone() as mic:
                    recognizer.adjust_for_ambient_noise(mic, duration=0.1)
                    audio = recognizer.listen(mic)

                    text = recognizer.recognize_google(audio_data=audio).lower()
                    console.print(f'[bold]Recognized text[/bold]: {text}')

                    progress_bar.update(status='Looking for answers...', spinner='line')
                    params = {
                        'api_key': os.getenv('API_KEY'),
                        'device': 'desktop',
                        'engine': 'google',
                        'q': text,
                        'google_domain': 'google.com',
                        'gl': 'us',
                        'hl': 'en'
                    }

                    search = GoogleSearch(params)
                    results = search.get_dict()

                    try:
                        if 'answer_box' in results:
                            try:
                                primary_answer = results['answer_box']['answer']
                            except:
                                primary_answer = results['answer_box']['result']
                            console.print(f'[bold]The answer is[/bold]: {primary_answer}')

                        elif 'knowledge_graph' in results:
                            secondary_answer = results['knowledge_graph']['description']
                            console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
                        else:
                            tertiary_answer = results['answer_box']['list']
                            console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')
                        progress_bar.stop() # if answered is success -> stop progress bar.

                        user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')

                        if user_promnt_to_contiune_if_answer_is_success == 'y':
                            recognizer = speech_recognition.Recognizer()
                            continue # run speech recognizion again until `user_promt` == 'n'
                        else:
                            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                            break
                    except KeyError:
                        progress_bar.stop()

                        error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")

                        if error_user_promt == 'y':
                            recognizer = speech_recognition.Recognizer()
                            continue # run speech recognizion again until `user_promt` == 'n'
                        else:
                            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                            break

            except speech_recognition.UnknownValueError:
                progress_bar.stop()
                user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')

                if user_promt_to_continue == 'y':
                    recognizer = speech_recognition.Recognizer()
                    continue # run speech recognizion again until `user_promt` == 'n'
                else:
                    progress_bar.stop()
                    console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                    break


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

Code Explanation

Import libraries:

import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv
Enter fullscreen mode Exit fullscreen mode
Library Purpose
rich Python library for beautiful formatting in the terminal.
pyttsx3 Python's Text-to-speech converter that works in offline.
SpeechRecognition Python library to convert speech to text.
google-search-results SerpApi's Python API wrapper that parses data from 15+ search engines.
os To read secret environment variable. In this case it's SerpApi API key.
dotenv To load your environment variable(s) (SerpApi API key) from .env file. .env file could renamed to any file: .napoleon . (dot) represents a environment variable file.

Define rich Console(). It will be used to prettify terminal output (animations, etc):

console = Console()
Enter fullscreen mode Exit fullscreen mode

Define main function where all will be happening:

def main():
    console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')

    recognizer = speech_recognition.Recognizer()
Enter fullscreen mode Exit fullscreen mode

At the beginning of the function we're defining speech_recognition.Recognizer() and console.rule will create the following output:

───────────────────────────────────── SerpApi Voice Assistant Demo Project ─────────────────────────────────────
Enter fullscreen mode Exit fullscreen mode

The next step is to create a while loop that will be constantly listening for microphone input to recognize the speech:

while True:
    with console.status(status='Listening you...', spinner='point') as progress_bar:
        try:
            with speech_recognition.Microphone() as mic:
                recognizer.adjust_for_ambient_noise(mic, duration=0.1)
                audio = recognizer.listen(mic)

                text = recognizer.recognize_google(audio_data=audio).lower()
                console.print(f'[bold]Recognized text[/bold]: {text}')
Enter fullscreen mode Exit fullscreen mode
Code Explanation
console.status A rich progress bar, it's used only for cosmetic purpose.
speech_recognition.Microphone() To start picking input from the microphone.
recognizer.adjust_for_ambient_noise Intended to calibrate the energy threshold with the ambient energy level.
recognizer.listen To listen for actual user text.
recognizer.recognize_google Performs speech recognition using Google Speech Recongition API. lower() is to lower recognized text.
console.print A rich print statement that allows to use text modification, such as adding bold, italic and so on.

spinner='point' will produce the following output (use python -m rich.spinner to see list of spinners):

rich-loading-progress

After that, we need to initialize SerpApi search parameters for the search:

progress_bar.update(status='Looking for answers...', spinner='line') 
params = {
    'api_key': os.getenv('API_KEY'),  # serpapi api key   
    'device': 'desktop',              # device used for 
    'engine': 'google',               # serpapi parsing engine: https://serpapi.com/status
    'q': text,                        # search query 
    'google_domain': 'google.com',    # google domain:          https://serpapi.com/google-domains
    'gl': 'us',                       # country of the search:  https://serpapi.com/google-countries
    'hl': 'en'                        # language of the search: https://serpapi.com/google-languages
    # other parameters such as locations: https://serpapi.com/locations-api
}

search = GoogleSearch(params)         # where data extraction happens on the SerpApi backend
results = search.get_dict()           # JSON -> Python dict
Enter fullscreen mode Exit fullscreen mode

progress_bar.update will, well, update progress_bar with a new status (text printed in the console), and spinner='line' will produce the following animation:

rich-loading-line-progress

After that, the data extraction happens from Google search using SerpApi's Google Search Engine API.

The following part of the code will do the following:

image

try:
    if 'answer_box' in results:
        try:
            primary_answer = results['answer_box']['answer']
        except:
            primary_answer = results['answer_box']['result']
        console.print(f'[bold]The answer is[/bold]: {primary_answer}')

    elif 'knowledge_graph' in results:
        secondary_answer = results['knowledge_graph']['description']
        console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
    else:
        tertiary_answer = results['answer_box']['list']
        console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')
    progress_bar.stop()  # if answered is success -> stop progress bar

    user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')

    if user_promnt_to_contiune_if_answer_is_success == 'y':
        recognizer = speech_recognition.Recognizer()
        continue         # run speech recognizion again until `user_promt` == 'n'
    else:
        console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
        break
except KeyError:
    progress_bar.stop()  # if didn't found the answer -> stop progress bar

    error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")

    if error_user_promt == 'y':
        recognizer = speech_recognition.Recognizer()
        continue         # run speech recognizion again until `user_promt` == 'n'
    else:
        console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
        break
Enter fullscreen mode Exit fullscreen mode

The final step is to handle error when no sound was picked up from the microphone:

# while True:
#     with console.status(status='Listening you...', spinner='point') as progress_bar:
#         try:
            # speech recognition code
            # data extraction code
        except speech_recognition.UnknownValueError:
                progress_bar.stop()         # if didn't heard the speech -> stop progress bar
                user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')

                if user_promt_to_continue == 'y':
                    recognizer = speech_recognition.Recognizer()
                    continue               # run speech recognizion again until `user_promt` == 'n'
                else:
                    progress_bar.stop()    # if want to quit -> stop progress bar
                    console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                    break
Enter fullscreen mode Exit fullscreen mode

console.rule() will provide the following output:

───────────────────── Thank you for cheking SerpApi Voice Assistant Demo Project ──────────────────────
Enter fullscreen mode Exit fullscreen mode

Add if __name__ == '__main__' idiom which protects users from accidentally invoking the some script(s) when they didn't intend to, and call the main function which will run the whole script:

if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

Links

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞

Top comments (0)