DEV Community: Dilek Karasoy

Build your own Krisp App: Noise Cancellation with Python

Dilek Karasoy — Wed, 16 Aug 2023 18:43:45 +0000

Krisp has become popular with COVID-19. Removing background noises improves the quality of virtual meetings. It's easy to download the Krisp app and start using. How about building one?

Enter [Koala Noise Suppression (https://picovoice.ai/platform/koala/)
It takes only a few lines of Python to get it embedded into your app. Koala Noise Suppression Python SDK supports Linux, macOS, Windows, Raspberry Pi, and NVIDIA Jetson. [Koala Noise Suppression (https://picovoice.ai/platform/koala/) can also run on mobile and web!

Install the SDK

pip install pvkoala

Import the package

import pvkoala

Create an instance with your AccessKey:

handle = pvkoala.create(access_key)

Get your AccessKey from Picovoice Console for free and replace the placeholder with it!

Suppress Noise!

enhanced_pcm = handle.process(pcm)

Voila! You're done!

Additional Resources:

Speaker Identification for Streaming with Python

Dilek Karasoy — Fri, 11 Aug 2023 18:52:12 +0000

Open-source speaker recognition is the only option for developers, unless they work for large enterprises. Recently, at Picovoice we made our internal tool for Speaker Recognition public. So, those who prefer production-ready solutions now have an option!

Let's get started!

1. Install [Picovoice Eagle Speaker Recognition](https://picovoice.ai/platform/eagle/) using pip. We will be using pvrecorder to get cross-platform audio, so install that as well:

pip3 install pveagle pvrecorder

2. Grab your AccessKey
If you haven't create an account on Picovoice Console for free and grab your AccessKey

3. Enroll Speakers
Import pveagle and create an instance of the EagleProfiler class:

import pveagle

access_key = "{YOUR_ACCESS_KEY}";
try:
    eagle_profiler = pveagle.create_profiler(access_key=access_key)
except pveagle.EagleError as e:
    # Handle error
    pass

Don't forget to replace the placeholder with your AccessKey!

Now, import pvrecorder and create an instance of the recorder as well. Use the EagleProfiler's .min_enroll_samples as the frame_length:

from pvrecorder import PvRecorder

DEFAULT_DEVICE_INDEX = -1
recorder = PvRecorder(
    device_index=DEFAULT_DEVICE_INDEX,
    frame_length=eagle_profiler.min_enroll_samples)

The .enroll() function generates a percentage value to know when Enrollment is done and another speaker can be enrolled:

recorder.start()

enroll_percentage = 0.0
while enroll_percentage < 100.0:
    audio_frame = recorder.read()
    enroll_percentage, feedback = eagle_profiler.enroll(audio_frame)

recorder.stop()

4. Export the Speaker Profile

We'll need use in the next step to identify / verify the speaker!

speaker_profile = eagle_profiler.export()

You can reuse the speaker_profile object. Check out the docs for details.

Add more speakers by creating additional profiles by calling the .reset() function on the EagleProfiler, and repeating the .enroll() step.

5. Clean up used resources:
Once you create profiles for all speakers, let's clean up used resources!

recorder.delete()
eagle_profiler.delete()

6. Recognize Speakers:

import pveagle

access_key = "{YOUR_ACCESS_KEY}"
profiles = [speaker_profile_1, speaker_profile_2]
try:
eagle = pveagle.create_recognizer(
access_key=access_key,
speaker_pofiles=profiles)
except pveagle.EagleError as e:
# Handle error
pass

Set up `pvrecorder` to use with Eagle Speaker Recognition:

recorder = PvRecorder(
device_index=DEFAULT_DEVICE_INDEX,
frame_length=eagle.frame_length)

Pass audio frames into the `eagle.process()` function get back speaker scores:

while True:
audio_frame = recorder.read()
scores = eagle.process(audio_frame)

When finished, again clean up used resources:

eagle.delete()




## Connect them All Together

import pveagle
from pvrecorder import PvRecorder

DEFAULT_DEVICE_INDEX = -1
access_key = "{YOUR_ACCESS_KEY}";

Step 1: Enrollment

try:
eagle_profiler = pveagle.create_profiler(access_key=access_key)
except pveagle.EagleError as e:
pass

enroll_recorder = PvRecorder(
device_index=DEFAULT_DEVICE_INDEX,
frame_length=eagle_profiler.min_enroll_samples)

enroll_recorder.start()

enroll_percentage = 0.0
while enroll_percentage < 100.0:
audio_frame = enroll_recorder.read()
enroll_percentage, feedback = eagle_profiler.enroll(audio_frame)

enroll_recorder.stop()

speaker_profile = eagle_profiler.export()

enroll_recorder.delete()
eagle_profiler.delete()

Step 2: Recognition

try:
eagle = pveagle.create_recognizer(
access_key=access_key,
speaker_pofiles=[speaker_profile])
except pveagle.EagleError as e:
pass

recognizer_recorder = PvRecorder(
device_index=DEFAULT_DEVICE_INDEX,
frame_length=eagle.frame_length)

recognizer_recorder.start()

while True:
audio_frame = recorder.read()
scores = eagle.process(audio_frame)
print(scores)

recognizer_recorder.stop()

recognizer_recorder.delete()
eagle.delete()




---
For more information:

- [Check the original article],(https://picovoice.ai/blog/speaker-recognition-in-python/)
- Learn more about Speaker [Recognition](https://picovoice.ai/blog/speaker-recognition/), [Identification](https://picovoice.ai/blog/speaker-identification/) and [Verification](https://picovoice.ai/blog/voice-biometrics/),
- [Check out other Eagle Speaker Recognition SDKs](https://picovoice.ai/docs/eagle/)
- Visit [Eagle Speaker Recognition GitHub repository](https://github.com/Picovoice/eagle) for open-source demos and to create issues. While on GitHub, if you like building with Eagle Speaker Recognition, give it a star and help fellow devs find it easily.

Live Audio Transcription with Python For Free

Dilek Karasoy — Thu, 10 Aug 2023 15:07:57 +0000

Unlike cloud speech-to-text APIs, Cheetah Streaming Speech-to-Text processes speech data locally on-device. Thus, it has a default advantage over the cloud speech-to-text APIs when it comes to speed. No cloud speech-to-text API can eliminate the network latency. They can be fast, but to a certain degree.

Let's learn to convert live audio to text using Picovoice Cheetah Streaming Speech-to-Text Python SDK, so you can see it yourself.

1. Install Cheetah Streaming Speech-to-Text Python SDK

pip install pvcheetah

2. Grab your AccessKey from **Picovoice Console**
If you do not have an existing Picovoice Console Account, create one in minutes. No credit card is required. You can enjoy the Forever-Free Plan, as the name suggests, forever!

3. Import Cheetah Streaming Speech-to-Text package:

import pvcheetah

4. Create an instance of the speech-to-text object with your AccessKey:

handle = pvcheetah.create(access_key)

Don't forget to replace the placeholder with your AccessKey

5. Implement audio recording.
Cheetah Streaming Speech-to-Text processes audio whether it comes from a microphone or another program.

For the following, we assume there is a function available to us that provides the next available audio chunk (frame) as below.

def get_next_audio_frame():
    pass

Convert live audio to text:

while True:
    partial_transcript, is_endpoint = handle.process(get_next_audio_frame())
    if is_endpoint:
        final_transcript = handle.flush()

That's it! In 5 simple steps, you can get live audio converted into text!

For more information, you can check

Voice AI with Raspberry Pi - ReSpeaker

Dilek Karasoy — Wed, 15 Mar 2023 02:53:48 +0000

For day 44 we'll use Porcupine Wake Word and Seeed Studio ReSpeaker.

Let's get started!
1. Installation

Follow the instructions on Seeed Studio to install and set up the microphone array.

-Install the demo:

sudo pip3 install ppnrespeakerdemo

Porcupine Wake Word requires a valid AccessKey at initialization If you haven't already, grab your AccessKey from Picovoice Console for free.

2. Usage
Below are the colors associated with supported wake words for this demo:

#ffff33 Alexa
#ff8000 Bumblebee
#ffffff Computer
#ff0000 Hey Google
#800080 Hey Siri
#ff3399 Jarvis
#00ff00 Picovoice
#0000ff Porcupine
000000 Terminator
Run the demo. Do not forget to replace the placeholder with your AccessKey

porcupine_respeaker_demo --access_key ${ACCESS_KEY}

Wait for the demo to initialize and print [Listening] in the terminal.

Say Picovoice
The demo outputs:
detected 'Picovoice'
The lights are now set to green.

Say Alexa
The lights are set to yellow now.

Say Terminator
The lights are now turned off.

_Happy Pi Day! _

No Alexa, No Google: Custom Hotword Detection Arduino

Dilek Karasoy — Thu, 09 Mar 2023 22:04:21 +0000

Are you interested in calling your Arduino board with its own name? All you need is:

Arduino IDE (free)
Arduino Nano 33 BLE
Picovoice Console Account (free)

1. Train a custom hotword

Sign up for [Picovoice Console] for free (https://console.picovoice.ai/)
Go to the Porcupine Page
Select the language for your model
Type in the phrase, i.e. name you'd like to give and click on the train.
[PS: You can read our tips for selecting a hotword.]
Pick Arm Cortex-M as the platform and Arduino Nano 33 BLE as the board type.
Paste your board UUID.
Open Arduino IDE
Download Porcupine_EN library
Run the GetUUID example from Porcupine_EN.
Check the Serial Monitor
Copy the UUID and paste it on Picovoice Console.
Click on Download.
Congrats! You trained your hotword!

Embed the Hotword

Open the file pv_porcupine_params.h with a text editor. The KEYWORD_ARRAY[] contains your newly-trained model!
Run the PorcupineExample from the Porcupine_EN library.
Replace KEYWORD_ARRAY in the params.h with yours.
Copy your AccessKey from Picovoice Console
Add your AccessKey in PorcupineExample.ino. Don't forget to wrap it in " and terminate the line with ;!
Upload and run the example

Voila!

Looking for more examples?
Check out Picovoice Blog

End-to-End Speech Recognition with Python

Dilek Karasoy — Wed, 08 Mar 2023 04:54:10 +0000

Let's start with why you should use Picovoice Python SDK when there are alternative libraries and in-depth tutorials on speech recognition with Python.

Private - processes voice data on the device
Cross-platform — Linux, macOS, Windows, Raspberry Pi, …
Real-time - zero latency

-I do not need to say accurate I guess. I haven't seen any vendor claiming mediocre accuracy 🙃

Now, let's get started!

1 — Install Picovoice

pip3 install picovoice

2 — Create a Picovoice Instant
Picovoice SDK consists of Porcupine Wake Word, enabling custom hotwords and Rhino Speech-to-Intent, enabling custom voice commands. Jointly they enable hands-free experiences.
Porcupine, set an alarm for 1 hours and 13 seconds.
Porcupine detects the hotword "Porcupine", then Rhino captures the user’s intent and provides intent and intent details as seen below:

{
    is_understood: true,
    intent: setAlarm,
    slots: {
        hours: 1,
        seconds: 13
    }
}

To create a Picovoice instance we need Porcupine and Rhino models, paths to the models and callbacks for hotword detection and inference completion. For the simplicity, we'll use pre-trained Porcupine and Rhino models, however, you can train custom ones on the Picovoice Console: While exploring the Picovoice Console, grab your AccessKey, too! Signing up for Picovoice Console is free, no credit card required.

from picovoice import Picovoice
keyword_path = ...  # path to Porcupine wake word file (.PPN)
def wake_word_callback():
    pass
context_path = ...  # path to Rhino context file (.RHN)
def inference_callback(inference):
    print(inference.is_understood)
    if inference.is_understood:
        print(inference.intent)
        for k, v in inference.slots.items():
            print(f"{k} : {v}")

pv = Picovoice(
    access_key=${YOUR_ACCESS_KEY}
    keyword_path=keyword_path(),
    wake_word_callback=wake_word_callback,
    context_path=context_path(),
    inference_callback=inference_callback)

Do not forget to replace model path and AccessKey placeholders.

3 — Process Audio with Picovoice
Pass frames of audio to the engine:

pv.process(audio_frame)

4 — Read audio from the Microphone
Install [pvrecorder](https://pypi.org/project/pvrecorder/) and read the audio:

from pvrecoder import PvRecoder
# `-1` is the default input audio device.
recorder = PvRecoder(device_index=-1)
recorder.start()

Read audio frames from the recorder and pass it to .process method:

pcm = recorder.read()
pv.process(pcm)

5— Create a GUI with Tkinter
Tkinter is the standard GUI framework shipped with Python. Create a frame, add a label showing the remaining time to it, then launch:

window = tk.Tk()
time_label = tk.Label(window, text='00 : 00 : 00')
time_label.pack()

window.protocol('WM_DELETE_WINDOW', on_close)

window.mainloop()

Some resources:
Source code for the tutorial
Original Medium Article
Picovoice SDK
Picovoice Console

Podcast Transcription Software with Express.js

Dilek Karasoy — Tue, 07 Mar 2023 04:03:34 +0000

We've had several tutorials with Leopard Speech-to-Text. Leopard offers fully on-device audio transcription. Today we'll build podcast transcription software using Leopard Speech-to-Text downloading new items from the RSS feed directly.

1. Backend Set up with Express.js
The backend setup will be straightforward. A single endpoint for transcription!

const express = require('express');

const app = express();
app.use(express.json());
app.use(express.urlencoded({ extended: false }));
app.use(express.static(path.join(__dirname, 'public')));

app.get('/', function (req, res) {
  res.redirect('/index')
});

app.post('/rss-transcribe', async (req, res) => {
  console.log("RSS feed = " + req.body.rss)
});

module.exports = app;

The test page is designed to allows entering a podcast RSS feed URL manually which will be sent to the /rss-transcribe backend for processing.
2. Parsing the RSS Feed
For parsing, use this great RSS parser from Robert Brennan. It takes the URL and provides a JSON representation of the feed:
const Parser = require('rss-parser');

app.post('/rss-transcribe', async (req, res) => {
  console.log("Parsing RSS feed " + req.body.rss)
  let parser = new Parser();
  let feed = await parser.parseURL(req.body.rss)
  console.log("Parse complete.")
}

3. Fetching the Audio
After having the JSON of the RSS, you can find where the podcast audio link was located in the object:

const fs = require('fs');
const path = require('path');
const axios = require('axios');

app.post('/rss-transcribe', async (req, res) => {

  // .. parse feed

  const podcastAudioUrl = feed.items[0].enclosure.url
  console.log("Fetching file from " + podcastAudioUrl)
  let dlResponse = await axios.get(podcastAudioUrl, { responseType: "arraybuffer" })
  console.log("File obtained.")

  console.log("Writing data to local file...")
  const fileName = `${Math.random().toString(36).substr(2, 5)}.mp3`
  fs.writeFileSync(fileName, dlResponse.data)
  console.log("File write complete")

});

4. Transcribing the Podcast
Once you have the audio file, it's time to feed it into Leopard Speech-to-Text. Grab your AccessKey from the Picovoice Console for free if you haven't done yet. Replace the placeholder with your AccessKey and run the code below:

const { Leopard } = require('@picovoice/leopard-node')

app.post('/rss-transcribe', async (req, res) => {
  // .. parse feed

  // .. get audio file

  console.log("Transcribing audio...")
  const leo = new Leopard("${YOUR ACCESS KEY HERE}")
  const transcript = leo.processFile(fileName)
  leo.release()
  fs.unlinkSync(fileName)
  console.log("Transcription complete")

  res.send(transcript)
});

Now, you can have the transcription being sent in the response.
What's next?
You can take it from here and enrich your solutions. Options are:

You can write to a text file in the browser for download
Add automation with a solution like Zapier
Build front end with a search bar and a new endpoint for queries to make transcriptions searchable.

Resource:
Original Medium Article
Tutorial Source Code
Picovoice Leopard
Picovoice Console

"Pico Chess, start a new game": .NET Speech Recognition Tutorial

Dilek Karasoy — Wed, 01 Mar 2023 04:35:34 +0000

ChessCore is an open-source cross-platform chess engine in .NET Core with a text-based interface. Our team decided to voice enable it to showcase how easy it is to work with Picovoice .NET SDK

Let's get started:
1. Remove the text-based interface to replace it with voice user interface:
ChessCore keeps the chess-playing engine separate from the interface, allowing developers to replace the text-based interface with voice user interface easily.

Once you extract the useful items from the original Program.cs, so you will have something like this:

class Program
{
    static readonly Engine gameEngine = new Engine();

    static void Main(string[] _)
    {
        // so we can see actual chess pieces on the board!
        Console.OutputEncoding = Encoding.UTF8;

        // start game loop
        RunGame();  
    }

    // game control
    static void RunGame() { ... }
    static void NewGame() { ... }
    static void QuitGame() { ... }  

    // piece movement
    static string MakePlayerMove(string srcSide, string srcFile, string srcRank, 
        string dstSide, string dstFile, string dstRank) { ... }
    static string MakeOpponentMove() { ... }
    static void UndoMove() { ... }

    // end game logic
    static bool CheckEndGame() { ... }
    static string GetEndGameReason() { ... }

    // translation functions
    static byte GetRow(string move) { ... }
    static string GetRow(byte row) { ... }
    static string GetColumn(byte col) { ... }
    static byte GetColumn(string side, string file) { ... }
    static string GetPieceSymbol(ChessPieceColor color, ChessPieceType type) { ... }

    // UTF-8 board to console
    static void DrawBoard(string aboveBoardText) { ... }
}

2. Design the Voice User Interface
To build a hands-free app understanding commands like "Pico Chess, start a new game", we need engines to detect the hotword "Picovoice" and to users' intent "starting a game." The first one is powered by Porcupine Wake Word, and the latter by Rhino Speech-to-Intent.
2.1. Train a custom hotword

Sign up for the Picovoice Console for free if you haven't and go to the Porcupine section. Simply type “Pico Chess” or hotword of your choice in multiple languages. Then select what platform you want - for a cross-platform experience, train one for Windows, one for Linux and one for macOS.

2.2. Train a context to understand follow-up commands
Go to the Rhino section on the Picovoice Console and create a new context. You can design your own model from scratch, but for the sake of simplicity, just download this YAML file and import it. It will be easier to adjust the existing context especially if this is your first project. Then train and download the model.

PS: Grab your AccessKey from the Picovoice Console, while you're there. You'll need it shortly.

3. Wire it up!
Now it's time to add the Picovoice NuGet package and voice AI model files to the ChessCore project.

static string _platform => RuntimeInformation.IsOSPlatform(OSPlatform.OSX) ? "mac" :
               RuntimeInformation.IsOSPlatform(OSPlatform.Linux) ? "linux" :
               RuntimeInformation.IsOSPlatform(OSPlatform.Windows) ? "windows" : 
               "";

static void RunGame()
{
    // init picovoice platform
    string accessKey = "..."; // replace with your Picovoice AccessKey
    string keywordPath = $"pico_chess_{_platform}.ppn";
    string contextPath = $"chess_{_platform}.rhn";

    using Picovoice picovoice = Picovoice.Create(
            accessKey,
        keywordPath, 
                wakeWordCallback, 
                contextPath, 
                inferenceCallback);

    DrawBoard();

    // start play
    // ...
}

static void WakeWordCallback()
{
    Console.WriteLine("\n Listening for command...");
}

static void InferenceCallback(Inference inference)
{
    // logic for when Rhino infers an intent
}

The .NET SDK Inference class has three immutable properties:

IsUnderstood: whether Rhino Speech-to-Intent matched one of the commands or not
Intent: if understood, which intent was inferred
Slots: if understood, a dictionary with data relating to the intent

If you used the existing YAML file, the intents are move, newGame, undo and quit.
Please note that Slots [dictionary for source and destination coordinates] will only be used with move, and will be empty for other intents. So the inference callback will look something like this:

static void InferenceCallback(Inference inference)
{
    if (inference.IsUnderstood)
    {           
        if (inference.Intent.Equals("move"))
        {
            if (CheckEndGame()) 
                return;

            // get source coordinates
            string srcSide = inference.Slots["srcSide"];
            string srcRank = inference.Slots["srcRank"];
            string srcFile = inference.Slots.ContainsKey("srcFile") ? 
            inference.Slots["srcFile"] : "";

            // get destination cooordinates
            string dstSide = inference.Slots["dstSide"];
            string dstRank = inference.Slots["dstRank"];
            string dstFile = inference.Slots.ContainsKey("dstFile") ? 
            inference.Slots["dstFile"] : "";

            // try to make player move
            string playerMove = MakePlayerMove(srcSide, srcFile, srcRank, 
                                               dstSide, dstFile, dstRank);
            if (playerMove.Equals("Invalid Move"))
            {
                DrawBoard($" {playerMove}\n");
                    return;
            }

            // make opponent move if player move was valid
            string theirMove = MakeOpponentMove();
            DrawBoard($" \u2654  {playerMove}\n \u265A  {theirMove}");

            // end game if necessary
            if (CheckEndGame())
            {
                Console.WriteLine($"\n {GetEndGameReason()}");
                Console.WriteLine($" Say 'new game' to play again.");
            }
        }
        else if (inference.Intent.Equals("undo"))
        {
            UndoLastMove();
        }
        else if (inference.Intent.Equals("newgame"))
        {
            NewGame();
        }
        else if (inference.Intent.Equals("quit"))
        {
            QuitGame();
        }
    }
    else
    {
        DrawBoard(" Didn't understand move.\n");
    }
}

4. Get PicoChess listen to commands:
If you think cross-platform microphone control is challenging in .NET, you're not alone. That's why Picovoice built the PvRecorder.

static bool _quitGame = false;

static void RunGame()
{       
    // init picovoice platform
    string accessKey = "..."; // replace with your Picovoice AccessKey
    string keywordPath = $"pico_chess_{_platform}.ppn";
    string contextPath = $"chess_{_platform}.rhn";

    using Picovoice picovoice = Picovoice.Create(
            accessKey,
        keywordPath, 
                wakeWordCallback, 
                contextPath, 
                inferenceCallback);

    DrawBoard();

    // create and start recording
    using (PvRecorder recorder = PvRecorder.Create(-1, picovoice.FrameLength))
    {
    recorder.Start();

    Console.WriteLine($"Using device: {recorder.SelectedDevice}");
    Console.WriteLine("Listening...");

    while (!_quitGame)
    {
        short[] pcm = recorder.Read();
        picovoice.Process(pcm);

        Thread.Yield();
    }
    }
}

Voila!

Resources:
Original Medium Article
Tutorial Source Code GitHub
Picovoice Console

React Native Speech Recognition Tutorial

Dilek Karasoy — Tue, 28 Feb 2023 05:00:11 +0000

Native apps of Google and Apple process voice data on the device. However, they do not offer it to other developers. Luckily, Picovoice does. On day 40, we'll go over how to process voice data on device by using Picovoice React Native SDK. Picovoice SDK combines Porcupine Wake Word and Rhino Speech-to-Intent engines, enabling commands like "Alexa, set timer for 5 minutes" but even better! We'll use a custom hotword instead of Alexa, and voice commands will be processed with zero latency.

Install the latest Picovoice packages:

npm i @picovoice/react-native-voice-processor
npm i @picovoice/porcupine-react-native
npm i @picovoice/rhino-react-native
npm i @picovoice/picovoice-react-native

2.Initialize the Speech Recognition Platform
First, grab your AccessKey for free from the Picovoice Console.
To keep things straightforward, we’re going to use the pre-trained hotword model Pico Clock and the pre-trained context model Clock for this tutorial. You can download pre-trained models here. However, you can also train custom wake words and contexts on the Picovoice Console.

Now you should have an AccessKey, a Porcupine model (.ppn file), and a Rhino model (.rhn file).
Let's initialize a PicovoiceManager in your React Native app.

import {PicovoiceManager} from '@picovoice/picovoice-react-native';

async createPicovoiceManager() {
    const accessKey = "..."; // your Picovoice AccessKey
    try {
        this._picovoiceManager = await PicovoiceManager.create(
            accessKey,
            '/path/to/keyword.ppn',
            this._wakeWordCallback,
            '/path/to/context.rhn',
            this._inferenceCallback,
            (error) => {
              this._errorCallback(error.message);
            }
        );
    } catch (err) {
        // handle error
    }
}

_wakeWordCallback() {
    // wake word detected!
}

_inferenceCallback(inference) {
    // `inference` has the following fields:
    // (1) isUnderstood
    // (2) intent
    // (3) slots      
}

3. Get Permission to Record Audio
To get the permission on iOS, open your Info.plist and add the following line:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

For Android, open your AndroidManifest.xml and add the following line:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

Then check for permission before proceeding with audio capture:

let recordAudioRequest;
if (Platform.OS == 'android') {
    // For Android, we need to explicitly ask
    recordAudioRequest = this._requestRecordAudioPermission();
} else {
    // iOS automatically asks for permission
    recordAudioRequest = new Promise(function (resolve, _) {
    resolve(true);
    });
}

recordAudioRequest.then((hasPermission) => {
    if (!hasPermission) {
        console.error('Required microphone permission was not granted.');        
        return;
      }

    // start feeding Picovoice
    this._picovoiceManager?.start().then((didStart) => {
    if (didStart) {
      // let app know we're ready to go
    }
  });

});

async _requestRecordAudioPermission() {
    const granted = await PermissionsAndroid.request(
    PermissionsAndroid.PERMISSIONS.RECORD_AUDIO,
    {
        title: 'Microphone Permission',
        message: '[Permission explanation]',
        buttonNeutral: 'Ask Me Later',
        buttonNegative: 'Cancel',
        buttonPositive: 'OK',
    }
    );
    return (granted === PermissionsAndroid.RESULTS.GRANTED)
  }

Once .start() called, Picovoice is listening for the hotword “PicoClock” and follow-up commands.

4. Controlling the App With Voice Inputs
In the source code, you can find a simple clock app with three main components: a clock that shows the time, a timer, and a stopwatch.
Let's connect these three components to the Voice User Interface (VUI):

_wakeWordCallback(keywordIndex){    
  // turn mic blue to show we're listening
  this.setState({    
    isListening: true
  });
}

_inferenceCallback(inference) {
  var tab = this.state.activeTab;  
  if (inference.isUnderstood) {         
    if (inference.intent === 'clock') {
      // show clock
      tab = 'clock';
    } else if (inference.intent === 'timer') {
      // control timer operation
      this._performTimerCommand(inference.slots);
      tab = 'timer';
    } else if (inference.intent === 'setTimer') {
      // set timer duration
      this._setTimer(inference.slots);
      tab = 'timer';
    } else if (inference.intent === 'alarm') {
      // control alarm operation
      this._performAlarmCommand(inference.slots);
      tab = 'clock';
    } else if (inference.intent] === 'setAlarm') {
      // set alarm time
      this._setAlarm(inference.slots);
      tab = 'clock';
    } else if (inference.intent === 'stopwatch') {
      // control stopwatch operation
      this._performStopwatchCommand(inference.slots);
      tab = 'stopwatch';
    }
  }

  // change active tab and show we've stopped listening
  this.setState({
    activeTab: tab,
    isListening: false,
  });
}

Then connect each intent to a specific action taken in the app and pass in the intent’s slots as arguments.

_setTimer(slots) {
  var hours = 0;
  var minutes = 0;
  var seconds = 0;

  // parse duration
  if (slots['hours'] != null) {
    hours = Number.parseInt(slots['hours']);
  }
  if (slots['minutes'] != null) {
    minutes = Number.parseInt(slots['minutes']);
  }
  if (slots['seconds'] != null) {
    seconds = Number.parseInt(slots['seconds']);
  }

  // set timer
  this.setState({
    timerCurrentTime: moment.duration({
      hour: hours,
      minute: minutes,
      second: seconds,
      millisecond: 0,
    }),
    isTimerRunning: true,
  });
}

Voila! Once you connect all the functions with the VUI, you have a hands-free and cross-platform clock app.

Resources:
Original article
Source Code
Picovoice React Native SDK

No More "Hey Google"! Add your Wake Phrase to an Android app

Dilek Karasoy — Fri, 24 Feb 2023 17:27:00 +0000

You cannot change Google hotwords "Hey Google", "OK Google" but you can get your Android app have its own wake phrase with Porcupine Wake Word

Let's get started
Add the Porcupine Wake Word Library

Make sure you have a reference to Maven Central in your project’s build.gradle file:

repositories {
  mavenCentral()
}

Add the following reference to your app’s

build.gradle
file:

dependencies {
  implementation 'ai.picovoice:porcupine-android:${LATEST_VERSION}'
}

Create a Background Service

public class PorcupineService extends Service {

    @Override
    public int onStartCommand(Intent intent, int flags, int startId) {
        return super.onStartCommand(intent, flags, startId);
    }

    @Nullable
    @Override
    public IBinder onBind(Intent intent) {
        return null;
    }

    @Override
    public void onDestroy() {        
        super.onDestroy();
    }
}

in your MainActivity, add code to start and stop the PorcupineService:

private void startService() {
    Intent serviceIntent = new Intent(this, PorcupineService.class);
    ContextCompat.startForegroundService(this, serviceIntent);
}

private void stopService() {
    Intent serviceIntent = new Intent(this, PorcupineService.class);
    stopService(serviceIntent);
}

Request Audio Permissions
In AndroidManifest.xml, add this:

<uses-permission android:name="android.permission.RECORD_AUDIO"/>

Check if you have permission to record audio in the MainActivity and if not, ask the user for it. Add the following code to achieve this:

private boolean hasRecordPermission() {
    return ActivityCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO) 
        == PackageManager.PERMISSION_GRANTED;
}

private void requestRecordPermission() {
    ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.RECORD_AUDIO}, 0);
}

@Override
public void onRequestPermissionsResult(int requestCode, 
                                       @NonNull String[] permissions, 
                                       @NonNull int[] grantResults) {    
    if (grantResults.length == 0 || 
        grantResults[0] == PackageManager.PERMISSION_DENIED) {
        // handle permission denied
    } else {
        startService();
    }
}

Launch Wake Word Engine from a Service
For this demo, we'll use one of built-in keywords, ‘Computer’. However, you can train a custom wake phrase on the Picovoice Console by signing up for free.

You also need an AccessKey which can be found on your Picovoice Console dashboard.

In our PorcupineService class, we’ll create an instance of PorcupineManagerto handle audio capture and processing. The service class now looks like this:

import ai.picovoice.porcupine.Porcupine;
import ai.picovoice.porcupine.PorcupineException;
import ai.picovoice.porcupine.PorcupineManager;

public class PorcupineService extends Service {
  private String accessKey = "..."; // your Picovoice AccessKey
  private PorcupineManager porcupineManager;  

  @Override
  public int onStartCommand(Intent intent, int flags, int startId) {
    try {
        porcupineManager = new PorcupineManager.Builder()
                .setAccessKey(accessKey)
                .setKeyword(Porcupine.BuiltInKeyword.COMPUTER)
                .setSensitivity(0.7f)
                .build(getApplicationContext(),
                        (keywordIndex) -> {
                          // wake word detected!
                        });
        porcupineManager.start();
    } catch (PorcupineException e) {
        Log.e("PORCUPINE_SERVICE", e.toString());
    }
    return super.onStartCommand(intent, flags, startId);
  }

  @Nullable
  @Override
  public IBinder onBind(Intent intent) {
      return null;
  }

  @Override
  public void onDestroy() {
      try {
          porcupineManager.stop();
          porcupineManager.delete();
      } catch (PorcupineException e) {
          Log.e("PORCUPINE", e.toString());
      }
      super.onDestroy();
  }
}

Voila!

Resources:
The tutorial was originally published on Medium.
Porcupine Android SDK
Tutorial Source Code
Picovoice.ai

Stuck with Java Speech API?

Dilek Karasoy — Thu, 23 Feb 2023 16:59:24 +0000

If you find adding voice inputs to a Java application difficult, you're not alone. JDK's Speech API relies on outdated products and third-party cloud providers.

But we have a good news, on day 37, we'll add custom voice commands, like Jarvis, turn off the lights, with Picovoice Java SDK.

The Picovoice SDK combines Porcupine Wake Word and Rhino Speech-to-Intent engines. Wake words like Jarvis, powered by Porcupine and follow-up commands like turn off the lights powered by Rhino.

Let's get started

Get the latest version of the SDK from Maven Central Repository:

ai.picovoice:picovoice-java:${version}

Design and train models on the Picovoice Platform While developers can train and adapt custom voice AI models on the Picovoice Console. However, you can use pre-trained ones, too! For this tutorial, we'll use the pre-trained [Jarvis](https://github.com/Picovoice/porcupine/tree/master/resources/keyword_files) wake word and the [Smart Lighting](https://github.com/Picovoice/rhino/tree/master/resources/contexts) context, which understands commands that change the color/state of lights.
Get an AccessKey Picovoice If you still haven't, create an account on the Picovoice Console for free and grab your AccessKey.
Initialize the Picovoice Platform

import ai.picovoice.picovoice.*;

final String accessKey = "..."; // your Picovoice AccessKey
final String keywordPath = "res/path/to/jarvis.ppn";
final String contextPath = "res/path/to/smart_lighting.rhn";

PicovoiceWakeWordCallback wakeWordCallback = () -> { 
  System.out.println("Wake word detected!");
  // let user know wake word was detected
};
PicovoiceInferenceCallback inferenceCallback = inference -> { 
  if (inference.getIsUnderstood()) {
    final String intent = inference.getIntent();
    final Map<String, String> slots = inference.getSlots();
    // use intent and slots to trigger action
  }
}; 

Picovoice picovoice = new Picovoice.Builder()
                    .setAccessKey(accessKey)
                    .setKeywordPath(keywordPath)
                    .setWakeWordCallback(wakeWordCallback)
                    .setContextPath(contextPath)
                    .setInferenceCallback(inferenceCallback)
                    .build();

Do not forget to replace AccessKey, KeywordPath, and ContextPath placeholders. Copy your AccessKey from Picovoice Console. The path for the downloaded Porcupine keyword and Rhino Context files, depends on where you save them when you download.

Read and Process Microphone Audio

import javax.sound.sampled.*;

// get default audio capture device
AudioFormat format = new AudioFormat(16000f, 16, 1, true, false);
DataLine.Info dataLineInfo = new DataLine.Info(TargetDataLine.class, format);
TargetDataLine micDataLine;
try {
    micDataLine = (TargetDataLine) AudioSystem.getLine(dataLineInfo);
    micDataLine.open(format);
} catch (LineUnavailableException e) {
    System.err.println("Failed to get a valid audio capture device.");    
    return;
}

// start audio capture
micDataLine.start();

Create a loop that reads microphone data and passes it to Picovoice:

// buffers for processing audio
short[] picovoiceBuffer = new short[picovoice.getFrameLength()];
ByteBuffer captureBuffer = ByteBuffer.allocate(picovoice.getFrameLength() * 2);
captureBuffer.order(ByteOrder.LITTLE_ENDIAN);

int numBytesRead;
boolean recordingCancelled = false;
while (!recordingCancelled) {

    // read a buffer of audio
    numBytesRead = micDataLine.read(captureBuffer.array(), 0, captureBuffer.capacity());

    // don't pass to Picovoice if we don't have a full buffer
    if (numBytesRead != picovoice.getFrameLength() * 2) {
        continue;
    }

    // copy into 16-bit buffer
    captureBuffer.asShortBuffer().get(picovoiceBuffer);

    // process with picovoice
    picovoice.process(picovoiceBuffer);
}

This tutorial was originally published on Medium

Resources:
Source Code
Picovoice
Picovoice Java SDK (https://picovoice.ai/docs/quick-start/picovoice-java/)

Hotword Activated Angular Application with Porcupine and Web Speech API

Dilek Karasoy — Wed, 22 Feb 2023 17:32:07 +0000

On day 36, Let's build an Angular app that is activated by a hotword to run the Web Speech API. We'll use Porcupine Angular SDK.

1. Install the packages
Set up a new Angular project and install the following packages:

yarn add @picovoice/porcupine-angular @picovoice/web-voice-processor

web-voice-processor accesses the microphone and converts the stream of audio data into speech recognition format.
porcupine-angular provides the Angular PorcupineService.

Go to GitHub and download the demo repository and git clone https://github.com/Picovoice/porcupine.git cd porcupine/demo/angular-stt yarn yarn start

This will start a server on http://localhost:4200. Upon loading, allow microphone permission. You will need to enter your AccessKey from the Picovoice Console.

Now, the Angular app is able to start with the hotword “Okay Google" and transcribe speech to text with Web Speech API. Web Speech API is only available for Chrome. If you're looking for a transcription engine runs in all modern browsers, check out Picovoice STTs.

Resources:
Original Artcile on Medium
Picovoice Porcupine
WebSpeech API
GitHub Repository for the Tutorial

DEV Community: Dilek Karasoy

Build your own Krisp App: Noise Cancellation with Python

Speaker Identification for Streaming with Python

Step 1: Enrollment

Step 2: Recognition

Live Audio Transcription with Python For Free

Voice AI with Raspberry Pi - ReSpeaker

000000 Terminator

No Alexa, No Google: Custom Hotword Detection Arduino

End-to-End Speech Recognition with Python

Podcast Transcription Software with Express.js

"Pico Chess, start a new game": .NET Speech Recognition Tutorial

React Native Speech Recognition Tutorial

No More "Hey Google"! Add your Wake Phrase to an Android app

Stuck with Java Speech API?

Hotword Activated Angular Application with Porcupine and Web Speech API