Real-time Speaker Identification in Python

August 22, 2024 · 2 min read

Speaker Recognition (or Speaker Identification) analyzes distinctive voice characteristics to identify and verify speakers. It is the technology behind voice authentication, speaker-based personalization, and speaker spotting. However, many applications of Speaker Recognition suffer from the high latency of cloud-based services, leading to poor user experience. That is where Picovoice's Eagle Speaker Recognition SDK comes in, offering on-device Speaker Recognition without sacrificing accuracy. What's more, Eagle Speaker Recognition makes it so easy, you can add Speaker Recognition to your app in just a few lines of Python.

Speaker Recognition typically requires two steps. The first step is speaker Enrollment, where a speaker's voice is registered using a short clip of audio to produce a Speaker Profile. The second step is Recognition, where the Speaker Profile is used to detect when that speaker is speaking given an audio stream.

Let's see how to use the Eagle Speaker Recognition Python SDK / API to implement a speaker recognition app!

Setup
Install pveagle using pip. We will be using pvrecorder to get cross-platform audio, so install that as well:

pip3 install pveagle pvrecorder
Lastly, you will need a Picovoice AccessKey, which can be obtained with a free Picovoice Console account.

Enroll a speaker
Import pveagle and create an instance of the EagleProfiler class:

import pveagle

access_key = "{YOUR_ACCESS_KEY}";
try:
    eagle_profiler = pveagle.create_profiler(access_key=access_key)
except pveagle.EagleError as e:
    # Handle error
    pass

Now, import pvrecorder and create an instance of the recorder as well. Use the EagleProfiler's .min_enroll_samples as the frame_length:

from pvrecorder import PvRecorder

DEFAULT_DEVICE_INDEX = -1
recorder = PvRecorder(
    device_index=DEFAULT_DEVICE_INDEX,
    frame_length=eagle_profiler.min_enroll_samples)

Now it's time to enroll a speaker. The .enroll() function takes in frames of audio and provides feedback on the audio quality and Enrollment percentage. Use the percentage value to know when Enrollment is done and another speaker can be enrolled:

recorder.start()

enroll_percentage = 0.0
while enroll_percentage < 100.0:
    audio_frame = recorder.read()
    enroll_percentage, feedback = eagle_profiler.enroll(audio_frame)

recorder.stop()

Once Enrollment reaches 100%, export the speaker profile to use in the next step, Speaker Recognition:

speaker_profile = eagle_profiler.export()

The speaker_profile object can be saved and reused; see the docs for more details. Profiles can be made for additional users by calling the .reset() function on the EagleProfiler, and repeating the .enroll() step.

Once profiles have been created for all speakers, don't forget to clean up used resources:

recorder.delete()
eagle_profiler.delete()

Perform recognition
Import pveagle and create an instance of the Eagle class, using the speaker profiles created by the Enrollment step:

import pveagle

access_key = "{YOUR_ACCESS_KEY}"
profiles = [speaker_profile_1, speaker_profile_2]
try:
    eagle = pveagle.create_recognizer(
        access_key=access_key,
        speaker_profiles=profiles)
except pveagle.EagleError as e:
    # Handle error
    pass

Now set up pvrecorder to use with Eagle:

recorder = PvRecorder(
    device_index=DEFAULT_DEVICE_INDEX,
    frame_length=eagle.frame_length)

Pass audio frames into the eagle.process() function get back speaker scores:

while True:
    audio_frame = recorder.read()
    scores = eagle.process(audio_frame)

When finished, don't forget to clean up used resources:

eagle.delete()

Putting It All Together
Here is an example program bringing together everything that has been shown so far:

import pveagle
from pvrecorder import PvRecorder

DEFAULT_DEVICE_INDEX = -1
access_key = "{YOUR_ACCESS_KEY}";

# Step 1: Enrollment
try:
    eagle_profiler = pveagle.create_profiler(access_key=access_key)
except pveagle.EagleError as e:
    pass

enroll_recorder = PvRecorder(
    device_index=DEFAULT_DEVICE_INDEX,
    frame_length=eagle_profiler.min_enroll_samples)

enroll_recorder.start()

enroll_percentage = 0.0
while enroll_percentage < 100.0:
    audio_frame = enroll_recorder.read()
    enroll_percentage, feedback = eagle_profiler.enroll(audio_frame)

enroll_recorder.stop()

speaker_profile = eagle_profiler.export()

enroll_recorder.delete()
eagle_profiler.delete()

# Step 2: Recognition
try:
    eagle = pveagle.create_recognizer(
        access_key=access_key,
        speaker_pofiles=[speaker_profile])
except pveagle.EagleError as e:
    pass

recognizer_recorder = PvRecorder(
    device_index=DEFAULT_DEVICE_INDEX,
    frame_length=eagle.frame_length)

recognizer_recorder.start()

while True:
    audio_frame = recorder.read()
    scores = eagle.process(audio_frame)
    print(scores)

recognizer_recorder.stop()

recognizer_recorder.delete()
eagle.delete()

It just takes 2 minutes to get it up and running:

Next Steps
See the GitHub Python Demo for a more complete example, including how to handle Enrollment feedback, save Speaker Profiles to disk and use files as the audio input. You can also view the Python API docs for details on the package.