Ensuring legal and policy compliance is a critical issue for the folks managing and leading a call center operation. In the following post, we'll dig into how Deepgram's speech AI platform can integrate into monitoring and compliance workflows.
Whenever an agent speaks with a customer, it can be helpful to get a call transcript in real-time and detect if the agent is complying with standards. For example, a common phrase that everyone has likely heard when calling customer service is “this call may be recorded for quality assurance purposes”. Most times, the customer service agent is legally required to inform the customer that the call is recorded.
We’ll use Python and Deepgram's speech-to-text API to see how simple it is to receive a transcript with live streaming in real time. We’ll also tap into some features that will recognize each speaker in the conversation, quickly search through the transcript for a phrase and recognize words that the model hasn’t been trained on or hasn’t encountered frequently.
Before You Start with Compliance Monitoring in Python
In this post, I’m using Python 3.10, so if you want to follow along, make sure you have that version installed. You will also need to grab a Deepgram API Key, which you can get here.
Next: Create a directory, I called mine monitor_compliance
.
Then: Go to that directory and create a virtual environment inside so all of the Python libraries can be installed there instead of globally on your computer. To install the virtual environment run the following command inside your directory in the terminal: python3 -m venv venv
. Now activate it by doing this: source venv/bin/activate
.
Installing Python Packages for Compliance Monitoring with Speech to Text
You’ll need to install some Python packages inside your virtual environment for the project to work properly. You can use Python’s pip
command to install these packages. Make sure your virtual environment is active. Then, from your terminal, install the following:
pip install PyAudio
pip install websockets
You’ll only need two Python libraries, PyAudio
and websockets
. The PyAudio library allows you to get sound from your computer’s microphone. The WebSockets Python library is used too since we’re working with live streaming. Deepgram also has a Python SDK but in this post, we’ll hit the API endpoint directly.
Python Code Dependencies and File Setup
Create an empty Python file called monitor.py
and add the following import statements:
import pyaudio
import asyncio
import websockets
import os
import json
from pprint import pprint
Next, add your Deepgram API Key:
DEEPGRAM_API_KEY=’REPLACE_WITH_YOUR_DEEPGRAM_API_KEY’
Define the Python Variables
Below the DEEPGRAM_API_KEY
you’ll need to define some Python variables. The constants are PyAudio related and the audio_queue is an asynchronous queue that we’ll use throughout our code.
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000
audio_queue = asyncio.Queue()
The Python Callback Code for Compliance Monitoring with Speech to Text
We need this callback to pass as an argument when we create our PyAudio object to get the audio.
def callback(input_data, frame_count, time_info, status_flags):
# Put an item into the queue without blocking.
audio_queue.put_nowait(input_data)
return (input_data, pyaudio.paContinue)
Getting the Microphone Audio in Python
We connect right away to the microphone in this asynchronous function, create our PyAudio object and open a stream.
async def microphone():
audio = pyaudio.PyAudio()
stream = audio.open(
format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = CHUNK,
stream_callback = callback
)
stream.start_stream()
while stream.is_active():
await asyncio.sleep(0.1)
stream.stop_stream()
stream.close()
Open the Websocket and Connect to Deepgram Real Time Speech to Text
This code authorizes Deepgram and opens the WebSocket to allow real-time audio streaming. We are passing in some of the Deepgram features in the API call like:
diarize
- captures each speaker in the transcript and gives them an ID.
search
- searches for the phrase in the transcript "this call may be recorded for quality and training purposes".
keywords
- correctly identifies the participant's last name and terminology
async def process():
extra_headers = {
'Authorization': 'token ' + DEEPGRAM_API_KEY
}
async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1&'\
'&punctuate=true' \
'&diarize=true' \
'&search=this+call+may+be+recorded+for+quality+and+training+purposes' \
'&keywords=Warrens:2' \
'&keyword_boost=standard',
extra_headers = extra_headers) as ws:
async def sender(ws):
try:
while True:
data = await audio_queue.get()
await ws.send(data)
except Exception as e:
print('Error while sending: ', + str(e))
raise
async def receiver(ws): # receives the transcript
async for msg in ws:
msg = json.loads(msg)
pprint(msg)
transcript = msg['channel']['alternatives'][0]['transcript']
words = msg['channel']['alternatives'][0]['words']
for speaker in words:
print(f"Speaker {speaker['speaker']}: {transcript} ")
break
await asyncio.gather(sender(ws), receiver(ws))
Run the Python Code for Compliance Monitoring
Finally, we get to run the code for the project. To do so, add the below lines, and from your terminal type the following command: python3 monitor.py
:
async def run():
await asyncio.gather(microphone(),process())
if __name__ == '__main__':
asyncio.run(run())
Depending on the streaming audio used, you can expect to get a response like the following:
Diarization
Speaker 0: Hello.
Speaker 0: Can you hear me?
Speaker 0: Hello, and thank you for calling Premier phone service.
Speaker 0: Be aware that this call may be recorded for quality and training purposes. My name is Beth and will be assisting you today.
Speaker 0: How are you doing?
Speaker 1: Not too bad.
Speaker 1: How are you today?
Speaker 0: I'm doing well. Thank you. May I please have your name?
Speaker 1: My name is Blake Warren.
Search
'search': [{'hits': [{'confidence': 0.8900703,
'end': 15.27,
'snippet': 'this call may be recorded for '
'quality and training purposes '
'my name is',
'start': 11.962303},
{'confidence': 0.3164375,
'end': 17.060001,
'snippet': 'and training purposes my name '
'is beth and i will be assisting '
'you today',
'start': 13.546514}],
'query': 'this call may be recorded for quality and '
'training purposes'}]},
Extending the Project Compliance Monitoring with Speech to Text
Hopefully, you had fun working on this project. Monitoring compliance in call centers with Python and Deepgram can be simple and straightforward. You can extend the project further by using some of Deepgram’s other features for streaming.
The final code for this project is as follows:
import pyaudio
import asyncio
import websockets
import os
import json
from pprint import pprint
DEEPGRAM_API_KEY = "YOUR_DEEPGRAM_API_KEY"
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000
audio_queue = asyncio.Queue()
def callback(input_data, frame_count, time_info, status_flags):
audio_queue.put_nowait(input_data)
return (input_data, pyaudio.paContinue)
async def microphone():
audio = pyaudio.PyAudio()
stream = audio.open(
format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = CHUNK,
stream_callback = callback
)
stream.start_stream()
while stream.is_active():
await asyncio.sleep(0.1)
stream.stop_stream()
stream.close()
async def process():
extra_headers = {
'Authorization': 'token ' + DEEPGRAM_API_KEY
}
async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1&'\
'&punctuate=true' \
'&diarize=true' \
'&search=this+call+may+be+recorded+for+quality+and+training+purposes' \
'&keywords=Warrens:2' \
'&keyword_boost=standard',
extra_headers = extra_headers) as ws:
async def sender(ws):
try:
while True:
data = await audio_queue.get()
await ws.send(data)
except Exception as e:
print('Error while sending: ', + str(e))
raise
async def receiver(ws):
async for msg in ws:
msg = json.loads(msg)
pprint(msg)
transcript = msg['channel']['alternatives'][0]['transcript']
words = msg['channel']['alternatives'][0]['words']
for speaker in words:
print(f"Speaker {speaker['speaker']}: {transcript} ")
break
await asyncio.gather(sender(ws), receiver(ws))
async def run():
await asyncio.gather(microphone(),process())
if __name__ == '__main__':
asyncio.run(run())
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.
Top comments (0)