Introduction
Voicemail has remained largely unchanged for decades. In a world where real-time communication is key, users often find voicemail tedious and inefficient. What if AI could transcribe, summarize, and even rank voicemail messages to help users stay in control?
That’s exactly what Vernon AI does! In this guide, we’ll build an AI-powered voicemail assistant using Twilio for call forwarding and voicemail handling and OpenAI’s Whisper and GPT APIs for transcription and intelligent summaries.
By the end of this tutorial, you’ll know how to:
• Set up Twilio to receive and store voicemails.
• Use OpenAI’s Whisper API to transcribe voicemails into text.
• Leverage GPT-4 to generate concise voicemail summaries.
• Store and retrieve data using MongoDB.
Let’s dive in!
Prerequisites
To follow along, you’ll need:
•A Twilio account (sign up at Twilio).
•An OpenAI API key (get one from OpenAI).
•A MongoDB database for storing voicemails.
•Python 3.9+ and Flask (for backend API development).
•Basic knowledge of REST APIs and webhooks.
Step 1: Setting Up Twilio to Receive Voicemails
1.1 Buy a Twilio Phone Number
Twilio provides virtual phone numbers that can receive calls and record voicemails. After signing up:
1.Go to Twilio Console > Phone Numbers.
2.Buy a local or toll-free number.
3.Under Voice & Fax, set the Webhook URL to your server (e.g., https://yourdomain.com/twilio/answer).
1.2 Twilio Webhook for Answering Calls
When a call is received, our webhook will play a greeting and start recording the voicemail.
from flask import Flask, request, Response
from twilio.twiml.voice_response import VoiceResponse
app = Flask(__name__)
@app.route("/twilio/answer", methods=["POST"])
def answer_call():
response = VoiceResponse()
response.say("Hi, you've reached Vernon AI. Please leave a message after the beep.")
response.record(action="/twilio/voicemail_callback", max_length=120, finish_on_key="#")
response.say("Thank you. Goodbye.")
return Response(str(response), mimetype="text/xml")
if __name__ == "__main__":
app.run(port=5000, debug=True)
This function:
•Greets the caller and instructs them to leave a message.
•Records the voicemail and sends it to /twilio/voicemail_callback.
•Says goodbye once recording is complete.
Step 2: Handling Voicemail Callbacks
Twilio will send the voicemail recording URL and caller information to our /twilio/voicemail_callback endpoint.
import os
import requests
from pymongo import MongoClient
from dotenv import load_dotenv
load_dotenv()
MONGODB_URI = os.getenv("MONGODB_URI")
TWILIO_ACCOUNT_SID = os.getenv("TWILIO_ACCOUNT_SID")
TWILIO_AUTH_TOKEN = os.getenv("TWILIO_AUTH_TOKEN")
client = MongoClient(MONGODB_URI)
db = client["voicemail_db"]
voicemails = db["voicemails"]
@app.route("/twilio/voicemail_callback", methods=["POST"])
def voicemail_callback():
recording_url = request.form.get("RecordingUrl")
caller_number = request.form.get("From")
if not recording_url:
return "No Recording URL", 400
voicemail_entry = {
"caller": caller_number,
"audio_url": f"{recording_url}.mp3",
"transcript": "",
"summary": ""
}
voicemails.insert_one(voicemail_entry)
return "Voicemail recorded", 200
This stores voicemail metadata (caller, audio URL) into MongoDB.
Step 3: Transcribing Voicemails with OpenAI Whisper
Now, let’s transcribe the voicemail using OpenAI Whisper.
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
def transcribe_voicemail(audio_url):
response = requests.get(audio_url)
with open("voicemail.mp3", "wb") as f:
f.write(response.content)
with open("voicemail.mp3", "rb") as f:
transcript = openai.Audio.transcribe("whisper-1", f)
return transcript["text"]
This function downloads the voicemail audio and transcribes it.
Step 4: Generating a Summary Using GPT-4
After transcription, we can summarize the voicemail with OpenAI’s GPT-4.
def summarize_voicemail(transcript):
prompt = f"""
Summarize this voicemail in a professional and concise way:
"{transcript}"
"""
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response["choices"][0]["message"]["content"].strip()
Now, let’s update our MongoDB entry with the transcription and summary.
def process_voicemail(voicemail_id, audio_url):
transcript = transcribe_voicemail(audio_url)
summary = summarize_voicemail(transcript)
voicemails.update_one(
{"_id": voicemail_id},
{"$set": {"transcript": transcript, "summary": summary}}
)
Step 5: Displaying Voicemails in a Web Dashboard
You can now build a frontend to display the summarized voicemails, with:
•Caller ID
•Timestamp
•Transcript & Summary
•Audio Playback
You can use React, Next.js, or any frontend framework to fetch and display this data.
Conclusion
In this guide, we built an AI-powered voicemail assistant that:
✅ Answers calls and records voicemails using Twilio
✅ Transcribes messages with OpenAI Whisper
✅ Generates intelligent summaries with GPT-4
✅ Stores and retrieves voicemails using MongoDB
This is just the beginning! You can expand this project by:
• Adding voicemail categorization (urgent, spam, etc.).
• Enabling SMS/email notifications with summaries.
• Creating a voice-based chatbot to interact with callers.
Let me know what you think! Would you use an AI-powered voicemail assistant?
https://www.vernonaisolutions.com/
Top comments (0)