Integrating OpenAI's LLM with Twilio Using Vercel AI SDK
In this guide, we'll walk through the process of integrating OpenAI's language models with Twilio's Conversation Relay using the Vercel AI SDK. This integration allows you to create a virtual voice assistant that can handle user queries and provide information via a phone call. We'll cover setting up the project, configuring Redis, and running the project. Additionally, we'll explain how the bufferTransform
function helps in sending larger chunks of data to Twilio, avoiding the inefficiency of sending one token at a time.
Prerequisites
- Node.js and npm installed on your machine.
- A Twilio account.
- An OpenAI API key.
- A Redis instance for managing conversation state.
Step 1: Setting Up the Project
First, create a new directory for your project and initialize it with npm:
mkdir twilio-openai-integration
cd twilio-openai-integration
npm init -y
Install the necessary dependencies:
npm install ai express express-ws redis twilio @ai-sdk/openai uuid ws dotenv
npm install --save-dev typescript @types/node @types/ws @types/express-ws @types/express
Step 2: Project Structure
Create the following file structure:
twilio-openai-integration/
│
├── managers/
│ └── ConversationManager.ts
│
├── types/
│ └── twilio.ts
│
├── utils/
│ └── bufferTransform.ts
│
├── .env
└── index.ts
Step 3: Environment Configuration
Create a .env
file in the root of your project and add your environment variables:
OPENAI_API_KEY=your-openai-api-key
PORT=5000
REDIS_URL=redis://localhost:6379
SERVER_DOMAIN=http://localhost:5000
TAVILY_API_KEY=your-twilio-api-key
Step 4: Implementing the Server
In index.ts
, implement the server logic:
import express from "express";
import ExpressWs from "express-ws";
import VoiceResponse from "twilio/lib/twiml/VoiceResponse";
import { CoreMessage, streamText } from "ai";
import { openai } from "@ai-sdk/openai";
import { v4 as uuid } from "uuid";
import { type WebSocket } from "ws";
import "dotenv/config";
import { ConversationManager } from "./managers/ConversationManager";
import { EventMessage } from "./types/twilio";
import { bufferTransform } from "./utils/bufferTransform";
const app = ExpressWs(express()).app;
const PORT = parseInt(process.env.PORT || "5000");
const welcomeGreeting = "Hi there! How can I help you today?";
const systemInstructions =
"You are a virtual voice assistant. You can help the user with their questions and provide information.";
app.use(express.urlencoded({ extended: false }));
app.post("/call/incoming", async (_, res) => {
const response = new VoiceResponse();
response.connect().conversationRelay({
url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
welcomeGreeting,
});
res.writeHead(200, { "Content-Type": "text/xml" });
res.end(response.toString());
});
app.ws("/call/connection", (ws: WebSocket) => {
const sessionId = uuid();
ws.on("message", async (data: string) => {
const event: EventMessage = JSON.parse(data);
const conversation = new ConversationManager(sessionId);
if (event.type === "setup") {
// Add welcome message to conversation transcript
const welcomeMessage: CoreMessage = {
role: "assistant",
content: welcomeGreeting,
};
await conversation.addMessage(welcomeMessage);
} else if (event.type === "prompt") {
// Add user message to conversation and retrieve all messages
const message: CoreMessage = { role: "user", content: event.voicePrompt };
await conversation.addMessage(message);
const messages = await conversation.getMessages();
const controller = new AbortController();
// Stream text from OpenAI model
const { textStream, text: completeText } = await streamText({
abortSignal: controller.signal,
experimental_transform: bufferTransform,
model: openai("gpt-4o-mini"),
messages,
maxSteps: 10,
system: systemInstructions,
});
// Iterate over text stream and send messages to Twilio
for await (const text of textStream) {
if (controller.signal.aborted) {
break;
}
ws.send(
JSON.stringify({
type: "text",
token: text,
last: false,
})
);
}
// Send last message to Twilio
if (!controller.signal.aborted) {
ws.send(
JSON.stringify({
type: "text",
token: "",
last: true,
})
);
}
// Add complete text to conversation transcript
const agentMessage: CoreMessage = {
role: "assistant",
content: await completeText,
};
void conversation.addMessage(agentMessage);
} else if (event.type === "end") {
// Clear conversation transcript when call ends
void conversation.clearMessages();
}
});
ws.on("error", console.error);
});
app.listen(PORT, () => {
console.log(`Local: http://localhost:${PORT}`);
console.log(`Remote: https://${process.env.SERVER_DOMAIN}`);
});
Explanation
-
Express and WebSocket Setup: We use
express-ws
to handle WebSocket connections, which are essential for real-time communication with Twilio's Conversation Relay. - Twilio VoiceResponse: This sets up a Twilio call and connects it to our WebSocket endpoint.
-
WebSocket Handling: We handle different types of events (
setup
,prompt
,end
) to manage the conversation state and interact with the OpenAI model. -
OpenAI Integration: We use the Vercel AI SDK to stream text from OpenAI's model, transforming it with
bufferTransform
to send larger chunks.
Step 5: Implementing bufferTransform
In utils/bufferTransform.ts
, implement the buffer transformation logic:
import { StreamTextTransform, TextStreamPart } from "ai";
export const bufferTransform: StreamTextTransform<any> = () => {
let buffer = "";
let threshold = 200;
return new TransformStream<TextStreamPart<any>, TextStreamPart<any>>({
transform(chunk, controller) {
if (chunk.type === "text-delta") {
buffer += chunk.textDelta;
if (buffer.length >= threshold) {
controller.enqueue({ ...chunk, textDelta: buffer });
buffer = "";
if (threshold < 5000) {
threshold += 200;
}
}
} else {
controller.enqueue(chunk);
}
},
flush(controller) {
if (buffer.length > 0) {
controller.enqueue({ type: "text-delta", textDelta: buffer });
}
},
});
};
Explanation
-
Buffering: The
bufferTransform
function accumulates text tokens into a buffer. Once the buffer reaches a certain size (threshold
), it sends the accumulated text as a single chunk. - Dynamic Threshold: The threshold increases gradually to optimize the size of the chunks being sent, improving efficiency by reducing the number of WebSocket messages.
Step 6: Running the Project
Ensure your Redis instance is running and accessible. Then, start your server:
npm run build
node dist/index.ts
Your server should now be running, ready to handle incoming calls and relay conversations through Twilio.
Conclusion
By following these steps, you've set up a system that integrates OpenAI's language models with Twilio's Conversation Relay, using the Vercel AI SDK. This setup allows for efficient communication by buffering text tokens and sending them in larger chunks, enhancing the performance of your virtual voice assistant.
Full Code on GitHub
You can view the full code for this project on GitHub: GitHub Repository
Top comments (0)