TJ Durnford

Posted on Feb 21, 2025

Connecting LLMs to Twilio: A Step-by-Step Guide

#twilio #openai #ai #llm

Integrating OpenAI's LLM with Twilio Using Vercel AI SDK

In this guide, we'll walk through the process of integrating OpenAI's language models with Twilio's Conversation Relay using the Vercel AI SDK. This integration allows you to create a virtual voice assistant that can handle user queries and provide information via a phone call. We'll cover setting up the project, configuring Redis, and running the project. Additionally, we'll explain how the bufferTransform function helps in sending larger chunks of data to Twilio, avoiding the inefficiency of sending one token at a time.

Prerequisites

Node.js and npm installed on your machine.
A Twilio account.
An OpenAI API key.
A Redis instance for managing conversation state.

Step 1: Setting Up the Project

First, create a new directory for your project and initialize it with npm:

mkdir twilio-openai-integration
cd twilio-openai-integration
npm init -y

Install the necessary dependencies:

npm install ai express express-ws redis twilio @ai-sdk/openai uuid ws dotenv
npm install --save-dev typescript @types/node @types/ws @types/express-ws @types/express

Step 2: Project Structure

Create the following file structure:

twilio-openai-integration/
│
├── managers/
│   └── ConversationManager.ts
│
├── types/
│   └── twilio.ts
│
├── utils/
│   └── bufferTransform.ts
│
├── .env
└── index.ts

Step 3: Environment Configuration

Create a .env file in the root of your project and add your environment variables:

OPENAI_API_KEY=your-openai-api-key
PORT=5000
REDIS_URL=redis://localhost:6379
SERVER_DOMAIN=http://localhost:5000
TAVILY_API_KEY=your-twilio-api-key

Step 4: Implementing the Server

In index.ts, implement the server logic:

import express from "express";
import ExpressWs from "express-ws";
import VoiceResponse from "twilio/lib/twiml/VoiceResponse";
import { CoreMessage, streamText } from "ai";
import { openai } from "@ai-sdk/openai";
import { v4 as uuid } from "uuid";
import { type WebSocket } from "ws";
import "dotenv/config";

import { ConversationManager } from "./managers/ConversationManager";
import { EventMessage } from "./types/twilio";
import { bufferTransform } from "./utils/bufferTransform";

const app = ExpressWs(express()).app;
const PORT = parseInt(process.env.PORT || "5000");

const welcomeGreeting = "Hi there! How can I help you today?";
const systemInstructions =
  "You are a virtual voice assistant. You can help the user with their questions and provide information.";

app.use(express.urlencoded({ extended: false }));

app.post("/call/incoming", async (_, res) => {
  const response = new VoiceResponse();

  response.connect().conversationRelay({
    url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
    welcomeGreeting,
  });

  res.writeHead(200, { "Content-Type": "text/xml" });
  res.end(response.toString());
});

app.ws("/call/connection", (ws: WebSocket) => {
  const sessionId = uuid();

  ws.on("message", async (data: string) => {
    const event: EventMessage = JSON.parse(data);
    const conversation = new ConversationManager(sessionId);

    if (event.type === "setup") {
      // Add welcome message to conversation transcript
      const welcomeMessage: CoreMessage = {
        role: "assistant",
        content: welcomeGreeting,
      };

      await conversation.addMessage(welcomeMessage);
    } else if (event.type === "prompt") {
      // Add user message to conversation and retrieve all messages
      const message: CoreMessage = { role: "user", content: event.voicePrompt };
      await conversation.addMessage(message);
      const messages = await conversation.getMessages();

      const controller = new AbortController();

      // Stream text from OpenAI model
      const { textStream, text: completeText } = await streamText({
        abortSignal: controller.signal,
        experimental_transform: bufferTransform,
        model: openai("gpt-4o-mini"),
        messages,
        maxSteps: 10,
        system: systemInstructions,
      });

      // Iterate over text stream and send messages to Twilio
      for await (const text of textStream) {
        if (controller.signal.aborted) {
          break;
        }

        ws.send(
          JSON.stringify({
            type: "text",
            token: text,
            last: false,
          })
        );
      }

      // Send last message to Twilio
      if (!controller.signal.aborted) {
        ws.send(
          JSON.stringify({
            type: "text",
            token: "",
            last: true,
          })
        );
      }

      // Add complete text to conversation transcript
      const agentMessage: CoreMessage = {
        role: "assistant",
        content: await completeText,
      };

      void conversation.addMessage(agentMessage);
    } else if (event.type === "end") {
      // Clear conversation transcript when call ends
      void conversation.clearMessages();
    }
  });

  ws.on("error", console.error);
});

app.listen(PORT, () => {
  console.log(`Local: http://localhost:${PORT}`);
  console.log(`Remote: https://${process.env.SERVER_DOMAIN}`);
});

Explanation

Express and WebSocket Setup: We use express-ws to handle WebSocket connections, which are essential for real-time communication with Twilio's Conversation Relay.
Twilio VoiceResponse: This sets up a Twilio call and connects it to our WebSocket endpoint.
WebSocket Handling: We handle different types of events (setup, prompt, end) to manage the conversation state and interact with the OpenAI model.
OpenAI Integration: We use the Vercel AI SDK to stream text from OpenAI's model, transforming it with bufferTransform to send larger chunks.

Step 5: Implementing `bufferTransform`

In utils/bufferTransform.ts, implement the buffer transformation logic:

import { StreamTextTransform, TextStreamPart } from "ai";

export const bufferTransform: StreamTextTransform<any> = () => {
  let buffer = "";
  let threshold = 200;

  return new TransformStream<TextStreamPart<any>, TextStreamPart<any>>({
    transform(chunk, controller) {
      if (chunk.type === "text-delta") {
        buffer += chunk.textDelta;
        if (buffer.length >= threshold) {
          controller.enqueue({ ...chunk, textDelta: buffer });
          buffer = "";
          if (threshold < 5000) {
            threshold += 200;
          }
        }
      } else {
        controller.enqueue(chunk);
      }
    },
    flush(controller) {
      if (buffer.length > 0) {
        controller.enqueue({ type: "text-delta", textDelta: buffer });
      }
    },
  });
};

Explanation

Buffering: The bufferTransform function accumulates text tokens into a buffer. Once the buffer reaches a certain size (threshold), it sends the accumulated text as a single chunk.
Dynamic Threshold: The threshold increases gradually to optimize the size of the chunks being sent, improving efficiency by reducing the number of WebSocket messages.

Step 6: Running the Project

Ensure your Redis instance is running and accessible. Then, start your server:

npm run build
node dist/index.ts

Your server should now be running, ready to handle incoming calls and relay conversations through Twilio.

Conclusion

By following these steps, you've set up a system that integrates OpenAI's language models with Twilio's Conversation Relay, using the Vercel AI SDK. This setup allows for efficient communication by buffering text tokens and sending them in larger chunks, enhancing the performance of your virtual voice assistant.

Full Code on GitHub

You can view the full code for this project on GitHub: GitHub Repository

DEV Community

Connecting LLMs to Twilio: A Step-by-Step Guide

Integrating OpenAI's LLM with Twilio Using Vercel AI SDK

Prerequisites

Step 1: Setting Up the Project

Step 2: Project Structure

Step 3: Environment Configuration

Step 4: Implementing the Server

Explanation

Step 5: Implementing `bufferTransform`

Explanation

Step 6: Running the Project

Conclusion

Full Code on GitHub

Top comments (0)

Integrating OpenAI's LLM with Twilio Using Vercel AI SDK

Prerequisites

Step 1: Setting Up the Project

Step 2: Project Structure

Step 3: Environment Configuration

Step 4: Implementing the Server

Explanation

Step 5: Implementing bufferTransform

Explanation

Step 6: Running the Project

Conclusion

Full Code on GitHub

Step 5: Implementing `bufferTransform`