DEV Community

Cover image for How to build a WhatsApp AI assistant
Ben James
Ben James

Posted on • Edited on

How to build a WhatsApp AI assistant

Introduction

In this tutorial, you’ll learn how to create an AI assistant with its own phone number that you can text on WhatsApp! You’ll also learn how to write background jobs in typescript and how to set up and run your own WhatsApp API.

Why?

Personally, I wanted to have custom AI agents in my pocket that I can text via WhatsApp just like I text my friends and family. Here’s a short list of reasons you might want to build this:

  • 🤖 AI Agents
  • ⚔️ Role-playing
  • 💁‍♀️ AI Waifus/Husbands

Overview

Here’s what we’ll be using for this project:

  • Hono - This will be our API layer, I like to think of it as express.js for the 21st century. It's typescript ready, works with bun, and you can deploy on Cloudflare workers. If you haven’t used Cloudflare workers before, you only need to know 3 things:
    • It’s fast (⚡ less than ~200ms away from every internet user).
    • It’s cheap (100k free requests daily and $0.15/million requests per month).
    • It’s serverless (No infrastructure to set up and maintain and no cold starts).
  • Trigger.dev - A powerful typescript framework for running background jobs, handling webhooks and scheduling cron jobs. Just like the bouncer at an exclusive nightclub, we want to make sure each message is queued in the order they arrive, processed one at a time by our AI assistant and the response is sent back to the user even if the OpenAI assistant takes longer than expected to generate a response. We also get 10,000 free job runs to test our assistant.
  • OpenAI Assistants API - The assistants API is great for managing conversation threads. We won’t have to manually store each message ID in our database to track which user is talking to the assistant. Instead, we’ll simply store the user’s phone number alongside the thread ID we get from the assistants API.
  • Node.js WhatsApp API - Since WhatsApp only offers an API for business accounts that requires multiple verification and review steps to get access, we’ll put together our own API using the whatsapp-web.js library. It will forward user messages to our API and send responses from the assistant to the user.

🖐️ Before we dive into the code, let’s take a second to look at how all the pieces fit together.

Architecture diagram

  1. When a user sends a message, it’s received by the WhatsApp API.
  2. The WhatsApp API forwards that message to our server, the Hono API.
  3. The Hono API creates a new background job to process the message using trigger.dev.
  4. Trigger.dev invokes the OpenAI API to generate a response for the message.
  5. Once the response has been generated, it is sent back to the user via the WhatsApp API.

Now that we can visualize how messages flow through our application, let’s get cooking! 🍳

WhatsApp API setup

Let’s set up our Node.js WhatsApp API and connect it to our server. In order to do this, we’ll use the whatsapp-web.js library.

🤔 At this point you might be asking:

“If this library is compatible with Node.js, why the heck do we need to use it in a separate server?”

Great question! Here’s why:

This library works by creating an instance of WhatsApp web running inside an instance of headless chrome automated by puppeteer. In my testing, I ran into tons of compatibility issues when trying to use these dependencies inside anything other than a bare-bones Node.js + express server. Also, we can’t spin up a new instance of chrome and WhatsApp web each time a user sends a message, this will exhaust our allowed WhatsApp connections (4 max), not to mention that doing this will make the response times painfully slow.

Now that we understand the constraints, we have 3 options:

  1. Run an express.js server and write your whole app using JavaScript. (Yeah… no, thanks).
  2. Try to set up typescript inside your express.js app. If you’ve tried setting up ts-node before, you’ll know how soul-crushing this experience can be. How do I configure import paths? ESM or CJS modules? Why won’t it compile? Don’t take my word for it, here’s what some internet strangers have to say about using node.js with typescript:
    1. Why is this so hard?
    2. Feeling fed up right now
  3. Run a Node.js server as a “dumb” server which is only responsible for forwarding messages between our API and the user. A tiny piece of infrastructure in exchange for simplicity, performance and a great development experience? Sign me up 👍

Now that we understand why we need this server, let’s set it up.

  1. Create a new Node.js project

    mkdir whatsapp-api && cd whatsapp-api
    
  2. Create a new package.json file with the default configuration

    npm init -y
    
  3. Install the necessary dependencies

    npm install axios body-parser express qrcode-terminal whatsapp-web.js
    
  4. Create the Node.js server

    ./index.js

    const { Client, LocalAuth } = require("whatsapp-web.js")
    const express = require("express")
    const qrcode = require("qrcode-terminal")
    const { default: axios } = require("axios")
    const bodyParser = require("body-parser")
    
    // Set up express
    const app = express()
    app.use(bodyParser.json())
    
    // Health-check route
    app.get("/", (_, res) => {
      res.send("WhatsApp API is running!")
    })
    
    // Forward messages to the user
    app.post("/send-message", async (req, res) => {
      const chatId = req.body.chatId
      const message = req.body.message
      const result = await client.sendMessage(chatId, message)
      res.status(200).json(result)
    })
    
    // Start the server
    const startServer = () => {
      const port = 3000
      app.listen(port, () => {
        console.log(`✔ Server is running on port ${port}`)
      })
    }
    
    // Initialize the WhatsApp web client
    const client = new Client({
      puppeteer: {
        // Run chromium in headless mode
        headless: true,
        args: ["--no-sandbox"],
      },
      // Save session to disk so you don't need to authenticate each time you start the server.
      authStrategy: new LocalAuth(),
    })
    
    // Print QR code in terminal
    client.on("qr", (qr) => {
      console.log("👇 Scan the QR code below to authenticate")
      qrcode.generate(qr, { small: true })
    })
    
    // Listen for client authentication
    client.on("authenticated", () => {
      console.log("✔ Client is authenticated!")
    })
    
    // Listen for when client is ready to start receiving/sending messages
    client.on("ready", () => {
      console.log("✔ Client is ready!")
      startServer()
    })
    
    // Listen for incoming messages
    client.on("message", (message) => {
      console.log("💬 New message received:", JSON.stringify(message.body))
    })
    
    // Start WhatsApp client
    console.log("◌ Starting WhatsApp client...")
    client.initialize()
    
  5. Run the node.js server

    node index.js
    
  6. On your phone, navigate to “linked devices” then scan the QR code in the terminal to connect your WhatsApp phone number to this instance. The session will be saved locally so you only have to do this once.
    Scan QR code
    Once you’ve successfully logged in, you should see these logs in your terminal:

    ✔ Client is authenticated!
    ✔ Client is ready!
    ✔ Server is running on port 3000
    
  7. Using another phone number, send a test message to the authenticated phone number to verify that everything is wired up properly. You should see the incoming message in your terminal:

    💬 New message received: "Hello world!"
    

That’s all we need to set up our WhatsApp API! 🎉

Hono API setup

  1. Initialize your hono project using bun:

    bunx create-hono
    
  2. Since we want to deploy to Cloudflare workers, we’ll choose cloudflare-workers as our starting template.

    ✔ Which template do you want to use? › cloudflare-workers
    
  3. Install the dependencies

    bun install
    
  4. Start the development server

    bun dev
    
  5. Test the default API route

    curl http://localhost:8787
    

    It should return:

    Hello Hono!
    

We’ve finished setting up our Hono server! 🔥

Create an AI assistant

Our assistant is going to be powered by gpt-3.5-turbo-1106. We want our assistant to provide fast responses with minimal costs. Since we are not going to be asking complicated questions, this level of capability is fine.

  1. For this demo, we’ll create a simple batman bot. There’s a good chance the model has a lot of knowledge about the batman universe from its training data. This saves us the effort of writing a lengthy and detailed system prompt.
    Open AI assistant

  2. Grab your Open AI API key from the dashboard. The API key should look like this:sk-12345abcd.

🚨 Note
Each Cloudflare worker runs in its own isolated environment with it’s own scope, therefore they cannot read from environment variables in the global scope i.e proccess.env. This was a necessary tradeoff to solve the cold start problem. You can learn more here.
At first glance, this seemed like a huge inconvenience, in practice however, it only required a slight behavior change. Instead of using the global process.env we can make the environment variables available in the API route context and access them using context.env.

  1. Add your API key and assistant ID to the env variables file

    ./.dev.vars

    OPENAI_API_KEY=sk-123456abc
    OPENAI_ASSISTANT_ID=asst_123456abc
    
  2. Add the corresponding types to your hono App Bindings type.

    ./src/types/AppBindings.ts

    // TS complier will be sad in the next step if you use an interface here 😭
    export type AppBindings = {
        OPENAI_API_KEY: string
        OPENAI_ASSISTANT_ID: string
    }
    
  3. With Hono, we can bind the env variables to the context object of each API route and get fully typed env variables.

    ./src/index.ts

    import { AppBindings } from "./types/AppBindings"
    
    const app = new Hono<{ Bindings: AppBindings }>()
    
  4. Then we can access the environment variables in the API routes like this:

    ./src/index.ts

    //...
    
    app.get('/', (c) => {
      return c.text(`Assistant ID: ${c.env.OPENAI_ASSISTANT_ID}`)
    })
    

    If the environment variables have been configured properly, you should see this message when you start up your server:

    -------------------------------------------------------
    Using vars defined in .dev.vars
    Your worker has access to the following bindings:
    - Vars:
      - OPENAI_API_KEY: "(hidden)"
      - OPENAI_ASSISTANT_ID: "(hidden)"
    ⎔ Starting local server...
    [wrangler:inf] Ready on http://localhost:8787
    

Trigger.dev setup

Trigger.dev ships with multiple client adaptors for the most popular typescript frameworks (including hono 🔥). These adaptors make it easy to plug into the trigger.dev infrastructure and create jobs from our API.

  1. Install the necessary dependencies

    bun add @trigger.dev/sdk@latest @trigger.dev/hono@latest
    
  2. Since we are using Cloudflare workers, we need to enable Node.js compatibility mode

    ./wrangler.toml

    compatibility_flags = ["nodejs_compat"]
    
  3. To get your development server API key, login to the Trigger.dev dashboard and select the Project you want to connect to. Then click on the Environments & API Keys tab in the left menu. Copy your development Server API Key. (Your development key will start with tr_dev_).

  4. Let’s set up our development environment variables.

    ./dev.vars

    ...
    TRIGGER_API_KEY=tr_dev_super_secret_key
    TRIGGER_API_URL=https://api.trigger.dev
    
  5. Add those environment variable names to a types file so we can get fully typed environment variables throughout our app:

    ./src/types/AppBindings.ts

    export type AppBindings = { 
        // ...
      TRIGGER_API_KEY: string
      TRIGGER_API_URL: string
    }
    
  6. Next, we’ll initialize the trigger.dev client:

    ./src/utils/triggerClient.ts

    import { TriggerClient } from "@trigger.dev/sdk"
    import { AppBindings } from "../types/AppBindings"
    
    export const triggerClient = (env: AppBindings) => {
      const client = new TriggerClient({
        id: "whatsapp-assistant",
        apiKey: env.TRIGGER_API_KEY,
        apiUrl: env.TRIGGER_API_URL,
      })
    
      return client
    }
    
  7. Finally, we’ll add the trigger.dev client to our API using the addMiddleware helper.

    ./src/index.ts

    import { addMiddleware } from "@trigger.dev/hono"
    import { triggerClient } from "./utils/triggerClient"
    
    addMiddleware(app, (env) => triggerClient(env))
    
    // ...rest of the API routes
    

Create the background job

If you take a look at the assistants API documentation, you’ll see that the flow for interacting with assistants generally looks like this:

  1. Create a message.
  2. Create a thread (The session between the assistant and the user to manage the messages).
  3. Add the message to the thread
  4. Create a run (Invoke the function that starts processing the newly added message).
  5. Poll the run every few seconds to check if it’s finished running.
  6. If the run has completed, we retrieve the complete list of messages in the thread. The assistant’s reply should be the latest message in the thread.

Trigger.dev provides some really handy integrations built on top of popular services that make it easy to work with some APIs inside your background job. In this section, we’ll be using:

  • The trigger.dev OpenAI integration to make API calls to the assistant.
  • The trigger.dev key-value store to save the user’s phone number and the thread ID. In production, you might consider swapping this out for a database.
  • Zod, a popular typescript schema validation library that will let us define the shape of the payload sent to our background job.
  1. Install dependencies

    bun add @trigger.dev/openai@latest zod
    
  2. Create a helper method to send messages to our WhatsApp API

    ./src/utils/whatsappSendMessage.ts

    interface Args {
      whatsappApiUrl: string
      chatId: string
      message: string
    }
    
    export const whatsappSendMessage = async ({
      whatsappApiUrl,
      chatId,
      message,
    }: Args) => {
      const body = { chatId, message }
      const options = {
        body: JSON.stringify(body),
        method: "POST",
        headers: {
          "content-type": "application/json",
        },
      }
    
      const response = await fetch(`${whatsappApiUrl}/send-message`, options)
      return JSON.stringify(await response.json())
    }
    
  3. Add the WhatsApp API url to the environment variables

    ./.dev.vars

    ...
    WHATSAPP_API_URL=http://localhost:3000
    
  4. Add the env variable type
    ./src/types/AppBindings.ts

    export type AppBindings = {
      // ...
      WHATSAPP_API_URL: string
    }
    
  5. Create the background job that generates responses

    ./src/jobs/assistant.ts

    import { z } from "zod"
    import { OpenAI } from "@trigger.dev/openai"
    import { eventTrigger, type TriggerClient } from "@trigger.dev/sdk"
    import { type AppBindings } from "../types/AppBindings"
    import { whatsappSendMessage } from "../utils/whatsappSendMessage"
    
    interface Args {
      client: TriggerClient
      env: AppBindings
    }
    
    interface Chat {
      threadId: string
    }
    
    export const assistantJob = ({ client, env }: Args) => {
      // Initialize OpenAI client
      const openai = new OpenAI({
        id: "openai",
        apiKey: env.OPENAI_API_KEY,
      })
    
      // Define the background job
      const job = client.defineJob({
        id: "assistant_generate_response",
        name: "Assistant generate response",
        version: "1.0.0",
        trigger: eventTrigger({
          // The identifier used to trigger this job from the API
          name: "assistant.response",
    
          // Define the schema of the payload
          schema: z.object({
            chatId: z.string(),
            message: z.string(),
          }),
        }),
    
        // Add the OpenAI integration to this job
        integrations: { openai },
        run: async (payload, io, ctx) => {
          const { chatId, message } = payload
    
          // Check if the chat exists in key-value store
          const chatExists = await io.store.job.has("chat-exists", chatId)
    
          let threadId = ""
          if (chatExists) {
            // Get the OpenAI thread ID associated with the WhatsApp chat ID
            const chat = await io.store.job.get<Chat>("get-chat", chatId)
            if (!chat) {
              throw new Error(`No chat found with ID ${chatId}`)
            }
    
            threadId = chat.threadId
          } else {
            // Create a new thread
            const thread = await io.openai.beta.threads.create("create-thread")
    
            // Register the new chat session
            await io.store.job.set("register-chat", chatId, { threadId: thread.id })
    
            threadId = thread.id
          }
    
          // Add the message to the conversation thread
          await io.openai.beta.threads.messages.create("create-message", threadId, {
            role: "user",
            content: message,
          })
    
          // Invoke the assistant to generate a response and wait for it to complete
          const run = await io.openai.beta.threads.runs.createAndWaitForCompletion(
            "create-run",
            threadId,
            { assistant_id: env.OPENAI_ASSISTANT_ID }
          )
    
          // Make sure the assistant has finished generating the response
          if (run?.status !== "completed") {
            throw new Error(
              `Run finished with status ${run?.status}: ${JSON.stringify(
                run?.last_error
              )}`
            )
          }
    
          // List the most recent message in the thread
          const messages = await io.openai.beta.threads.messages.list(
            "list-messages",
            run.thread_id,
            { query: { limit: "1" } }
          )
    
          // Retrieve the latest assistant message
          const content = messages[0].content[0]
    
          // Verify the message contains text and not an image
          if (content.type === "image_file") {
            throw new Error(
              "The OpenAI response was an image but we expected text."
            )
          }
    
          // Send the assistant's response to the WhatsApp API so it can be forwarded to the user
          const responseMessage = content.text.value
          await whatsappSendMessage({
            whatsappApiUrl: env.WHATSAPP_API_URL,
            chatId,
            message: responseMessage,
          })
    
          return { message: responseMessage }
        },
      })
    
      return job
    }
    

🖐️ Let’s take a look at what’s going on in this job.

If the user is messaging our assistant for the first time, we’ll create a new message thread. Otherwise, we’ll append the message to an existing thread. We use the trigger.dev key-value store to save the chatId - threadId pairs so we can track the WhatsApp phone number that corresponds to a specific OpenAI thread ID.

Normally, we would have to create a function that will poll the OpenAI API every few seconds to check if the assistant has completed generating a response. The trigger.dev OpenAI integration exposes a handy helper function called createAndWaitForCompletion that handles everything for us.

  1. Register the new job in the trigger client

    ./src/utils/triggerClient.ts

    import { TriggerClient } from "@trigger.dev/sdk"
    import { AppBindings } from "../types/AppBindings"
    import { assistantJob } from "../jobs/assistant"
    
    export const triggerClient = (env: AppBindings) => {
      const client = new TriggerClient({
        id: "whatsapp-assistant",
        apiKey: env.TRIGGER_API_KEY,
        apiUrl: env.TRIGGER_API_URL,
      })
    
        // Register jobs
      assistantJob({ client, env })
    
      return client
    }
    
  2. Create a new API route that will receive incoming WhatsApp messages

    ./src/index.ts

    
    // ... 
    app.post("/wa-message-received", async (c) => {
      const { message } = await c.req.json()
    
        // Trigger the job with the message payload
      const event = await triggerClient(c.env).sendEvent({
        name: "assistant.response",
        payload: { chatId: message.from, message: message.body },
      })
    
      return c.json({ event })
    })
    

Test the WhatsApp assistant

  1. Start the WhatsApp API server.

    node index.js
    
  2. We need to make our WhatsApp API accessible on the internet so the trigger.dev cloud service can connect to it. We can do that by running ngrok in a separate terminal.

    ngrok http 3000
    
  3. Update the WhatsApp API URL. We’ll replace it with the https url from ngrok that points to our API.

    ./.dev.vars

    WHATSAPP_API_URL=https://my_ngrok_domain.ngrok-free.app
    
  4. Start the Hono API

    bun dev
    
  5. In a new terminal window, start the trigger.dev tunnel to connect the trigger.dev cloud service.

    bunx @trigger.dev/cli@latest dev --client-id whatsapp-assistant -p 8787 -H localhost
    
  6. Log in to the trigger.dev dashboard and verify that our job has been synced.
    Trigger.dev dashboard

  7. Send a message to the assistant’s phone number. You should receive a response. 🎉

WhatsApp message

✨ Extras

  • Once you have deployed the API to production, you can use an SMS verification service like sms pool to assign a permanent phone number to your assistant. Sign in with the virtual number provided on your phone and scan the QR code printed in your server logs.
  • Check out the full repositories on Github to learn how to implement a “typing…” indicator while the assistant generates the response.
  • Create a WhatsApp profile for your assistant

WhatsApp bot profile

Top comments (3)

Collapse
 
simplese profile image
Juan

Help me pls...
Image description

Collapse
 
ben_james profile image
Ben James

Hey Juan, if you're having trouble setting up trigger.dev you might get some help from their discord community here: discord.gg/HcPFSZfruN

Collapse
 
simplese profile image
Juan

Image description