Introduction
In this tutorial, you’ll learn how to create an AI assistant with its own phone number that you can text on WhatsApp! You’ll also learn how to write background jobs in typescript and how to set up and run your own WhatsApp API.
Why?
Personally, I wanted to have custom AI agents in my pocket that I can text via WhatsApp just like I text my friends and family. Here’s a short list of reasons you might want to build this:
- 🤖 AI Agents
- ⚔️ Role-playing
- 💁♀️ AI Waifus/Husbands
Overview
Here’s what we’ll be using for this project:
-
Hono - This will be our API layer, I like to think of it as express.js for the 21st century. It's typescript ready, works with bun, and you can deploy on Cloudflare workers. If you haven’t used Cloudflare workers before, you only need to know 3 things:
- It’s fast (⚡ less than ~200ms away from every internet user).
- It’s cheap (100k free requests daily and $0.15/million requests per month).
- It’s serverless (No infrastructure to set up and maintain and no cold starts).
- Trigger.dev - A powerful typescript framework for running background jobs, handling webhooks and scheduling cron jobs. Just like the bouncer at an exclusive nightclub, we want to make sure each message is queued in the order they arrive, processed one at a time by our AI assistant and the response is sent back to the user even if the OpenAI assistant takes longer than expected to generate a response. We also get 10,000 free job runs to test our assistant.
- OpenAI Assistants API - The assistants API is great for managing conversation threads. We won’t have to manually store each message ID in our database to track which user is talking to the assistant. Instead, we’ll simply store the user’s phone number alongside the thread ID we get from the assistants API.
- Node.js WhatsApp API - Since WhatsApp only offers an API for business accounts that requires multiple verification and review steps to get access, we’ll put together our own API using the whatsapp-web.js library. It will forward user messages to our API and send responses from the assistant to the user.
🖐️ Before we dive into the code, let’s take a second to look at how all the pieces fit together.
- When a user sends a message, it’s received by the WhatsApp API.
- The WhatsApp API forwards that message to our server, the Hono API.
- The Hono API creates a new background job to process the message using trigger.dev.
- Trigger.dev invokes the OpenAI API to generate a response for the message.
- Once the response has been generated, it is sent back to the user via the WhatsApp API.
Now that we can visualize how messages flow through our application, let’s get cooking! 🍳
WhatsApp API setup
Let’s set up our Node.js WhatsApp API and connect it to our server. In order to do this, we’ll use the whatsapp-web.js library.
🤔 At this point you might be asking:
“If this library is compatible with Node.js, why the heck do we need to use it in a separate server?”
Great question! Here’s why:
This library works by creating an instance of WhatsApp web running inside an instance of headless chrome automated by puppeteer. In my testing, I ran into tons of compatibility issues when trying to use these dependencies inside anything other than a bare-bones Node.js + express server. Also, we can’t spin up a new instance of chrome and WhatsApp web each time a user sends a message, this will exhaust our allowed WhatsApp connections (4 max), not to mention that doing this will make the response times painfully slow.
Now that we understand the constraints, we have 3 options:
- Run an
express.js
server and write your whole app using JavaScript. (Yeah… no, thanks). - Try to set up typescript inside your express.js app. If you’ve tried setting up
ts-node
before, you’ll know how soul-crushing this experience can be. How do I configure import paths? ESM or CJS modules? Why won’t it compile? Don’t take my word for it, here’s what some internet strangers have to say about using node.js with typescript: - Run a Node.js server as a “dumb” server which is only responsible for forwarding messages between our API and the user. A tiny piece of infrastructure in exchange for simplicity, performance and a great development experience? Sign me up 👍
Now that we understand why we need this server, let’s set it up.
-
Create a new Node.js project
mkdir whatsapp-api && cd whatsapp-api
-
Create a new
package.json
file with the default configurationnpm init -y
-
Install the necessary dependencies
npm install axios body-parser express qrcode-terminal whatsapp-web.js
-
Create the Node.js server
./index.js
const { Client, LocalAuth } = require("whatsapp-web.js") const express = require("express") const qrcode = require("qrcode-terminal") const { default: axios } = require("axios") const bodyParser = require("body-parser") // Set up express const app = express() app.use(bodyParser.json()) // Health-check route app.get("/", (_, res) => { res.send("WhatsApp API is running!") }) // Forward messages to the user app.post("/send-message", async (req, res) => { const chatId = req.body.chatId const message = req.body.message const result = await client.sendMessage(chatId, message) res.status(200).json(result) }) // Start the server const startServer = () => { const port = 3000 app.listen(port, () => { console.log(`✔ Server is running on port ${port}`) }) } // Initialize the WhatsApp web client const client = new Client({ puppeteer: { // Run chromium in headless mode headless: true, args: ["--no-sandbox"], }, // Save session to disk so you don't need to authenticate each time you start the server. authStrategy: new LocalAuth(), }) // Print QR code in terminal client.on("qr", (qr) => { console.log("👇 Scan the QR code below to authenticate") qrcode.generate(qr, { small: true }) }) // Listen for client authentication client.on("authenticated", () => { console.log("✔ Client is authenticated!") }) // Listen for when client is ready to start receiving/sending messages client.on("ready", () => { console.log("✔ Client is ready!") startServer() }) // Listen for incoming messages client.on("message", (message) => { console.log("💬 New message received:", JSON.stringify(message.body)) }) // Start WhatsApp client console.log("◌ Starting WhatsApp client...") client.initialize()
-
Run the node.js server
node index.js
-
On your phone, navigate to “linked devices” then scan the QR code in the terminal to connect your WhatsApp phone number to this instance. The session will be saved locally so you only have to do this once.
Once you’ve successfully logged in, you should see these logs in your terminal:✔ Client is authenticated! ✔ Client is ready! ✔ Server is running on port 3000
-
Using another phone number, send a test message to the authenticated phone number to verify that everything is wired up properly. You should see the incoming message in your terminal:
💬 New message received: "Hello world!"
That’s all we need to set up our WhatsApp API! 🎉
Hono API setup
-
Initialize your hono project using bun:
bunx create-hono
-
Since we want to deploy to Cloudflare workers, we’ll choose
cloudflare-workers
as our starting template.✔ Which template do you want to use? › cloudflare-workers
-
Install the dependencies
bun install
-
Start the development server
bun dev
-
Test the default API route
curl http://localhost:8787
It should return:
Hello Hono!
We’ve finished setting up our Hono server! 🔥
Create an AI assistant
Our assistant is going to be powered by gpt-3.5-turbo-1106
. We want our assistant to provide fast responses with minimal costs. Since we are not going to be asking complicated questions, this level of capability is fine.
For this demo, we’ll create a simple batman bot. There’s a good chance the model has a lot of knowledge about the batman universe from its training data. This saves us the effort of writing a lengthy and detailed system prompt.
Grab your Open AI API key from the dashboard. The API key should look like this:
sk-12345abcd
.
🚨 Note
Each Cloudflare worker runs in its own isolated environment with it’s own scope, therefore they cannot read from environment variables in the global scope i.eproccess.env
. This was a necessary tradeoff to solve the cold start problem. You can learn more here.
At first glance, this seemed like a huge inconvenience, in practice however, it only required a slight behavior change. Instead of using the globalprocess.env
we can make the environment variables available in the API route context and access them usingcontext.env
.
-
Add your API key and assistant ID to the env variables file
./.dev.vars
OPENAI_API_KEY=sk-123456abc OPENAI_ASSISTANT_ID=asst_123456abc
-
Add the corresponding types to your hono App Bindings type.
./src/types/AppBindings.ts
// TS complier will be sad in the next step if you use an interface here 😭 export type AppBindings = { OPENAI_API_KEY: string OPENAI_ASSISTANT_ID: string }
-
With Hono, we can bind the env variables to the context object of each API route and get fully typed env variables.
./src/index.ts
import { AppBindings } from "./types/AppBindings" const app = new Hono<{ Bindings: AppBindings }>()
-
Then we can access the environment variables in the API routes like this:
./src/index.ts
//... app.get('/', (c) => { return c.text(`Assistant ID: ${c.env.OPENAI_ASSISTANT_ID}`) })
If the environment variables have been configured properly, you should see this message when you start up your server:
------------------------------------------------------- Using vars defined in .dev.vars Your worker has access to the following bindings: - Vars: - OPENAI_API_KEY: "(hidden)" - OPENAI_ASSISTANT_ID: "(hidden)" ⎔ Starting local server... [wrangler:inf] Ready on http://localhost:8787
Trigger.dev setup
Trigger.dev ships with multiple client adaptors for the most popular typescript frameworks (including hono 🔥). These adaptors make it easy to plug into the trigger.dev infrastructure and create jobs from our API.
-
Install the necessary dependencies
bun add @trigger.dev/sdk@latest @trigger.dev/hono@latest
-
Since we are using Cloudflare workers, we need to enable Node.js compatibility mode
./wrangler.toml
compatibility_flags = ["nodejs_compat"]
To get your development server API key, login to the Trigger.dev dashboard and select the Project you want to connect to. Then click on the Environments & API Keys tab in the left menu. Copy your development Server API Key. (Your development key will start with
tr_dev_
).-
Let’s set up our development environment variables.
./dev.vars
... TRIGGER_API_KEY=tr_dev_super_secret_key TRIGGER_API_URL=https://api.trigger.dev
-
Add those environment variable names to a types file so we can get fully typed environment variables throughout our app:
./src/types/AppBindings.ts
export type AppBindings = { // ... TRIGGER_API_KEY: string TRIGGER_API_URL: string }
-
Next, we’ll initialize the trigger.dev client:
./src/utils/triggerClient.ts
import { TriggerClient } from "@trigger.dev/sdk" import { AppBindings } from "../types/AppBindings" export const triggerClient = (env: AppBindings) => { const client = new TriggerClient({ id: "whatsapp-assistant", apiKey: env.TRIGGER_API_KEY, apiUrl: env.TRIGGER_API_URL, }) return client }
-
Finally, we’ll add the trigger.dev client to our API using the
addMiddleware
helper../src/index.ts
import { addMiddleware } from "@trigger.dev/hono" import { triggerClient } from "./utils/triggerClient" addMiddleware(app, (env) => triggerClient(env)) // ...rest of the API routes
Create the background job
If you take a look at the assistants API documentation, you’ll see that the flow for interacting with assistants generally looks like this:
- Create a message.
- Create a thread (The session between the assistant and the user to manage the messages).
- Add the message to the thread
- Create a run (Invoke the function that starts processing the newly added message).
- Poll the run every few seconds to check if it’s finished running.
- If the run has completed, we retrieve the complete list of messages in the thread. The assistant’s reply should be the latest message in the thread.
Trigger.dev provides some really handy integrations built on top of popular services that make it easy to work with some APIs inside your background job. In this section, we’ll be using:
- The trigger.dev OpenAI integration to make API calls to the assistant.
- The trigger.dev key-value store to save the user’s phone number and the thread ID. In production, you might consider swapping this out for a database.
- Zod, a popular typescript schema validation library that will let us define the shape of the payload sent to our background job.
-
Install dependencies
bun add @trigger.dev/openai@latest zod
-
Create a helper method to send messages to our WhatsApp API
./src/utils/whatsappSendMessage.ts
interface Args { whatsappApiUrl: string chatId: string message: string } export const whatsappSendMessage = async ({ whatsappApiUrl, chatId, message, }: Args) => { const body = { chatId, message } const options = { body: JSON.stringify(body), method: "POST", headers: { "content-type": "application/json", }, } const response = await fetch(`${whatsappApiUrl}/send-message`, options) return JSON.stringify(await response.json()) }
-
Add the WhatsApp API url to the environment variables
./.dev.vars
... WHATSAPP_API_URL=http://localhost:3000
-
Add the env variable type
./src/types/AppBindings.ts
export type AppBindings = { // ... WHATSAPP_API_URL: string }
-
Create the background job that generates responses
./src/jobs/assistant.ts
import { z } from "zod" import { OpenAI } from "@trigger.dev/openai" import { eventTrigger, type TriggerClient } from "@trigger.dev/sdk" import { type AppBindings } from "../types/AppBindings" import { whatsappSendMessage } from "../utils/whatsappSendMessage" interface Args { client: TriggerClient env: AppBindings } interface Chat { threadId: string } export const assistantJob = ({ client, env }: Args) => { // Initialize OpenAI client const openai = new OpenAI({ id: "openai", apiKey: env.OPENAI_API_KEY, }) // Define the background job const job = client.defineJob({ id: "assistant_generate_response", name: "Assistant generate response", version: "1.0.0", trigger: eventTrigger({ // The identifier used to trigger this job from the API name: "assistant.response", // Define the schema of the payload schema: z.object({ chatId: z.string(), message: z.string(), }), }), // Add the OpenAI integration to this job integrations: { openai }, run: async (payload, io, ctx) => { const { chatId, message } = payload // Check if the chat exists in key-value store const chatExists = await io.store.job.has("chat-exists", chatId) let threadId = "" if (chatExists) { // Get the OpenAI thread ID associated with the WhatsApp chat ID const chat = await io.store.job.get<Chat>("get-chat", chatId) if (!chat) { throw new Error(`No chat found with ID ${chatId}`) } threadId = chat.threadId } else { // Create a new thread const thread = await io.openai.beta.threads.create("create-thread") // Register the new chat session await io.store.job.set("register-chat", chatId, { threadId: thread.id }) threadId = thread.id } // Add the message to the conversation thread await io.openai.beta.threads.messages.create("create-message", threadId, { role: "user", content: message, }) // Invoke the assistant to generate a response and wait for it to complete const run = await io.openai.beta.threads.runs.createAndWaitForCompletion( "create-run", threadId, { assistant_id: env.OPENAI_ASSISTANT_ID } ) // Make sure the assistant has finished generating the response if (run?.status !== "completed") { throw new Error( `Run finished with status ${run?.status}: ${JSON.stringify( run?.last_error )}` ) } // List the most recent message in the thread const messages = await io.openai.beta.threads.messages.list( "list-messages", run.thread_id, { query: { limit: "1" } } ) // Retrieve the latest assistant message const content = messages[0].content[0] // Verify the message contains text and not an image if (content.type === "image_file") { throw new Error( "The OpenAI response was an image but we expected text." ) } // Send the assistant's response to the WhatsApp API so it can be forwarded to the user const responseMessage = content.text.value await whatsappSendMessage({ whatsappApiUrl: env.WHATSAPP_API_URL, chatId, message: responseMessage, }) return { message: responseMessage } }, }) return job }
🖐️ Let’s take a look at what’s going on in this job.
If the user is messaging our assistant for the first time, we’ll create a new message thread. Otherwise, we’ll append the message to an existing thread. We use the trigger.dev key-value store to save the
chatId
-threadId
pairs so we can track the WhatsApp phone number that corresponds to a specific OpenAI thread ID.Normally, we would have to create a function that will poll the OpenAI API every few seconds to check if the assistant has completed generating a response. The trigger.dev OpenAI integration exposes a handy helper function called
createAndWaitForCompletion
that handles everything for us.
-
Register the new job in the trigger client
./src/utils/triggerClient.ts
import { TriggerClient } from "@trigger.dev/sdk" import { AppBindings } from "../types/AppBindings" import { assistantJob } from "../jobs/assistant" export const triggerClient = (env: AppBindings) => { const client = new TriggerClient({ id: "whatsapp-assistant", apiKey: env.TRIGGER_API_KEY, apiUrl: env.TRIGGER_API_URL, }) // Register jobs assistantJob({ client, env }) return client }
-
Create a new API route that will receive incoming WhatsApp messages
./src/index.ts
// ... app.post("/wa-message-received", async (c) => { const { message } = await c.req.json() // Trigger the job with the message payload const event = await triggerClient(c.env).sendEvent({ name: "assistant.response", payload: { chatId: message.from, message: message.body }, }) return c.json({ event }) })
Test the WhatsApp assistant
-
Start the WhatsApp API server.
node index.js
-
We need to make our WhatsApp API accessible on the internet so the trigger.dev cloud service can connect to it. We can do that by running ngrok in a separate terminal.
ngrok http 3000
-
Update the WhatsApp API URL. We’ll replace it with the
https
url from ngrok that points to our API../.dev.vars
WHATSAPP_API_URL=https://my_ngrok_domain.ngrok-free.app
-
Start the Hono API
bun dev
-
In a new terminal window, start the trigger.dev tunnel to connect the trigger.dev cloud service.
bunx @trigger.dev/cli@latest dev --client-id whatsapp-assistant -p 8787 -H localhost
Log in to the trigger.dev dashboard and verify that our job has been synced.
Send a message to the assistant’s phone number. You should receive a response. 🎉
✨ Extras
- Once you have deployed the API to production, you can use an SMS verification service like sms pool to assign a permanent phone number to your assistant. Sign in with the virtual number provided on your phone and scan the QR code printed in your server logs.
- Check out the full repositories on Github to learn how to implement a “typing…” indicator while the assistant generates the response.
- Create a WhatsApp profile for your assistant
Top comments (3)
Help me pls...
Hey Juan, if you're having trouble setting up trigger.dev you might get some help from their discord community here: discord.gg/HcPFSZfruN