DEV Community

wellallyTech
wellallyTech

Posted on

Never Miss a Pill: Building an AI Vision Agent for Automated Medication Reminders via WhatsApp πŸ’Š

We’ve all been there: staring at a box of medicine with tiny, illegible font, trying to remember if it was "two pills twice a day" or "one pill every eight hours." For the elderly or those with complex prescriptions, this isn't just a nuisanceβ€”it’s a health risk.

In this tutorial, we are building an Automated Medication Butler. By leveraging a Vision Agent powered by GPT-4o-mini, we will transform a simple photo of a pillbox into a fully automated, multi-device reminder system. We will use Node.js for our backend logic, Twilio API for WhatsApp integration, and Redis to manage our scheduling queue. This project demonstrates the power of AI Agents in solving real-world healthcare challenges through automated medication reminders and intelligent OCR.

The Architecture πŸ—οΈ

Before we dive into the code, let's visualize how the data flows from a smartphone camera to a timely WhatsApp notification.

graph TD
    A[User takes photo of Pillbox] --> B[FastAPI/Node.js Backend]
    B --> C{GPT-4o-mini Vision Agent}
    C -->|Extracts| D[JSON: Dosage & Frequency]
    D --> E[Redis Task Queue]
    E --> F[Cron Job / Scheduler]
    F -->|Trigger| G[Twilio API]
    G --> H[WhatsApp Reminder Sent to User]
    H --> I{User Confirms Intake?}
    I -->|No Response| J[Escalate to Caregiver]
Enter fullscreen mode Exit fullscreen mode

Prerequisites πŸ› οΈ

To follow along, you’ll need:

  • Node.js (v18+) installed.
  • An OpenAI API Key (for GPT-4o-mini).
  • A Twilio Account with the WhatsApp Sandbox activated.
  • Redis (Local or cloud-hosted like Upstash).

Step 1: Parsing the Pillbox with Vision AI πŸ‘οΈ

First, we need to convert the image into structured data. GPT-4o-mini is perfect for this because it’s incredibly cost-efficient for high-volume OCR tasks.

const OpenAI = require('openai');
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function parseMedicationImage(imageBuffer) {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "Identify the medication name, dosage, and frequency. Return ONLY a JSON object with keys: name, dosage, times_per_day, and specific_hours (e.g. [8, 20])." },
          {
            type: "image_url",
            image_url: { url: `data:image/jpeg;base64,${imageBuffer.toString('base64')}` },
          },
        ],
      },
    ],
    response_format: { type: "json_object" },
  });

  return JSON.parse(response.choices[0].message.content);
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Scheduling Reminders with Redis ⏰

Once we have the specific_hours, we need to schedule the tasks. Using Redis ensures that even if our server restarts, the reminders aren't lost.

const Redis = require('ioredis');
const redis = new Redis();

async function scheduleReminders(userId, medData) {
  const { name, specific_hours } = medData;

  for (const hour of specific_hours) {
    // Store the schedule: key = reminder:time:userId
    const taskKey = `reminder:${hour}:00:${userId}`;
    await redis.set(taskKey, JSON.stringify({
      medName: name,
      message: `Time to take your ${name}! πŸ’Š`
    }));
    console.log(`βœ… Scheduled ${name} for ${hour}:00`);
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Sending the WhatsApp Hook πŸ“²

We use the Twilio API to push the notification. This turns the system into an interactive agent rather than a passive app.

const twilio = require('twilio')(process.env.TWILIO_SID, process.env.TWILIO_AUTH_TOKEN);

async function sendWhatsAppMessage(to, body) {
  try {
    await twilio.messages.create({
      from: 'whatsapp:+14155238886', // Twilio Sandbox Number
      to: `whatsapp:${to}`,
      body: body
    });
    console.log("πŸš€ Reminder sent successfully!");
  } catch (error) {
    console.error("❌ Failed to send WhatsApp", error);
  }
}
Enter fullscreen mode Exit fullscreen mode

Taking it Further: The "Official" Way πŸ₯‘

While this script works for a weekend project, building a production-ready healthcare agent requires handling edge cases like timezone shifts, medication interactions, and HIPAA-compliant data storage.

If you're looking for more production-ready AI patterns, advanced prompt engineering techniques, or enterprise-grade automation strategies, I highly recommend checking out the technical deep dives at the WellAlly Blog. It’s a goldmine for developers looking to scale AI agents beyond simple scripts.

Step 4: Putting it all Together (The Logic Loop)

We can use a simple setInterval or a more robust library like node-cron to check Redis every minute for pending reminders.

const cron = require('node-cron');

cron.schedule('* * * * *', async () => {
  const now = new Date();
  const currentTime = `${now.getHours()}:${now.getMinutes()}`;

  // Logic to scan Redis for keys matching 'reminder:HH:mm:*'
  // and trigger sendWhatsAppMessage()
  console.log(`Checking heartbeat at ${currentTime}...`);
});
Enter fullscreen mode Exit fullscreen mode

Conclusion 🏁

In less than 100 lines of code, we've built a Vision Agent that can read, understand, and act. By combining GPT-4o-mini with Twilio, we've bridged the gap between the physical world (a pillbox) and the digital world (a WhatsApp notification).

What's next?

  1. Feedback Loop: Add a "Confirm" button in WhatsApp to log that the medicine was actually taken.
  2. Safety Check: Cross-reference the medication name with a drug-interaction API.
  3. Multi-user: Expand the Redis schema to handle thousands of users.

If you enjoyed this build, don't forget to heart this post and follow for more "Learning in Public" AI tutorials! πŸš€


For more advanced AI Agent architectures, visit wellally.tech/blog.

Top comments (0)