From Pixels to Prescriptions: Building an Automated Medication Reminder with YOLOv8 and OCR

#computervision #opencv #python #iot

Forgetfulness is a human trait, but when it comes to medication, a missed dose can be serious. 💊 With the rise of Computer Vision and Edge AI, we can now build smart systems that watch over our health—literally.

In this tutorial, we are going to build a fully automated Medication Reminder System. By leveraging YOLOv8 for object detection and Tesseract OCR for text extraction, we can monitor a pill box via a camera (like a Raspberry Pi) and trigger alerts if the medication hasn't been moved or taken at the scheduled time. This project combines real-time object detection, Internet of Things (IoT), and Image Processing into one practical, life-saving application.

The Architecture 🏗️

The logic flow is straightforward: we capture video frames, identify the pill box, read the label to identify the medication, and use a state-machine to determine if the user has interacted with the box.

graph TD
    A[Camera Feed] --> B{OpenCV Frame Processing}
    B --> C[YOLOv8: Detect Pill Box]
    C --> D[ROI Extraction]
    D --> E[Tesseract OCR: Read Label]
    E --> F{Logic Engine}
    F -- "No movement detected" --> G[MQTT: Trigger Voice Alert]
    F -- "Box moved/Empty" --> H[Log Success]
    G --> I[Mobile Notification]

Prerequisites 🛠️

To get started, make sure you have the following in your tech_stack:

Python 3.8+
YOLOv8 (via the ultralytics package)
Tesseract OCR (installed on your OS)
OpenCV: For image manipulation
MQTT: For lightweight messaging between your camera and the alert system

pip install ultralytics opencv-python pytesseract paho-mqtt

Step 1: Detecting the Pill Box with YOLOv8

First, we need to locate the pill box in the camera's field of view. YOLOv8 is incredibly fast, making it perfect for edge devices.

from ultralytics import YOLO
import cv2

# Load a pre-trained Nano model (lightweight for Raspberry Pi)
model = YOLO('yolov8n.pt') 

def detect_pill_box(frame):
    # We are looking for 'bottle' or 'box' classes in COCO dataset
    results = model(frame, conf=0.5)

    for r in results:
        for box in r.boxes:
            # Class 39 is 'bottle' in COCO, or use a custom trained model
            if int(box.cls) == 39: 
                x1, y1, x2, y2 = map(int, box.xyxy[0])
                return frame[y1:y2, x1:x2] # Return the cropped Region of Interest (ROI)
    return None

Step 2: Reading the Label with Tesseract OCR

Once we have the ROI (the cropped image of the pill box), we need to know what medication it is. This is where Tesseract OCR shines. 🔍

import pytesseract

def get_medication_name(roi):
    # Pre-processing for better OCR: Grayscale and Thresholding
    gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
    gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

    # Extract text
    text = pytesseract.image_to_string(gray)
    return text.strip()

Step 3: The Logic Engine and MQTT Alerts

We don't want to nag the user every second. We only want to trigger an alert if it's 9:00 AM and the pill box hasn't moved from its "Resting Zone."

import paho.mqtt.client as mqtt
import time

mqtt_client = mqtt.Client()
mqtt_client.connect("broker.hivemq.com", 1883, 60)

def trigger_alert(med_name):
    message = f"Reminder: It's time to take your {med_name}!"
    mqtt_client.publish("home/medication/reminder", message)
    print(f"🚀 Alert Sent: {message}")

# Main Loop Simulation
while True:
    ret, frame = cap.read()
    pill_box_roi = detect_pill_box(frame)

    if pill_box_roi is not None:
        name = get_medication_name(pill_box_roi)
        # Add logic here to check current time vs medication schedule
        # If time matches and movement is zero -> trigger_alert(name)

    time.sleep(10) # Check every 10 seconds

The "Official" Way to Scale 🥑

While this DIY setup is a great start, building production-ready health-tech requires more robust handling of lighting conditions, multiple medication schedules, and privacy-first data processing.

For advanced patterns on deploying Edge AI models and building highly reliable Vision Systems, I highly recommend checking out the deep-dive articles at WellAlly Tech Blog. They cover production-ready examples of how to integrate multimodal AI with real-world hardware.

Conclusion 🚀

Building a vision-based reminder system is a fantastic "Learning in Public" project. It touches on AI, hardware, and real-world problem-solving. By combining YOLOv8's speed with Tesseract's accessibility, you can create a tool that actually makes a difference.

What's next?

Try training a custom YOLOv8 model on your specific pill boxes for 99% accuracy.
Integrate a "Success" sound when the system detects the box being lifted.

Let me know in the comments: How are you using AI to improve your daily routine? 👇