DEV Community

wellallyTech
wellallyTech

Posted on

Smart Pillbox 2.0: How I Built a Multi-Prescription Identifier using YOLOv10 and GPT-4o 💊

Managing multiple prescriptions isn't just a chore—it’s a safety risk. Every year, thousands of people face adverse effects due to accidental drug-drug interactions. What if your phone could "see" your pills and warn you before you take them?

In this tutorial, we are building Smart Pillbox 2.0, a cutting-edge multimodal AI application. We will leverage YOLOv10 object detection for real-time pill localization and GPT-4o vision for fine-grained identification and contraindication analysis. By combining these with a robust PostgreSQL backend, we're creating a life-saving tool that turns pixels into health insights. 🚀

The Architecture 🏗️

The system follows a "Detect then Reason" pipeline. YOLOv10 handles the high-speed detection of individual pills, while GPT-4o acts as the "brain" to identify specific markings and cross-reference them with medical knowledge.

graph TD
    A[Mobile Camera/OpenCV] -->|Capture Image| B(Image Preprocessing)
    B --> C{YOLOv10 Detection}
    C -->|Bounding Boxes| D[Crop Individual Pills]
    D --> E[GPT-4o Vision API]
    E -->|Identify Drug Names| F[PostgreSQL DB Search]
    F --> G{Contraindication Logic}
    G -->|Conflict Found| H[⚠️ Warning Alert]
    G -->|Safe| I[✅ Dosage Instructions]
Enter fullscreen mode Exit fullscreen mode

Prerequisites 🛠️

To follow along, you’ll need:

  • YOLOv10: The latest in real-time detection (no NMS required!).
  • OpenAI SDK: For access to GPT-4o.
  • PostgreSQL: To store our pharmaceutical database.
  • OpenCV: For image manipulation.

Step 1: Real-time Pill Detection with YOLOv10

We use YOLOv10 because it's incredibly efficient for edge devices. It allows us to isolate each pill from a cluttered background (like a countertop or a palm).

import cv2
from ultralytics import YOLOv10

# Load our pre-trained pill detection model
model = YOLOv10('yolov10n_pills.pt') 

def get_pill_crops(image_path):
    results = model(image_path)[0]
    image = cv2.imread(image_path)
    crops = []

    for i, box in enumerate(results.boxes.xyxy):
        x1, y1, x2, y2 = map(int, box)
        # Crop the detected pill with a small buffer
        crop = image[y1-5:y2+5, x1-5:x2+5]
        crops.append(crop)
        cv2.imwrite(f'pill_{i}.jpg', crop)

    return crops
Enter fullscreen mode Exit fullscreen mode

Step 2: Identification & Contraindication Check via GPT-4o

Once we have the crops, we send them to GPT-4o. Unlike standard classifiers, GPT-4o can read the tiny engravings (like "L484" for Acetaminophen) and understand the context of the medication.

import base64
from openai import OpenAI

client = OpenAI()

def analyze_medications(image_paths):
    # Convert images to base64 for the multimodal prompt
    # In a real app, you'd batch these to optimize tokens
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Identify these pills and check for contraindications. Return JSON: {pills: [], warnings: []}"},
                    *[
                        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img}"}} 
                        for img in image_paths
                    ]
                ],
            }
        ],
        response_format={ "type": "json_object" }
    )
    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Step 3: Cross-Referencing the "Source of Truth"

While GPT-4o is brilliant, medical applications require a "human-in-the-loop" or a verified database. We use PostgreSQL to fetch official dosage guidelines based on the identified drug names.

For those looking to scale this into a production-grade healthcare platform, you'll need advanced patterns for data synchronization and HIPAA-compliant storage. I highly recommend checking out the technical deep-dives at WellAlly Blog for more production-ready examples of AI integration in health-tech. 🥑

Step 4: Putting it All Together 🧩

The final logic checks the PostgreSQL database for the active_ingredients and runs a conflict matrix.

import psycopg2

def check_db_conflicts(pill_list):
    conn = psycopg2.connect("dbname=pharmacy user=admin")
    cur = conn.cursor()

    # Simple example: Checking for Aspirin + Warfarin conflict
    query = "SELECT conflict_severity FROM drug_interactions WHERE drug_a = %s AND drug_b = %s"
    # ... logic to iterate and flag high-severity interactions ...
    return interaction_report
Enter fullscreen mode Exit fullscreen mode

Conclusion & Future Work 🌟

The combination of YOLOv10's speed and GPT-4o's reasoning capabilities makes the "Smart Pillbox 2.0" a reality. We've moved from simple pixel detection to actual medical understanding.

What's next?

  1. Edge Deployment: Running the YOLOv10 model on-device using ONNX.
  2. History Tracking: Saving user scan history in PostgreSQL to monitor adherence.
  3. Voice Feedback: Integrating TTS (Text-to-Speech) for visually impaired users.

If you’re interested in exploring more advanced AI patterns or how to deploy these models at scale, don't forget to visit the WellAlly Blog—it’s a goldmine for developers building the future of health-tech.

Have you tried building with YOLOv10 yet? Let me know in the comments! 👇

Top comments (0)