DEV Community

Surjeet Singh
Surjeet Singh

Posted on

Building an AI-Powered Medicine Strip Analyzer using YOLO, EasyOCR, and the Gemini API

Built with Google Gemini: Writing Challenge

Have you ever struggled to read the tiny, faded text on a medicine strip? Or maybe you wanted to quickly know the side effects of a pill but found the medical jargon too complex?

For the Build with Google Gemini API Challenge, I decided to solve this real-world problem. I built an AI pipeline that not only reads the text from a medicine strip but also understands it and provides a clean, easy-to-read summary of the medicine's uses, ingredients, and warnings.

๐Ÿ› ๏ธ The Tech Stack
YOLO (You Only Look Once): For detecting and cropping the exact location of the text on the medicine strip.

EasyOCR: For extracting the raw text from the cropped image.

Google Gemini API (gemini-2.5-flash): The brain of the operation. It takes the messy OCR output and structures it into meaningful medical information.

Python: The glue holding it all together.

๐Ÿš€ How It Works
Step 1: Text Extraction (Vision)
First, my system uses a camera to capture the medicine strip. YOLO identifies the text regions, and EasyOCR extracts the raw characters.
Example OCR Output: "Paracetamol Tablets IP 500mg Dolo 500 Micro Labs"

Step 2: AI Comprehension (Gemini API)
Raw OCR text is often unstructured and hard for a normal user to understand. This is where the new google-genai SDK shines. I pass this raw text to the Gemini 2.5 Flash model with a specific prompt to act as a medical assistant.

Here is the core code that powers this integration:

Python

`from google import genai

Initialize the client

client = genai.Client(api_key="YOUR_API_KEY")

ocr_extracted_text = "Paracetamol Tablets IP 500mg Dolo 500 Micro Labs"

prompt = f"""
I have extracted the following text from a medicine strip using an OCR pipeline:
'{ocr_extracted_text}'

Please act as a helpful AI assistant and extract the following details from this text:

  1. Medicine Name
  2. Active Ingredients & Composition
  3. Common Uses
  4. General Warnings or Side Effects

Please keep the response well-formatted, brief, and include a standard medical disclaimer.
"""

response = client.models.generate_content(
model='gemini-2.5-flash',
contents=prompt
)

print(response.text)`

Step 3: The Output
The Gemini model processes the request instantly and returns a beautifully formatted summary.

(Here is the actual output from my terminal):

  1. Medicine Name: Dolo 500 (Brand Name), Paracetamol Tablets IP (Generic Name & Form)
  2. Active Ingredients & Composition: Paracetamol 500mg (conforming to Indian Pharmacopoeia standards)
  3. Common Uses (Inferred): Pain relief (e.g., headache, muscle aches, toothache) and fever reduction.
  4. General Warnings or Side Effects (Inferred):

Warnings: Do not exceed the recommended dose. Overdose can cause liver damage...

Common Side Effects: Nausea, stomach upset, allergic reactions...

(Disclaimer: This is for informational purposes and uses AI inference, not a substitute for professional medical advice.)

๐Ÿ’ก Why Gemini 2.5 Flash?
I chose the gemini-2.5-flash model because it is incredibly fast, which is crucial for a real-time scanning application. The updated google-genai SDK was also very straightforward to implement.

๐Ÿ”ฎ What's Next?
In the future, I plan to integrate this pipeline into a mobile app or a Raspberry Pi setup using Node-RED for automated sorting!

Thanks for reading, and happy coding!

Top comments (1)

Collapse
 
surjeetlko profile image
Surjeet Singh

Hi everyone read this