Have you ever struggled to read the tiny, faded text on a medicine strip? Or maybe you wanted to quickly know the side effects of a pill but found the medical jargon too complex?
For the Build with Google Gemini API Challenge, I decided to solve this real-world problem. I built an AI pipeline that not only reads the text from a medicine strip but also understands it and provides a clean, easy-to-read summary of the medicine's uses, ingredients, and warnings.
๐ ๏ธ The Tech Stack
YOLO (You Only Look Once): For detecting and cropping the exact location of the text on the medicine strip.
EasyOCR: For extracting the raw text from the cropped image.
Google Gemini API (gemini-2.5-flash): The brain of the operation. It takes the messy OCR output and structures it into meaningful medical information.
Python: The glue holding it all together.
๐ How It Works
Step 1: Text Extraction (Vision)
First, my system uses a camera to capture the medicine strip. YOLO identifies the text regions, and EasyOCR extracts the raw characters.
Example OCR Output: "Paracetamol Tablets IP 500mg Dolo 500 Micro Labs"
Step 2: AI Comprehension (Gemini API)
Raw OCR text is often unstructured and hard for a normal user to understand. This is where the new google-genai SDK shines. I pass this raw text to the Gemini 2.5 Flash model with a specific prompt to act as a medical assistant.
Here is the core code that powers this integration:
Python
`from google import genai
Initialize the client
client = genai.Client(api_key="YOUR_API_KEY")
ocr_extracted_text = "Paracetamol Tablets IP 500mg Dolo 500 Micro Labs"
prompt = f"""
I have extracted the following text from a medicine strip using an OCR pipeline:
'{ocr_extracted_text}'
Please act as a helpful AI assistant and extract the following details from this text:
- Medicine Name
- Active Ingredients & Composition
- Common Uses
- General Warnings or Side Effects
Please keep the response well-formatted, brief, and include a standard medical disclaimer.
"""
response = client.models.generate_content(
model='gemini-2.5-flash',
contents=prompt
)
print(response.text)`
Step 3: The Output
The Gemini model processes the request instantly and returns a beautifully formatted summary.
(Here is the actual output from my terminal):
- Medicine Name: Dolo 500 (Brand Name), Paracetamol Tablets IP (Generic Name & Form)
- Active Ingredients & Composition: Paracetamol 500mg (conforming to Indian Pharmacopoeia standards)
- Common Uses (Inferred): Pain relief (e.g., headache, muscle aches, toothache) and fever reduction.
- General Warnings or Side Effects (Inferred):
Warnings: Do not exceed the recommended dose. Overdose can cause liver damage...
Common Side Effects: Nausea, stomach upset, allergic reactions...
(Disclaimer: This is for informational purposes and uses AI inference, not a substitute for professional medical advice.)
๐ก Why Gemini 2.5 Flash?
I chose the gemini-2.5-flash model because it is incredibly fast, which is crucial for a real-time scanning application. The updated google-genai SDK was also very straightforward to implement.
๐ฎ What's Next?
In the future, I plan to integrate this pipeline into a mobile app or a Raspberry Pi setup using Node-RED for automated sorting!
Thanks for reading, and happy coding!
Top comments (1)
Hi everyone read this