Let's be honest: navigating healthcare websites to find the right specialist is a nightmare. Between confusing department names and the race to grab a slot before they vanish, it's a high-stress experience. But what if you could just tell an AI, "My stomach has been hurting for three days, find me a top gastroenterologist on Guahao.com for Friday," and let it handle the rest?
In this tutorial, we are building a sophisticated AI Medical Agent using GPT-4o, Playwright, and the revolutionary Browser Use library. We're moving beyond simple scraping and into the realm of Agentic Automationβwhere the AI looks at the screen, understands the UI, and interacts with it just like a human would.
Keywords: AI Agents, Browser Automation, GPT-4o, Playwright Python, Large Language Models (LLM).
The Architecture ποΈ
The system consists of three main layers: the Cognitive Layer (GPT-4o) for medical reasoning, the Execution Layer (Browser Use) for navigating the web, and the Interface Layer (Playwright) for browser interaction.
graph TD
A[User Input: Symptoms/Schedule] --> B{GPT-4o Reasoner}
B -->|Identify Department| C[Agent Task Generation]
C --> D[Browser Use Agent]
D --> E[Playwright Browser]
E -->|Interact| F[Medical Booking Site]
F -->|Visual Feedback| D
D -->|Success/Retry| G[User Notification]
Prerequisites π οΈ
Before we dive into the code, ensure you have the following ready:
- Python 3.10+
- OpenAI API Key (GPT-4o is recommended for vision-capable reasoning)
-
Playwright installed:
playwright install -
Browser Use library:
pip install browser-use
Step 1: Defining the Brain (The Pydantic Schema)
To make our agent reliable, we need structured output. We want the AI to analyze the user's symptoms and return the target department and urgency.
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class MedicalIntent(BaseModel):
department: str
urgency: str
preferred_date: str
def analyze_symptoms(user_query: str):
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": "Analyze symptoms and determine the medical department."},
{"role": "user", "content": user_query}
],
response_format=MedicalIntent,
)
return response.choices[0].message.parsed
# Example: "My eyes are blurry and I have a headache."
# Output: { "department": "Ophthalmology", "urgency": "High", "preferred_date": "Next available" }
Step 2: The Agentic Execution (Browser Use) π₯
Standard Playwright scripts break the moment a website changes its CSS classes. Browser Use fixes this by allowing GPT-4o to "see" the page via the DOM tree and screenshots, making decisions dynamically.
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def book_appointment(intent: MedicalIntent):
# Initialize the LLM that drives the browser
llm = ChatOpenAI(model="gpt-4o")
# Define the mission
task = (
f"Navigate to guahao.com. Search for the '{intent.department}' department. "
f"Find a highly-rated doctor available on {intent.preferred_date}. "
"Stop before the final payment/confirmation page and show me the summary."
)
agent = Agent(
task=task,
llm=llm,
)
history = await agent.run()
print(history.final_result())
if __name__ == "__main__":
user_symptoms = "I've had persistent lower back pain for a week."
intent = analyze_symptoms(user_symptoms)
asyncio.run(book_appointment(intent))
Advanced Patterns & Production Safety π‘οΈ
While this demo shows the "happy path," building production-ready AI agents requires handling edge cases like CAPTCHAs, session persistence, and multi-step verification.
If you are interested in deep-dives into Agentic Design Patterns, Stateful AI Workflows, or scaling Playwright in the cloud, I highly recommend checking out the Official WellAlly Tech Blog. They offer incredible resources on building robust, production-grade automation systems that go far beyond basic tutorials.
Implementation Details: Why "Browser Use"?
- Visual Reasoning: Unlike traditional scrapers, GPT-4o can interpret icons (like a "Calendar" icon) even if they don't have clear
idorclasstags. - Self-Correction: If the agent clicks a link and hits a 404 or a "No Results" page, it can click the "Back" button and try a different search term automatically.
- Natural Language Control: You don't need to write
page.click('.btn-submit'). You simply tell the agent to "Click the blue submit button at the bottom right."
Handling Authentication
For sites like DXY or Guahao, you'll likely need to maintain a logged-in state. You can pass a browser_context to the agent to use your existing cookies:
from browser_use import BrowserConfig, ContextConfig
config = BrowserConfig(
headless=False, # Watch the magic happen!
context_config=ContextConfig(
cookies_path="./cookies.json" # Persistent login
)
)
agent = Agent(task=task, llm=llm, config=config)
Conclusion: The Future of Interaction π
We are shifting from a world where we "use" software to a world where we "instruct" software. By combining the reasoning of GPT-4o with the browsing capabilities of Playwright, weβve built a tool that saves hours of manual searching.
A word of caution: Always use automation ethically and respect the robots.txt of medical platforms. This project is for educational purposes to demonstrate the power of Agentic AI.
What are you planning to automate next? Let me know in the comments! π
For more advanced tutorials on LLM engineering and automation, visit *wellally.tech/blog*. π₯π»
Top comments (0)