DEV Community

wellallyTech
wellallyTech

Posted on

Goodbye Manual Booking: Building an AI Medical Concierge with GPT-4o and Browser Use πŸ₯πŸ€–

Let's be honest: navigating healthcare websites to find the right specialist is a nightmare. Between confusing department names and the race to grab a slot before they vanish, it's a high-stress experience. But what if you could just tell an AI, "My stomach has been hurting for three days, find me a top gastroenterologist on Guahao.com for Friday," and let it handle the rest?

In this tutorial, we are building a sophisticated AI Medical Agent using GPT-4o, Playwright, and the revolutionary Browser Use library. We're moving beyond simple scraping and into the realm of Agentic Automationβ€”where the AI looks at the screen, understands the UI, and interacts with it just like a human would.

Keywords: AI Agents, Browser Automation, GPT-4o, Playwright Python, Large Language Models (LLM).


The Architecture πŸ—οΈ

The system consists of three main layers: the Cognitive Layer (GPT-4o) for medical reasoning, the Execution Layer (Browser Use) for navigating the web, and the Interface Layer (Playwright) for browser interaction.

graph TD
    A[User Input: Symptoms/Schedule] --> B{GPT-4o Reasoner}
    B -->|Identify Department| C[Agent Task Generation]
    C --> D[Browser Use Agent]
    D --> E[Playwright Browser]
    E -->|Interact| F[Medical Booking Site]
    F -->|Visual Feedback| D
    D -->|Success/Retry| G[User Notification]
Enter fullscreen mode Exit fullscreen mode

Prerequisites πŸ› οΈ

Before we dive into the code, ensure you have the following ready:

  • Python 3.10+
  • OpenAI API Key (GPT-4o is recommended for vision-capable reasoning)
  • Playwright installed: playwright install
  • Browser Use library: pip install browser-use

Step 1: Defining the Brain (The Pydantic Schema)

To make our agent reliable, we need structured output. We want the AI to analyze the user's symptoms and return the target department and urgency.

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class MedicalIntent(BaseModel):
    department: str
    urgency: str
    preferred_date: str

def analyze_symptoms(user_query: str):
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Analyze symptoms and determine the medical department."},
            {"role": "user", "content": user_query}
        ],
        response_format=MedicalIntent,
    )
    return response.choices[0].message.parsed

# Example: "My eyes are blurry and I have a headache."
# Output: { "department": "Ophthalmology", "urgency": "High", "preferred_date": "Next available" }
Enter fullscreen mode Exit fullscreen mode

Step 2: The Agentic Execution (Browser Use) πŸ₯‘

Standard Playwright scripts break the moment a website changes its CSS classes. Browser Use fixes this by allowing GPT-4o to "see" the page via the DOM tree and screenshots, making decisions dynamically.

import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def book_appointment(intent: MedicalIntent):
    # Initialize the LLM that drives the browser
    llm = ChatOpenAI(model="gpt-4o")

    # Define the mission
    task = (
        f"Navigate to guahao.com. Search for the '{intent.department}' department. "
        f"Find a highly-rated doctor available on {intent.preferred_date}. "
        "Stop before the final payment/confirmation page and show me the summary."
    )

    agent = Agent(
        task=task,
        llm=llm,
    )

    history = await agent.run()
    print(history.final_result())

if __name__ == "__main__":
    user_symptoms = "I've had persistent lower back pain for a week."
    intent = analyze_symptoms(user_symptoms)
    asyncio.run(book_appointment(intent))
Enter fullscreen mode Exit fullscreen mode

Advanced Patterns & Production Safety πŸ›‘οΈ

While this demo shows the "happy path," building production-ready AI agents requires handling edge cases like CAPTCHAs, session persistence, and multi-step verification.

If you are interested in deep-dives into Agentic Design Patterns, Stateful AI Workflows, or scaling Playwright in the cloud, I highly recommend checking out the Official WellAlly Tech Blog. They offer incredible resources on building robust, production-grade automation systems that go far beyond basic tutorials.


Implementation Details: Why "Browser Use"?

  1. Visual Reasoning: Unlike traditional scrapers, GPT-4o can interpret icons (like a "Calendar" icon) even if they don't have clear id or class tags.
  2. Self-Correction: If the agent clicks a link and hits a 404 or a "No Results" page, it can click the "Back" button and try a different search term automatically.
  3. Natural Language Control: You don't need to write page.click('.btn-submit'). You simply tell the agent to "Click the blue submit button at the bottom right."

Handling Authentication

For sites like DXY or Guahao, you'll likely need to maintain a logged-in state. You can pass a browser_context to the agent to use your existing cookies:

from browser_use import BrowserConfig, ContextConfig

config = BrowserConfig(
    headless=False, # Watch the magic happen!
    context_config=ContextConfig(
        cookies_path="./cookies.json" # Persistent login
    )
)
agent = Agent(task=task, llm=llm, config=config)
Enter fullscreen mode Exit fullscreen mode

Conclusion: The Future of Interaction πŸš€

We are shifting from a world where we "use" software to a world where we "instruct" software. By combining the reasoning of GPT-4o with the browsing capabilities of Playwright, we’ve built a tool that saves hours of manual searching.

A word of caution: Always use automation ethically and respect the robots.txt of medical platforms. This project is for educational purposes to demonstrate the power of Agentic AI.

What are you planning to automate next? Let me know in the comments! πŸ‘‡


For more advanced tutorials on LLM engineering and automation, visit *wellally.tech/blog*. πŸ₯‘πŸ’»

Top comments (0)