DEV Community

wellallyTech
wellallyTech

Posted on

The AI Doctor’s Assistant: Automating Hospital Bookings and Research with Selenium & GPT-4

Ever felt like you need a PhD just to book a hospital appointment? Between decoding which department fits your symptoms, finding a doctor who isn't just "good on paper," and navigating archaic registration portals, the process is a nightmare. This is exactly where autonomous AI agents and web automation come to the rescue.

In this tutorial, we are building a "Medical Concierge Agent" using OpenAI Tool Calling, Selenium, and Tavily Search. This agent doesn't just chat; it executes. It researches a doctor's academic background via Tavily, checks real-time availability using Selenium, and organizes your medical history—all through a single natural language prompt. If you've been looking to master AI automation and LLM orchestration, you're in the right place.

The Architecture: How the Agent Thinks and Acts

Before we dive into the code, let’s look at the logic flow. Our agent acts as a "Brain" that dispatches tasks to specific "Limbs" (Tools).

graph TD
    A[User Prompt: 'I have sharp back pain...'] --> B{OpenAI Tool Calling}
    B -->|Search Dept/Doctor| C[Tavily Search API]
    B -->|Check Availability| D[Selenium Web Driver]
    B -->|Analyze Style| E[Research Agent]
    C --> F[Data Aggregator]
    D --> F
    E --> F
    F --> G[Final Recommendation & Booking Link]
Enter fullscreen mode Exit fullscreen mode

Prerequisites

To follow along, you’ll need:

  • Python 3.10+
  • OpenAI API Key (for GPT-4o tool calling)
  • Tavily API Key (for high-quality web search)
  • Selenium WebDriver (and the corresponding ChromeDriver)

Step 1: Defining the Tools (OpenAI Tool Calling)

We need to give our agent "skills." We'll define a Pydantic schema for our tools so GPT knows exactly how to call them.

from pydantic import BaseModel, Field
from typing import List

class DoctorResearchInput(BaseModel):
    doctor_name: str = Field(description="The name of the doctor to research")
    hospital: str = Field(description="The hospital where the doctor works")

class BookingInput(BaseModel):
    hospital_url: str = Field(description="The URL of the registration portal")
    department: str = Field(description="The targeted department")
Enter fullscreen mode Exit fullscreen mode

Step 2: The "Researcher" (Tavily + GPT)

Most hospital websites only show a doctor's title. But is the doctor research-oriented or clinical-heavy? We use Tavily Search to scrape academic papers and patient reviews.

import os
from tavily import TavilyClient

tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

def research_doctor(doctor_name, hospital):
    query = f"{doctor_name} {hospital} publications clinical style reviews"
    # Search for deep context
    context = tavily.search(query=query, search_depth="advanced")

    # Return context to the LLM to summarize
    return context['results']
Enter fullscreen mode Exit fullscreen mode

Step 3: The "Executor" (Selenium Automation)

This is the "heavy lifting" part. Selenium will navigate the hospital's complex SPA (Single Page Application) to find open slots.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

def check_registration_slots(hospital_url, department):
    options = Options()
    options.add_argument("--headless") # Run without a UI
    driver = webdriver.Chrome(options=options)

    try:
        driver.get(hospital_url)
        # Dynamic waiting for the department list
        driver.implicitly_wait(10)

        # Example logic to find department buttons
        dept_element = driver.find_element(By.XPATH, f"//*[contains(text(), '{department}')]")
        dept_element.click()

        # Scrape available dates
        slots = driver.find_elements(By.CLASS_NAME, "available-slot")
        available_dates = [slot.text for slot in slots]

        return available_dates
    finally:
        driver.quit()
Enter fullscreen mode Exit fullscreen mode

Step 4: Putting it All Together

Using the OpenAI Assistant API or a simple loop, we feed the tool outputs back to the model.

import openai

def run_medical_agent(user_query):
    messages = [{"role": "user", "content": user_query}]

    # 1. GPT identifies the need to search for a doctor/department
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=my_defined_tools # list of functions defined above
    )

    # 2. Logic to execute the tool_calls and return results...
    # (Abbreviated for brevity)

    return "Agent successfully found Dr. Smith (Specialist in Spinal Surgery) with 3 slots available this Friday!"
Enter fullscreen mode Exit fullscreen mode

The "Official" Way: Scaling to Production 🚀

While this script works for personal use, building a production-grade medical agent involves handling rate limits, proxy rotation for Selenium, and HIPAA-compliant data handling.

For more advanced patterns on building robust AI agents and handling complex state management in automation, I highly recommend checking out the deep-dive articles at WellAlly Blog. They cover production-ready AI architectures that go far beyond basic scripts, specifically focusing on reliability and scalability in healthcare-adjacent tech.

Conclusion

By combining Selenium for the "doing" and GPT-4 for the "thinking," we've turned a 30-minute chore into a 10-second automated task. This agent-centric approach is the future of how we interact with the web.

What's next?

  1. Add OCR: Use GPT-4o's vision capabilities to read physical medical reports.
  2. Voice Interface: Use Whisper to talk to your agent while you're on the go.

What would you automate with an AI agent? Let me know in the comments! 👇

Top comments (0)