Ever felt like you need a PhD just to book a hospital appointment? Between decoding which department fits your symptoms, finding a doctor who isn't just "good on paper," and navigating archaic registration portals, the process is a nightmare. This is exactly where autonomous AI agents and web automation come to the rescue.
In this tutorial, we are building a "Medical Concierge Agent" using OpenAI Tool Calling, Selenium, and Tavily Search. This agent doesn't just chat; it executes. It researches a doctor's academic background via Tavily, checks real-time availability using Selenium, and organizes your medical history—all through a single natural language prompt. If you've been looking to master AI automation and LLM orchestration, you're in the right place.
The Architecture: How the Agent Thinks and Acts
Before we dive into the code, let’s look at the logic flow. Our agent acts as a "Brain" that dispatches tasks to specific "Limbs" (Tools).
graph TD
A[User Prompt: 'I have sharp back pain...'] --> B{OpenAI Tool Calling}
B -->|Search Dept/Doctor| C[Tavily Search API]
B -->|Check Availability| D[Selenium Web Driver]
B -->|Analyze Style| E[Research Agent]
C --> F[Data Aggregator]
D --> F
E --> F
F --> G[Final Recommendation & Booking Link]
Prerequisites
To follow along, you’ll need:
- Python 3.10+
- OpenAI API Key (for GPT-4o tool calling)
- Tavily API Key (for high-quality web search)
- Selenium WebDriver (and the corresponding ChromeDriver)
Step 1: Defining the Tools (OpenAI Tool Calling)
We need to give our agent "skills." We'll define a Pydantic schema for our tools so GPT knows exactly how to call them.
from pydantic import BaseModel, Field
from typing import List
class DoctorResearchInput(BaseModel):
doctor_name: str = Field(description="The name of the doctor to research")
hospital: str = Field(description="The hospital where the doctor works")
class BookingInput(BaseModel):
hospital_url: str = Field(description="The URL of the registration portal")
department: str = Field(description="The targeted department")
Step 2: The "Researcher" (Tavily + GPT)
Most hospital websites only show a doctor's title. But is the doctor research-oriented or clinical-heavy? We use Tavily Search to scrape academic papers and patient reviews.
import os
from tavily import TavilyClient
tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
def research_doctor(doctor_name, hospital):
query = f"{doctor_name} {hospital} publications clinical style reviews"
# Search for deep context
context = tavily.search(query=query, search_depth="advanced")
# Return context to the LLM to summarize
return context['results']
Step 3: The "Executor" (Selenium Automation)
This is the "heavy lifting" part. Selenium will navigate the hospital's complex SPA (Single Page Application) to find open slots.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
def check_registration_slots(hospital_url, department):
options = Options()
options.add_argument("--headless") # Run without a UI
driver = webdriver.Chrome(options=options)
try:
driver.get(hospital_url)
# Dynamic waiting for the department list
driver.implicitly_wait(10)
# Example logic to find department buttons
dept_element = driver.find_element(By.XPATH, f"//*[contains(text(), '{department}')]")
dept_element.click()
# Scrape available dates
slots = driver.find_elements(By.CLASS_NAME, "available-slot")
available_dates = [slot.text for slot in slots]
return available_dates
finally:
driver.quit()
Step 4: Putting it All Together
Using the OpenAI Assistant API or a simple loop, we feed the tool outputs back to the model.
import openai
def run_medical_agent(user_query):
messages = [{"role": "user", "content": user_query}]
# 1. GPT identifies the need to search for a doctor/department
response = openai.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=my_defined_tools # list of functions defined above
)
# 2. Logic to execute the tool_calls and return results...
# (Abbreviated for brevity)
return "Agent successfully found Dr. Smith (Specialist in Spinal Surgery) with 3 slots available this Friday!"
The "Official" Way: Scaling to Production 🚀
While this script works for personal use, building a production-grade medical agent involves handling rate limits, proxy rotation for Selenium, and HIPAA-compliant data handling.
For more advanced patterns on building robust AI agents and handling complex state management in automation, I highly recommend checking out the deep-dive articles at WellAlly Blog. They cover production-ready AI architectures that go far beyond basic scripts, specifically focusing on reliability and scalability in healthcare-adjacent tech.
Conclusion
By combining Selenium for the "doing" and GPT-4 for the "thinking," we've turned a 30-minute chore into a 10-second automated task. This agent-centric approach is the future of how we interact with the web.
What's next?
- Add OCR: Use GPT-4o's vision capabilities to read physical medical reports.
- Voice Interface: Use Whisper to talk to your agent while you're on the go.
What would you automate with an AI agent? Let me know in the comments! 👇
Top comments (0)