Weβve all been there: waking up at 6:00 AM, frantically refreshing a hospital's booking page, only to find that all the slots were snatched up by bots in milliseconds. Itβs frustrating, repetitive, and quite frankly, a task perfectly suited for an AI Agent.
In this tutorial, we are building Auto-Doc-Scheduler, an intelligent agent that leverages the viral Browser-use framework, GPT-4o, and the Google Calendar API. This agent doesn't just scrape data; it actively navigates complex web interfaces, logs into portals, and finds an appointment slot that perfectly fits your existing schedule.
If you're interested in AI Agents, browser automation, or LLM-driven workflows, you're in the right place. Letβs dive into how we can turn "pixels into appointments."
The Architecture ποΈ
Before we write a single line of code, letβs look at how these pieces fit together. We aren't just writing a script; we're building a reasoning loop where the LLM sees the browser state and decides the next click.
graph TD
A[User Trigger] --> B[Google Calendar API]
B --> C{Find Free Slots}
C --> D[GPT-4o Decision Engine]
D --> E[Browser-use Agent]
E --> F[Playwright / Chromium]
F --> G[Hospital Booking Portal]
G --> H{Slot Available?}
H -- Yes --> I[Execute Booking & Auth]
H -- No --> J[Retry/Wait]
I --> K[Update Google Calendar]
K --> L[Notify User via SMS/Email]
Prerequisites π οΈ
To follow along, you'll need the following tech stack:
- Python 3.10+
- Playwright: For the underlying browser control.
- LangChain / GPT-4o: To provide the "brain" for our agent.
- Browser-use: The high-level library that makes agentic web navigation easy.
- Google Calendar API: To check your availability.
pip install browser-use langchain-openai playwright google-api-python-client google-auth-httplib2 google-auth-oauthlib
playwright install chromium
Step 1: Checking Your Availability ποΈ
We don't want our agent to book an appointment during your big presentation. First, we'll fetch your "free time" using the Google Calendar API.
from googleapiclient.discovery import build
# ... (standard Google OAuth boilerplate) ...
def get_free_slots(service, start_time, end_time):
"""Returns a list of busy periods to avoid."""
events_result = service.events().list(
calendarId='primary', timeMin=start_time,
timeMax=end_time, singleEvents=True,
orderBy='startTime'
).execute()
events = events_result.get('items', [])
return [(e['start'].get('dateTime'), e['end'].get('dateTime')) for e in events]
Step 2: Defining the Browser-use Agent π€
The core of this project is the browser-use framework. Unlike traditional Selenium scripts that break when a CSS class changes, this agent uses Computer Vision and DOM tree analysis to understand the page.
from browser_use import Agent
from langchain_openai import ChatOpenAI
import asyncio
async def run_appointment_agent(target_date, available_windows):
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o")
# Define the complex task
task = f"""
1. Go to 'https://hospital-portal.example.com/login'.
2. Login with credentials found in environment variables.
3. Navigate to the 'Physical Examination' or 'General Practitioner' department.
4. Search for appointments on {target_date}.
5. Cross-reference available slots with user free windows: {available_windows}.
6. If a match is found, click 'Book' and complete the form.
7. If a CAPTCHA appears, alert me or try to solve it if it's a simple checkbox.
"""
agent = Agent(
task=task,
llm=llm,
)
history = await agent.run()
print(history.final_result())
if __name__ == "__main__":
asyncio.run(run_appointment_agent("2023-11-25", "9:00 AM - 12:00 PM"))
Deep Dive: Why Browser-use? π₯
Traditional automation (like pure Playwright) is brittle. If the "Book Now" button changes from a <div> to a <span>, your script dies.
By using GPT-4o as the navigator, the agent looks at the rendered page just like a human. It sees the text "Schedule Appointment," identifies the coordinates, and tells Playwright to click there. This is the future of LLM-driven RPA (Robotic Process Automation).
Pro Tip: When building production-grade agents, you often need more than just a simple script. For advanced patterns on managing long-running agent states and error handling, check out the deep-dive articles at WellAlly Tech Blog. They cover excellent strategies for scaling AI workflows that I used as inspiration for this scheduler!
Step 3: Handling Authentication and Security π
Never hardcode your passwords! Use environment variables or a secret manager.
import os
from dotenv import load_dotenv
load_dotenv()
# Pass these contextually to the agent
USERNAME = os.getenv("HOSPITAL_USER")
PASSWORD = os.getenv("HOSPITAL_PASS")
The agent is smart enough to find the input fields labeled "Username" or "Email" and fill them accordingly without you needing to provide the exact XPath.
Challenges & Solutions π§
- CAPTCHAs: While GPT-4o is good, some "harsher" CAPTCHAs require specialized solvers (like 2Captcha) integrated into the Playwright flow.
- Concurrency: Booking platforms often have high traffic. You can run multiple instances of the agent using a
TaskGroupin Python'sasyncio. - State Management: If the browser crashes mid-booking, you need to save the state.
browser-useallows for session persistence.
Conclusion: The Era of Personal AI Assistants π
The Auto-Doc-Scheduler is just one example of how AI agents can reclaim our time. By combining the reasoning of GPT-4o with the browsing capabilities of Playwright, weβve moved beyond simple chatbots into the world of Actionable AI.
Next Steps:
- Integrate Twilio to get an SMS confirmation once the slot is booked.
- Deploy this as a cron job on a VPS so it runs every morning at the "booking drop" time.
What would you automate with a browser agent? Let me know in the comments! π
For more production-ready AI agent architectures and enterprise-level automation tips, don't forget to visit the WellAlly Blog. Stay curious and keep building! π₯π»
Top comments (0)