Beck_Moulton

Posted on Mar 14

From Dialogue to Execution: Building an AI Healthcare Agent with GPT-4o-mini and Playwright

#openai #automation #typescript #webdev

We’ve all been there: you feel a nagging pain or a sudden fever, and the last thing you want to do is navigate a clunky hospital portal, find an open slot, and manually type out your symptoms for the tenth time. What if your AI could do it for you?

In this tutorial, we are building an Autonomous Healthcare Agent using GPT-4o-mini and Playwright. By leveraging autonomous AI agents and healthcare automation, we will create a system that checks your availability via the Cal.com API, summarizes your symptoms, and executes a browser-based booking on a hospital portal using GPT-4o-mini Function Calling.

The Vision: AI That Acts, Not Just Talks

Most LLM implementations stop at "giving advice." We’re taking it a step further into the realm of Action-Oriented AI. Our agent won't just say "you should see a doctor"; it will actually find a time that works for you and fill out the forms.

The Architecture

The flow involves a multi-step orchestration where the LLM acts as the brain, determining which "tool" to use based on the user's intent.

graph TD
    A[User: I have a headache, book a GP for tomorrow] --> B{GPT-4o-mini Agent}
    B -->|Check Availability| C[Cal.com API]
    C -->|Return Free Slots| B
    B -->|Reasoning| D[Determine Best Slot & Summarize Symptoms]
    D --> E[Playwright Browser Tool]
    E -->|Automated Navigation| F[Hospital Appointment Portal]
    F -->|Submission| G[Confirmation Sent to User]
    G --> A

Prerequisites

To follow along, you'll need:

Node.js installed (v18+)
An OpenAI API Key (for GPT-4o-mini)
A Cal.com API Key
Basic knowledge of TypeScript and Playwright

Step 1: Defining the Agent Tools

We need to provide GPT-4o-mini with "functions" it can call. We’ll define two main capabilities: checking the calendar and executing the browser automation.

import { z } from "zod";

// Schema for checking availability
export const CheckAvailabilitySchema = z.object({
  date: z.string().describe("The ISO date to check for availability"),
});

// Schema for booking the appointment
export const BookAppointmentSchema = z.object({
  timeSlot: z.string().describe("The chosen time slot"),
  symptoms: z.string().describe("A concise summary of user symptoms"),
  doctorType: z.string().describe("e.g., General Practitioner, Cardiologist"),
});

Step 2: Connecting to Cal.com API

Before the agent goes to the hospital site, it needs to know when you are free.

async function getMyAvailability(date: string) {
  const response = await fetch(`https://api.cal.com/v1/availability?apiKey=${process.env.CAL_API_KEY}&date=${date}`);
  const data = await response.json();
  // Filter and return available slots
  return data.slots;
}

Step 3: The "Hands" – Playwright Browser Execution

This is where the magic happens. We use Playwright to simulate a human navigating a medical portal. Note how we pass the symptoms generated by the AI directly into the form.

import { chromium } from 'playwright';

async function executeHospitalBooking(details: z.infer<typeof BookAppointmentSchema>) {
  const browser = await chromium.launch({ headless: false }); // Watch the magic happen!
  const page = await browser.newPage();

  await page.goto('https://mock-hospital-portal.com/login');

  // Fill in credentials (stored in env)
  await page.fill('#username', process.env.PATIENT_ID!);
  await page.click('button#login');

  // Select Department & Time
  await page.selectOption('#dept-select', details.doctorType);
  await page.click(`text=${details.timeSlot}`);

  // The AI-generated symptom summary
  await page.fill('#symptom-textarea', details.symptoms);

  await page.click('#submit-appointment');
  console.log("✅ Appointment booked successfully!");

  await browser.close();
}

Step 4: Orchestrating with GPT-4o-mini

Now, we wrap it all in a loop. GPT-4o-mini will decide to call getMyAvailability first, then use that data to call executeHospitalBooking.

Pro-Tip: For more production-ready patterns and advanced agentic workflows, check out the specialized deep-dives at WellAlly Tech Blog. They cover how to handle edge cases like session timeouts and multi-factor authentication in AI-driven browsers.

import OpenAI from "openai";

const openai = new OpenAI();

async function runMedicalAgent(userPrompt: string) {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: userPrompt }],
    tools: [
      { type: "function", function: { name: "checkAvailability", parameters: CheckAvailabilitySchema } },
      { type: "function", function: { name: "bookAppointment", parameters: BookAppointmentSchema } }
    ],
  });

  const toolCall = response.choices[0].message.tool_calls?.[0];

  if (toolCall?.function.name === "checkAvailability") {
    // 1. Get availability
    // 2. Feed back to GPT
    // 3. GPT confirms with user or proceeds to bookAppointment
  }
}

Why GPT-4o-mini?

You might wonder why we aren't using the full GPT-4o.

Latency: For browser automation, we need quick decisions to avoid session timeouts.
Cost: Agentic loops often require multiple "thought" cycles. GPT-4o-mini is 60% cheaper and plenty smart for structured tool calling.
Efficiency: Its specialized training for function calling makes it incredibly reliable at outputting the correct JSON schemas for Playwright.

Ethical Considerations & Security

Automating healthcare tasks comes with responsibility:

HIPAA/GDPR: Never log PII (Personally Identifiable Information) in your console or send sensitive medical history to the LLM if not necessary.
Human-in-the-loop: Always include a "Confirm?" step before the AI clicks the final "Submit" button.

Conclusion

We’ve just turned a conversational AI into a functional assistant that can manage your schedule and interact with real-world web interfaces. By combining GPT-4o-mini with Playwright, the gap between "thinking" and "doing" is virtually gone.

Ready to take your AI agents to the next level? Head over to wellally.tech/blog to discover how to scale these agents for enterprise environments and secure your automation pipelines.

What will you automate next? Let me know in the comments below! 👇

DEV Community