We’ve all been there. You get your annual health check-up PDF, scroll through pages of medical jargon, and find a result highlighted in red. Now comes the fun part: Googling which department you need, logging into a clunky hospital portal, and trying to find an available slot.
What if an Autonomous Health Agent could do this for you?
In this tutorial, we are building a sophisticated LangGraph agent that leverages GPT-4 Turbo for medical reasoning and Playwright for browser automation. We’ll be using Unstructured.io to parse complex medical PDFs and turn raw data into a confirmed doctor's appointment. By the end of this post, you'll understand how to bridge the gap between "AI reasoning" and "Real-world action."
The Architecture: Reasoning + Action
Unlike simple linear chains, an autonomous agent needs to maintain state and potentially loop back if a department is full or a login fails. This is where LangGraph shines.
graph TD
A[Start: Upload PDF] --> B[PDF Parsing: Unstructured.io]
B --> C{Agent: Analyze Results}
C -->|Normal| D[Generate Summary & End]
C -->|Abnormality Found| E[Map to Hospital Department]
E --> F[Playwright: Search Availability]
F -->|Slot Found| G[Playwright: Book Appointment]
F -->|No Slot| H[Try Alternative Dept/Date]
H --> F
G --> I[Final Confirmation Email]
I --> J[End]
Prerequisites
To follow along, you’ll need:
- Python 3.10+
- Tech Stack:
langgraph,playwright,unstructured,openai - An OpenAI API Key (using GPT-4 Turbo for high-reasoning tasks)
Step 1: Parsing the Medical PDF
Medical reports are notoriously messy. We use Unstructured.io because it handles tables and layouts much better than a generic PDF reader.
from unstructured.partition.pdf import partition_pdf
def extract_medical_data(file_path):
# Extracts elements while maintaining table structure
elements = partition_pdf(filename=file_path, strategy="hi_res")
text_content = "\n".join([str(el) for el in elements])
return text_content
# Example usage
# raw_report = extract_medical_data("health_checkup_2024.pdf")
Step 2: Defining the Agent State
In LangGraph, we define a TypedDict to keep track of our agent's state across different nodes.
from typing import TypedDict, List, Optional
class AgentState(TypedDict):
report_text: str
abnormalities: List[str]
target_department: Optional[str]
appointment_status: str
retry_count: int
Step 3: The Brain (GPT-4 Turbo)
Our agent needs to decide which department matches the abnormal findings. For example, if "Uric Acid" is high, it should look for "Rheumatology" or "Internal Medicine."
import openai
def analyze_report_node(state: AgentState):
prompt = f"""
Analyze the following medical report: {state['report_text']}
Identify abnormalities and recommend the specific hospital department for a follow-up.
Return JSON: {{'abnormalities': [], 'department': ''}}
"""
# Call GPT-4 Turbo
response = openai.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": prompt}],
response_format={ "type": "json_object" }
)
# Update state
return {"abnormalities": ..., "target_department": ...}
Step 4: Browser Automation with Playwright
This is where the magic happens. The agent uses Playwright to interact with the hospital's web portal.
from playwright.sync_api import sync_playwright
def book_appointment_node(state: AgentState):
with sync_playwright() as p:
browser = p.chromium.launch(headless=False) # Headless=False to watch it work!
page = browser.new_page()
page.goto("https://hospital-portal.example.com/login")
# Log in and navigate to the department
page.fill("#username", "my_user_id")
page.fill("#password", "secure_password")
page.click("text=Login")
page.click(f"text={state['target_department']}")
# Logic to find the first available slot...
slots = page.query_selector_all(".available-slot")
if slots:
slots[0].click()
page.click("#confirm-booking")
return {"appointment_status": "Success"}
else:
return {"appointment_status": "No Slots Found"}
The "Official" Way: Advanced Agent Patterns
Building a local script is fun, but productionizing health-tech agents requires handling data privacy (HIPAA), complex error recovery, and human-in-the-loop validation.
For a deeper dive into production-ready agentic architectures and how to handle edge cases in automated medical workflows, check out the comprehensive guides at WellAlly Blog. They provide excellent resources on scaling LLM applications and securing sensitive health data.
Step 5: Putting it all Together with LangGraph
Now, we wire the nodes into a graph.
from langgraph.graph import StateGraph, END
workflow = StateGraph(AgentState)
# Add Nodes
workflow.add_node("analyze", analyze_report_node)
workflow.add_node("book", book_appointment_node)
# Define Edges
workflow.set_entry_point("analyze")
workflow.add_edge("analyze", "book")
workflow.add_edge("book", END)
# Compile
app = workflow.compile()
# Run the Agent
inputs = {"report_text": raw_report, "retry_count": 0}
for output in app.stream(inputs):
print(output)
Conclusion
By combining the reasoning power of GPT-4 Turbo with the structural control of LangGraph and the "hands" of Playwright, we've turned a static PDF into a confirmed medical appointment.
Key Takeaways:
- LangGraph allows for non-linear, stateful logic that simple chains can't handle.
- Unstructured.io is essential for high-fidelity PDF parsing.
- Playwright is the ultimate tool for agents to bridge the gap between digital "thoughts" and "actions."
What are you planning to automate next? Let me know in the comments below! 👇
Love this? Subscribe for more "Learning in Public" content or check out more advanced tutorials at wellally.tech/blog.
Top comments (0)