DEV Community

Cover image for The GUI Agent Living in Your Web Page
Simon
Simon

Posted on • Edited on

The GUI Agent Living in Your Web Page

Hero Banner

Most AI agent frameworks need a server, a headless browser, and a whole automation stack just to click a button on a web page. The page itself has no say in the process.

PageAgent takes a different approach. It's a JavaScript library that runs directly in your page. Add it, and users can give natural language commands — the AI reads the live DOM, understands the UI, and acts. No server, no external process, no automation stack.

This means your web app isn't being automated — it's doing the automating. You control what the AI sees, how it behaves, which LLM powers it. The intelligence lives in your page, not on someone else's server.

Star PageAgent on GitHub — MIT licensed, open source, 600+ stars.

Zero Infrastructure

Zero Infrastructure

For npm projects, the programmatic API is just as clean:

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-5.1',
  baseURL: 'https://api.openai.com/v1',
  apiKey: YOUR_KEY,
})

await agent.execute('Fill the expense report for last Friday')
Enter fullscreen mode Exit fullscreen mode

No screenshots, no OCR, no vision models. PageAgent works with text-based DOM — fast and lightweight. See the integration docs for all setup options.

Bring Your Own LLM

OpenAI, Claude, DeepSeek, Qwen, Gemini, Grok — or fully offline via Ollama. PageAgent has no backend and calls no external service. Data flows directly from the page to whichever LLM you configure. MIT-licensed, fully auditable.

Going Cross-Page

PageAgent runs inside your web page — ideal for SPAs where the agent has full context of the app state.

But some tasks span multiple pages. An optional browser extension adds multi-tab awareness for those cases. It's a power-up, not a dependency.

Extension Bridge

What's different here: your page drives the browser, not the other way around.

const result = await window.PAGE_AGENT_EXT.execute(
  'Compare the top 3 results for "wireless keyboard" on Amazon',
  {
    baseURL: 'https://api.openai.com/v1',
    apiKey: YOUR_KEY,
    model: 'gpt-5.1',
    onStatusChange: (status) => updateUI(status),
  }
)
Enter fullscreen mode Exit fullscreen mode

Your page initiates tasks, controls the LLM, and receives real-time callbacks. Access requires explicit user authorization via token.

Here's the key: because PageAgent runs in the user's real browser, it operates within their authenticated sessions. No credential sharing, no cookie management, no server-side login flows. The user is already logged in — the agent just acts.

This unlocks scenarios that server-side agents can't touch:

  • A procurement tool that reorders supplies from the company's supplier portal — the user is logged in, the agent navigates the ordering flow directly
  • Books travel through the user's corporate booking system — operating the actual booking flow, not crawling public fares
  • A project tracker that creates tasks in the team's project board — no API integration, the agent uses the same UI the user does

Who Is This For?

SaaS developers — ship an AI copilot without rewriting the backend.

Enterprise teams — let users describe what they want in plain language instead of navigating 20-click workflows in ERP, CRM, and admin systems.

AI builders — use @page-agent/core as a tool inside your existing agent, or plug it behind a customer service bot so it operates the UI instead of just giving instructions.

Modular and Extensible

Architecture

Use the full package for a turnkey solution, import the headless core for a custom UI, or use individual packages (DOM controller, LLM client, UI panel) à la carte. Custom tools, lifecycle hooks, prompt customization, and data masking are all built in.

Get Started

⭐ Star on GitHub — and help us grow.

Try the live demo — no sign-up needed. Or drag the bookmarklet to try it on any site.

Read the docs — CDN, npm, and programmatic setup guides.

Install the extension — for multi-page tasks.


PageAgent is open source under the MIT license. The free testing API on the demo site is for evaluation only — for production use, bring your own LLM API key. Terms of Use

Top comments (0)