Simon

Posted on Feb 27 • Edited on Mar 4

The GUI Agent Living in Your Web Page

#webdev #javascript #ai #opensource

Most AI agent frameworks need a server, a headless browser, and a whole automation stack just to click a button on a web page. The page itself has no say in the process.

PageAgent takes a different approach. It's a JavaScript library that runs directly in your page. Add it, and users can give natural language commands — the AI reads the live DOM, understands the UI, and acts. No server, no external process, no automation stack.

This means your web app isn't being automated — it's doing the automating. You control what the AI sees, how it behaves, which LLM powers it. The intelligence lives in your page, not on someone else's server.

⭐ Star PageAgent on GitHub — MIT licensed, open source, 600+ stars.

Zero Infrastructure

For npm projects, the programmatic API is just as clean:

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-5.1',
  baseURL: 'https://api.openai.com/v1',
  apiKey: YOUR_KEY,
})

await agent.execute('Fill the expense report for last Friday')

No screenshots, no OCR, no vision models. PageAgent works with text-based DOM — fast and lightweight. See the integration docs for all setup options.

Bring Your Own LLM

OpenAI, Claude, DeepSeek, Qwen, Gemini, Grok — or fully offline via Ollama. PageAgent has no backend and calls no external service. Data flows directly from the page to whichever LLM you configure. MIT-licensed, fully auditable.

Going Cross-Page

PageAgent runs inside your web page — ideal for SPAs where the agent has full context of the app state.

But some tasks span multiple pages. An optional browser extension adds multi-tab awareness for those cases. It's a power-up, not a dependency.

What's different here: your page drives the browser, not the other way around.

const result = await window.PAGE_AGENT_EXT.execute(
  'Compare the top 3 results for "wireless keyboard" on Amazon',
  {
    baseURL: 'https://api.openai.com/v1',
    apiKey: YOUR_KEY,
    model: 'gpt-5.1',
    onStatusChange: (status) => updateUI(status),
  }
)

Your page initiates tasks, controls the LLM, and receives real-time callbacks. Access requires explicit user authorization via token.

Here's the key: because PageAgent runs in the user's real browser, it operates within their authenticated sessions. No credential sharing, no cookie management, no server-side login flows. The user is already logged in — the agent just acts.

This unlocks scenarios that server-side agents can't touch:

A procurement tool that reorders supplies from the company's supplier portal — the user is logged in, the agent navigates the ordering flow directly
Books travel through the user's corporate booking system — operating the actual booking flow, not crawling public fares
A project tracker that creates tasks in the team's project board — no API integration, the agent uses the same UI the user does

Who Is This For?

SaaS developers — ship an AI copilot without rewriting the backend.

Enterprise teams — let users describe what they want in plain language instead of navigating 20-click workflows in ERP, CRM, and admin systems.

AI builders — use @page-agent/core as a tool inside your existing agent, or plug it behind a customer service bot so it operates the UI instead of just giving instructions.

Modular and Extensible

Use the full package for a turnkey solution, import the headless core for a custom UI, or use individual packages (DOM controller, LLM client, UI panel) à la carte. Custom tools, lifecycle hooks, prompt customization, and data masking are all built in.

Get Started

⭐ Star on GitHub — and help us grow.

Try the live demo — no sign-up needed. Or drag the bookmarklet to try it on any site.

Read the docs — CDN, npm, and programmatic setup guides.

Install the extension — for multi-page tasks.

PageAgent is open source under the MIT license. The free testing API on the demo site is for evaluation only — for production use, bring your own LLM API key. Terms of Use

DEV Community