Most AI agent frameworks need a server, a headless browser, and a whole automation stack just to click a button on a web page. The page itself has no say in the process.
PageAgent takes a different approach. It's a JavaScript library that runs directly in your page. Add it, and users can give natural language commands — the AI reads the live DOM, understands the UI, and acts. No server, no external process, no automation stack.
This means your web app isn't being automated — it's doing the automating. You control what the AI sees, how it behaves, which LLM powers it. The intelligence lives in your page, not on someone else's server.
⭐ Star PageAgent on GitHub — MIT licensed, open source, 600+ stars.
Zero Infrastructure
For npm projects, the programmatic API is just as clean:
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'gpt-5.1',
baseURL: 'https://api.openai.com/v1',
apiKey: YOUR_KEY,
})
await agent.execute('Fill the expense report for last Friday')
No screenshots, no OCR, no vision models. PageAgent works with text-based DOM — fast and lightweight. See the integration docs for all setup options.
Bring Your Own LLM
OpenAI, Claude, DeepSeek, Qwen, Gemini, Grok — or fully offline via Ollama. PageAgent has no backend and calls no external service. Data flows directly from the page to whichever LLM you configure. MIT-licensed, fully auditable.
Going Cross-Page
PageAgent runs inside your web page — ideal for SPAs where the agent has full context of the app state.
But some tasks span multiple pages. An optional browser extension adds multi-tab awareness for those cases. It's a power-up, not a dependency.
What's different here: your page drives the browser, not the other way around.
const result = await window.PAGE_AGENT_EXT.execute(
'Compare the top 3 results for "wireless keyboard" on Amazon',
{
baseURL: 'https://api.openai.com/v1',
apiKey: YOUR_KEY,
model: 'gpt-5.1',
onStatusChange: (status) => updateUI(status),
}
)
Your page initiates tasks, controls the LLM, and receives real-time callbacks. Access requires explicit user authorization via token.
Here's the key: because PageAgent runs in the user's real browser, it operates within their authenticated sessions. No credential sharing, no cookie management, no server-side login flows. The user is already logged in — the agent just acts.
This unlocks scenarios that server-side agents can't touch:
- A procurement tool that reorders supplies from the company's supplier portal — the user is logged in, the agent navigates the ordering flow directly
- Books travel through the user's corporate booking system — operating the actual booking flow, not crawling public fares
- A project tracker that creates tasks in the team's project board — no API integration, the agent uses the same UI the user does
Who Is This For?
SaaS developers — ship an AI copilot without rewriting the backend.
Enterprise teams — let users describe what they want in plain language instead of navigating 20-click workflows in ERP, CRM, and admin systems.
AI builders — use @page-agent/core as a tool inside your existing agent, or plug it behind a customer service bot so it operates the UI instead of just giving instructions.
Modular and Extensible
Use the full package for a turnkey solution, import the headless core for a custom UI, or use individual packages (DOM controller, LLM client, UI panel) à la carte. Custom tools, lifecycle hooks, prompt customization, and data masking are all built in.
Get Started
⭐ Star on GitHub — and help us grow.
Try the live demo — no sign-up needed. Or drag the bookmarklet to try it on any site.
Read the docs — CDN, npm, and programmatic setup guides.
Install the extension — for multi-page tasks.
PageAgent is open source under the MIT license. The free testing API on the demo site is for evaluation only — for production use, bring your own LLM API key. Terms of Use




Top comments (0)