<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Simon</title>
    <description>The latest articles on DEV Community by Simon (@simon_luv_pho).</description>
    <link>https://dev.to/simon_luv_pho</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3796972%2F8a59b065-4b59-4aee-80f9-e5dd7c3230f7.jpeg</url>
      <title>DEV Community: Simon</title>
      <link>https://dev.to/simon_luv_pho</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/simon_luv_pho"/>
    <language>en</language>
    <item>
      <title>The GUI Agent Living in Your Web Page</title>
      <dc:creator>Simon</dc:creator>
      <pubDate>Fri, 27 Feb 2026 19:23:56 +0000</pubDate>
      <link>https://dev.to/simon_luv_pho/pageagent-the-gui-agent-living-in-your-web-page-1cda</link>
      <guid>https://dev.to/simon_luv_pho/pageagent-the-gui-agent-living-in-your-web-page-1cda</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2imw1ocwuyubvfrkuqu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2imw1ocwuyubvfrkuqu.png" alt="Hero Banner" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most AI agent frameworks need a server, a headless browser, and a whole automation stack just to click a button on a web page. The page itself has no say in the process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/alibaba/page-agent" rel="noopener noreferrer"&gt;&lt;strong&gt;PageAgent&lt;/strong&gt;&lt;/a&gt; takes a different approach. It's a JavaScript library that runs directly in your page. Add it, and users can give natural language commands — the AI reads the live DOM, understands the UI, and acts. No server, no external process, no automation stack.&lt;/p&gt;

&lt;p&gt;This means your web app isn't being automated — it's &lt;em&gt;doing&lt;/em&gt; the automating. You control what the AI sees, how it behaves, which LLM powers it. The intelligence lives in your page, not on someone else's server.&lt;/p&gt;

&lt;p&gt;⭐ &lt;a href="https://github.com/alibaba/page-agent" rel="noopener noreferrer"&gt;&lt;strong&gt;Star PageAgent on GitHub&lt;/strong&gt;&lt;/a&gt; — MIT licensed, open source, 600+ stars.&lt;/p&gt;

&lt;h2&gt;
  
  
  Zero Infrastructure
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r4i2dwl8h626g7ik201.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r4i2dwl8h626g7ik201.png" alt="Zero Infrastructure" width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For npm projects, the programmatic API is just as clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PageAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;page-agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PageAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-5.1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.openai.com/v1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;YOUR_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Fill the expense report for last Friday&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No screenshots, no OCR, no vision models. PageAgent works with text-based DOM — fast and lightweight. See the &lt;a href="https://alibaba.github.io/page-agent/docs/introduction/quick-start" rel="noopener noreferrer"&gt;integration docs&lt;/a&gt; for all setup options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bring Your Own LLM
&lt;/h2&gt;

&lt;p&gt;OpenAI, Claude, DeepSeek, Qwen, Gemini, Grok — or fully offline via Ollama. PageAgent has no backend and calls no external service. Data flows directly from the page to whichever LLM you configure. &lt;a href="https://github.com/alibaba/page-agent" rel="noopener noreferrer"&gt;MIT-licensed, fully auditable.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Going Cross-Page
&lt;/h2&gt;

&lt;p&gt;PageAgent runs inside your web page — ideal for SPAs where the agent has full context of the app state.&lt;/p&gt;

&lt;p&gt;But some tasks span multiple pages. An &lt;a href="https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj" rel="noopener noreferrer"&gt;optional browser extension&lt;/a&gt; adds multi-tab awareness for those cases. It's a power-up, not a dependency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz651vweoz8szayrkdls1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz651vweoz8szayrkdls1.png" alt="Extension Bridge" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What's different here: &lt;strong&gt;your page drives the browser&lt;/strong&gt;, not the other way around.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PAGE_AGENT_EXT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Compare the top 3 results for "wireless keyboard" on Amazon&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.openai.com/v1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;YOUR_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-5.1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;onStatusChange&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;updateUI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your page initiates tasks, controls the LLM, and receives real-time callbacks. Access requires explicit user authorization via token.&lt;/p&gt;

&lt;p&gt;Here's the key: because PageAgent runs in the user's real browser, it operates within their authenticated sessions. No credential sharing, no cookie management, no server-side login flows. The user is already logged in — the agent just acts.&lt;/p&gt;

&lt;p&gt;This unlocks scenarios that server-side agents can't touch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A procurement tool that reorders supplies from the company's supplier portal&lt;/strong&gt; — the user is logged in, the agent navigates the ordering flow directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Books travel through the user's corporate booking system&lt;/strong&gt; — operating the actual booking flow, not crawling public fares&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A project tracker that creates tasks in the team's project board&lt;/strong&gt; — no API integration, the agent uses the same UI the user does&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who Is This For?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SaaS developers&lt;/strong&gt; — ship an AI copilot without rewriting the backend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise teams&lt;/strong&gt; — let users describe what they want in plain language instead of navigating 20-click workflows in ERP, CRM, and admin systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI builders&lt;/strong&gt; — use &lt;code&gt;@page-agent/core&lt;/code&gt; as a tool inside your existing agent, or plug it behind a customer service bot so it operates the UI instead of just giving instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modular and Extensible
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyha8vdnor4p9gg0fz7e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyha8vdnor4p9gg0fz7e.png" alt="Architecture" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the full package for a turnkey solution, import the headless core for a custom UI, or use individual packages (DOM controller, LLM client, UI panel) à la carte. Custom tools, lifecycle hooks, prompt customization, and data masking are all built in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/alibaba/page-agent" rel="noopener noreferrer"&gt;⭐ Star on GitHub&lt;/a&gt;&lt;/strong&gt; — and help us grow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://alibaba.github.io/page-agent/" rel="noopener noreferrer"&gt;Try the live demo&lt;/a&gt;&lt;/strong&gt; — no sign-up needed. Or drag the bookmarklet to try it on any site.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://alibaba.github.io/page-agent/docs/introduction/quick-start" rel="noopener noreferrer"&gt;Read the docs&lt;/a&gt;&lt;/strong&gt; — CDN, npm, and programmatic setup guides.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj" rel="noopener noreferrer"&gt;Install the extension&lt;/a&gt;&lt;/strong&gt; — for multi-page tasks.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;PageAgent is open source under the MIT license. The free testing API on the demo site is for evaluation only — for production use, bring your own LLM API key. &lt;a href="https://github.com/alibaba/page-agent/blob/main/docs/terms-and-privacy.md" rel="noopener noreferrer"&gt;Terms of Use&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
