Implementing Automated Web Interaction: Deep Technical Analysis of the BlackEagle Plugin's WAOP
Author: AndersLiu
Preface
This article provides an in-depth analysis of the core technology behind the BlackEagle browser plugin: a powerful and robust Web Automation Operation Protocol (WAOP). We'll start from the macro architectural design, then dive into each module's implementation details, and finally reveal how it realizes an "interpretable, ordered sequence of automation steps" in the browser to drive complex web interactions.
Macro Architecture: A Multi-layer Collaborative Automation System
My goal is to build an automation system that can accurately simulate user behavior. It should not only perform basic actions like clicks and inputs, but also understand page context and interact intelligently with users (or AI). To that end I designed a layered collaborative architecture:
- Presentation Layer (UI - Sidebar)
- Responsibility: Serves as the main interface between users and the automation system. Here I present structured information extracted from the webpage and accept user commands.
-
Core Implementation:
webcontentprocess.tsis responsible for turning raw webpage HTML and text into concise summaries and interactive element lists, providing high-quality context for decision-making (by either a user or an LLM).
- Orchestration Layer (Background / Service Worker)
-
Responsibility: Acts as the system brain: it manages open tabs, maintains snapshots of each page's content (
webcontent), and relays commands and data between the sidebar, content scripts, and external tools. -
Core Implementation:
background.jslistens for browser events (tab switches, URL changes) and requests up-to-date page information from content scripts as needed. It also initializes a task module (TaskModule) to process asynchronous tasks from various sources.
- Execution Layer (Content Script)
- Responsibility: Executes operations directly in the target page's context. This is where automation commands are actually performed.
-
Core Implementation:
-
content.js: Injects the execution engine into the page, listens for commands frombackground.js, and monitors page changes (URL updates, DOM mutations) to support single-page applications (SPAs). -
execute.js: The core execution engine of WAOP (Web Automation Operation Protocol). It interprets each protocol step (likeclick,input,scroll) and converts them into realistic user behavior simulations.
-
- Capabilities Layer (Tools)
- Responsibility: Provides standardized, callable atomic capabilities such as tab operations. This design allows the core logic to be extended like "function calls."
-
Core Implementation:
browserTabsTool.tsexports a standard tool definition with operations likeopen_tabandclose_tab. Whenchrome.tabsAPIs are unavailable in a given context, the tool gracefully falls back toruntime.sendMessagesobackground.jscan perform the operation with higher privileges.
WAOP: The Language of Automation
WAOP (Web Automation Operation Protocol) is the core protocol we designed. It decomposes complex automation tasks into a clear, serializable sequence of steps.
A typical WAOP task looks like this:
{
"protocol": "WAOP",
"steps": [
{ "type": "input", "selector": "#search-box", "value": "BlackEagle AI" },
{ "type": "press", "key": "Enter" },
{
"type": "waitForElement",
"selector": ".search-results",
"timeout": 5000
},
{ "type": "highlightText", "value": "Official Website" }
]
}
Design Highlights
-
Declarative Steps: Each step is an instruction object with a
typeand all data needed to execute (e.g.,selector,value,timeout). This makes tasks easy to read, build, and debug. -
Robust Selector Strategy: The
waitForElementfunction inexecute.jsimplements intelligent lookup logic. It supports CSS selectors and also element lookup by visible text (for elements like<a>and<button>), improving stability on modern front-ends that generate dynamic classes. -
Timeouts & Error Handling: Each step can specify a
timeout. The engine captures per-step execution errors, enabling optional steps or graceful task abortion. -
Rich Operation Types: Beyond
clickandinput, WAOP supportsscroll,wait,assert,highlightText,press, and more to cover most web interaction scenarios.
Implementation Details: Inspecting the Code
1. Page Context Awareness & SPA Support (content.js)
To make automation run smoothly in single-page applications, we must detect page changes even when the URL doesn't fully reload.
-
Routing Change Detection: We monkey-patch
history.pushStateandhistory.replaceStateto intercept route changes caused by the History API. We also listen topopstateandhashchangeevents. -
DOM Mutation Monitoring: Use
MutationObserverto watch the document's DOM tree. When components load or the DOM updates, the observer triggers content refresh logic. -
Debounce: To avoid flooding
background.jswith messages during intensive DOM updates, we apply a 600ms debounce so refresh logic only runs after DOM changes settle.
2. Core Execution Engine (execute.js)
This module translates WAOP into actual operations.
-
Event Simulation: We go beyond
element.click()for realistic interactions. For typing and key events, we synthesizekeydown,keypress, andkeyupvianew KeyboardEvent(...)to preserve compatibility with frameworks that rely on full event sequences. -
Handling Rich Text Editors &
iframes: Finding the true editable surface is challenging.resolveEditableTargetimplements robust logic:- Check if the target is an
iframe; if so, enter itsdocumentto find editable elements. - Check
contentEditableattributes. - Handle editor implementations that hide a
textareaand expose aniframeordiv[contenteditable]as the editing surface.
- Check if the target is an
-
Precise Text Highlighting: The
highlightTextfeature uses aTreeWalkerto traverse text nodes under a target, locate matching text, and create aRange. The range is wrapped in a highlight<span>and smoothly scrolled to the center of the viewport. A CSS animation performs a brief highlight then removes it to avoid permanently changing the page.
3. Background Orchestration & State Management (background.js)
background.js is the plugin's traffic controller.
-
Page Snapshot (
webcontent) Management:background.jskeeps awebcontentcache holding the active tab's latest snapshot. When a tab changes or the sidebar opens,refreshWebcontentis invoked. -
Graceful Loading State:
refreshWebcontentfirst sends an older snapshot withloading: trueto the sidebar so the UI can show a loading indicator immediately instead of stale or empty data. When updated content arrives, it updates again. -
API Fallback Strategy: As shown in
browserTabsTool.ts, if a module (e.g., sidebar JS) can't callchrome.*APIs directly, it sends achrome.runtime.sendMessagetobackground.js, which executes the request with the required privileges.
4. Context Synthesis & Summarization (webcontentprocess.ts)
Raw page content is too noisy for AI or users. WebContentProcess refines it.
-
Change Detection: To avoid unnecessary processing and spamming data,
hasContentChangedcomputes a signature for page content. Only when the signature or URL changes do we regenerate the summary. -
Structured Summary:
extractStructuralSummaryparses HTML and extracts key interactive elements (<a>,<button>,<input>,h1-h6, etc.), producing a compact representation such as:
<input> [class=search-input, id=q, type=text, placeholder=Search...]
<button> [class=btn.btn-primary] text="Confirm"
This high-quality context helps LLMs generate precise selectors or make task decisions.
Engineering Trade-offs & Practices
-
Modularity & Extensibility: Encapsulate capabilities (e.g.,
browserTabsTool) as independent tools that export definitions. This makes future integration with LLM Function Calling or plugin ecosystems straightforward. - Resilience: Timeouts, retries, and optional steps are first-class citizens. Assume networks delay, page structures change, and selectors fail; handle these gracefully to improve automation success rates.
-
Security Considerations:
-
Least Privilege: Request permissions only when necessary, and use mechanisms like
CHECK_ACTIVE_TABto ensure automation runs only on the user's active page. -
Cross-origin
iframes: Wrap access attempts intry...catch. If cross-origin policies block access, log a warning and continue rather than crashing. -
Error Guards: Use
chrome.runtime.lastErrorwidely to check async API results and avoid uncaught exceptions that could crash the Service Worker.
-
Least Privilege: Request permissions only when necessary, and use mechanisms like
Conclusion
BlackEagle's web automation is not the result of a single technique but a synthesis of protocol design, DOM manipulation, event simulation, state management, and fault-tolerant strategies. Through a layered architecture and a carefully designed WAOP, we deliver a powerful yet relatively reliable automation system. I hope this analysis helps you when building similar browser automation tools.
Top comments (0)