DEV Community

andersliuyang
andersliuyang

Posted on

# Implementing Automated Web Interaction: Deep Technical Analysis of the BlackEagle Plugin's WAOP

Implementing Automated Web Interaction: Deep Technical Analysis of the BlackEagle Plugin's WAOP

Author: AndersLiu

Preface
This article provides an in-depth analysis of the core technology behind the BlackEagle browser plugin: a powerful and robust Web Automation Operation Protocol (WAOP). We'll start from the macro architectural design, then dive into each module's implementation details, and finally reveal how it realizes an "interpretable, ordered sequence of automation steps" in the browser to drive complex web interactions.

Macro Architecture: A Multi-layer Collaborative Automation System

My goal is to build an automation system that can accurately simulate user behavior. It should not only perform basic actions like clicks and inputs, but also understand page context and interact intelligently with users (or AI). To that end I designed a layered collaborative architecture:

  1. Presentation Layer (UI - Sidebar)
  • Responsibility: Serves as the main interface between users and the automation system. Here I present structured information extracted from the webpage and accept user commands.
  • Core Implementation: webcontentprocess.ts is responsible for turning raw webpage HTML and text into concise summaries and interactive element lists, providing high-quality context for decision-making (by either a user or an LLM).
  1. Orchestration Layer (Background / Service Worker)
  • Responsibility: Acts as the system brain: it manages open tabs, maintains snapshots of each page's content (webcontent), and relays commands and data between the sidebar, content scripts, and external tools.
  • Core Implementation: background.js listens for browser events (tab switches, URL changes) and requests up-to-date page information from content scripts as needed. It also initializes a task module (TaskModule) to process asynchronous tasks from various sources.
  1. Execution Layer (Content Script)
  • Responsibility: Executes operations directly in the target page's context. This is where automation commands are actually performed.
  • Core Implementation:
    • content.js: Injects the execution engine into the page, listens for commands from background.js, and monitors page changes (URL updates, DOM mutations) to support single-page applications (SPAs).
    • execute.js: The core execution engine of WAOP (Web Automation Operation Protocol). It interprets each protocol step (like click, input, scroll) and converts them into realistic user behavior simulations.
  1. Capabilities Layer (Tools)
  • Responsibility: Provides standardized, callable atomic capabilities such as tab operations. This design allows the core logic to be extended like "function calls."
  • Core Implementation: browserTabsTool.ts exports a standard tool definition with operations like open_tab and close_tab. When chrome.tabs APIs are unavailable in a given context, the tool gracefully falls back to runtime.sendMessage so background.js can perform the operation with higher privileges.

WAOP: The Language of Automation

WAOP (Web Automation Operation Protocol) is the core protocol we designed. It decomposes complex automation tasks into a clear, serializable sequence of steps.

A typical WAOP task looks like this:

{
  "protocol": "WAOP",
  "steps": [
    { "type": "input", "selector": "#search-box", "value": "BlackEagle AI" },
    { "type": "press", "key": "Enter" },
    {
      "type": "waitForElement",
      "selector": ".search-results",
      "timeout": 5000
    },
    { "type": "highlightText", "value": "Official Website" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Design Highlights

  • Declarative Steps: Each step is an instruction object with a type and all data needed to execute (e.g., selector, value, timeout). This makes tasks easy to read, build, and debug.
  • Robust Selector Strategy: The waitForElement function in execute.js implements intelligent lookup logic. It supports CSS selectors and also element lookup by visible text (for elements like <a> and <button>), improving stability on modern front-ends that generate dynamic classes.
  • Timeouts & Error Handling: Each step can specify a timeout. The engine captures per-step execution errors, enabling optional steps or graceful task abortion.
  • Rich Operation Types: Beyond click and input, WAOP supports scroll, wait, assert, highlightText, press, and more to cover most web interaction scenarios.

Implementation Details: Inspecting the Code

1. Page Context Awareness & SPA Support (content.js)

To make automation run smoothly in single-page applications, we must detect page changes even when the URL doesn't fully reload.

  • Routing Change Detection: We monkey-patch history.pushState and history.replaceState to intercept route changes caused by the History API. We also listen to popstate and hashchange events.
  • DOM Mutation Monitoring: Use MutationObserver to watch the document's DOM tree. When components load or the DOM updates, the observer triggers content refresh logic.
  • Debounce: To avoid flooding background.js with messages during intensive DOM updates, we apply a 600ms debounce so refresh logic only runs after DOM changes settle.

2. Core Execution Engine (execute.js)

This module translates WAOP into actual operations.

  • Event Simulation: We go beyond element.click() for realistic interactions. For typing and key events, we synthesize keydown, keypress, and keyup via new KeyboardEvent(...) to preserve compatibility with frameworks that rely on full event sequences.
  • Handling Rich Text Editors & iframes: Finding the true editable surface is challenging. resolveEditableTarget implements robust logic:
    1. Check if the target is an iframe; if so, enter its document to find editable elements.
    2. Check contentEditable attributes.
    3. Handle editor implementations that hide a textarea and expose an iframe or div[contenteditable] as the editing surface.
  • Precise Text Highlighting: The highlightText feature uses a TreeWalker to traverse text nodes under a target, locate matching text, and create a Range. The range is wrapped in a highlight <span> and smoothly scrolled to the center of the viewport. A CSS animation performs a brief highlight then removes it to avoid permanently changing the page.

3. Background Orchestration & State Management (background.js)

background.js is the plugin's traffic controller.

  • Page Snapshot (webcontent) Management: background.js keeps a webcontent cache holding the active tab's latest snapshot. When a tab changes or the sidebar opens, refreshWebcontent is invoked.
  • Graceful Loading State: refreshWebcontent first sends an older snapshot with loading: true to the sidebar so the UI can show a loading indicator immediately instead of stale or empty data. When updated content arrives, it updates again.
  • API Fallback Strategy: As shown in browserTabsTool.ts, if a module (e.g., sidebar JS) can't call chrome.* APIs directly, it sends a chrome.runtime.sendMessage to background.js, which executes the request with the required privileges.

4. Context Synthesis & Summarization (webcontentprocess.ts)

Raw page content is too noisy for AI or users. WebContentProcess refines it.

  • Change Detection: To avoid unnecessary processing and spamming data, hasContentChanged computes a signature for page content. Only when the signature or URL changes do we regenerate the summary.
  • Structured Summary: extractStructuralSummary parses HTML and extracts key interactive elements (<a>, <button>, <input>, h1-h6, etc.), producing a compact representation such as:
  <input> [class=search-input, id=q, type=text, placeholder=Search...]
  <button> [class=btn.btn-primary] text="Confirm"
Enter fullscreen mode Exit fullscreen mode

This high-quality context helps LLMs generate precise selectors or make task decisions.


Engineering Trade-offs & Practices

  • Modularity & Extensibility: Encapsulate capabilities (e.g., browserTabsTool) as independent tools that export definitions. This makes future integration with LLM Function Calling or plugin ecosystems straightforward.
  • Resilience: Timeouts, retries, and optional steps are first-class citizens. Assume networks delay, page structures change, and selectors fail; handle these gracefully to improve automation success rates.
  • Security Considerations:
    • Least Privilege: Request permissions only when necessary, and use mechanisms like CHECK_ACTIVE_TAB to ensure automation runs only on the user's active page.
    • Cross-origin iframes: Wrap access attempts in try...catch. If cross-origin policies block access, log a warning and continue rather than crashing.
    • Error Guards: Use chrome.runtime.lastError widely to check async API results and avoid uncaught exceptions that could crash the Service Worker.

Conclusion

BlackEagle's web automation is not the result of a single technique but a synthesis of protocol design, DOM manipulation, event simulation, state management, and fault-tolerant strategies. Through a layered architecture and a carefully designed WAOP, we deliver a powerful yet relatively reliable automation system. I hope this analysis helps you when building similar browser automation tools.

My GitHub:https://github.com/andersliuyang/BlackEagleAI。

Top comments (0)