Johnny

Posted on Dec 10

Why Verdex Uses CDP Directly

#playwright #mcp #testing #ai

This is probably the most common question I get from engineers about Verdex: "Why use CDP instead of building on top of, or simply extending Playwright?" It's a completely fair question—Playwright is exceptional, widely adopted, and has CDP access through newCDPSession(). So let me walk through the technical reasoning.

The short answer: Verdex is a development-time authoring tool that needs deep, specific control over DOM inspection and JavaScript execution contexts. Playwright is an execution-time test runner optimized for cross-browser reliability. These are fundamentally different use cases with different architectural requirements—and understanding why reveals something interesting about when abstractions help vs. when they create friction.

A Note on Inspiration

Before diving in, I want to be clear: Verdex owes a significant debt to Playwright's design. The accessibility tree implementation, the approach to isolated worlds, the careful attention to element lifecycle—these are all areas where I studied Playwright's codebase extensively and drew inspiration.

Verdex aims for parity with Playwright's level of sophistication, particularly around W3C ARIA-compliant accessibility tree generation, robust handling of frame lifecycles and navigation, and isolated execution contexts. Where Verdex diverges isn't in capability but in architecture: it adds structural exploration primitives (get_ancestors, get_siblings, get_descendants) for authoring-time selector construction rather than execution-time test reliability.

The question isn't "what can Verdex do that Playwright can't?" It's "given these different goals, what foundation makes the most sense?"

Execution-Time vs Authoring-Time: Different Problems, Different Solutions

Here's the core distinction:

Execution-Time (Playwright)	Authoring-Time (Verdex)
Re-resolve selectors on every action	Maintain stable references across multiple queries
Prevent stale element bugs	Enable multi-step DOM exploration
Cross-browser uniformity	Analysis depth
Ephemeral element handles	Persistent element mappings during sessions

Playwright's Locator Philosophy

Playwright's Locators re-resolve on every action—a brilliant defense against stale element bugs that plagued Selenium. From their documentation:

"Every time a locator is used for an action, an up-to-date DOM element is located in the page."

For test execution, this is exactly right.

For authoring-time analysis, where I need to call get_ancestors(e3), then get_siblings(e3), then get_descendants(e3) on the same element, this creates friction. Here's what Verdex needs:

// Sequential queries on the same element during authoring
get_ancestors("e3")     // Walk up from this specific element
get_siblings("e3", 2)   // Examine the SAME element's siblings  
get_descendants("e3")   // Explore the SAME element's children

With Locators, you're re-querying the DOM each time—potentially getting different elements if the page changed. With ElementHandles (Playwright's persistent option), the framework actively discourages their use and auto-disposes them after actions to enforce re-resolution. This is core philosophy, not a technical omission.

Different problems, different solutions.

"But Wait—What About Stale Elements?"

If you're familiar with Selenium's notorious StaleElementReferenceException, you might be wondering: doesn't Verdex's persistent reference approach risk the same problems?

No—and understanding why reveals an important architectural insight.

Selenium's problem occurred at runtime:

element = driver.find_element(By.ID, "submit")
# ... page re-renders during test execution ...
element.click()  # ❌ Stale!

Verdex's persistent references exist only during authoring:

// Authoring session (stable snapshot)
resolve_container("e3")    // Walk up from element
get_siblings("e3", 2)      // Check siblings  
extract_anchors("e3", 1)   // Find unique content

// Output: Pure Playwright code (auto-resolving)
getByTestId("product-card")
  .filter({ hasText: "iPhone 15 Pro" })
  .getByRole("button", { name: "Add to Cart" })

The key differences:

Different lifecycle phase: Verdex refs exist during static analysis of a snapshot, not during dynamic test execution
No page interactions: You're exploring structure, not triggering re-renders
Safe output: The final code uses Playwright's auto-resolving Locators

Verdex gets the best of both worlds: persistent references enable multi-step structural exploration during authoring, while the generated test code uses Playwright's battle-tested auto-resolution during execution.

The CDP Layer: Where Architecture Choices Matter

Both Playwright and Verdex build on CDP. The difference is in how they use it.

Creating an isolated world looks nearly identical:

// Playwright with CDP
const client = await page.context().newCDPSession(page);
await client.send('Page.createIsolatedWorld', {
  frameId: mainFrameId,
  worldName: 'verdex-bridge',
  grantUniveralAccess: true
});

// Puppeteer with CDP  
const client = await page.createCDPSession();
const { executionContextId } = await client.send('Page.createIsolatedWorld', {
  frameId: mainFrameId,
  worldName: 'verdex-bridge',
  grantUniveralAccess: true
});

The complexity emerges when working with elements across multiple operations.

The Impedance Mismatch

Verdex maintains a persistent Map<string, ElementInfo> that tracks DOM nodes across analysis operations. With Playwright + CDP, you're bridging two object models:

Playwright's auto-managed ElementHandles exist in its utility world
You need manually-managed CDP objectIds in your isolated world
Converting between them requires extra evaluation calls and context switching

You'd end up using Playwright's CDP access to bypass Playwright's abstractions entirely, while carrying the overhead of those abstractions in your bundle.

Why Puppeteer Fits Naturally

With Puppeteer, everything operates at the same abstraction level—CDP primitives throughout:

const session = await page.target().createCDPSession();

// Create your isolated world
const { executionContextId } = await session.send('Page.createIsolatedWorld', {...});

// Inject your bridge code
await session.send('Runtime.evaluate', {
  expression: bridgeCode,
  contextId: executionContextId
});

// Use objectId for stable, multi-step operations
const { result } = await session.send('Runtime.evaluate', {
  expression: 'document.querySelector("[data-testid=product]")',
  contextId: executionContextId
});

// Same objectId, multiple operations
const analysis = await session.send('Runtime.callFunctionOn', {
  objectId: result.objectId,  // Stable until you release it
  functionDeclaration: 'function() { return window.verdexBridge.fullAnalysis(this); }',
  executionContextId,
  returnByValue: true
});

// Later: structural exploration using the same reference
const { result: ancestors } = await session.send('Runtime.callFunctionOn', {
  functionDeclaration: 'function() { return window.verdexBridge.get_ancestors(this); }',
  objectId: result.objectId,  // Same stable reference
  executionContextId,
  returnByValue: true
});

No impedance mismatch, no fighting auto-disposal, no bridging between object models. The puppeteer package is ~2MB and CDP-native, while playwright-core is ~10MB with cross-browser abstractions that Verdex never uses.

Isolated Worlds: Different Purposes, Different Implementations

Both Playwright and Verdex use isolated execution contexts, but for different reasons.

Playwright's approach: Test scripts run in the main world by default (selector engines use utility worlds). This lets tests interact with the page's actual JavaScript context—including any framework quirks or monkey-patched APIs. If your application overrides Element.prototype.getAttribute(), Playwright sees that. This is correct: tests should validate user-facing behavior, warts and all.

Verdex's approach: Analysis code runs entirely in an isolated world. When Verdex traverses the DOM to discover that <div data-testid="product-card"> exists, it uses native, unmodified DOM APIs. No monkey-patches, no framework hooks, no interference.

This separation strengthens both tools: Verdex discovers clean structural facts during authoring (data-testid="product-card", text content, hierarchy). Playwright executes the generated selector during testing with its battle-tested engine that handles any monkey-patching robustly.

Each tool operates in the execution context that best serves its purpose.

Multi-Role Browser Contexts

Here's where direct CDP access proved especially valuable: multi-role browser contexts.

I was working on a marketplace application with three distinct roles: admin, provider, and customer. The CTO wanted tests that maintained flows across roles—just one example: the provider adds a product with a discount, the customer loads the product, the provider changes details, those changes reflect in the customer session, then the customer checks out.

The Playwright MCP Approach:

Playwright MCP's architecture creates an interesting challenge for multi-role testing. Since each MCP server instance manages a single browser context, you need to run multiple server instances—one per role—and orchestrate them in your conversation:

You: "Using playwright-provider, add iPhone 15 Pro at $999"
[Response includes generated code for provider actions]

You: "Using playwright-customer, navigate to products and find iPhone"
[Response includes generated code for customer actions]

You: "Using playwright-provider, change price to $899"
[Response includes generated code for provider update]

You: "Using playwright-customer, refresh and verify new price"
[Response includes generated code for customer verification]

The orchestration overhead is substantial:

Server context switching - Explicitly specify which server (role) before every action
Code stream assembly - Collect and stitch code snippets from 3+ separate response streams
State coordination - Manually track which role has what data across parallel sessions
Conversation fragmentation - LLM managing multiple concurrent browser states simultaneously

For a 50-action multi-role test with 7 role switches, you're effectively conducting two parallel conversations with separate automation sessions, then manually assembling the fragments into a coherent test file.

The Verdex Approach:

With CDP-level control, Verdex manages multiple browser contexts within a single server instance. Each role gets its own incognito context with true browser-level isolation—separate CDP sessions, independent authentication, isolated storage:

select_role("provider");
browser_navigate("/provider/products/new");
browser_click("e1");  // Add product form

select_role("customer");
browser_navigate("/products");
browser_snapshot();  // Customer sees new product immediately

select_role("provider");
browser_navigate("/provider/products/edit");
browser_click("e7");  // Update pricing

select_role("customer");
browser_navigate("/products");
browser_snapshot();  // Changes reflected instantly

The architectural difference:

Single conversation thread, natural narrative flow, browser-level isolation handled at the protocol level. All actions logged in one unified session trace. No server process juggling, no code stream assembly, no manual state synchronization.

The key is that select_role() switches the active CDP session within the same MCP server—not switching between separate server instances. Each role maintains:

Independent authentication state (separate auth files)
Isolated storage (cookies, localStorage, sessionStorage)
Dedicated CDP session for commands
Separate accessibility snapshots and element maps

Multi-role e2e tests went from tedious orchestration across multiple MCP instances to straightforward linear authoring. Tests that would take an hour of careful coordination took minutes, and they were more reliable because the isolation was handled at the browser protocol level rather than through multi-server configuration management.

Progressive Disclosure and Token Efficiency

Verdex's progressive disclosure design—returning targeted results per query instead of dumping entire DOM trees—depends on complete control over what gets serialized across the protocol boundary.

With direct CDP access, I control the entire serialization pipeline:

// Bridge runs entirely in the isolated world
window.verdexBridge = {
  get_ancestors(ref) {
    // Synchronous DOM traversal
    // Custom filtering for stable containers
    // Minimal JSON output
    return { target, ancestors };
  }
};

The bridge performs multi-step DOM analysis in-browser and returns only the structural facts needed for selector construction. This keeps computation close to the data and minimizes serialization costs.

The Cross-Browser Constraint Doesn't Apply

Playwright's entire value proposition is cross-browser parity. Every feature needs to work consistently across Chromium, Firefox, and WebKit. This is the right design for a test runner.

Verdex is explicitly a development-time tool. You author tests with Verdex during development, then execute those tests with Playwright across all browsers in CI:

The output is standard Playwright code that runs anywhere

What About Playwright MCP?

Playwright MCP serves a fundamentally different purpose. Understanding this distinction requires looking at how each tool actually works with the browser.

The Critical Difference: Runtime Execution vs Authoring Analysis

Playwright MCP: A runtime execution tool that gives AI agents direct control over live browser sessions for immediate tasks—web scraping, form filling, test execution, automation. The agent issues commands and Playwright MCP executes them in real-time.

Verdex: An authoring-time analysis tool. It doesn't execute tests or care how semantic your html is—it is solely focused on helping you write better test code, faster.

When you use Verdex, you first navigate using Verdex's primitives:

browser_navigate("https://example.com")  // Verdex navigation
browser_snapshot()                        // Get accessibility tree with refs
get_ancestors("e3")                      // Explore DOM structure
get_siblings("e3", 2)                    // Examine repeating patterns

Those ref values (like e3) come from Verdex's accessibility snapshot generator, which creates stable references to DOM nodes and maintains a Map<string, ElementInfo>. This mapping enables structural exploration—get_ancestors(ref) can walk up the DOM tree because it has direct access to the underlying element.

As of current public documentation, Playwright MCP does not offer primitives to traverse DOM structure. The maintainers closed a request for exposing the raw DOM tree as "not planned." The architecture is fundamentally different—Playwright MCP's refs are for immediate interaction, not structural analysis.

Why Accessibility Snapshots Work for Different Purposes

For Playwright MCP (runtime execution):

button "Add to Cart" [ref=e3]

That's all the agent needs to click the button or extract data.

For Verdex (test authoring):

// Start with the same accessibility snapshot
button "Add to Cart" [ref=e3]

// But then explore structure to write durable selectors
get_ancestors("e3")       // Discovers the product-card container
get_siblings("e3", 2)     // Confirms it's one of many cards
get_descendants("e3", 1)  // Finds unique anchors inside the card

// Result: Container-scoped selector that survives layout changes
getByTestId("product-card")
  .filter({ hasText: "iPhone 15 Pro" })
  .getByRole("button", { name: "Add to Cart" })

Element Lifecycle: Two Philosophies

Playwright MCP (Ephemeral References):

Snapshot 1 → ref=e3 → Element A → Click happens
Snapshot 2 → ref=e3 → Element B (possibly different!)

Refs regenerate each time, preventing stale element bugs but making structural analysis impractical.

Verdex (Persistent References During Authoring):

Snapshot → ref=e3 → CDP objectId → Element A
  ↓
get_ancestors(e3) → Same objectId → Walk parent chain
  ↓
get_siblings(e3, 2) → Same objectId → Analyze siblings
  ↓
get_descendants(e3, 1) → Same objectId → Explore children

Refs remain stable throughout the authoring session. The final test code uses Playwright's standard Locators.

Why Issue #103 Was Closed

Issue #103 in the Playwright MCP repository requested DOM visibility features, but it was closed as "not planned" after the maintainer tested full HTML output on real sites.

From the issue:

"I played around with this, but in my tests on github.com and microsoft.com, the HTML was way too much text for the LLM to make sense of it. In fact, it repeatedly requested the HTML again, I assume because it couldn't comprehend it." — @skn0tt

This is the exact information overload problem I encountered building Verdex. Full DOM dumps on complex pages produce 50k+ tokens that degrade LLM accuracy—the model hallucinates elements, loses context mid-response, and can't distinguish signal from noise.

The architectural constraint: While full HTML dumps don't work, progressive structural exploration isn't compatible with Playwright MCP's current design either. The accessibility snapshot approach—which generates ephemeral refs tied to specific snapshot moments—creates fundamental barriers:

No stable references across queries: Each snapshot generates new refs. If you call browser_snapshot() and get ref=e3, then the page updates and you call browser_snapshot() again, that same button might now be ref=e7. You can't call get_ancestors(e3) and then get_siblings(e3) on the same element because e3 only exists within the context of one snapshot.
No persistent ref-to-DOM mapping: The accessibility snapshot provides clean semantic views but doesn't maintain the underlying mapping from refs to DOM nodes needed for structural traversal. To answer "what are the ancestors of e3?", you'd need to track which DOM node e3 represents—but refs are intentionally disposable to prevent staleness.
Cross-browser constraints: Adding browser-specific CDP features like persistent objectId tracking or isolated world DOM traversal would undermine Playwright's core value proposition of cross-browser uniformity.

Adding structural exploration to Playwright MCP would require fundamental architectural changes:

Persistent ref lifecycle: Refs would need to survive across queries and map to live DOM nodes, contradicting the ephemeral, snapshot-based model designed to prevent staleness
Specific CDP features: Structural primitives like ancestor traversal and sibling inspection require CDP objectId tracking in isolated execution contexts—features that don't exist in Firefox or WebKit
Different abstraction layer: The accessibility-first view intentionally hides structural details (like data-testid attributes and container hierarchies) that test authors need for writing stable selectors

The Complementary Relationship

This creates a natural division of labor:

Use Playwright MCP for:

Real-time browser control and automation
Web scraping and data extraction
Form filling and interaction
Exploratory testing and debugging

Use Verdex for:

Authoring complex multi-role e2e test flows
Creating durable, refactor-resistant selectors
Understanding DOM structure and container relationships
Managing multiple authenticated sessions with proper isolation

Execute with Playwright's test runner for:

Cross-browser reliability in CI
Parallel test execution
Comprehensive reporting and debugging tools

The tools solve different problems at different stages of the testing workflow.

The Bottom Line: Foundation Matches Purpose

This isn't about capability—Playwright's newCDPSession() provides the same CDP access that Verdex uses. It's about directness and architectural fit.

Playwright abstracts away browser protocols to provide cross-browser uniformity—write once, test everywhere. This abstraction layer is Playwright's superpower for test execution.

Verdex builds on browser protocols to provide specific authoring intelligence—deep structural exploration that helps you write better selectors during development. Working at the CDP layer directly means no impedance mismatches, no fighting auto-disposal semantics, no bridging between execution contexts.

The 2MB footprint, CDP-native APIs, and element lifecycle semantics all align with authoring-time exploration rather than test-time reliability.

You use Verdex during development to write complex e2e tests with better selectors, then execute those tests with Playwright in CI. The refs, structural analysis, and multi-role contexts exist only during authoring—your final test code is pure Playwright, running anywhere Playwright runs.

Different stages of the workflow, different architectural foundations, designed to complement each other.

DEV Community