DEV Community

Cover image for Browser Automation Protocols: CDP vs WebDriver Deep Dive
Rakibul Yeasin
Rakibul Yeasin

Posted on • Edited on

Browser Automation Protocols: CDP vs WebDriver Deep Dive

A technical perspective on browser automation internals, protocol architectures, and when to use what.


Table of Contents

  1. The Two Protocols
  2. Architecture Comparison
  3. WebDriver Protocol (W3C)
  4. Chrome DevTools Protocol (CDP)
  5. Head-to-Head Comparison
  6. When to Use What
  7. Our CDP Implementation
  8. Production Examples
  9. Final Thoughts

The Two Protocols

Browser automation comes down to two fundamental approaches:

WebDriver - W3C standardized, cross-browser, high-level abstraction over HTTP REST.

CDP - Chrome's native debugging protocol, WebSocket-based, low-level access to browser internals.

Both solve browser automation. Neither is universally "better." Your use case dictates the choice.


Architecture Comparison

WebDriver Architecture

┌──────────────┐    HTTP/REST    ┌──────────────┐    Native    ┌─────────────┐
│    Client    │ ◄────────────► │    Driver    │ ◄──────────► │   Browser   │
│  (Selenium)  │   Port 4444    │  (chromedriver│   Protocol  │   (Chrome)  │
└──────────────┘                │   geckodriver)│              └─────────────┘
                                └──────────────┘
Enter fullscreen mode Exit fullscreen mode

Three-tier model:

  1. Client library sends HTTP requests
  2. Driver binary translates to browser-native calls
  3. Browser executes and responds

The middleman tax: Every command pays HTTP overhead + driver process latency.

CDP Architecture

┌──────────────┐    WebSocket    ┌─────────────┐
│    Client    │ ◄────────────► │   Browser   │
│   (Direct)   │   Port 9222    │   (Chrome)  │
└──────────────┘                └─────────────┘
Enter fullscreen mode Exit fullscreen mode

Two-tier model:

  1. Client connects directly to browser
  2. Persistent WebSocket, bidirectional streaming

No middleman. Direct protocol access. Events pushed in real-time.


WebDriver Protocol (W3C)

Overview

WebDriver is a W3C Recommendation since 2018. It defines a REST API for browser automation with focus on cross-browser compatibility.

Transport

HTTP/REST with JSON payloads:

POST /session HTTP/1.1
Content-Type: application/json

{
  "capabilities": {
    "browserName": "chrome",
    "browserVersion": "120"
  }
}
Enter fullscreen mode Exit fullscreen mode

Session Lifecycle

# Create session
POST /session
→ {"sessionId": "abc123", "capabilities": {...}}

# All subsequent commands use session ID
POST /session/abc123/url
GET  /session/abc123/title
POST /session/abc123/element
DELETE /session/abc123
Enter fullscreen mode Exit fullscreen mode

Core Endpoints

Endpoint Method Purpose
/session POST Create new session
/session/{id} DELETE End session
/session/{id}/url POST Navigate to URL
/session/{id}/url GET Get current URL
/session/{id}/title GET Get page title
/session/{id}/element POST Find element
/session/{id}/element/{eid}/click POST Click element
/session/{id}/element/{eid}/value POST Send keys
/session/{id}/screenshot GET Capture screenshot
/session/{id}/execute/sync POST Execute JS

Element Location Strategies

{
  "using": "css selector",
  "value": "button.submit"
}
Enter fullscreen mode Exit fullscreen mode

Supported locators:

  • css selector
  • link text
  • partial link text
  • tag name
  • xpath

Example: Complete Flow

# 1. Create session
curl -X POST http://localhost:4444/session \
  -H "Content-Type: application/json" \
  -d '{"capabilities": {"browserName": "chrome"}}'

# Response: {"value": {"sessionId": "xyz789", ...}}

# 2. Navigate
curl -X POST http://localhost:4444/session/xyz789/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# 3. Find element
curl -X POST http://localhost:4444/session/xyz789/element \
  -H "Content-Type: application/json" \
  -d '{"using": "css selector", "value": "h1"}'

# Response: {"value": {"element-6066-...": "element-id-123"}}

# 4. Get text
curl http://localhost:4444/session/xyz789/element/element-id-123/text

# Response: {"value": "Example Domain"}

# 5. Screenshot
curl http://localhost:4444/session/xyz789/screenshot

# Response: {"value": "iVBORw0KGgo...base64..."}

# 6. Cleanup
curl -X DELETE http://localhost:4444/session/xyz789
Enter fullscreen mode Exit fullscreen mode

Limitations

  1. No network interception - Can't inspect/modify HTTP traffic
  2. No console access - Can't capture console.log output
  3. No performance metrics - No access to rendering/memory data
  4. No real-time events - Polling only, no push notifications
  5. Driver dependency - Requires separate driver binary per browser
  6. Version coupling - Driver version must match browser version

Chrome DevTools Protocol (CDP)

Overview

CDP is Chrome's native debugging protocol. It's what DevTools uses internally. Direct access to 61 domains covering every browser capability.

Transport

Bidirectional WebSocket with JSON-RPC:

Client                                Browser
   │                                     │
   │──── {"id":1,"method":"Page.navigate", ───►
   │      "params":{"url":"..."}}        │
   │                                     │
   │◄─── {"id":1,"result":{"frameId":...}} ───
   │                                     │
   │◄─── {"method":"Page.loadEventFired", ────
   │      "params":{"timestamp":...}}    │
   │                                     │
Enter fullscreen mode Exit fullscreen mode

Three message types:

  • Request: Client → Browser (has id + method)
  • Response: Browser → Client (has id + result/error)
  • Event: Browser → Client (has method only, no id)

Domain Organization

CDP organizes into domains. Each domain has methods and events.

Core domains:

Domain Methods Events Purpose
Page 25+ 15+ Navigation, lifecycle, screenshots
Runtime 20+ 10+ JS execution, console
DOM 30+ 10+ Document structure
Network 15+ 20+ HTTP traffic
Input 5+ 0 Mouse, keyboard, touch
Emulation 20+ 0 Device simulation
Target 15+ 5+ Tab/window management
Debugger 25+ 10+ JS debugging
Profiler 10+ 5+ CPU profiling
HeapProfiler 10+ 5+ Memory profiling

HTTP Discovery Endpoints

Before WebSocket, discover targets via HTTP:

# List all debuggable targets
curl http://localhost:9222/json/list
[
  {
    "id": "ABC123",
    "type": "page",
    "title": "New Tab",
    "url": "chrome://newtab/",
    "webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/ABC123"
  }
]

# Browser version
curl http://localhost:9222/json/version
{
  "Browser": "Chrome/120.0.0.0",
  "Protocol-Version": "1.3",
  "webSocketDebuggerUrl": "ws://localhost:9222/devtools/browser/XYZ"
}

# Create new tab
curl http://localhost:9222/json/new?https://example.com

# Close tab
curl http://localhost:9222/json/close/ABC123
Enter fullscreen mode Exit fullscreen mode

Protocol Examples

1. Navigation

// Enable Page domain first
 {"id": 1, "method": "Page.enable"}
 {"id": 1, "result": {}}

// Navigate
 {"id": 2, "method": "Page.navigate", "params": {"url": "https://example.com"}}
 {"id": 2, "result": {"frameId": "ABC", "loaderId": "XYZ"}}

// Events fired automatically
 {"method": "Page.frameStartedLoading", "params": {"frameId": "ABC"}}
 {"method": "Page.loadEventFired", "params": {"timestamp": 1234.56}}
 {"method": "Page.frameStoppedLoading", "params": {"frameId": "ABC"}}
Enter fullscreen mode Exit fullscreen mode

2. JavaScript Evaluation

 {"id": 1, "method": "Runtime.enable"}
 {"id": 1, "result": {}}

 {"id": 2, "method": "Runtime.evaluate", "params": {
    "expression": "document.title",
    "returnByValue": true
  }}
 {"id": 2, "result": {
    "result": {"type": "string", "value": "Example Domain"}
  }}

// Complex evaluation
 {"id": 3, "method": "Runtime.evaluate", "params": {
    "expression": "(() => { return {width: window.innerWidth, height: window.innerHeight}; })()",
    "returnByValue": true
  }}
 {"id": 3, "result": {
    "result": {"type": "object", "value": {"width": 1920, "height": 1080}}
  }}
Enter fullscreen mode Exit fullscreen mode

3. DOM Operations

 {"id": 1, "method": "DOM.enable"}
 {"id": 1, "result": {}}

 {"id": 2, "method": "DOM.getDocument", "params": {"depth": 0}}
 {"id": 2, "result": {
    "root": {"nodeId": 1, "nodeName": "#document", "childNodeCount": 2}
  }}

 {"id": 3, "method": "DOM.querySelector", "params": {"nodeId": 1, "selector": "h1"}}
 {"id": 3, "result": {"nodeId": 42}}

 {"id": 4, "method": "DOM.getOuterHTML", "params": {"nodeId": 42}}
 {"id": 4, "result": {"outerHTML": "<h1>Example Domain</h1>"}}
Enter fullscreen mode Exit fullscreen mode

4. Network Interception

 {"id": 1, "method": "Network.enable"}
 {"id": 1, "result": {}}

// Events stream automatically
 {"method": "Network.requestWillBeSent", "params": {
    "requestId": "req-1",
    "request": {
      "url": "https://example.com/api/data",
      "method": "GET",
      "headers": {"Accept": "application/json"}
    },
    "timestamp": 1234.56,
    "type": "XHR"
  }}

 {"method": "Network.responseReceived", "params": {
    "requestId": "req-1",
    "response": {
      "status": 200,
      "statusText": "OK",
      "headers": {"content-type": "application/json"},
      "mimeType": "application/json"
    }
  }}

 {"method": "Network.loadingFinished", "params": {
    "requestId": "req-1",
    "encodedDataLength": 1234
  }}

// Get response body
 {"id": 2, "method": "Network.getResponseBody", "params": {"requestId": "req-1"}}
 {"id": 2, "result": {"body": "{\"data\": [...]}", "base64Encoded": false}}
Enter fullscreen mode Exit fullscreen mode

5. Screenshots

 {"id": 1, "method": "Page.captureScreenshot", "params": {
    "format": "png",
    "quality": 100,
    "fromSurface": true
  }}
 {"id": 1, "result": {"data": "iVBORw0KGgoAAAANSUhEUgAAA..."}}

// Full page screenshot
 {"id": 2, "method": "Page.captureScreenshot", "params": {
    "format": "png",
    "captureBeyondViewport": true
  }}

// Specific region
 {"id": 3, "method": "Page.captureScreenshot", "params": {
    "format": "jpeg",
    "quality": 80,
    "clip": {"x": 0, "y": 0, "width": 800, "height": 600, "scale": 1}
  }}
Enter fullscreen mode Exit fullscreen mode

6. Input Simulation

// Mouse click
 {"id": 1, "method": "Input.dispatchMouseEvent", "params": {
    "type": "mousePressed",
    "x": 100, "y": 200,
    "button": "left",
    "clickCount": 1
  }}
 {"id": 1, "result": {}}

 {"id": 2, "method": "Input.dispatchMouseEvent", "params": {
    "type": "mouseReleased",
    "x": 100, "y": 200,
    "button": "left",
    "clickCount": 1
  }}

// Type text
 {"id": 3, "method": "Input.insertText", "params": {"text": "Hello World"}}

// Key press
 {"id": 4, "method": "Input.dispatchKeyEvent", "params": {
    "type": "keyDown",
    "key": "Enter",
    "code": "Enter",
    "windowsVirtualKeyCode": 13
  }}
 {"id": 5, "method": "Input.dispatchKeyEvent", "params": {
    "type": "keyUp",
    "key": "Enter",
    "code": "Enter"
  }}
Enter fullscreen mode Exit fullscreen mode

7. Console Capture

 {"id": 1, "method": "Runtime.enable"}
 {"id": 1, "result": {}}

// Console events stream automatically
 {"method": "Runtime.consoleAPICalled", "params": {
    "type": "log",
    "args": [{"type": "string", "value": "Hello from page"}],
    "timestamp": 1234567890.123
  }}

 {"method": "Runtime.consoleAPICalled", "params": {
    "type": "error",
    "args": [{"type": "string", "value": "Something went wrong"}],
    "stackTrace": {...}
  }}
Enter fullscreen mode Exit fullscreen mode

8. Performance Metrics

 {"id": 1, "method": "Performance.enable"}
 {"id": 1, "result": {}}

 {"id": 2, "method": "Performance.getMetrics"}
 {"id": 2, "result": {
    "metrics": [
      {"name": "Timestamp", "value": 1234.56},
      {"name": "Documents", "value": 1},
      {"name": "Frames", "value": 1},
      {"name": "JSEventListeners", "value": 42},
      {"name": "Nodes", "value": 150},
      {"name": "LayoutCount", "value": 3},
      {"name": "RecalcStyleCount", "value": 5},
      {"name": "JSHeapUsedSize", "value": 10485760},
      {"name": "JSHeapTotalSize", "value": 16777216}
    ]
  }}
Enter fullscreen mode Exit fullscreen mode

Head-to-Head Comparison

Protocol Level

Aspect WebDriver CDP
Specification W3C Standard Chrome Internal
Transport HTTP REST WebSocket
Connection Request/Response Persistent + Events
Latency Higher (HTTP per command) Lower (single WS)
Message Format JSON over HTTP JSON-RPC over WS

Architecture

Aspect WebDriver CDP
Components Client + Driver + Browser Client + Browser
Driver Required Yes (chromedriver, etc.) No
Version Coupling Driver ↔ Browser tight Protocol versioned
Port 4444 (driver) 9222 (browser)

Capabilities

Feature WebDriver CDP
Navigation
Element Interaction
JavaScript Execution
Screenshots
Cookies
Network Interception
Console Access
Performance Metrics
Real-time Events
DOM Debugging
CPU Profiling
Memory Profiling
Geolocation Emulation Limited
Device Emulation Limited
Request Blocking

Browser Support

Browser WebDriver CDP
Chrome
Edge ✅ (Chromium)
Firefox Partial
Safari
Opera ✅ (Chromium)

Ecosystem

Tool WebDriver CDP
Selenium Primary Via BiDi
Puppeteer Primary
Playwright Uses both Uses both
Cypress Primary

When to Use What

Use WebDriver When:

  1. Cross-browser testing - Need Safari, Firefox, Chrome uniformly
  2. Existing Selenium infrastructure - Large test suites already written
  3. Simple automation - Basic click, type, navigate workflows
  4. Compliance requirements - W3C standard may be mandated
  5. Team familiarity - Team knows Selenium well

Use CDP When:

  1. Chrome/Chromium only - Target browser is fixed
  2. Network interception - Mock APIs, block resources, modify requests
  3. Performance profiling - Need rendering metrics, memory analysis
  4. Console monitoring - Capture JS logs, errors, warnings
  5. Real-time events - React to page events as they happen
  6. Speed critical - Minimize automation overhead
  7. AI agents - Need granular control for autonomous browsing
  8. Advanced debugging - JS breakpoints, DOM inspection

Hybrid Approach (Playwright/Selenium 4)

Modern tools use both:

Playwright:
  - WebDriver for cross-browser compat
  - CDP for Chrome-specific features

Selenium 4 BiDi:
  - WebDriver base protocol
  - CDP bridge for advanced features
Enter fullscreen mode Exit fullscreen mode

Our CDP Implementation

We built a production-ready Rust CDP client with two abstraction layers. Source: github.com/dreygur/cdp-protocol

Project Structure

cdp-protocol/
├── src/
│   ├── lib.rs          # Public exports
│   ├── client.rs       # Low-level CDP client (WebSocket, routing)
│   ├── agent.rs        # High-level BrowserAgent (AI-friendly)
│   ├── config.rs       # Shared configuration (host, port, viewport, screenshots dir)
│   ├── types.rs        # Protocol message types
│   └── error.rs        # Error handling
├── examples/
│   ├── basic.rs        # Low-level usage
│   ├── agent.rs        # High-level AI agent
│   └── industrial.rs   # Parallel scraping
└── Cargo.toml
Enter fullscreen mode Exit fullscreen mode

Dependencies

[dependencies]
tokio = { version = "1", features = ["full"] }
tokio-tungstenite = { version = "0.21", features = ["native-tls"] }
futures-util = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
reqwest = { version = "0.11", features = ["json"] }
base64 = "0.21"
tracing = "0.1"
Enter fullscreen mode Exit fullscreen mode

Layer 1: CdpClient (Low-Level)

Direct protocol access with convenience wrappers.

use cdp_protocol::{CdpClient, Config, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let cfg = Config::default();
    std::fs::create_dir_all(&cfg.screenshots_dir).ok();

    let version = CdpClient::get_version(&cfg.host, cfg.port).await?;
    println!("Browser: {}", version.browser);

    let targets = CdpClient::list_targets(&cfg.host, cfg.port).await?;
    for target in &targets {
        println!("  - {} [{}]: {}", target.target_type, target.id, target.title);
    }

    let client = CdpClient::connect_to_page(&cfg.host, cfg.port).await?;

    for domain in ["Page", "Runtime", "DOM", "Network"] {
        client.enable_domain(domain).await?;
    }
    client.set_viewport(cfg.viewport_width, cfg.viewport_height, false).await?;

    let nav = client.navigate("https://example.com").await?;
    println!("Frame ID: {}", nav.frame_id);

    tokio::time::sleep(std::time::Duration::from_secs(2)).await;

    let title = client.eval("document.title").await?;
    println!("Title: {}", title);

    let result = client.evaluate("1 + 2 * 3").await?;
    println!("Math: {:?}", result.result.value);

    let doc = client.get_document().await?;
    let h1_id = client.query_selector(doc.node_id, "h1").await?;
    if h1_id > 0 {
        println!("H1: {}", client.get_outer_html(h1_id).await?);
    }

    client.full_page_screenshot_to_file(&format!("{}/example.png", cfg.screenshots_dir)).await?;

    let cookies = client.get_cookies().await?;
    println!("Cookies: {}", cookies.len());

    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Layer 2: BrowserAgent (High-Level)

AI-friendly interface with JSON action dispatch.

use cdp_protocol::{BrowserAgent, BrowserAction, Config, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let cfg = Config::default();
    std::fs::create_dir_all(&cfg.screenshots_dir).ok();

    let agent = BrowserAgent::connect_with_config(&cfg).await?;

    agent.execute(BrowserAction::Navigate {
        url: "https://example.com".to_string(),
    }).await;

    agent.execute(BrowserAction::GetTitle).await;

    agent.execute_json(r#"{"action": "navigate", "url": "https://rust-lang.org"}"#).await;
    agent.execute_json(r#"{"action": "wait", "ms": 2000}"#).await;
    agent.execute_json(r#"{"action": "screenshot", "path": "screenshots/rust.png"}"#).await;

    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Action Builder

Fluent API for chaining:

use cdp_protocol::ActionBuilder;

let actions = ActionBuilder::new()
    .navigate("https://www.google.com")
    .wait(1500)
    .fill("input[name='q']", "Rust programming")
    .press_key("Enter")
    .wait(2000)
    .screenshot(Some("search.png"))
    .build();

let results = agent.execute_many(actions).await;
Enter fullscreen mode Exit fullscreen mode

Supported Actions

pub enum BrowserAction {
    Navigate { url: String },
    GoBack,
    GoForward,
    Reload,

    Click { selector: Option<String>, x: Option<f64>, y: Option<f64> },
    Type { text: String, selector: Option<String> },
    Fill { selector: String, value: String },
    Submit { selector: Option<String> },
    PressKey { key: String },

    GetTitle,
    GetUrl,
    GetText,
    GetContent { selector: Option<String> },
    GetLinks,
    GetAttributes { selector: String },
    Exists { selector: String },

    Screenshot { path: Option<String> },
    Evaluate { expression: String },

    Wait { ms: u64 },
    WaitForSelector { selector: String, timeout_ms: u64 },

    Scroll { x: f64, y: f64 },
    SetViewport { width: i32, height: i32, mobile: bool },
    GetMetrics,
}
Enter fullscreen mode Exit fullscreen mode

Production Examples

Form Automation

let search_actions = vec![
    BrowserAction::Navigate {
        url: "https://duckduckgo.com".to_string(),
    },
    BrowserAction::Wait { ms: 1500 },
    BrowserAction::Fill {
        selector: "input[name='q']".to_string(),
        value: "Rust programming language".to_string(),
    },
    BrowserAction::PressKey {
        key: "Enter".to_string(),
    },
    BrowserAction::Wait { ms: 2000 },
    BrowserAction::Screenshot {
        path: Some("search_results.png".to_string()),
    },
    BrowserAction::GetTitle,
];

for action in search_actions {
    let result = agent.execute(action).await;
    if !result.is_success() {
        println!("Failed: {:?}", result);
        break;
    }
}
Enter fullscreen mode Exit fullscreen mode

Data Extraction

let result = agent.execute(BrowserAction::Evaluate {
    expression: r#"
        (() => {
            return {
                viewport: {
                    width: window.innerWidth,
                    height: window.innerHeight
                },
                userAgent: navigator.userAgent,
                language: navigator.language,
                cookiesEnabled: navigator.cookieEnabled,
                platform: navigator.platform
            };
        })()
    "#.to_string(),
}).await;
Enter fullscreen mode Exit fullscreen mode

Industrial Scraping (100 Pages Parallel)

use cdp_protocol::{CdpClient, CdpError, Config, Result};
use std::sync::Arc;
use std::time::Instant;
use tokio::sync::Semaphore;
use tokio::task::JoinSet;

const MAX_CONCURRENT: usize = 5;

const URLS: &[&str] = &[
    "https://slishee.com",
    "https://www.rust-lang.org",
    "https://www.google.com",
    // ... 97 more
];

const NUM_PAGES: usize = URLS.len();

#[tokio::main]
async fn main() -> Result<()> {
    let cfg = Arc::new(Config::default());
    std::fs::create_dir_all(&cfg.screenshots_dir).ok();

    println!("=== Industrial Scraping Demo ===");
    println!("Pages to process: {NUM_PAGES}");
    println!("Max concurrent:   {MAX_CONCURRENT}\n");

    let start = Instant::now();
    let semaphore = Arc::new(Semaphore::new(MAX_CONCURRENT));
    let mut set = JoinSet::new();

    for i in 0..NUM_PAGES {
        let url = URLS[i].to_string();
        let sem = semaphore.clone();
        let cfg = cfg.clone();

        set.spawn(async move {
            let _permit = sem.acquire().await.unwrap();
            (i, process_page(i, &url, &cfg).await)
        });
    }

    let (mut success, mut failed) = (0usize, 0usize);

    while let Some(res) = set.join_next().await {
        match res {
            Ok((i, Ok((title, elapsed)))) => {
                println!("[{i:3}] ✓ {title} ({elapsed:.1}s)");
                success += 1;
            }
            Ok((i, Err(e))) => {
                println!("[{i:3}] ✗ Error: {e}");
                failed += 1;
            }
            Err(e) => {
                println!("[???] ✗ Panic: {e}");
                failed += 1;
            }
        }
    }

    let total = start.elapsed();
    println!(
        "\nTotal: {:.2}s | Success: {} | Failed: {} | {:.2} pages/sec",
        total.as_secs_f64(), success, failed,
        NUM_PAGES as f64 / total.as_secs_f64()
    );

    Ok(())
}

async fn process_page(id: usize, url: &str, cfg: &Config) -> Result<(String, f64)> {
    let start = Instant::now();

    let target = CdpClient::create_tab(&cfg.host, cfg.port, None).await?;
    let ws_url = target.web_socket_debugger_url
        .ok_or_else(|| CdpError::InvalidUrl(format!("no WS URL for tab {id}")))?;

    let client = CdpClient::connect(&ws_url).await?;
    client.enable_domain("Page").await?;
    client.enable_domain("Runtime").await?;
    client.set_viewport(cfg.viewport_width, cfg.viewport_height, false).await?;
    client.navigate(url).await?;

    tokio::time::sleep(std::time::Duration::from_millis(2000)).await;

    let title = client.eval("document.title").await.unwrap_or_else(|_| "Unknown".into());

    client.full_page_screenshot_to_file(&format!("{}/page_{id:03}.png", cfg.screenshots_dir)).await?;
    client.close().await?;

    Ok((title, start.elapsed().as_secs_f64()))
}
Enter fullscreen mode Exit fullscreen mode

Output:

=== Industrial Scraping Demo ===
Pages to process: 100
Max concurrent:   5

[  2] ✓ Google (2.8s)
[  0] ✓ Slishee - We solve puzzles (3.4s)
[  1] ✓ Rust Programming Language (3.1s)
...
[ 99] ✓ Planet Scale (4.2s)

Total: 87.3s | Success: 97 | Failed: 3 | 1.15 pages/sec
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

Protocol Selection Matrix

Requirement Recommendation
Cross-browser testing WebDriver
Chrome-only, max performance CDP
Network mocking CDP
AI agent automation CDP
Existing Selenium codebase WebDriver (+ BiDi for CDP features)
Console/log capture CDP
Performance profiling CDP
Simple E2E tests Either works

The Future

WebDriver BiDi is bridging the gap - adding CDP-like capabilities to WebDriver. Selenium 4 already supports it. Eventually, you'll get the best of both worlds through a unified spec.

Until then:

  • WebDriver for cross-browser standardization
  • CDP for Chrome power-user features

Resources

Top comments (4)

Collapse
 
double_chen_70da460344c73 profile image
Double CHEN

The middleman tax framing for WebDriver is real — I measured around 50ms of extra RTT per command through geckodriver in some test loops. But the tricky part your architecture diagram shows is that CDP's direct WebSocket connection is also exactly what anti-bot systems fingerprint: the Runtime.enable call sequence, Page.addScriptToEvaluateOnNewDocument for injection, and certain DOM properties create a detectable pattern. browser-act CLI (npx skills add browser-act/skills --skill browser-act) wraps CDP but patches the fingerprinting layer before any page JS runs — randomizes canvas noise, screen dimensions, and the driver property. Same speed advantage of CDP, but harder to detect than raw Playwright.

Collapse
 
dreygur profile image
Rakibul Yeasin • Edited

Valid point, and it's the gap I deliberately left out of this post since the focus was protocol architecture not evasion.

The fingerprint surface with raw CDP is real:

  • navigator.webdriver = true unless patched
  • Missing window.chrome.runtime properties that real Chrome exposes
  • Page.addScriptToEvaluateOnNewDocument injection itself is detectable by timing, since it fires before any page script but after the domain enable sequence
  • Headless-specific canvas/WebGL rendering differences that noise-patching helps but doesn't fully close

The domain enable sequence fingerprint is the subtler one. Anti-bot systems can infer automation from behavioral timing: how fast Runtime.enable + Page.enable fires relative to first navigation, zero human input latency, etc. Patching properties doesn't fix that.

browser-act's approach (patch before page JS runs) is the right layer, same idea as puppeteer-extra-plugin-stealth. The tradeoff is you're now trusting that abstraction's maintenance cadence against detection updates, which is its own arms race.

For our use case (internal tooling, performance profiling, AI agents against controlled environments) raw CDP is fine. For scraping production anti-bot sites, yeah, you need the stealth layer on top. Worth a separate post on the detection vectors.

Collapse
 
cloakhq profile image
CloakHQ

Really solid breakdown, the comparison table alone is worth bookmarking.

One thing worth adding to the CDP "industrial scraping" section: when you run 10+ sessions through the same browser instance on port 9222, they all share the same process-level fingerprint. Canvas, WebGL renderer, screen resolution, timezone - all identical across tabs. Detection systems that look at behavioral clustering will flag this pretty fast even if each session has its own cookies and headers.

If you're doing anything beyond simple data collection (logged-in sessions, sites with bot detection), you basically need a separate browser process per session, not just separate targets within one instance. Playwright makes this easier with browserType.launch() per context, but it comes at a memory cost.

The BiDi point at the end is interesting - curious how long until it actually closes the gap with CDP in practice. The spec has been "almost ready" for a while now.

Collapse
 
dreygur profile image
Rakibul Yeasin

Correct on all counts, and it's a limitation baked into the industrial.rs example directly. CdpClient::create_tab() creates targets within one browser process, so every session shares the GPU fingerprint, WebGL renderer string, canvas noise signature, and system timezone. For the "100 URLs, grab titles and screenshots" use case that's fine. For anything with session state or bot detection, you're handing detection systems a behavioral cluster on a plate.

The fix is what you said: separate --remote-debugging-port per process, not separate targets per port. Isolation at the OS process level, not the CDP target level. Memory cost is real, roughly 100-200MB per Chrome instance depending on what's loaded, which is why the semaphore pattern in the example needs rethinking if you scale it with process-per-session.

One addition: --user-data-dir per process matters too. Shared profile directories leak state between "isolated" browser instances through cache, local storage baseline, and extension state even when cookies are cleared.