A technical perspective on browser automation internals, protocol architectures, and when to use what.
Table of Contents
- The Two Protocols
- Architecture Comparison
- WebDriver Protocol (W3C)
- Chrome DevTools Protocol (CDP)
- Head-to-Head Comparison
- When to Use What
- Our CDP Implementation
- Production Examples
- Final Thoughts
The Two Protocols
Browser automation comes down to two fundamental approaches:
WebDriver - W3C standardized, cross-browser, high-level abstraction over HTTP REST.
CDP - Chrome's native debugging protocol, WebSocket-based, low-level access to browser internals.
Both solve browser automation. Neither is universally "better." Your use case dictates the choice.
Architecture Comparison
WebDriver Architecture
┌──────────────┐ HTTP/REST ┌──────────────┐ Native ┌─────────────┐
│ Client │ ◄────────────► │ Driver │ ◄──────────► │ Browser │
│ (Selenium) │ Port 4444 │ (chromedriver│ Protocol │ (Chrome) │
└──────────────┘ │ geckodriver)│ └─────────────┘
└──────────────┘
Three-tier model:
- Client library sends HTTP requests
- Driver binary translates to browser-native calls
- Browser executes and responds
The middleman tax: Every command pays HTTP overhead + driver process latency.
CDP Architecture
┌──────────────┐ WebSocket ┌─────────────┐
│ Client │ ◄────────────► │ Browser │
│ (Direct) │ Port 9222 │ (Chrome) │
└──────────────┘ └─────────────┘
Two-tier model:
- Client connects directly to browser
- Persistent WebSocket, bidirectional streaming
No middleman. Direct protocol access. Events pushed in real-time.
WebDriver Protocol (W3C)
Overview
WebDriver is a W3C Recommendation since 2018. It defines a REST API for browser automation with focus on cross-browser compatibility.
Transport
HTTP/REST with JSON payloads:
POST /session HTTP/1.1
Content-Type: application/json
{
"capabilities": {
"browserName": "chrome",
"browserVersion": "120"
}
}
Session Lifecycle
# Create session
POST /session
→ {"sessionId": "abc123", "capabilities": {...}}
# All subsequent commands use session ID
POST /session/abc123/url
GET /session/abc123/title
POST /session/abc123/element
DELETE /session/abc123
Core Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/session |
POST | Create new session |
/session/{id} |
DELETE | End session |
/session/{id}/url |
POST | Navigate to URL |
/session/{id}/url |
GET | Get current URL |
/session/{id}/title |
GET | Get page title |
/session/{id}/element |
POST | Find element |
/session/{id}/element/{eid}/click |
POST | Click element |
/session/{id}/element/{eid}/value |
POST | Send keys |
/session/{id}/screenshot |
GET | Capture screenshot |
/session/{id}/execute/sync |
POST | Execute JS |
Element Location Strategies
{
"using": "css selector",
"value": "button.submit"
}
Supported locators:
css selectorlink textpartial link texttag namexpath
Example: Complete Flow
# 1. Create session
curl -X POST http://localhost:4444/session \
-H "Content-Type: application/json" \
-d '{"capabilities": {"browserName": "chrome"}}'
# Response: {"value": {"sessionId": "xyz789", ...}}
# 2. Navigate
curl -X POST http://localhost:4444/session/xyz789/url \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# 3. Find element
curl -X POST http://localhost:4444/session/xyz789/element \
-H "Content-Type: application/json" \
-d '{"using": "css selector", "value": "h1"}'
# Response: {"value": {"element-6066-...": "element-id-123"}}
# 4. Get text
curl http://localhost:4444/session/xyz789/element/element-id-123/text
# Response: {"value": "Example Domain"}
# 5. Screenshot
curl http://localhost:4444/session/xyz789/screenshot
# Response: {"value": "iVBORw0KGgo...base64..."}
# 6. Cleanup
curl -X DELETE http://localhost:4444/session/xyz789
Limitations
- No network interception - Can't inspect/modify HTTP traffic
-
No console access - Can't capture
console.logoutput - No performance metrics - No access to rendering/memory data
- No real-time events - Polling only, no push notifications
- Driver dependency - Requires separate driver binary per browser
- Version coupling - Driver version must match browser version
Chrome DevTools Protocol (CDP)
Overview
CDP is Chrome's native debugging protocol. It's what DevTools uses internally. Direct access to 61 domains covering every browser capability.
Transport
Bidirectional WebSocket with JSON-RPC:
Client Browser
│ │
│──── {"id":1,"method":"Page.navigate", ───►
│ "params":{"url":"..."}} │
│ │
│◄─── {"id":1,"result":{"frameId":...}} ───
│ │
│◄─── {"method":"Page.loadEventFired", ────
│ "params":{"timestamp":...}} │
│ │
Three message types:
-
Request: Client → Browser (has
id+method) -
Response: Browser → Client (has
id+result/error) -
Event: Browser → Client (has
methodonly, noid)
Domain Organization
CDP organizes into domains. Each domain has methods and events.
Core domains:
| Domain | Methods | Events | Purpose |
|---|---|---|---|
| Page | 25+ | 15+ | Navigation, lifecycle, screenshots |
| Runtime | 20+ | 10+ | JS execution, console |
| DOM | 30+ | 10+ | Document structure |
| Network | 15+ | 20+ | HTTP traffic |
| Input | 5+ | 0 | Mouse, keyboard, touch |
| Emulation | 20+ | 0 | Device simulation |
| Target | 15+ | 5+ | Tab/window management |
| Debugger | 25+ | 10+ | JS debugging |
| Profiler | 10+ | 5+ | CPU profiling |
| HeapProfiler | 10+ | 5+ | Memory profiling |
HTTP Discovery Endpoints
Before WebSocket, discover targets via HTTP:
# List all debuggable targets
curl http://localhost:9222/json/list
[
{
"id": "ABC123",
"type": "page",
"title": "New Tab",
"url": "chrome://newtab/",
"webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/ABC123"
}
]
# Browser version
curl http://localhost:9222/json/version
{
"Browser": "Chrome/120.0.0.0",
"Protocol-Version": "1.3",
"webSocketDebuggerUrl": "ws://localhost:9222/devtools/browser/XYZ"
}
# Create new tab
curl http://localhost:9222/json/new?https://example.com
# Close tab
curl http://localhost:9222/json/close/ABC123
Protocol Examples
1. Navigation
// Enable Page domain first
→ {"id": 1, "method": "Page.enable"}
← {"id": 1, "result": {}}
// Navigate
→ {"id": 2, "method": "Page.navigate", "params": {"url": "https://example.com"}}
← {"id": 2, "result": {"frameId": "ABC", "loaderId": "XYZ"}}
// Events fired automatically
← {"method": "Page.frameStartedLoading", "params": {"frameId": "ABC"}}
← {"method": "Page.loadEventFired", "params": {"timestamp": 1234.56}}
← {"method": "Page.frameStoppedLoading", "params": {"frameId": "ABC"}}
2. JavaScript Evaluation
→ {"id": 1, "method": "Runtime.enable"}
← {"id": 1, "result": {}}
→ {"id": 2, "method": "Runtime.evaluate", "params": {
"expression": "document.title",
"returnByValue": true
}}
← {"id": 2, "result": {
"result": {"type": "string", "value": "Example Domain"}
}}
// Complex evaluation
→ {"id": 3, "method": "Runtime.evaluate", "params": {
"expression": "(() => { return {width: window.innerWidth, height: window.innerHeight}; })()",
"returnByValue": true
}}
← {"id": 3, "result": {
"result": {"type": "object", "value": {"width": 1920, "height": 1080}}
}}
3. DOM Operations
→ {"id": 1, "method": "DOM.enable"}
← {"id": 1, "result": {}}
→ {"id": 2, "method": "DOM.getDocument", "params": {"depth": 0}}
← {"id": 2, "result": {
"root": {"nodeId": 1, "nodeName": "#document", "childNodeCount": 2}
}}
→ {"id": 3, "method": "DOM.querySelector", "params": {"nodeId": 1, "selector": "h1"}}
← {"id": 3, "result": {"nodeId": 42}}
→ {"id": 4, "method": "DOM.getOuterHTML", "params": {"nodeId": 42}}
← {"id": 4, "result": {"outerHTML": "<h1>Example Domain</h1>"}}
4. Network Interception
→ {"id": 1, "method": "Network.enable"}
← {"id": 1, "result": {}}
// Events stream automatically
← {"method": "Network.requestWillBeSent", "params": {
"requestId": "req-1",
"request": {
"url": "https://example.com/api/data",
"method": "GET",
"headers": {"Accept": "application/json"}
},
"timestamp": 1234.56,
"type": "XHR"
}}
← {"method": "Network.responseReceived", "params": {
"requestId": "req-1",
"response": {
"status": 200,
"statusText": "OK",
"headers": {"content-type": "application/json"},
"mimeType": "application/json"
}
}}
← {"method": "Network.loadingFinished", "params": {
"requestId": "req-1",
"encodedDataLength": 1234
}}
// Get response body
→ {"id": 2, "method": "Network.getResponseBody", "params": {"requestId": "req-1"}}
← {"id": 2, "result": {"body": "{\"data\": [...]}", "base64Encoded": false}}
5. Screenshots
→ {"id": 1, "method": "Page.captureScreenshot", "params": {
"format": "png",
"quality": 100,
"fromSurface": true
}}
← {"id": 1, "result": {"data": "iVBORw0KGgoAAAANSUhEUgAAA..."}}
// Full page screenshot
→ {"id": 2, "method": "Page.captureScreenshot", "params": {
"format": "png",
"captureBeyondViewport": true
}}
// Specific region
→ {"id": 3, "method": "Page.captureScreenshot", "params": {
"format": "jpeg",
"quality": 80,
"clip": {"x": 0, "y": 0, "width": 800, "height": 600, "scale": 1}
}}
6. Input Simulation
// Mouse click
→ {"id": 1, "method": "Input.dispatchMouseEvent", "params": {
"type": "mousePressed",
"x": 100, "y": 200,
"button": "left",
"clickCount": 1
}}
← {"id": 1, "result": {}}
→ {"id": 2, "method": "Input.dispatchMouseEvent", "params": {
"type": "mouseReleased",
"x": 100, "y": 200,
"button": "left",
"clickCount": 1
}}
// Type text
→ {"id": 3, "method": "Input.insertText", "params": {"text": "Hello World"}}
// Key press
→ {"id": 4, "method": "Input.dispatchKeyEvent", "params": {
"type": "keyDown",
"key": "Enter",
"code": "Enter",
"windowsVirtualKeyCode": 13
}}
→ {"id": 5, "method": "Input.dispatchKeyEvent", "params": {
"type": "keyUp",
"key": "Enter",
"code": "Enter"
}}
7. Console Capture
→ {"id": 1, "method": "Runtime.enable"}
← {"id": 1, "result": {}}
// Console events stream automatically
← {"method": "Runtime.consoleAPICalled", "params": {
"type": "log",
"args": [{"type": "string", "value": "Hello from page"}],
"timestamp": 1234567890.123
}}
← {"method": "Runtime.consoleAPICalled", "params": {
"type": "error",
"args": [{"type": "string", "value": "Something went wrong"}],
"stackTrace": {...}
}}
8. Performance Metrics
→ {"id": 1, "method": "Performance.enable"}
← {"id": 1, "result": {}}
→ {"id": 2, "method": "Performance.getMetrics"}
← {"id": 2, "result": {
"metrics": [
{"name": "Timestamp", "value": 1234.56},
{"name": "Documents", "value": 1},
{"name": "Frames", "value": 1},
{"name": "JSEventListeners", "value": 42},
{"name": "Nodes", "value": 150},
{"name": "LayoutCount", "value": 3},
{"name": "RecalcStyleCount", "value": 5},
{"name": "JSHeapUsedSize", "value": 10485760},
{"name": "JSHeapTotalSize", "value": 16777216}
]
}}
Head-to-Head Comparison
Protocol Level
| Aspect | WebDriver | CDP |
|---|---|---|
| Specification | W3C Standard | Chrome Internal |
| Transport | HTTP REST | WebSocket |
| Connection | Request/Response | Persistent + Events |
| Latency | Higher (HTTP per command) | Lower (single WS) |
| Message Format | JSON over HTTP | JSON-RPC over WS |
Architecture
| Aspect | WebDriver | CDP |
|---|---|---|
| Components | Client + Driver + Browser | Client + Browser |
| Driver Required | Yes (chromedriver, etc.) | No |
| Version Coupling | Driver ↔ Browser tight | Protocol versioned |
| Port | 4444 (driver) | 9222 (browser) |
Capabilities
| Feature | WebDriver | CDP |
|---|---|---|
| Navigation | ✅ | ✅ |
| Element Interaction | ✅ | ✅ |
| JavaScript Execution | ✅ | ✅ |
| Screenshots | ✅ | ✅ |
| Cookies | ✅ | ✅ |
| Network Interception | ❌ | ✅ |
| Console Access | ❌ | ✅ |
| Performance Metrics | ❌ | ✅ |
| Real-time Events | ❌ | ✅ |
| DOM Debugging | ❌ | ✅ |
| CPU Profiling | ❌ | ✅ |
| Memory Profiling | ❌ | ✅ |
| Geolocation Emulation | Limited | ✅ |
| Device Emulation | Limited | ✅ |
| Request Blocking | ❌ | ✅ |
Browser Support
| Browser | WebDriver | CDP |
|---|---|---|
| Chrome | ✅ | ✅ |
| Edge | ✅ | ✅ (Chromium) |
| Firefox | ✅ | Partial |
| Safari | ✅ | ❌ |
| Opera | ✅ | ✅ (Chromium) |
Ecosystem
| Tool | WebDriver | CDP |
|---|---|---|
| Selenium | Primary | Via BiDi |
| Puppeteer | ❌ | Primary |
| Playwright | Uses both | Uses both |
| Cypress | ❌ | Primary |
When to Use What
Use WebDriver When:
- Cross-browser testing - Need Safari, Firefox, Chrome uniformly
- Existing Selenium infrastructure - Large test suites already written
- Simple automation - Basic click, type, navigate workflows
- Compliance requirements - W3C standard may be mandated
- Team familiarity - Team knows Selenium well
Use CDP When:
- Chrome/Chromium only - Target browser is fixed
- Network interception - Mock APIs, block resources, modify requests
- Performance profiling - Need rendering metrics, memory analysis
- Console monitoring - Capture JS logs, errors, warnings
- Real-time events - React to page events as they happen
- Speed critical - Minimize automation overhead
- AI agents - Need granular control for autonomous browsing
- Advanced debugging - JS breakpoints, DOM inspection
Hybrid Approach (Playwright/Selenium 4)
Modern tools use both:
Playwright:
- WebDriver for cross-browser compat
- CDP for Chrome-specific features
Selenium 4 BiDi:
- WebDriver base protocol
- CDP bridge for advanced features
Our CDP Implementation
We built a production-ready Rust CDP client with two abstraction layers. Source: github.com/dreygur/cdp-protocol
Project Structure
cdp-protocol/
├── src/
│ ├── lib.rs # Public exports
│ ├── client.rs # Low-level CDP client (WebSocket, routing)
│ ├── agent.rs # High-level BrowserAgent (AI-friendly)
│ ├── config.rs # Shared configuration (host, port, viewport, screenshots dir)
│ ├── types.rs # Protocol message types
│ └── error.rs # Error handling
├── examples/
│ ├── basic.rs # Low-level usage
│ ├── agent.rs # High-level AI agent
│ └── industrial.rs # Parallel scraping
└── Cargo.toml
Dependencies
[dependencies]
tokio = { version = "1", features = ["full"] }
tokio-tungstenite = { version = "0.21", features = ["native-tls"] }
futures-util = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
reqwest = { version = "0.11", features = ["json"] }
base64 = "0.21"
tracing = "0.1"
Layer 1: CdpClient (Low-Level)
Direct protocol access with convenience wrappers.
use cdp_protocol::{CdpClient, Config, Result};
#[tokio::main]
async fn main() -> Result<()> {
let cfg = Config::default();
std::fs::create_dir_all(&cfg.screenshots_dir).ok();
let version = CdpClient::get_version(&cfg.host, cfg.port).await?;
println!("Browser: {}", version.browser);
let targets = CdpClient::list_targets(&cfg.host, cfg.port).await?;
for target in &targets {
println!(" - {} [{}]: {}", target.target_type, target.id, target.title);
}
let client = CdpClient::connect_to_page(&cfg.host, cfg.port).await?;
for domain in ["Page", "Runtime", "DOM", "Network"] {
client.enable_domain(domain).await?;
}
client.set_viewport(cfg.viewport_width, cfg.viewport_height, false).await?;
let nav = client.navigate("https://example.com").await?;
println!("Frame ID: {}", nav.frame_id);
tokio::time::sleep(std::time::Duration::from_secs(2)).await;
let title = client.eval("document.title").await?;
println!("Title: {}", title);
let result = client.evaluate("1 + 2 * 3").await?;
println!("Math: {:?}", result.result.value);
let doc = client.get_document().await?;
let h1_id = client.query_selector(doc.node_id, "h1").await?;
if h1_id > 0 {
println!("H1: {}", client.get_outer_html(h1_id).await?);
}
client.full_page_screenshot_to_file(&format!("{}/example.png", cfg.screenshots_dir)).await?;
let cookies = client.get_cookies().await?;
println!("Cookies: {}", cookies.len());
Ok(())
}
Layer 2: BrowserAgent (High-Level)
AI-friendly interface with JSON action dispatch.
use cdp_protocol::{BrowserAgent, BrowserAction, Config, Result};
#[tokio::main]
async fn main() -> Result<()> {
let cfg = Config::default();
std::fs::create_dir_all(&cfg.screenshots_dir).ok();
let agent = BrowserAgent::connect_with_config(&cfg).await?;
agent.execute(BrowserAction::Navigate {
url: "https://example.com".to_string(),
}).await;
agent.execute(BrowserAction::GetTitle).await;
agent.execute_json(r#"{"action": "navigate", "url": "https://rust-lang.org"}"#).await;
agent.execute_json(r#"{"action": "wait", "ms": 2000}"#).await;
agent.execute_json(r#"{"action": "screenshot", "path": "screenshots/rust.png"}"#).await;
Ok(())
}
Action Builder
Fluent API for chaining:
use cdp_protocol::ActionBuilder;
let actions = ActionBuilder::new()
.navigate("https://www.google.com")
.wait(1500)
.fill("input[name='q']", "Rust programming")
.press_key("Enter")
.wait(2000)
.screenshot(Some("search.png"))
.build();
let results = agent.execute_many(actions).await;
Supported Actions
pub enum BrowserAction {
Navigate { url: String },
GoBack,
GoForward,
Reload,
Click { selector: Option<String>, x: Option<f64>, y: Option<f64> },
Type { text: String, selector: Option<String> },
Fill { selector: String, value: String },
Submit { selector: Option<String> },
PressKey { key: String },
GetTitle,
GetUrl,
GetText,
GetContent { selector: Option<String> },
GetLinks,
GetAttributes { selector: String },
Exists { selector: String },
Screenshot { path: Option<String> },
Evaluate { expression: String },
Wait { ms: u64 },
WaitForSelector { selector: String, timeout_ms: u64 },
Scroll { x: f64, y: f64 },
SetViewport { width: i32, height: i32, mobile: bool },
GetMetrics,
}
Production Examples
Form Automation
let search_actions = vec![
BrowserAction::Navigate {
url: "https://duckduckgo.com".to_string(),
},
BrowserAction::Wait { ms: 1500 },
BrowserAction::Fill {
selector: "input[name='q']".to_string(),
value: "Rust programming language".to_string(),
},
BrowserAction::PressKey {
key: "Enter".to_string(),
},
BrowserAction::Wait { ms: 2000 },
BrowserAction::Screenshot {
path: Some("search_results.png".to_string()),
},
BrowserAction::GetTitle,
];
for action in search_actions {
let result = agent.execute(action).await;
if !result.is_success() {
println!("Failed: {:?}", result);
break;
}
}
Data Extraction
let result = agent.execute(BrowserAction::Evaluate {
expression: r#"
(() => {
return {
viewport: {
width: window.innerWidth,
height: window.innerHeight
},
userAgent: navigator.userAgent,
language: navigator.language,
cookiesEnabled: navigator.cookieEnabled,
platform: navigator.platform
};
})()
"#.to_string(),
}).await;
Industrial Scraping (100 Pages Parallel)
use cdp_protocol::{CdpClient, CdpError, Config, Result};
use std::sync::Arc;
use std::time::Instant;
use tokio::sync::Semaphore;
use tokio::task::JoinSet;
const MAX_CONCURRENT: usize = 5;
const URLS: &[&str] = &[
"https://slishee.com",
"https://www.rust-lang.org",
"https://www.google.com",
// ... 97 more
];
const NUM_PAGES: usize = URLS.len();
#[tokio::main]
async fn main() -> Result<()> {
let cfg = Arc::new(Config::default());
std::fs::create_dir_all(&cfg.screenshots_dir).ok();
println!("=== Industrial Scraping Demo ===");
println!("Pages to process: {NUM_PAGES}");
println!("Max concurrent: {MAX_CONCURRENT}\n");
let start = Instant::now();
let semaphore = Arc::new(Semaphore::new(MAX_CONCURRENT));
let mut set = JoinSet::new();
for i in 0..NUM_PAGES {
let url = URLS[i].to_string();
let sem = semaphore.clone();
let cfg = cfg.clone();
set.spawn(async move {
let _permit = sem.acquire().await.unwrap();
(i, process_page(i, &url, &cfg).await)
});
}
let (mut success, mut failed) = (0usize, 0usize);
while let Some(res) = set.join_next().await {
match res {
Ok((i, Ok((title, elapsed)))) => {
println!("[{i:3}] ✓ {title} ({elapsed:.1}s)");
success += 1;
}
Ok((i, Err(e))) => {
println!("[{i:3}] ✗ Error: {e}");
failed += 1;
}
Err(e) => {
println!("[???] ✗ Panic: {e}");
failed += 1;
}
}
}
let total = start.elapsed();
println!(
"\nTotal: {:.2}s | Success: {} | Failed: {} | {:.2} pages/sec",
total.as_secs_f64(), success, failed,
NUM_PAGES as f64 / total.as_secs_f64()
);
Ok(())
}
async fn process_page(id: usize, url: &str, cfg: &Config) -> Result<(String, f64)> {
let start = Instant::now();
let target = CdpClient::create_tab(&cfg.host, cfg.port, None).await?;
let ws_url = target.web_socket_debugger_url
.ok_or_else(|| CdpError::InvalidUrl(format!("no WS URL for tab {id}")))?;
let client = CdpClient::connect(&ws_url).await?;
client.enable_domain("Page").await?;
client.enable_domain("Runtime").await?;
client.set_viewport(cfg.viewport_width, cfg.viewport_height, false).await?;
client.navigate(url).await?;
tokio::time::sleep(std::time::Duration::from_millis(2000)).await;
let title = client.eval("document.title").await.unwrap_or_else(|_| "Unknown".into());
client.full_page_screenshot_to_file(&format!("{}/page_{id:03}.png", cfg.screenshots_dir)).await?;
client.close().await?;
Ok((title, start.elapsed().as_secs_f64()))
}
Output:
=== Industrial Scraping Demo ===
Pages to process: 100
Max concurrent: 5
[ 2] ✓ Google (2.8s)
[ 0] ✓ Slishee - We solve puzzles (3.4s)
[ 1] ✓ Rust Programming Language (3.1s)
...
[ 99] ✓ Planet Scale (4.2s)
Total: 87.3s | Success: 97 | Failed: 3 | 1.15 pages/sec
Final Thoughts
Protocol Selection Matrix
| Requirement | Recommendation |
|---|---|
| Cross-browser testing | WebDriver |
| Chrome-only, max performance | CDP |
| Network mocking | CDP |
| AI agent automation | CDP |
| Existing Selenium codebase | WebDriver (+ BiDi for CDP features) |
| Console/log capture | CDP |
| Performance profiling | CDP |
| Simple E2E tests | Either works |
The Future
WebDriver BiDi is bridging the gap - adding CDP-like capabilities to WebDriver. Selenium 4 already supports it. Eventually, you'll get the best of both worlds through a unified spec.
Until then:
- WebDriver for cross-browser standardization
- CDP for Chrome power-user features
Top comments (4)
The middleman tax framing for WebDriver is real — I measured around 50ms of extra RTT per command through geckodriver in some test loops. But the tricky part your architecture diagram shows is that CDP's direct WebSocket connection is also exactly what anti-bot systems fingerprint: the Runtime.enable call sequence, Page.addScriptToEvaluateOnNewDocument for injection, and certain DOM properties create a detectable pattern. browser-act CLI (npx skills add browser-act/skills --skill browser-act) wraps CDP but patches the fingerprinting layer before any page JS runs — randomizes canvas noise, screen dimensions, and the driver property. Same speed advantage of CDP, but harder to detect than raw Playwright.
Valid point, and it's the gap I deliberately left out of this post since the focus was protocol architecture not evasion.
The fingerprint surface with raw CDP is real:
navigator.webdriver = trueunless patchedwindow.chrome.runtimeproperties that real Chrome exposesPage.addScriptToEvaluateOnNewDocumentinjection itself is detectable by timing, since it fires before any page script but after the domain enable sequenceThe domain enable sequence fingerprint is the subtler one. Anti-bot systems can infer automation from behavioral timing: how fast
Runtime.enable+Page.enablefires relative to first navigation, zero human input latency, etc. Patching properties doesn't fix that.browser-act's approach (patch before page JS runs) is the right layer, same idea as
puppeteer-extra-plugin-stealth. The tradeoff is you're now trusting that abstraction's maintenance cadence against detection updates, which is its own arms race.For our use case (internal tooling, performance profiling, AI agents against controlled environments) raw CDP is fine. For scraping production anti-bot sites, yeah, you need the stealth layer on top. Worth a separate post on the detection vectors.
Really solid breakdown, the comparison table alone is worth bookmarking.
One thing worth adding to the CDP "industrial scraping" section: when you run 10+ sessions through the same browser instance on port 9222, they all share the same process-level fingerprint. Canvas, WebGL renderer, screen resolution, timezone - all identical across tabs. Detection systems that look at behavioral clustering will flag this pretty fast even if each session has its own cookies and headers.
If you're doing anything beyond simple data collection (logged-in sessions, sites with bot detection), you basically need a separate browser process per session, not just separate targets within one instance. Playwright makes this easier with browserType.launch() per context, but it comes at a memory cost.
The BiDi point at the end is interesting - curious how long until it actually closes the gap with CDP in practice. The spec has been "almost ready" for a while now.
Correct on all counts, and it's a limitation baked into the
industrial.rsexample directly.CdpClient::create_tab()creates targets within one browser process, so every session shares the GPU fingerprint, WebGL renderer string, canvas noise signature, and system timezone. For the "100 URLs, grab titles and screenshots" use case that's fine. For anything with session state or bot detection, you're handing detection systems a behavioral cluster on a plate.The fix is what you said: separate
--remote-debugging-portper process, not separate targets per port. Isolation at the OS process level, not the CDP target level. Memory cost is real, roughly 100-200MB per Chrome instance depending on what's loaded, which is why the semaphore pattern in the example needs rethinking if you scale it with process-per-session.One addition:
--user-data-dirper process matters too. Shared profile directories leak state between "isolated" browser instances through cache, local storage baseline, and extension state even when cookies are cleared.