DEV Community

Marcelo Bairros
Marcelo Bairros

Posted on

Playwright + AI = Stagehand (It's Better Than It Sounds)

From fragile CSS selectors to "click the login button" — how I turbocharged browser automation without throwing Playwright away


Playwright + AI = Stagehand

I've always valued end-to-end testing immensely.

It's the type of test that catches real world issues.

We know data can be messy, infrastructure shifts, but the end-to-end integration test is the test that validates core features are working.

But they add considerable overhead.

Even with a great framework like Playwright, without solid coordination between developers and SDETs those tests are not adding real confidence for the team if they:

  • Are fragile to UI changes
  • Are flaky and unreliable
  • Don't add value for other stakeholders

In this post I'll share a library I found called Stagehand which gives you a code-first solution that makes end-to-end automations more robust and valuable.

Playwright VS Stagehand


What Is Stagehand (And What It's NOT)

Let me clear something up right away: Stagehand is not a Playwright replacement. It's a Playwright enhancement.

In the website they position as:

"We built an OSS alternative to Playwright that's easier to use and lets AI reliably read and write on the web."

But I believe there's a lot more value to gain by using them together. Playwright has a huge ecosystem and wide range of capabilities.

I don't think it's worthwhile to simply throw that away, when both tools can be used together.

See more: https://docs.stagehand.dev/v3/integrations/playwright


The Three AI Powers: Act, Extract, Observe

Stagehand gives you three core AI-powered methods. Each serves a different purpose, and understanding when to use each one is key to getting the most out of the library.

Act: "Just Do the Thing"

act() is the workhorse. You describe what you want to happen in natural language, and Stagehand figures out how to do it.

Traditional Playwright:

// Hope this selector doesn't change...
await page.click('button[data-testid="submit-form"]');
// Or worse, when there's no good selector
await page.click('xpath=//button[contains(@class, "primary") and contains(text(), "Submit")]');
Enter fullscreen mode Exit fullscreen mode

Stagehand:

// This works even if the button's class, ID, or text changes slightly
await stagehand.act("click the submit button");
Enter fullscreen mode Exit fullscreen mode

The magic is in the resilience. If the frontend team changes the button from "Submit" to "Send" or "Confirm", your test doesn't break. The AI understands intent, not just DOM structure.

Extract: "Pull Data Out"

extract() is for when you need to get structured data from a page. Instead of writing complex selectors to scrape text, you describe what you want in plain English.

Traditional Playwright:

// Fragile and breaks if structure changes
const productName = await page.locator('.product-card h2.title').textContent();
const productPrice = await page.locator('.product-card .price-tag span').textContent();
const inStock = await page.locator('.product-card .availability').textContent();
Enter fullscreen mode Exit fullscreen mode

Stagehand:

const productInfo = await stagehand.extract(
  "Extract the product name, price, and availability status",
  z.object({
    name: z.string(),
    price: z.string(),
    inStock: z.boolean(),
  }),
);
console.log(productInfo);
// { name: "Wireless Mouse", price: "$29.99", inStock: true }
Enter fullscreen mode Exit fullscreen mode

The Zod schema integration is particularly nice. You get type-safe extraction with validation built in. If the AI can't find what you're looking for, you get a structured error instead of silent failures.

Observe: "What Can I Do Here?"

observe() is the most interesting one conceptually. It tells you what actions are possible on the current page. This is incredibly useful for building agents that need to dynamically navigate.

const actions = await stagehand.observe();
console.log(actions);
// [
//   { action: "click the login button", selector: "#login-btn" },
//   { action: "enter text in the search field", selector: "input[name='q']" },
//   { action: "click the shopping cart icon", selector: ".cart-icon" },
//   ...
// ]
Enter fullscreen mode Exit fullscreen mode

This is where Stagehand starts feeling less like a testing tool and more like a foundation for AI agents. You can build systems that explore, adapt, and interact with any website without prior knowledge of its structure.


Self-Healing: The Differentiator That Matters

This is the feature that sold me. Stagehand caches its AI decisions, so the second time you run the same action, it's fast. But here's where it gets clever: if the cached selector breaks, Stagehand automatically re-queries the AI to find the new correct element.

This is self-healing in action. Your tests become resilient to UI changes without you lifting a finger.

Compare this to traditional Playwright where:

  1. Test fails
  2. CI goes red
  3. You get paged (or worse, find out hours later)
  4. You investigate what broke
  5. You update the selector
  6. You push a fix
  7. You wait for CI again

With Stagehand, step 1 becomes "test heals itself and passes" — no human intervention required for cosmetic changes.


Agent Mode: The Experimental Part (Honest Take)

Stagehand also includes an agent() method that attempts to accomplish complex goals autonomously. You give it a high-level objective, and it chains together observe → act → observe → act until it succeeds.

Here's the truth: in my testing, results were mixed.

When it works, it's magical. I've seen it successfully navigate complex flows that would take dozens of lines of traditional code. But I've also seen it get confused by popups, take 30 seconds for something a hardcoded test does in 2 seconds, and occasionally just… give up.

My honest assessment:

  • Great for: Exploratory testing, demos, prototyping flows
  • Not ready for: Production CI pipelines where you need predictable timing and costs
  • Getting better: Each Stagehand release improves agent reliability

If you're evaluating Stagehand primarily for agent capabilities, temper your expectations. But if you're using it for the core act/extract/observe methods with occasional agent assists, you'll be happy.

It's worth noting Stagehand supports Computer Use models which is a big differentiator. I think this space has a lot of room to grow.

Learn more: https://docs.stagehand.dev/v3/best-practices/computer-use


100% Playwright Compatibility

This deserves its own section because it's the key to practical adoption.

Stagehand gives you direct access to the underlying Playwright page object. From the Stagehand docs:

import { Stagehand } from "@browserbasehq/stagehand";
import { chromium } from "playwright-core";
import { z } from "zod/v3";

async function main() {
  // Initialize Stagehand
  const stagehand = new Stagehand({
    env: "BROWSERBASE",
    model: "openai/gpt-5",
    verbose: 1,
  });

  await stagehand.init();
  console.log("Stagehand initialized");

  // Connect Playwright to Stagehand's browser
  const browser = await chromium.connectOverCDP({
    wsEndpoint: stagehand.connectURL(),
  });

  const pwContext = browser.contexts()[0];
  const pwPage = pwContext.pages()[0];

  // Navigate and interact
  await pwPage.goto("https://example.com");

  // Use Stagehand's AI methods
  const actions = await stagehand.observe("find the main heading", {
    page: pwPage,
  });

  console.log("Found actions:", actions);

  // Extract data
  const heading = await stagehand.extract(
    "extract the main heading text",
    z.object({ heading: z.string() }),
    { page: pwPage }
  );

  console.log("Heading:", heading);

  // Cleanup
  await stagehand.close();
}

main();
Enter fullscreen mode Exit fullscreen mode

This interoperability means you can:

  • Start with your existing Playwright test suite
  • Gradually add Stagehand for the flaky parts
  • Keep precise Playwright control where it makes sense
  • Never be locked into one approach

You're not rewriting anything. You're enhancing selectively.


TL;DR

Here's the cheat sheet:

✅ Stagehand complements Playwright, doesn't replace it

✅ Three core methods: act (do things), extract (get data), observe (discover options)

✅ Self-healing tests that survive UI changes automatically

✅ 100% Playwright compatible — use both in the same test

⚠️ Agent mode is experimental — promising but still limited from my tests

⚠️ Not a silver bullet — use it where fragile selectors are actually your problem


The Bottom Line

Will I rewrite all my tests in Stagehand? No. Will I reach for it every time I'm writing a selector that I know will break in two weeks? Absolutely.

The best part is there's no commitment. Try it on one flaky test. See if it helps. If it does, use it more. If it doesn't fit your workflow, you've lost nothing — your Playwright code is still there, untouched.

That's the kind of low-risk, high-potential tool I love discovering.

Have you tried Stagehand? I'm curious what results others are getting, especially with the agent mode. Drop a comment or reach out — I'm genuinely interested in comparing notes.


And if you like Stagehand, check out Ledda.ai — We are building the next platform for natural language to web automation. It's the easiest platform to create QA regression suites and synthetic monitoring for production in minutes instead of hours.

Top comments (0)