DEV Community

Cover image for One Playwright Selector Trick Nobody Talks About: getByRole
SIÁN Agency
SIÁN Agency

Posted on • Originally published at apify.com

One Playwright Selector Trick Nobody Talks About: getByRole

Everyone reaches for page.locator(".some-class") first. They shouldn't.

getByRole is the most stable selector in Playwright and almost nobody uses it for scraping. They think it's a testing-library thing. It's not. It's a way of asking the page "what is this element semantically" instead of "what classname does the design system happen to use this week."

That distinction is what kept our Facebook video transcript actor running through three Facebook redesigns this past year.

The 3-item checklist

When does getByRole work? When the site is built by people who care about accessibility. Which is: more sites than you think, especially big ones with legal requirements (US government, EU compliance, large e-commerce).

Check before you skip it:

  1. Open the accessibility tree in Chrome DevTools (Elements → Accessibility tab). If your target element shows a role and an accessible name, getByRole will find it.
  2. Buttons and headings are nearly always tagged correctly. Even sloppy sites give you role="button" and proper heading levels because the design system enforced it.
  3. Forms expose label even when the visual design hides it. getByLabel("Email") works on inputs that don't visibly show "Email" anywhere.

The trick

Compare:

// Class-name brittle
const followBtn = page.locator('._a9-_._a9-_2._a9-_8._a9-_z');

// getByRole — survives layout changes
const followBtn = page.getByRole('button', { name: /follow/i });
Enter fullscreen mode Exit fullscreen mode

The first one breaks the day Facebook tweaks their CSS-in-JS hash. The second one keeps working until they remove the button entirely.

Same for headings:

// "Get the post title"
const title = page.getByRole('heading', { level: 1 });
Enter fullscreen mode Exit fullscreen mode

That works on every site that uses <h1> correctly. Which is most of them, because Google penalises sites that don't.

Fig. 1 — Selector stability over a 30-day window. getByRole survives layout churn.

Quick case

The Facebook transcript actor extracts video metadata from public posts. Facebook ships A/B tests constantly — class names change every couple of weeks. Selectors built on _a9-_8 chains broke regularly.

I rewrote the extractor to use getByRole for everything that had a meaningful role:

  • Author name → getByRole('link', { name: /^[\w. ]+$/ }) near the post header.
  • Post text → no role, but [data-ad-comet-preview="message"] (a data- attribute, also stable).
  • Video player → getByRole('article') containing a <video> element.

Before: ~8 selector breakages per quarter. After: 1 in the last 6 months, and that one was a real structural change (Facebook moved to a new post type), not a class rename.

The CTA you didn't ask for

getByRole is now the first thing every new actor we write tries — including the rebuild of the Facebook AI Transcript Extractor. CSS-class selectors are reserved for the cases where the site's accessibility story is genuinely broken (rare in 2026 — most sites have been audited at least once).

So:

Open your scraper. Run a search for page.locator( with a CSS class chain. How many can you replace with getByRole? Drop the count in the comments — I'll bet it's more than half.

Agree, disagree, or have a site where getByRole falls apart? Reply.


Written by **Nova Chen, Automation Dev Advocate at SIÁN Agency. Find more from Nova on dev.to. For custom scraping or automation work, hire SIÁN Agency.

Top comments (0)