Everyone reaches for page.locator(".some-class") first. They shouldn't.
getByRole is the most stable selector in Playwright and almost nobody uses it for scraping. They think it's a testing-library thing. It's not. It's a way of asking the page "what is this element semantically" instead of "what classname does the design system happen to use this week."
That distinction is what kept our Facebook video transcript actor running through three Facebook redesigns this past year.
The 3-item checklist
When does getByRole work? When the site is built by people who care about accessibility. Which is: more sites than you think, especially big ones with legal requirements (US government, EU compliance, large e-commerce).
Check before you skip it:
-
Open the accessibility tree in Chrome DevTools (Elements → Accessibility tab). If your target element shows a role and an accessible name,
getByRolewill find it. -
Buttons and headings are nearly always tagged correctly. Even sloppy sites give you
role="button"and proper heading levels because the design system enforced it. -
Forms expose
labeleven when the visual design hides it.getByLabel("Email")works on inputs that don't visibly show "Email" anywhere.
The trick
Compare:
// Class-name brittle
const followBtn = page.locator('._a9-_._a9-_2._a9-_8._a9-_z');
// getByRole — survives layout changes
const followBtn = page.getByRole('button', { name: /follow/i });
The first one breaks the day Facebook tweaks their CSS-in-JS hash. The second one keeps working until they remove the button entirely.
Same for headings:
// "Get the post title"
const title = page.getByRole('heading', { level: 1 });
That works on every site that uses <h1> correctly. Which is most of them, because Google penalises sites that don't.
Quick case
The Facebook transcript actor extracts video metadata from public posts. Facebook ships A/B tests constantly — class names change every couple of weeks. Selectors built on _a9-_8 chains broke regularly.
I rewrote the extractor to use getByRole for everything that had a meaningful role:
- Author name →
getByRole('link', { name: /^[\w. ]+$/ })near the post header. - Post text → no role, but
[data-ad-comet-preview="message"](adata-attribute, also stable). - Video player →
getByRole('article')containing a<video>element.
Before: ~8 selector breakages per quarter. After: 1 in the last 6 months, and that one was a real structural change (Facebook moved to a new post type), not a class rename.
The CTA you didn't ask for
getByRole is now the first thing every new actor we write tries — including the rebuild of the Facebook AI Transcript Extractor. CSS-class selectors are reserved for the cases where the site's accessibility story is genuinely broken (rare in 2026 — most sites have been audited at least once).
So:
Open your scraper. Run a search for page.locator( with a CSS class chain. How many can you replace with getByRole? Drop the count in the comments — I'll bet it's more than half.
Agree, disagree, or have a site where getByRole falls apart? Reply.
Written by **Nova Chen, Automation Dev Advocate at SIÁN Agency. Find more from Nova on dev.to. For custom scraping or automation work, hire SIÁN Agency.

Top comments (0)