Playwright Automation in TypeScript: Reusable Patterns for Screenshot, PDF, and Scraping

#playwright #typescript #webdev #automation

Every browser automation project starts the same way. You open a browser, navigate to a URL, and immediately realize you need retry logic, stealth mode, and session persistence before writing any actual automation.

Here are the TypeScript patterns I reach for every time.

Core Browser Factory

export async function withPage<T>(
  config: BrowserConfig,
  fn: (page: Page) => Promise<T>
): Promise<T> {
  const browser = await createBrowser(config);
  const context = await createContext(browser, config);
  const page = await context.newPage();
  try {
    return await fn(page);
  } finally {
    await browser.close();
  }
}

Stealth patch that most tutorials skip:

await context.addInitScript(() => {
  Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
  Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
});

Without this, headless Chrome is trivially detectable.

Screenshot with Element Selection

// Full page
const buffer = await captureScreenshot({ url: 'https://example.com', fullPage: true });

// Specific element
const nav = await captureScreenshot({ url: 'https://example.com', selector: 'nav' });

Supports PNG, JPEG, WebP. Custom viewports. Single function for all cases.

Structured Data Extraction

Define a schema, get typed data back:

const jobs = await extractStructured({
  url: 'https://jobboard.example.com',
  schema: {
    title:   { selector: 'h2.job-title', required: true },
    company: { selector: '.company-name' },
    tags:    { selector: '.skill-badge', multiple: true },
    link:    { selector: 'a', attribute: 'href' },
  },
  listSelector: '.job-card',
});
// [{ title: 'Senior Engineer', company: 'Acme', tags: ['Python', 'AWS'], link: '...' }, ...]

No more writing querySelectorAll loops by hand.

Session Persistence

The biggest time-saver for authenticated scraping:

// First run: log in and save
await loginAndSaveSession({
  loginUrl: 'https://app.example.com/login',
  usernameSelector: '#email',
  passwordSelector: '#password',
  submitSelector: 'button[type="submit"]',
  username: 'you@example.com',
  password: process.env.PASS,
  sessionPath: './session.json',
});

// Every subsequent run: no login needed
await withSavedSession('./session.json', async (context) => {
  const page = await context.newPage();
  await page.goto('https://app.example.com/dashboard');
  // Already authenticated
});

Block Resources for 3-5x Speed

await context.route('**/*', async (route) => {
  if (['image', 'media', 'font'].includes(route.request().resourceType()))
    return route.abort();
  return route.continue();
});

On content-heavy sites this cuts load time from 4s to under 1s.

Page Change Monitoring

Hash-based diffing with persistence:

await startMonitor({
  url: 'https://example.com/product/123',
  selector: '.price, .availability',
  checkIntervalMs: 300_000,
  onChange: async (diff) => {
    console.log(`Changed at ${diff.detectedAt}`);
    console.log(`Was: ${diff.previousContent}`);
    console.log(`Now: ${diff.newContent}`);
    // Send Slack/email alert
  },
});

Good for: price trackers, job monitors, stock alerts, government updates.

PDF Generation

// URL to PDF
const pdf = await generatePdf({ source: 'https://example.com' });

// HTML template to PDF (great for invoices)
const invoicePdf = await generatePdf({
  source: INVOICE_HTML(invoiceData),
  sourceType: 'html',
  displayHeaderFooter: true,
});

The kit includes a complete invoice HTML template you can drop your data into.

These are packaged as a starter kit with 20+ runnable TypeScript scripts, MIT license.

Playwright Browser Automation Starter Kit - $19 one-time, instant download. Includes: screenshot capture, PDF generation, data extraction, form automation, login flows, page monitoring, anti-detection, and a full scraper template with pagination and retry.