Erik for Allscreenshots

Posted on Jan 6

Day 6: The Core Engine - Getting Playwright Running

#playwright #puppeteer #programming #webdev

Day 6 of 30. Today we capture our first screenshot. Finally, actual product work.

We've spent the last five days on setup, planning, and infrastructure. While this is absolutely important work, it might feel a bit like time we could
have spent this time better building the product, which, in our opinion, is a trap, since it helps to knows what you're building.

But, at the same time, all of this planning doesn't deliver our product... So, let's get at it, and today we are writing the code that actually does the thing we're building: take a URL, render it in a browser and capture a screenshot! It won't be the full product, but it's a start.

Sounds simple right? Perhaps, but when something sounds too simple, it often a case that we don't understand the problem we're solving well enough yet.
Let's dive in to increase our understanding of the problem we're solving!

Why Playwright over Puppeteer?

There are several screenshotting solutions, such as Playwright, Puppeteer, Selenium + Webdriver, Chrome Devtools Protocol (CDP), and tools like PhantomJS, SlimerJS, and CasperJS. Let's dive in.

All of the aforementioned tools are headless browser automation tools. Some of them, like PhantomJS, SlimerJS, and CasperJS, are unfortunately no longer maintained, which makes the choice more limited, but perhaps a little easier.

Based on the capabilities of the tools, we picked Playwright. We looked at the following criteria to make a well-informed decision:

Better auto-waiting. Playwright automatically waits for elements to be ready before interacting. Puppeteer requires more manual wait logic. For screenshots, we especially need the "wait until the page looks done", and Playwright handles this well.

Multi-browser from the start. Playwright supports Chromium, Firefox, and WebKit out of the box while Puppeteer is Chrome-focused. While we're starting with Chromium only, having the option to support more browsers in the future is a nice feature.

Cleaner device emulation. Playwright has built-in device profiles for phones and tablets. playwright.devices['iPhone 13'] gives us the right viewport, user agent, and pixel density. This results in a lower amount of required config and less room for mistakes.

Active development. Microsoft backs Playwright, and it's evolving faster than Puppeteer. It's not that Puppeteer is abandoned, but Playwright feels like the future.

Setting up Playwright in Kotlin

Here's where things get interesting. Playwright has official bindings for JavaScript, Python, Java, and C#. We're using Kotlin on the JVM, so the Java bindings work well.

First, we need to add the dependency:

// build.gradle.kts
dependencies {
    implementation("com.microsoft.playwright:playwright:1.50.0")
}

Then install the browsers (this runs once during Docker build):

./gradlew playwright --install

Our first screenshot service

Here's the core class that captures screenshots:

@Service
class ScreenshotService {

    private val playwright = Playwright.create()
    private val browser = playwright.chromium().launch(
        LaunchOptions().setHeadless(true)
    )

    fun capture(request: ScreenshotRequest): ByteArray {
        val contextOptions = Browser.NewContextOptions()

        // Apply device emulation if requested
        when (request.device) {
            "mobile" -> {
                val device = playwright.devices()["iPhone 13"]
                contextOptions
                    .setViewportSize(device.viewport.width, device.viewport.height)
                    .setUserAgent(device.userAgent)
                    .setDeviceScaleFactor(device.deviceScaleFactor)
            }
            "tablet" -> {
                val device = playwright.devices()["iPad Pro 11"]
                contextOptions
                    .setViewportSize(device.viewport.width, device.viewport.height)
                    .setUserAgent(device.userAgent)
                    .setDeviceScaleFactor(device.deviceScaleFactor)
            }
            else -> {
                contextOptions.setViewportSize(1920, 1080)
            }
        }

        val context = browser.newContext(contextOptions)
        val page = context.newPage()

        try {
            page.navigate(request.url, Page.NavigateOptions()
                .setTimeout(30000.0)
                .setWaitUntil(WaitUntilState.NETWORKIDLE)
            )

            return page.screenshot(Page.ScreenshotOptions()
                .setFullPage(request.fullPage)
                .setType(ScreenshotType.PNG)
            )
        } finally {
            context.close()
        }
    }
}

This is the naive version. It works, but we'll need to improve it. More on that below.

The first screenshot

After getting everything wired up, we hit our local endpoint:

curl -X POST http://localhost:8080/api/v1/screenshots \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Fourteen seconds later: a screenshot. It worked!

But 14 seconds is way too slow. Time to figure out why. (Didn't we say it sounds simple?)

What's making it slow?

We added some timing logs:

Browser context creation: 245ms
Page creation: 12ms
Navigation: 11,847ms
Screenshot capture: 1,203ms
Context cleanup: 89ms

It's pretty clear: navigation is the killer. WaitUntilState.NETWORKIDLE waits until there are no network requests for 500ms. On a site with analytics, chat widgets, and lazy-loaded images, that takes a very long time indeed.

Making it faster

1. Smarter wait strategies

Instead of waiting for network idle, we can wait for specific conditions:

// Wait for DOM to be ready, not all resources
page.navigate(request.url, Page.NavigateOptions()
    .setTimeout(30000.0)
    .setWaitUntil(WaitUntilState.DOMCONTENTLOADED)
)

// Then wait a bit for JavaScript rendering
page.waitForTimeout(1000.0)

A simple change like this brought us from 14 seconds to 3-4 seconds on the same URL. That's about 3x faster!

2. Block unnecessary resources

To speed things even more, let's see if we can block some resources. We don't need analytics, ads, or tracking pixels for a proper rendering of screenshots, so eliminating those calls could speed things up:

page.route("**/*") { route ->
    val resourceType = route.request().resourceType()
    if (resourceType in listOf("image", "stylesheet", "font", "media")) {
        // Allow these - they affect appearance
        route.resume()
    } else if (route.request().url().contains("analytics") ||
               route.request().url().contains("tracking") ||
               route.request().url().contains("ads")) {
        route.abort()
    } else {
        route.resume()
    }
}

Wait - we can't block images and stylesheets if we want accurate screenshots. Let's refine:

page.route("**/*") { route ->
    val url = route.request().url()
    val blockedPatterns = listOf(
        "google-analytics.com",
        "googletagmanager.com",
        "facebook.net",
        "hotjar.com",
        "intercom.io",
        "segment.com"
    )

    if (blockedPatterns.any { url.contains(it) }) {
        route.abort()
    } else {
        route.resume()
    }
}

This shaved off another 500ms on average. We need to extend the list of blocked patterns, make it configurable, have multi-language versions of it, and much more, but for a proof of concept, this will do for now.

3. Browser reuse

There are more areas where we can optimise things. For example, creating a new browser for each request is wasteful.
We're now reusing the browser instance and only creating new contexts:

@Service
class ScreenshotService {
    private val playwright = Playwright.create()
    private val browser = playwright.chromium().launch(
        LaunchOptions()
            .setHeadless(true)
            .setArgs(listOf(
                "--disable-gpu",
                "--disable-dev-shm-usage",
                "--no-sandbox"
            ))
    )

    // Browser stays warm, contexts are per-request
    fun capture(request: ScreenshotRequest): ByteArray {
        val context = browser.newContext(/* options */)
        try {
            // ... capture logic
        } finally {
            context.close()  // Clean up context, keep browser
        }
    }
}

The browser startup is ~500ms. Context creation is ~50ms. This is a big difference when handling multiple requests.

Results after optimization

With the same URL, the same machine, we got the following results:

Before: 14 seconds
After: 2.1 seconds

That's a 6x improvement. The end result isn't blazing fast yet, but it's acceptable, and we'll work on improving this even further.
The loading of complex sites still take 3-5 seconds, but simple sites are under 2 seconds. This is no issue at all for async calls, and could be acceptable in some situations for sync calls.

Edge cases we discovered

Sites that block headless browsers. Some sites detect and block automated browsers. They see no mouse movements, no scroll events, and they see headless Chrome signatures. We'll need to handle these situations gracefully.

Infinite scroll pages. A "Full page" screenshot on an infinite scroll page is quite undefined behavior, which means we'll need to cap the maximum page height.

Cookie consent popups. A lot of sites show a GDPR popup which makes them end up in our screenshots, so this is yet another situation we need to handle.

Sites that require login. Our API won't help here (yet).

Really slow sites. Some sites take 30+ seconds to load. Our timeout handles this, but we need good error messages.

Didn't we mention that we thought making a screenshot would be easy? We haven't even covered mobile devices, dark mode, animations, pagination, or any of the other edge cases we've discovered. Stay tuned for how we tackle these!

What we accomplished today

Got Playwright working in Kotlin/Spring Boot
Captured our first programmatic screenshot
Reduced capture time from 14s to ~2s
Identified major edge cases to handle
Learned a lot about how the web actually behaves

This is the core of our product. Everything else is wrapping paper around this engine.

Tomorrow: week 1 retrospective

Day 7 marks the end of week one. We'll step back and assess: what did we get done, what didn't work, and are we on track for a paying customer by day 15?

Book of the day

Web Scraping with Python by Ryan Mitchell

Okay, we're not using Python, and this book is about scraping, not screenshots. But sometimes it's good to look beyond your own stack, and the principles between the two overlap significantly.

Mitchell covers how websites work under the hood, how to handle JavaScript-rendered content, dealing with anti-bot measures, and the ethics of automated web access. Chapter 10 on JavaScript execution is particularly relevant - it explains why "waiting for the page to load" is so complicated in the modern web.

Even if you're not doing web scraping, understanding how browsers render pages helps you build better web tools. This book is a practical, readable introduction to that world, and a must read in our book (ha!) if you want to dive in into the world of web automation.

Current stats:

Hours spent: 14 (previous 10 + 4 today)
Lines of code: ~450
Monthly hosting cost: $5.50
Revenue: $0
Paying customers: 0
First screenshot captured: ✓
Capture time: ~2-3 seconds
Edge cases identified: 5

DEV Community