Hassy Veldstra for Artillery.io

Posted on Jan 31, 2022 • Originally published at artillery.io

Load testing with Playwright

#playwright #loadtesting #performance

Load testing non-trivial dynamic web applications is no walk in the park.

If you've ever had to do it, you know. Creating test scripts is slow, error-prone, cumbersome, brittle, and just downright annoying. You have to deal with forms, track down which on-page actions kick off which request to the backend, and if any part of the flow depends on in-page Javascript... forget it.

Traditional load testing tools don't work well

The root cause of it all is simple: a mismatch in the level of abstraction. Load testing tools are designed to work with API endpoints, whereas a page is a better and more natural abstraction when testing web apps.

When testing an API, there's usually a spec available (e.g. an OpenAPI spec). Testing a web app? It's unlikely there's even a list of all API endpoints used on different pages. Different APIs may be called depending on in-page actions. Some calls may be made by in-page JS — and these have to be tracked down and emulated. Hours of fun in Chrome DevTools!

There hasn't been a good solution... up until now. We've been busy at Artillery HQ1 cooking something up.

Launching 10,000 browsers for fun and profit

What if you could run load tests with real browsers, at scale, and just reuse existing end-to-end testing scripts?

A few things to highlight in the demo above:

We're re-using Playwright scripts, which we can also generate with playwright codegen, i.e. we can click around in a browser to create a test script
We get both backend and frontend performance metrics in the report, so we can see how load affects LCP or FCP over time
We're running thousands of browsers with zero infrastructure setup, and they run from your own AWS account - not a hosted cloud service. Did "ridiculous cost efficiency" cross your mind? -- so efficient you could run these tests in CICD multiple times a day. All sorts of nice security and data privacy-related thoughts too perhaps, since this would run in your own VPC. Why, yes, that's why we like cloud-native software that we can run ourselves too.

Are we also possibly talking 10x developer productivity for load testing non-trivial applications? Yes, yes, quite possibly.

How does that work?

How does that work? Playwright provides the browsers. Artillery provides the ability to launch a ridiculous number of them from your own AWS account, with no infra to manage.

Behind the scenes, we expose the Playwright API to Artillery via Artillery's extension interface, which is the same API that powers all of Artillery's many engines. This lets Artillery treat Playwright scripts as a description of what a virtual user scenario is. From that point onwards, as far as Artillery is concerned there is no difference between opening a TCP connection and exchanging Socket.IO messages (if we're testing Socket.IO) or launching a browser and running page-based actions (with the Playwright engine).

Artillery Pro can then pick up from there to launch swarms of Artillery workers running the Playwright engine on AWS ECS or Fargate for testing at scale.

OK, so we can but should we?

Just because we can do something doesn't mean we should, does it? We're running real browsers here, which is obviously going to be resource-intensive. Is it going to scale? Is it worth it?

Does it scale?

Yes, it certainly does. It's 2022, cloud compute is cheap & plentiful, and "just throw more containers at it" can be a perfectly legitimate tactic.

Again, we are launching whole browsers here, which is going to be resource-hungry. Let's look at Fargate with the beefiest configuration per-container we can get: 4 vCPUs and 12GB of memory. Running a container with that spec is us-east-2 for a whole hour is going to cost us a whopping 22 cents. If we run a 1,000 of those containers for an hour, we're looking at $220.

How much load can those 1,000 containers generate? In our unscientific tests running a simple but not-trivial Playwright flow, we could launch one Chromium instance every other second in each container, making around 20 HTTP requests per second each.

Your particular results will of course vary. Memory is going to be the main bottleneck, which will put a ceiling on how many concurrent browser instances we can run in a single container. The number of those will depend on the web app being tested and your Playwright scenario. The slower the app and the longer the test scirpt, the more concurrent Chrome instances you'll be running in each container.

The trade-off is developer productivity vs AWS costs. If you're testing a complex web app, especially on a tight schedule, you can be up and running much faster with a Playwright-based approach. Developer time is usually expensive, and cloud compute isn't - your call.

And hey, you don’t have to go full hog and load test exclusively with browsers, as fun as that may be. We can now run hybrid tests2 and smoke tests with browsers too.

Try it yourself!

The engine is open source with the code available on Github at artilleryio/artillery-engine-playwright.

You can try an end-to-end example locally by following along the README in the e2e example repo we put together - artillery-examples/browser-load-testing-playwright. The official repo includes a Dockerfile for running these tests in CICD as well.

We’d love to hear how you might use this. Hit us up on Twitter or on GitHub discussions. ✌️

DEV Community

Load testing with Playwright

Traditional load testing tools don't work well

Launching 10,000 browsers for fun and profit

A few things to highlight in the demo above:

How does that work?

OK, so we can but should we?

Does it scale?

Try it yourself!

Top comments (0)

Traditional load testing tools don't work well​

Launching 10,000 browsers for fun and profit​

A few things to highlight in the demo above:

How does that work?​

OK, so we can but should we?​

Does it scale?​

Try it yourself!​

Traditional load testing tools don't work well

Launching 10,000 browsers for fun and profit

How does that work?

OK, so we can but should we?

Does it scale?

Try it yourself!