If you've been using Playwright for a while on a large test suite, you've probably used the --shard
option to parallelize your tests across multiple machines or CI runners. At first, it seems like the perfect solution. But as your test suite grows, you start to notice a frustrating problem: some of your test runners finish in a reasonable amount of time, while others can take significantly longer. Ultimately, you're stuck waiting for the slowest one to complete.
The main reason for this is how Playwright's sharding works: it's static. It splits the test files into even chunks before the tests start and assigns each chunk to a specific machine or CI runner. For example, if you're sharding across 4 runners, it divides your tests into 4 predetermined groups.
# Runner 1 runs the first quarter of tests
npx playwright test --shard=1/4
# Runner 2 runs the second quarter
npx playwright test --shard=2/4
# ...and so on
This approach works perfectly if all your tests take exactly the same amount of time. But in the real world, that never happens. Some tests are short and simple, while others involve complex user flows and take much longer. This static division leads to a significant imbalance:
- Uneven Load Distribution: One runner might get all the "easy" tests and finish early, sitting idle.
- Wasted Resources: While one runner is idle, another is struggling with a long queue of "hard" tests.
- Longer Execution Times: Your total test run time is dictated by your slowest shard, not the average.
A Real-World Example
To make this problem more concrete, let's look at a screenshot from a real GitLab CI pipeline where a suite of 53 tests was distributed across 4 runners.
As you can see, the test distribution is far from balanced:
- Runner 1 (shard 1/4): 46 seconds
- Runner 2 (shard 2/4): 1 minute 45 seconds
- Runner 3 (shard 3/4): 41 seconds
- Runner 4 (shard 4/4): 1 minute 1 second
The total execution time for our test suite is determined by the slowest runner, which is 1 minute and 45 seconds. Meanwhile, Runner 3 finished in just 41 seconds! This means one of our CI runners sat completely idle for over a minute, waiting for the others to catch up. This is an example of static sharding leading to wasted CI/CD resources and longer feedback cycles.
The Solution: Dynamic Test Distribution with Pawdist
I'd like to introduce Pawdist, a high-performance, Rust-based dynamic test distributor I developed to solve this problem. Instead of pre-assigning tests, Pawdist uses a proven Manager-Worker architecture:
- Manager: It scans all your tests and creates a single work queue.
- Workers: They connect to the manager and ask for a test. As soon as a worker finishes a test, it asks for the next one from the queue.
This way, no CI runner ever sits idle. A runner that quickly finishes a short test can immediately grab the next available one, helping to clear the queue faster. This ensures your resources are utilized much more efficiently until the very last test is complete.
Showdown: Pawdist vs. Playwright Sharding
To demonstrate the real-world impact, I created a sample Playwright project with 100 tests. I deliberately designed it to expose the weakness of static sharding:
- Tests 1-50: Intentionally long, taking 15-25 seconds each.
- Tests 51-100: Intentionally short, taking 1-15 seconds each.
// A snippet from the test file to show the imbalance
import { test, expect } from '@playwright/test';
// ...
test('Distribution Test 4', async ({ page }) => {
await page.waitForTimeout(25000); // 25 seconds (long test)
expect(true).toBe(true);
});
// ...
test('Distribution Test 77', async ({ page }) => {
await page.waitForTimeout(1000); // 1 second (short test)
expect(true).toBe(true);
});
I ran this test suite using three different methods, keeping the total parallel count at 4 for this methods.
Run 1: The Baseline (No Sharding)
First, I ran all 100 tests on a single machine using 4 parallel workers. This gives us our best-case-scenario time.
The entire test suite finished in 6 minutes and 3 seconds.
Run 2: The Imbalance of Static Sharding
Next, I split the tests across two runners (simulating two CI machines), each running 2 parallel workers.
Because Playwright splits tests by order, the result created a severe imbalance:
- Shard 1/2 (Tests 1-50): Received all the long tests and took 8 minutes and 36 seconds.
- Shard 2/2 (Tests 51-100): Received all the short tests and finished in just 3 minutes and 32 seconds.
The total execution time ballooned to 8 minutes and 36 seconds, dictated by the slowest shard.
Run 3: The Pawdist Solution (Dynamic Distribution)
Finally, I used Pawdist with two workers, each set to 2 parallel runners.
The logs clearly show the magic of dynamic distribution. Instead of being locked into a predefined group of tests, workers pull from a single, shared queue. When a worker finishes a test (whether short or long) it immediately requests the very next test from the central queue. This ensures that even if one worker is tied up with a long test, another can complete several shorter tests in the meantime, effectively balancing the load in real-time.
The entire test suite finished in 6 minutes and 5 seconds, nearly matching the ideal baseline and eliminating the massive imbalance caused by static sharding.
Method | Total Execution Time |
---|---|
Baseline (No Sharding) | 6m 3s |
Playwright Sharding | 8m 36s |
Pawdist | 6m 5s |
The benchmark speaks for itself. Pawdist provides the scalability of distributed testing without the performance penalties of static sharding.
Ready to Speed Up Your Tests?
If you're looking to optimize your Playwright test execution times and make the most of your CI resources, give Pawdist a try!
By switching to a dynamic distribution model, you can achieve:
- Faster Overall Execution: Your suite finishes when the last test completes, not the slowest shard.
- Optimal Resource Utilization: No more idle CI runners waiting for others to catch up.
- True Dynamic Load Balancing: Tests are assigned on-demand for maximum efficiency.
For detailed installation and usage instructions, you can check out the project's comprehensive README on GitHub.
Top comments (1)
Playwright's static sharding has definitely sped up test execution by splitting tests across different runners, but it doesn’t always nail the performance, mainly because the load isn't always balanced. Some runners finish early while others are stuck with more tests or tougher ones, causing idle time. To fix that, I’ve thrown in dynamic test distribution and figured out shard sizes based on how complex the tests are, so the load’s more even. Also, using tools like Pawdist, which adjust test distribution based on runtime, works way better for larger test suites. It cuts down on idle time and makes the whole CI pipeline run smoother and faster.
Some comments have been hidden by the post's author - find out more