Antoine Dubois

Posted on Jun 8

Testing More Without Slowing Releases: A Practical Memo for Engineering Teams

#testing #devops #webdev

Internal note: we need better coverage, but not at the cost of release speed

The goal is not "more tests" in the abstract. The goal is fewer surprises after merge, fewer mystery failures in CI, and less time spent deciding whether a red build is a real problem or just noise. If testing slows the team down, people start working around it. Once that happens, coverage drops in the places that matter most.

So the question is not whether we should automate more. It is which tests deserve to exist, where they should run, and who owns the signal when they fail.

The core decision: protect release speed, not every possible edge case

A healthy test strategy usually has three jobs:

catch high-risk regressions before code merges,
keep feedback fast enough that developers trust it,
preserve enough traceability that failures are actionable.

If a test does not help with one of those jobs, it is a candidate for removal, deferral, or relocation to a slower layer. That sounds blunt, but it is usually the only way to grow coverage without turning the pipeline into a parking lot.

This is where teams often get stuck. They add more end-to-end checks because they feel safer, then CI gets slower and flakier, and nobody wants to touch the suite. A better move is to be explicit about which tests belong in the merge gate and which ones belong in scheduled or pre-release validation.

Put the hardest failures where they are cheapest to understand

A useful rule is to move fast feedback as close to the change as possible, then reserve heavier validation for the risks that need it. Unit tests and small integration tests should explain failures quickly. If a developer changes a form validation rule, they should not have to wait for a full browser run to learn that a boundary condition broke.

That is why techniques like boundary value analysis and equivalence partitioning still matter. They are simple, but they help you choose fewer test cases that cover meaningful behavior instead of spraying inputs at a feature and hoping the right one fails. If you want a clean refresher on when each method fits, the Boundary Value Analysis vs Equivalence Partitioning article is a good practical reference.

The useful part for teams is not the terminology, it is the habit. Decide where the boundaries are, identify equivalence classes, and test the cases most likely to reveal a defect. That keeps suites smaller and more focused.

Keep the CI gate narrow, trustworthy, and boring

The merge gate should be boring. If it is exciting, it is probably broken.

A reliable CI gate does not need to run every test you own. It needs to run the tests that are fast, deterministic, and directly tied to merge risk. For frontend work, that usually means a small set of component tests, API contract checks, focused integration tests, and a thin layer of browser coverage around the critical user journeys.

The detailed thinking here is worth reading in How to Build a Reliable CI Test Gate for Frontend Releases. The main idea is simple, pick what belongs in CI, keep the gate fast, and make flaky failures someone’s responsibility instead of everyone’s annoyance.

A few practical rules help:

Run only what is needed to protect merge quality

If a test is useful but not merge-critical, it may be better as a post-merge check, nightly suite, or release candidate validation.

Make failures easy to classify

A red build should quickly answer, is this product logic, test setup, data, environment, or infrastructure?

Remove tests that repeat the same signal

If three tests fail for the same root cause, you probably have overlapping coverage, not triple the confidence.

Flaky tests are not just annoying, they distort decision-making

Flaky tests create a bad habit, teams stop treating failures as useful information. That is already a problem without automation. When teams add AI into the debugging loop, the risk can get worse if the underlying test signal is noisy.

The reason is not that AI is magical or bad, it is that an uncertain input can produce an overconfident explanation. If the system sees inconsistent failures, it may propose patterns that sound plausible but are not grounded in repeatable evidence. The result is more guesswork, not less.

The article Why Flaky Tests Get Worse When You Add AI to the Debugging Loop makes this point well, especially around observability, traceability, and ownership. That is the real lesson for teams, before you automate the diagnosis, make the failure traceable.

What this means in practice:

capture logs, screenshots, and request traces for test failures,
tag failures by environment and test ownership,
quarantine flaky tests instead of leaving them in the main gate,
fix nondeterminism at the source, not by retrying forever.

Retries can be useful, but they are a bandage. If a test needs three reruns to pass, it is not a reliable signal yet.

Browser coverage should match the product’s risk, not your appetite for infrastructure

A lot of teams overinvest in browser automation infrastructure because they think the problem is scale, when the real problem is test design and ownership. A big Selenium Grid can still give you a weak signal if your tests are too broad, too slow, or too hard to maintain.

If you are feeling that pressure, the buyer guide on Managed Real Browser Testing Platform Buyer Guide for Teams Outgrowing Selenium Grid is useful because it frames the decision around criteria, not just tooling. The real question is whether the platform reduces maintenance overhead, gives you dependable cross-browser execution, and fits the way your team already ships.

My practical take is this, use real browser testing for flows that genuinely need browser behavior, like rendering, navigation, auth, and critical interactions. Do not expand browser coverage just because it is easy to explain in a status meeting. Expand it when the risk justifies the cost.

QA and engineering need one shared release checklist

One reason testing becomes slow is that release readiness lives in people’s heads. Engineering knows part of the story, QA knows another part, and release managers are left reconciling them late in the cycle.

A shared checklist helps, but only if it is lightweight enough that people actually use it. The Frontend Release Readiness Checklist is a good example of the kind of thing that works when it stays concrete, UI regressions, browser checks, accessibility smoke tests, and merge gates before release.

The important pattern is not the exact checklist items. It is the shared agreement on what "ready" means. Once that is clear, teams waste less time debating whether a change is safe and more time fixing the thing that made it unsafe.

Traceability matters more when teams are hybrid

Some teams still have a clean separation between QA and engineering, but many do not. There may be manual exploratory testing, automated checks, product owners doing acceptance review, and engineers owning infrastructure. That mix can work well, but only if test cases, runs, and requirements stay connected.

Without traceability, coverage becomes a spreadsheet exercise. You can say you have tests, but you cannot explain what they protect or what changed when they failed. The article How to Evaluate a Test Case Management Tool for Hybrid QA Teams Without Losing Traceability is a useful guide if your team is trying to keep that linkage intact without drowning in admin work.

The practical point here is simple, tool choice should reduce coordination cost. If a tool adds process but does not improve visibility, it is probably not helping.

A workable team policy, in plain language

If I had to turn this into a team policy, it would look like this:

Keep the merge gate small

Only include tests that are fast, deterministic, and directly tied to merge risk.

Push broader validation later

Use scheduled runs, staging checks, or release candidate validation for heavier coverage.

Treat flakiness as a product problem

Do not normalize retries. Investigate, quarantine, and fix.

Design tests from risk, not from habit

Use boundary-focused test design, user journey mapping, and known failure modes to decide what to automate.

Preserve ownership and traceability

Every important automated test should have an owner, a purpose, and a clear failure path.

Next steps worth taking this week

If your team wants better coverage without slowing development, do not start by adding more tests. Start by classifying the tests you already have.

Separate merge-critical tests from release-only checks.
Identify the top flaky tests and quarantine them.
Cut duplicate coverage where multiple tests defend the same behavior.
Add or improve logging, screenshots, and trace data for failing browser tests.
Review one critical frontend flow and decide which layer should own each assertion.

That is usually enough to expose the real bottleneck. Most teams do not need a giant automation rewrite. They need cleaner signal, tighter scope, and a clearer agreement on what testing is supposed to protect.

Once that is in place, coverage gets easier to grow, because the team can trust the feedback instead of fighting it.

DEV Community