kol kol

Posted on Jun 9

Our Test Suite Passed 100% — Then Users Found 14 Bugs in One Day

#codcompass #ai #knowledgebase #webdev

We had 847 tests. Green checkmarks across the board. 100% coverage on our critical paths. I was proud of that dashboard.

Then a user reported that our checkout was double-charging on Safari. Another said the password reset emails weren't arriving. Within 24 hours we had 14 confirmed bugs — and our CI pipeline was still proudly green.

That's when I realized: 100% code coverage is a vanity metric that makes you feel safe while your users burn.

The Illusion of Coverage

Here's what our test suite was great at:

Testing individual functions in isolation
Verifying happy paths with clean inputs
Catching regressions in pure utility functions

Here's what it completely missed:

Browser-specific behavior — Safari's date parsing is different from Chrome's. Our test runner used Node.js. No browser, no Safari.
Race conditions — Two API calls firing simultaneously? Our mocked fetch resolved instantly. In production, timing matters.
Integration gaps — Each module had tests. The connections between modules did not.
Real-world data — Our fixtures were clean. User data is never clean.

The Bug That Started It All

A user in Japan reported being charged twice for a single purchase. We couldn't reproduce it locally. Our payment integration tests passed every time.

The root cause: a double-submit button on slow networks. Our mock API responded in 12ms. Real networks: 800ms. That gap was enough for impatient fingers to click twice.

The fix was 3 lines of code:

const [isSubmitting, setIsSubmitting] = useState(false);
// Button: disabled={isSubmitting}

Three lines. But the test suite — our beautiful 847-test suite — had zero tests for this scenario because nobody wrote a test for "user clicks button twice."

The 14-Bug Autopsy

After that incident, we categorized all 14 bugs:

Bug Category	Count	Tests Should've Caught It
Browser compatibility	4	❌ No cross-browser tests
Race conditions	3	❌ Mocks too fast
Edge-case user input	3	❌ Fixtures too clean
Third-party API changes	2	❌ No contract testing
Time zone bugs	2	❌ All tests ran in UTC

14 bugs. Zero caught by CI. The problem wasn't that we didn't have enough tests — we had the wrong kind of tests.

What We Changed

1. Added Integration Tests at Module Boundaries

Unit tests check the bricks. Integration tests check the mortar. We added tests specifically for the connections between services — where most real bugs hide.

2. Started Running Tests in Real Browsers

We added Playwright for critical user flows: checkout, auth, search. These run against a real Chrome and Firefox instance. Safari is next.

3. Mock Network Latency

Instead of instant mock responses, we randomized delays between 100ms and 2000ms. This surfaced race conditions we never knew existed.

4. Contract Testing for APIs

We used Pact to verify that our frontend's expectations of backend APIs actually match reality. Two bugs disappeared the day we added this.

5. Time Zone Roulette

We randomize the test runner's timezone. Half our date bugs appeared within the first week.

The New Philosophy

Coverage tells you what code runs. It doesn't tell you what breaks.

Now we track different metrics:

Bug escape rate — bugs found by users vs. caught in CI
Mean time to detection — how fast our tests find regressions
Integration test coverage — not line coverage, but scenario coverage

Our total test count went down (we deleted 200+ redundant unit tests). Our bug escape rate went down 80%.

The dashboard looks less impressive. The product works better.

Have you been burned by "green tests, broken production"? What testing gaps surprised you most? I'd love to hear your war stories in the comments.

DEV Community