DEV Community

artshllaku
artshllaku

Posted on

Your Tests Are Passing. That Doesn't Mean They're Good

I've worked on projects with 85% test coverage where bugs still made it to production every week. The team would look at the coverage report, see green, and feel safe. But the tests were lying.

Here's what was actually happening:

test('should process user data', async () => {
const result = await processUser(mockData);
expect(result).toBeTruthy();
});

This test "covers" the processUser function. Coverage goes up. CI is green. But what does it actually prove? That the function returns something that isn't null, undefined, 0, or false. That's it. The function could return completely wrong data and this test would still pass.

Or this one — common in E2E tests:

test('user signup flow', async ({ page }) => {
await page.goto('/signup');
await page.fill('#email', 'test@test.com');
await page.fill('#password', '12345678');
await page.click('button[type="submit"]');
await page.waitForURL('/dashboard');
});

It walks through the whole signup flow but never checks if the user was actually created, if the right page content loaded, or if the form shows errors for bad input. It checks that a redirect happened. That's all.

Coverage tools can't catch this. Coverage measures which lines of code ran during your test. It doesn't care if you actually verified anything.

What's the difference between a good test and a bad one?

A bad test runs your code and checks almost nothing. A good test runs your code and verifies that specific things happened correctly.

Bad:

expect(response).toBeTruthy();
expect(user).toBeDefined();

Good:

expect(response.status).toBe(200);
expect(user.email).toBe('test@test.com');
expect(user.role).toBe('admin');
expect(user.createdAt).toBeInstanceOf(Date);

Bad tests give you confidence that isn't real. You think your code works because tests pass. Then a customer reports a bug that your test suite should have caught — but didn't, because the assertions were too weak.

The real metrics that matter

Instead of asking "how much of my code is tested?", start asking:

How many assertions does each test have? A test with one weak assertion is barely a test.
What kind of matchers am I using? toBeTruthy() is almost always the wrong choice. toBe(), toEqual(), toContain() actually verify values.
Am I testing edge cases? The happy path working doesn't mean your code handles errors, empty inputs, or boundary values.
Am I testing behavior or implementation? Tests that break when you refactor are testing the wrong thing.

I built a tool to automate this

I got tired of reviewing PRs and manually pointing out weak tests, so I built gapix — a CLI tool that analyzes your test quality automatically.

It parses your test files using AST analysis, looks at every single assertion, categorizes your matchers, and gives each test file a quality score from 0 to 100.

npx @artshllaku/gapix analyze ./src

What it checks:

Assertion density — tests with zero or only one assertion get flagged
Matcher strength — using toBeTruthy() where you could check actual values
Edge case coverage — whether you're testing more than just the happy path
Assertion categories — equality checks, mock verifications, error handling, DOM assertions
Overall quality grade — Poor, Fair, Good, or Excellent for each file

It generates an interactive HTML report you can open in the browser:

npx @artshllaku/gapix show-report

You get a dark-themed dashboard showing every test file, its score, individual findings, and suggestions for improvement.

It works with any framework

Jest, Vitest, Playwright, Cypress - doesn't matter. If your tests are written in TypeScript or JavaScript, gapix can analyze them. It reads the AST, not framework-specific APIs.

Optional AI analysis

If you want deeper analysis, you can connect an AI provider (OpenAI or Ollama) and gapix will give you context-aware suggestions — not generic advice, but specific findings based on your actual code and what your tests are checking.

npx @artshllaku/gapix set-provider openai
npx @artshllaku/gapix set-key sk-your-key
npx @artshllaku/gapix analyze ./src

Without AI, it still runs full rule-based analysis using AST parsing. The AI just adds an extra layer.

Why this matters more than coverage

I've seen this pattern too many times:

  1. Team sets a coverage threshold (80%)
  2. Developers write tests to hit the number
  3. Tests become a checkbox, not a safety net
  4. Bugs get through because tests don't verify real behavior
  5. Team loses trust in the test suite
  6. People stop writing tests or start skipping them

The fix isn't more tests. It's better tests. One well-written test with strong assertions is worth more than ten tests that just call functions and check toBeTruthy().

Get started

`# Run it once without installing
npx @artshllaku/gapix analyze ./src

Or install globally

npm i -g @artshllaku/gapix
gapix analyze ./src
It's free, open source, and takes about 30 seconds to get your first report.`

GitHub: https://github.com/artshllk/gapix

I'd love your feedback. If you try it on your codebase and something doesn't work right or the suggestions aren't helpful, open an issue. I'm actively working on this and want it to be useful for real-world projects.

Top comments (0)