Kiran Kotari

Posted on Jan 19

How I Reduced My Test Suite by 72% Using TestIQ

#python #testing #ai #opensource

The Problem: AI Generates Tests Fast, But Redundantly

AI coding assistants (GitHub Copilot, Cursor, ChatGPT) are game-changers for test generation. But there's a critical issue: they don't check if tests duplicate existing coverage.

Here's what happened when I let AI generate tests for a simple calculator:

# AI cheerfully created all 5 of these:
def test_addition(): 
    assert Calculator().add(2, 3) == 5

def test_add_numbers():
    assert Calculator().add(2, 3) == 5

def test_sum_calculation():
    assert Calculator().add(2, 3) == 5

def test_calculate_sum():
    assert Calculator().add(2, 3) == 5

def test_basic_addition():
    assert Calculator().add(2, 3) == 5

Different function names, slightly different formatting, but identical coverage. Each test executes the exact same code paths.

The Scale of the Problem

I analyzed the complete calculator test suite:

54 tests generated by AI
39 exact duplicates (same coverage)
168 subset duplicates (tests completely covered by others)
45 similar test pairs (significant overlap)

Only 15 unique tests were actually needed.

That's a 72% redundancy rate! 🤯

Why Text Similarity Doesn't Work

My first instinct: "Just compare the test code!"

But these tests look different yet do the same thing:

# Test 1: Simple assertion
def test_addition():
    result = Calculator().add(2, 3)
    assert result == 5

# Test 2: Different style, same coverage
def test_sum():
    calc = Calculator()
    total = calc.add(2, 3)
    self.assertEqual(total, 5)

# Test 3: One-liner, same coverage
def test_calc_add():
    assert Calculator().add(2, 3) == 5

Text similarity would rate these as "different" - but they execute identical code paths.

The Solution: Coverage Analysis

I built TestIQ to analyze execution patterns instead of text:

Collect coverage - Track which lines each test executes
Compare patterns - Find tests with identical/overlapping coverage
Visualize results - Interactive HTML reports with side-by-side comparison

Types of Duplicates Detected

Exact Duplicates - Identical coverage

test_addition → lines [9, 11, 12]
test_add_numbers → lines [9, 11, 12]  ❌ DUPLICATE

Subset Duplicates - One test covered by another

test_basic_add → lines [9, 11]
test_comprehensive_add → lines [9, 11, 15, 20, 25]  ⚠️ SUBSET

Similar Tests - Significant overlap (configurable)

test_addition → lines [9, 11, 15]
test_subtraction → lines [9, 11, 20]  ⚠️ 66% similar

How It Works

1. Install TestIQ

pip install testiq

2. Collect Coverage Data

pytest --testiq-output=coverage.json

TestIQ includes a built-in pytest plugin that tracks per-test coverage. No configuration needed!

3. Analyze for Duplicates

testiq analyze coverage.json --format html --output report.html

Get an interactive HTML report showing:

All duplicate test relationships
Side-by-side coverage comparison
Recommended actions (keep/remove)

4. Compare Tests Visually

Click any test pair to see a split-screen comparison:

Real-World Results

Calculator Example (AI-Generated)

Before: 54 tests, ~10 minute CI run
After: 15 tests, ~3 minute CI run
Savings: 70% faster CI, 39 fewer tests to maintain
Coverage: 100% maintained ✅

Production Django Application

Before: 847 tests
After: 612 tests (removed 235 duplicates)
Savings: 28% faster CI runs
Coverage: No reduction ✅

CI/CD Integration

Prevent duplicate tests from merging:

# .github/workflows/test-quality.yml
name: Test Quality Check

on: [pull_request]

jobs:
  check-duplicates:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install testiq pytest pytest-cov
          pip install -r requirements.txt

      - name: Collect coverage
        run: pytest --testiq-output=coverage.json

      - name: Check for duplicates
        run: |
          testiq analyze coverage.json \
            --quality-gate \
            --max-duplicates 0 \
            --format html \
            --output testiq-report.html

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: testiq-report
          path: testiq-report.html

Builds fail if duplicates are detected. No manual review needed!

Configuration

Fine-tune duplicate detection:

# .testiq.toml
[analysis]
similarity_threshold = 0.3    # 30% overlap = similar
min_coverage_lines = 5        # Ignore trivial tests
max_results = 100             # Limit report size

[reporting]
format = "html"
split_similar_threshold = 0.5  # Split similar tests

[performance]
enable_parallel = true
max_workers = 8

Or use YAML:

# .testiq.yaml
analysis:
  similarity_threshold: 0.3
  min_coverage_lines: 5

reporting:
  format: html

performance:
  enable_parallel: true

See full configuration docs.

Try It Yourself

TestIQ includes a demo with the actual AI-generated calculator example:

pip install testiq
testiq demo

This generates and opens an HTML report showing all 207 redundancies in 54 tests.

Key Takeaways

AI tools generate tests fast - but don't check for coverage duplication
Coverage analysis beats text similarity - catches duplicates that look different
Significant reduction possible - 72% in AI-generated suites, 28% in production
No coverage loss - Remove duplicates while maintaining 100% coverage
CI/CD ready - Block PRs with duplicate tests automatically

What's Next?

TestIQ is open source (MIT) and actively developed:

GitHub: https://github.com/pydevtools/TestIQ
PyPI: https://pypi.org/project/testiq/
Documentation: https://github.com/pydevtools/TestIQ/tree/main/README.md

Contributions welcome! Issues, PRs, and feedback appreciated.

Have you dealt with test suite bloat from AI-generated code? Share your experiences in the comments! 👇

Want to try it? pip install testiq && testiq demo

DEV Community