Saurav Ghosh

Posted on May 5

Intelligent RAG powered Playwright code reviewer

Your Playwright Tests Are Lying to You (And How I Built a System to Catch It)

Your Playwright tests are passing.

But are they still testing what they were supposed to?

Over time, something subtle happens in most automation suites:

Assertions get removed
Requirements evolve
Tests are modified for quick fixes
Coverage silently drops

And yet… everything is still “green”.

This is what I call test drift — and most teams don’t even realize it’s happening.

🚨 The Problem: Passing Tests ≠ Correct Tests

In real-world projects, test suites grow quickly. But maintaining their correctness over time is hard.

Some common issues:

A test originally validated 3 things, now only validates 1
Jira requirements changed, but tests didn’t
Quick fixes removed important assertions
No visibility into what was lost over time

The worst part?

There’s no tool today that tells you your test is semantically incorrect.

🤖 Why Existing Tools Don’t Solve This

Tools like GitHub Copilot or Claude are great at:

Writing code
Suggesting improvements
Reviewing syntax

But they don’t:

Understand product requirements (from Jira)
Track historical changes in tests
Detect semantic drift over time
Compare intent vs implementation

They work on current code, not context across time and systems.

💡 The Idea: A Requirement-Aware Test Drift Analyzer

So I built a system that answers a deeper question:

“Is this test still validating what it was originally supposed to?”

The system combines:

Jira requirements
Playwright test code
Git history
RAG (Retrieval Augmented Generation)

And performs semantic analysis across all of them.

🧠 How It Works (High-Level)

Jira → Requirement Intent  
        ↓
Test Code → Test Intent  
        ↓
Git History → Change Analysis  
        ↓
RAG → Context Retrieval  
        ↓
LLM → Drift + Coverage Analysis

Step 1: Extract Requirement Intent

From Jira tickets, the system extracts:

Expected behaviors
Validation scenarios

Step 2: Extract Test Intent

From Playwright tests, it identifies:

What the test is actually validating
Assertions and flows

Step 3: Analyze Git History

It looks at:

What changed in the test over time
What assertions were removed
Whether coverage degraded

Step 4: Use RAG for Context

The system uses embeddings to:

Understand the repository semantically
Retrieve relevant historical and related code

Step 5: Detect Drift

Finally, it compares:

Requirement vs Test Intent
Past vs Current Implementation

📊 Example Output

⚠️ Drift Detected

Test: login.spec.ts

Missing:
- Error message validation for invalid login

History:
- Assertion for dashboard visibility removed 2 commits ago

Coverage: 65%

Suggestion:
- Refer auth/error.spec.ts for correct implementation

🔍 The Most Useful Feature

One thing I found extremely powerful:

When a test is missing something, the system finds another test in the repo that already implements it.

So instead of just saying:

“This is missing…”

It says:

“This is missing, and here’s how it’s already done elsewhere.”

This helps with:

Faster fixes
Standardization
Knowledge reuse

⚙️ Tech Stack

Python
LangChain
ChromaDB (vector search)
sentence-transformers / Ollama (local embeddings)
Claude API (reasoning)
Jira API
Git history analysis

🚀 What Makes This Different

This is not just another AI code reviewer.

It:

Aligns tests with product requirements
Detects semantic drift over time
Uses Git history as context
Applies RAG for repository-level understanding
Suggests existing implementations for missing coverage

🧪 What’s Next

Some ideas I’m exploring:

Running Playwright tests to validate runtime behavior
Auto-suggesting safe fixes
Improving retrieval accuracy
Supporting more frameworks

👇 Final Thoughts

We spend a lot of time making tests pass.

But very little time asking:

“Are they still testing the right thing?”

This project is a step toward answering that.

Would love to hear:

How do you deal with test drift today?
Do your tests stay aligned with requirements over time?

Happy to discuss ideas or improvements!

ai #testing #playwright #rag #python

DEV Community