Your Playwright Tests Are Lying to You (And How I Built a System to Catch It)
Your Playwright tests are passing.
But are they still testing what they were supposed to?
Over time, something subtle happens in most automation suites:
- Assertions get removed
- Requirements evolve
- Tests are modified for quick fixes
- Coverage silently drops
And yet… everything is still “green”.
This is what I call test drift — and most teams don’t even realize it’s happening.
🚨 The Problem: Passing Tests ≠ Correct Tests
In real-world projects, test suites grow quickly. But maintaining their correctness over time is hard.
Some common issues:
- A test originally validated 3 things, now only validates 1
- Jira requirements changed, but tests didn’t
- Quick fixes removed important assertions
- No visibility into what was lost over time
The worst part?
There’s no tool today that tells you your test is semantically incorrect.
🤖 Why Existing Tools Don’t Solve This
Tools like GitHub Copilot or Claude are great at:
- Writing code
- Suggesting improvements
- Reviewing syntax
But they don’t:
- Understand product requirements (from Jira)
- Track historical changes in tests
- Detect semantic drift over time
- Compare intent vs implementation
They work on current code, not context across time and systems.
💡 The Idea: A Requirement-Aware Test Drift Analyzer
So I built a system that answers a deeper question:
“Is this test still validating what it was originally supposed to?”
The system combines:
- Jira requirements
- Playwright test code
- Git history
- RAG (Retrieval Augmented Generation)
And performs semantic analysis across all of them.
🧠 How It Works (High-Level)
Jira → Requirement Intent
↓
Test Code → Test Intent
↓
Git History → Change Analysis
↓
RAG → Context Retrieval
↓
LLM → Drift + Coverage Analysis
Step 1: Extract Requirement Intent
From Jira tickets, the system extracts:
- Expected behaviors
- Validation scenarios
Step 2: Extract Test Intent
From Playwright tests, it identifies:
- What the test is actually validating
- Assertions and flows
Step 3: Analyze Git History
It looks at:
- What changed in the test over time
- What assertions were removed
- Whether coverage degraded
Step 4: Use RAG for Context
The system uses embeddings to:
- Understand the repository semantically
- Retrieve relevant historical and related code
Step 5: Detect Drift
Finally, it compares:
- Requirement vs Test Intent
- Past vs Current Implementation
📊 Example Output
⚠️ Drift Detected
Test: login.spec.ts
Missing:
- Error message validation for invalid login
History:
- Assertion for dashboard visibility removed 2 commits ago
Coverage: 65%
Suggestion:
- Refer auth/error.spec.ts for correct implementation
🔍 The Most Useful Feature
One thing I found extremely powerful:
When a test is missing something, the system finds another test in the repo that already implements it.
So instead of just saying:
“This is missing…”
It says:
“This is missing, and here’s how it’s already done elsewhere.”
This helps with:
- Faster fixes
- Standardization
- Knowledge reuse
⚙️ Tech Stack
- Python
- LangChain
- ChromaDB (vector search)
- sentence-transformers / Ollama (local embeddings)
- Claude API (reasoning)
- Jira API
- Git history analysis
🚀 What Makes This Different
This is not just another AI code reviewer.
It:
- Aligns tests with product requirements
- Detects semantic drift over time
- Uses Git history as context
- Applies RAG for repository-level understanding
- Suggests existing implementations for missing coverage
🧪 What’s Next
Some ideas I’m exploring:
- Running Playwright tests to validate runtime behavior
- Auto-suggesting safe fixes
- Improving retrieval accuracy
- Supporting more frameworks
👇 Final Thoughts
We spend a lot of time making tests pass.
But very little time asking:
“Are they still testing the right thing?”
This project is a step toward answering that.
Would love to hear:
- How do you deal with test drift today?
- Do your tests stay aligned with requirements over time?
Happy to discuss ideas or improvements!
Top comments (0)