DEV Community

Cover image for Intelligent RAG powered Playwright code reviewer
Saurav Ghosh
Saurav Ghosh

Posted on

Intelligent RAG powered Playwright code reviewer

Your Playwright Tests Are Lying to You (And How I Built a System to Catch It)

Your Playwright tests are passing.

But are they still testing what they were supposed to?

Over time, something subtle happens in most automation suites:

  • Assertions get removed
  • Requirements evolve
  • Tests are modified for quick fixes
  • Coverage silently drops

And yet… everything is still “green”.

This is what I call test drift — and most teams don’t even realize it’s happening.


🚨 The Problem: Passing Tests ≠ Correct Tests

In real-world projects, test suites grow quickly. But maintaining their correctness over time is hard.

Some common issues:

  • A test originally validated 3 things, now only validates 1
  • Jira requirements changed, but tests didn’t
  • Quick fixes removed important assertions
  • No visibility into what was lost over time

The worst part?

There’s no tool today that tells you your test is semantically incorrect.


🤖 Why Existing Tools Don’t Solve This

Tools like GitHub Copilot or Claude are great at:

  • Writing code
  • Suggesting improvements
  • Reviewing syntax

But they don’t:

  • Understand product requirements (from Jira)
  • Track historical changes in tests
  • Detect semantic drift over time
  • Compare intent vs implementation

They work on current code, not context across time and systems.


💡 The Idea: A Requirement-Aware Test Drift Analyzer

So I built a system that answers a deeper question:

“Is this test still validating what it was originally supposed to?”

The system combines:

  • Jira requirements
  • Playwright test code
  • Git history
  • RAG (Retrieval Augmented Generation)

And performs semantic analysis across all of them.


🧠 How It Works (High-Level)

Jira → Requirement Intent  
        ↓
Test Code → Test Intent  
        ↓
Git History → Change Analysis  
        ↓
RAG → Context Retrieval  
        ↓
LLM → Drift + Coverage Analysis  
Enter fullscreen mode Exit fullscreen mode

Step 1: Extract Requirement Intent

From Jira tickets, the system extracts:

  • Expected behaviors
  • Validation scenarios

Step 2: Extract Test Intent

From Playwright tests, it identifies:

  • What the test is actually validating
  • Assertions and flows

Step 3: Analyze Git History

It looks at:

  • What changed in the test over time
  • What assertions were removed
  • Whether coverage degraded

Step 4: Use RAG for Context

The system uses embeddings to:

  • Understand the repository semantically
  • Retrieve relevant historical and related code

Step 5: Detect Drift

Finally, it compares:

  • Requirement vs Test Intent
  • Past vs Current Implementation

📊 Example Output

⚠️ Drift Detected

Test: login.spec.ts

Missing:
- Error message validation for invalid login

History:
- Assertion for dashboard visibility removed 2 commits ago

Coverage: 65%

Suggestion:
- Refer auth/error.spec.ts for correct implementation
Enter fullscreen mode Exit fullscreen mode

🔍 The Most Useful Feature

One thing I found extremely powerful:

When a test is missing something, the system finds another test in the repo that already implements it.

So instead of just saying:

“This is missing…”

It says:

“This is missing, and here’s how it’s already done elsewhere.”

This helps with:

  • Faster fixes
  • Standardization
  • Knowledge reuse

⚙️ Tech Stack

  • Python
  • LangChain
  • ChromaDB (vector search)
  • sentence-transformers / Ollama (local embeddings)
  • Claude API (reasoning)
  • Jira API
  • Git history analysis

🚀 What Makes This Different

This is not just another AI code reviewer.

It:

  • Aligns tests with product requirements
  • Detects semantic drift over time
  • Uses Git history as context
  • Applies RAG for repository-level understanding
  • Suggests existing implementations for missing coverage

🧪 What’s Next

Some ideas I’m exploring:

  • Running Playwright tests to validate runtime behavior
  • Auto-suggesting safe fixes
  • Improving retrieval accuracy
  • Supporting more frameworks

👇 Final Thoughts

We spend a lot of time making tests pass.

But very little time asking:

“Are they still testing the right thing?”

This project is a step toward answering that.


Would love to hear:

  • How do you deal with test drift today?
  • Do your tests stay aligned with requirements over time?

Happy to discuss ideas or improvements!

ai #testing #playwright #rag #python

Top comments (0)