<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Saurav Ghosh</title>
    <description>The latest articles on DEV Community by Saurav Ghosh (@automationwithsaurav).</description>
    <link>https://dev.to/automationwithsaurav</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3873999%2F33193ef3-7b49-41cd-90bc-681d1b4c4f65.jpeg</url>
      <title>DEV Community: Saurav Ghosh</title>
      <link>https://dev.to/automationwithsaurav</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/automationwithsaurav"/>
    <language>en</language>
    <item>
      <title>Intelligent RAG powered Playwright code reviewer</title>
      <dc:creator>Saurav Ghosh</dc:creator>
      <pubDate>Tue, 05 May 2026 07:41:22 +0000</pubDate>
      <link>https://dev.to/automationwithsaurav/intelligent-rag-powered-playwright-code-reviewer-2219</link>
      <guid>https://dev.to/automationwithsaurav/intelligent-rag-powered-playwright-code-reviewer-2219</guid>
      <description>&lt;h1&gt;
  
  
  Your Playwright Tests Are Lying to You (And How I Built a System to Catch It)
&lt;/h1&gt;

&lt;p&gt;Your Playwright tests are passing.&lt;/p&gt;

&lt;p&gt;But are they still testing what they were supposed to?&lt;/p&gt;

&lt;p&gt;Over time, something subtle happens in most automation suites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assertions get removed&lt;/li&gt;
&lt;li&gt;Requirements evolve&lt;/li&gt;
&lt;li&gt;Tests are modified for quick fixes&lt;/li&gt;
&lt;li&gt;Coverage silently drops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And yet… everything is still “green”.&lt;/p&gt;

&lt;p&gt;This is what I call &lt;strong&gt;test drift&lt;/strong&gt; — and most teams don’t even realize it’s happening.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚨 The Problem: Passing Tests ≠ Correct Tests
&lt;/h2&gt;

&lt;p&gt;In real-world projects, test suites grow quickly. But maintaining their &lt;em&gt;correctness&lt;/em&gt; over time is hard.&lt;/p&gt;

&lt;p&gt;Some common issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A test originally validated 3 things, now only validates 1&lt;/li&gt;
&lt;li&gt;Jira requirements changed, but tests didn’t&lt;/li&gt;
&lt;li&gt;Quick fixes removed important assertions&lt;/li&gt;
&lt;li&gt;No visibility into what was lost over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The worst part?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There’s no tool today that tells you your test is &lt;em&gt;semantically incorrect&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🤖 Why Existing Tools Don’t Solve This
&lt;/h2&gt;

&lt;p&gt;Tools like GitHub Copilot or Claude are great at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing code&lt;/li&gt;
&lt;li&gt;Suggesting improvements&lt;/li&gt;
&lt;li&gt;Reviewing syntax&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But they don’t:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand product requirements (from Jira)&lt;/li&gt;
&lt;li&gt;Track historical changes in tests&lt;/li&gt;
&lt;li&gt;Detect semantic drift over time&lt;/li&gt;
&lt;li&gt;Compare intent vs implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They work on &lt;em&gt;current code&lt;/em&gt;, not &lt;strong&gt;context across time and systems&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  💡 The Idea: A Requirement-Aware Test Drift Analyzer
&lt;/h2&gt;

&lt;p&gt;So I built a system that answers a deeper question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Is this test still validating what it was originally supposed to?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The system combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Jira requirements&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Playwright test code&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Git history&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RAG (Retrieval Augmented Generation)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And performs &lt;strong&gt;semantic analysis across all of them&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 How It Works (High-Level)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Jira → Requirement Intent  
        ↓
Test Code → Test Intent  
        ↓
Git History → Change Analysis  
        ↓
RAG → Context Retrieval  
        ↓
LLM → Drift + Coverage Analysis  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 1: Extract Requirement Intent
&lt;/h3&gt;

&lt;p&gt;From Jira tickets, the system extracts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expected behaviors&lt;/li&gt;
&lt;li&gt;Validation scenarios&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Step 2: Extract Test Intent
&lt;/h3&gt;

&lt;p&gt;From Playwright tests, it identifies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the test is actually validating&lt;/li&gt;
&lt;li&gt;Assertions and flows&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Step 3: Analyze Git History
&lt;/h3&gt;

&lt;p&gt;It looks at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What changed in the test over time&lt;/li&gt;
&lt;li&gt;What assertions were removed&lt;/li&gt;
&lt;li&gt;Whether coverage degraded&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Step 4: Use RAG for Context
&lt;/h3&gt;

&lt;p&gt;The system uses embeddings to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand the repository semantically&lt;/li&gt;
&lt;li&gt;Retrieve relevant historical and related code&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Step 5: Detect Drift
&lt;/h3&gt;

&lt;p&gt;Finally, it compares:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requirement vs Test Intent&lt;/li&gt;
&lt;li&gt;Past vs Current Implementation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📊 Example Output
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;⚠️ Drift Detected

Test: login.spec.ts

Missing:
- Error message validation for invalid login

History:
- Assertion for dashboard visibility removed 2 commits ago

Coverage: 65%

Suggestion:
- Refer auth/error.spec.ts for correct implementation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🔍 The Most Useful Feature
&lt;/h2&gt;

&lt;p&gt;One thing I found extremely powerful:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When a test is missing something, the system finds another test in the repo that already implements it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So instead of just saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This is missing…”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This is missing, and here’s how it’s already done elsewhere.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This helps with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster fixes&lt;/li&gt;
&lt;li&gt;Standardization&lt;/li&gt;
&lt;li&gt;Knowledge reuse&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚙️ Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;li&gt;LangChain&lt;/li&gt;
&lt;li&gt;ChromaDB (vector search)&lt;/li&gt;
&lt;li&gt;sentence-transformers / Ollama (local embeddings)&lt;/li&gt;
&lt;li&gt;Claude API (reasoning)&lt;/li&gt;
&lt;li&gt;Jira API&lt;/li&gt;
&lt;li&gt;Git history analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚀 What Makes This Different
&lt;/h2&gt;

&lt;p&gt;This is not just another AI code reviewer.&lt;/p&gt;

&lt;p&gt;It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aligns tests with &lt;strong&gt;product requirements&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Detects &lt;strong&gt;semantic drift over time&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Uses &lt;strong&gt;Git history as context&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Applies &lt;strong&gt;RAG for repository-level understanding&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Suggests &lt;strong&gt;existing implementations for missing coverage&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧪 What’s Next
&lt;/h2&gt;

&lt;p&gt;Some ideas I’m exploring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Running Playwright tests to validate runtime behavior&lt;/li&gt;
&lt;li&gt;Auto-suggesting safe fixes&lt;/li&gt;
&lt;li&gt;Improving retrieval accuracy&lt;/li&gt;
&lt;li&gt;Supporting more frameworks&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  👇 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;We spend a lot of time making tests pass.&lt;/p&gt;

&lt;p&gt;But very little time asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Are they still testing the right thing?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This project is a step toward answering that.&lt;/p&gt;




&lt;p&gt;Would love to hear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do you deal with test drift today?&lt;/li&gt;
&lt;li&gt;Do your tests stay aligned with requirements over time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy to discuss ideas or improvements!&lt;/p&gt;

&lt;h1&gt;
  
  
  ai #testing #playwright #rag #python
&lt;/h1&gt;

</description>
    </item>
  </channel>
</rss>
