<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Raghav Chamadiya</title>
    <description>The latest articles on DEV Community by Raghav Chamadiya (@raghav_builds).</description>
    <link>https://dev.to/raghav_builds</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1769627%2Ff0561aef-260b-447c-a551-33ab21f4c895.png</url>
      <title>DEV Community: Raghav Chamadiya</title>
      <link>https://dev.to/raghav_builds</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raghav_builds"/>
    <language>en</language>
    <item>
      <title>I tested whether a code health score actually predicts bugs. Here's the benchmark</title>
      <dc:creator>Raghav Chamadiya</dc:creator>
      <pubDate>Sat, 06 Jun 2026 07:57:41 +0000</pubDate>
      <link>https://dev.to/raghav_builds/i-tested-whether-a-code-health-score-actually-predicts-bugs-heres-the-benchmark-10dl</link>
      <guid>https://dev.to/raghav_builds/i-tested-whether-a-code-health-score-actually-predicts-bugs-heres-the-benchmark-10dl</guid>
      <description>&lt;p&gt;Most code health scores are vibes. A number goes up, a number goes down, and nobody checks whether the files it flags are the files that actually break later. I wanted to know if the score I built does better than that, so I ran it against a defect corpus and put it head to head with the leading commercial code-health tool.&lt;/p&gt;

&lt;p&gt;On the same 2,770 files across 9 languages, scored at the same leakage-free commit against the same defect labels, the score surfaces 2.3x the defects under a fixed review budget.&lt;/p&gt;

&lt;p&gt;This post is how that works, and the four other layers sitting next to it in &lt;a href="https://github.com/repowise-dev/repowise" rel="noopener noreferrer"&gt;repowise&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the score is
&lt;/h2&gt;

&lt;p&gt;Every file gets a 1 to 10 score from 25 deterministic biomarkers. McCabe complexity, deep nesting, brain methods, class cohesion (LCOM4), god classes, native Rabin-Karp clone detection, untested hotspots, function-level churn, code-age volatility, ownership dispersion, change entropy, co-change scatter, prior-defect history, test-quality smells, and more.&lt;/p&gt;

&lt;p&gt;No LLM calls. No cloud. No new runtime dependency. It is pure Python over tree-sitter and git data, and it finishes in under 30 seconds on a 3,000-file repo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;repowise health                       &lt;span class="c"&gt;# KPIs + lowest-scoring files&lt;/span&gt;
repowise health &lt;span class="nt"&gt;--coverage&lt;/span&gt; cov.lcov   &lt;span class="c"&gt;# ingest LCOV/Cobertura/Clover&lt;/span&gt;
repowise health &lt;span class="nt"&gt;--refactoring-targets&lt;/span&gt; &lt;span class="c"&gt;# ranked by impact / effort&lt;/span&gt;
repowise health &lt;span class="nt"&gt;--trend&lt;/span&gt;               &lt;span class="c"&gt;# snapshots + declining alerts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The biomarker weights are calibrated against a real defect corpus instead of hand-tuned. Only the learned constants ship. The runtime itself stays fully deterministic, so the same file produces the same score every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does the score find bugs
&lt;/h2&gt;

&lt;p&gt;The validation setup avoids the usual leakage trap. Health scores are collected at a historical commit (call it T0). Bug-fixing commits are counted over the following 6 months. Then the two get correlated. The score never sees the future it is being graded on.&lt;/p&gt;

&lt;p&gt;Across 21 open-source repositories spanning all 9 Full-tier languages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-project mean ROC AUC of 0.74 [95% CI 0.68 to 0.79] at identifying files that go on to receive bug fixes. Up to 0.90 on individual repos.&lt;/li&gt;
&lt;li&gt;It survives controlling for file size (partial Spearman rho = -0.16). It is not just flagging the big files.&lt;/li&gt;
&lt;li&gt;It out-discriminates recent churn by +0.10 AUC and prior-defect history by +0.12 AUC, DeLong p &amp;lt; 1e-9.&lt;/li&gt;
&lt;li&gt;It holds on an external published dataset it has never seen (PROMISE/jEdit CK-metrics: AUC 0.76 to 0.78, within about 0.03 of that dataset's own tuned model).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Head to head
&lt;/h2&gt;

&lt;p&gt;Same files, same commit, same labels, paired tests against the leading commercial code-health tool:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;repowise&lt;/th&gt;
&lt;th&gt;Commercial tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Recall @ 20%-of-lines budget&lt;/td&gt;
&lt;td&gt;0.173&lt;/td&gt;
&lt;td&gt;0.074&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effort-aware ranking (Popt)&lt;/td&gt;
&lt;td&gt;0.607&lt;/td&gt;
&lt;td&gt;0.462&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Defect density, size-normalized (Alert:Healthy)&lt;/td&gt;
&lt;td&gt;2.18x&lt;/td&gt;
&lt;td&gt;0.56x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discrimination (ROC AUC)&lt;/td&gt;
&lt;td&gt;0.731&lt;/td&gt;
&lt;td&gt;0.705&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Ranking files by repowise health surfaces 2.3x the defects under a fixed review budget. Popt delta +0.144, recall delta +0.098, density delta p = 0.003, all paired and significant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/repowise-dev/repowise-bench/blob/master/health-defect/COMPARISON_REPORT.md" rel="noopener noreferrer"&gt;Full methodology and CIs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four other layers
&lt;/h2&gt;

&lt;p&gt;Code health is one of five. The point of the other four is that your AI coding agent reads files but knows nothing about how the codebase got there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graph.&lt;/strong&gt; A real tree-sitter dependency graph across 15 languages. File and symbol nodes, 3-tier call resolution, Leiden communities, PageRank, framework-aware route-to-handler edges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Git.&lt;/strong&gt; Behavioral signals static analysis cannot see. Hotspots (churn times complexity), ownership percentages, co-change pairs that expose hidden coupling, bus factor, reviewer suggestions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docs.&lt;/strong&gt; An LLM-generated wiki per module and file, rebuilt incrementally on every commit with freshness and confidence scoring, searchable via hybrid RAG (full-text plus vector through reciprocal rank fusion).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decisions.&lt;/strong&gt; Architectural decisions mined from 8 sources, evidence-backed, linked to graph nodes, connected by supersedes / refines / conflicts_with edges. This is the layer most tools capture nowhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agent angle
&lt;/h2&gt;

&lt;p&gt;All five layers expose through nine MCP tools shaped around tasks, not data entities. You pass multiple files or symbols in one call and get complete context back, instead of chaining 30 greps and reads.&lt;/p&gt;

&lt;p&gt;Paired SWE-QA runs on real repos, same model and harness, with and without the MCP tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;70% fewer tool calls&lt;/li&gt;
&lt;li&gt;89% fewer file reads&lt;/li&gt;
&lt;li&gt;36% lower cost per query&lt;/li&gt;
&lt;li&gt;answer quality at parity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feeding an agent a commit through &lt;code&gt;get_context&lt;/code&gt; costs 2,391 tokens versus 64,039 for the raw changed files. About 27x fewer.&lt;/p&gt;

&lt;p&gt;There is also &lt;code&gt;repowise distill&lt;/code&gt;, which compresses noisy command output before the agent reads it, errors first, every omission reversible:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Raw to distilled tokens&lt;/th&gt;
&lt;th&gt;Saved&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;pytest -q&lt;/code&gt; (11 failures)&lt;/td&gt;
&lt;td&gt;3,374 to 1,317&lt;/td&gt;
&lt;td&gt;61%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git log -50&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3,064 to 331&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;git diff&lt;/code&gt; (30 commits)&lt;/td&gt;
&lt;td&gt;62,833 to 8,635&lt;/td&gt;
&lt;td&gt;86%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;repowise
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
repowise init        &lt;span class="c"&gt;# builds all five layers&lt;/span&gt;
repowise serve       &lt;span class="c"&gt;# MCP server + local dashboard&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The graph, git, dead-code, and health layers build in minutes with zero LLM calls. Run &lt;code&gt;repowise init --index-only&lt;/code&gt; for a queryable index almost immediately. After that, every commit-triggered update takes under 30 seconds and only regenerates the pages your change touched.&lt;/p&gt;

&lt;p&gt;100% local, bring your own API key, AGPL-3.0.&lt;/p&gt;

&lt;p&gt;Repo, benchmarks, and live demo: &lt;a href="https://github.com/repowise-dev/repowise" rel="noopener noreferrer"&gt;github.com/repowise-dev/repowise&lt;/a&gt; and &lt;a href="https://www.repowise.dev" rel="noopener noreferrer"&gt;repowise.dev&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you run the health-defect benchmark on your own repos, I want to see the numbers. The whole harness is public so you can reproduce or break it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>Why Grinding LeetCode Randomly Stops Working After a Point</title>
      <dc:creator>Raghav Chamadiya</dc:creator>
      <pubDate>Fri, 26 Dec 2025 07:46:04 +0000</pubDate>
      <link>https://dev.to/raghav_builds/why-grinding-leetcode-randomly-stops-working-after-a-point-37mc</link>
      <guid>https://dev.to/raghav_builds/why-grinding-leetcode-randomly-stops-working-after-a-point-37mc</guid>
      <description>&lt;p&gt;Most people don’t fail technical interviews because they didn’t solve enough problems.&lt;/p&gt;

&lt;p&gt;They fail because their practice stopped compounding.&lt;/p&gt;

&lt;p&gt;If you’ve done 200, 300, sometimes even 500 LeetCode problems and still feel shaky in interviews, this post is for you. Not because you’re bad at DSA, but because random practice quietly plateaus.&lt;/p&gt;

&lt;h2&gt;
  
  
  The illusion of progress
&lt;/h2&gt;

&lt;p&gt;Early on, any practice works.&lt;/p&gt;

&lt;p&gt;You solve two sum, reverse a linked list, maybe your first BFS. Every new concept feels like progress because it is. Your brain is laying down basic patterns.&lt;/p&gt;

&lt;p&gt;But after a while, something weird happens.&lt;/p&gt;

&lt;p&gt;You solve more problems, but interviews don’t feel easier. New questions still feel unfamiliar. Under pressure, your mind goes blank even for things you “know”.&lt;/p&gt;

&lt;p&gt;That’s not a motivation problem. It’s a structure problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problems are not the unit of learning
&lt;/h2&gt;

&lt;p&gt;We treat problems as the unit of progress because they’re countable.&lt;/p&gt;

&lt;p&gt;One problem done. Two problems done. Streaks maintained.&lt;/p&gt;

&lt;p&gt;But interviews don’t test whether you’ve seen a problem before. They test whether you can recognize a pattern, adapt it, and explain your thinking clearly under ambiguity.&lt;/p&gt;

&lt;p&gt;Patterns, not problems, are the real unit of learning.&lt;/p&gt;

&lt;p&gt;If you solve ten problems that all secretly rely on the same idea, but you never abstract that idea, you didn’t learn ten things. You learned one thing ten times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Random practice breaks pattern recognition
&lt;/h2&gt;

&lt;p&gt;Random problem solving does three harmful things over time.&lt;/p&gt;

&lt;p&gt;First, it fragments your mental model. You remember solutions, not strategies.&lt;/p&gt;

&lt;p&gt;Second, it hides gaps. You might be strong at sliding window but weak at tree recursion and random practice won’t surface that clearly.&lt;/p&gt;

&lt;p&gt;Third, it trains recall instead of reasoning. Interviews rarely reward recalling the exact solution you practiced last night.&lt;/p&gt;

&lt;p&gt;This is why people say things like “I knew this problem, but I couldn’t do it in the interview.”&lt;/p&gt;

&lt;p&gt;They knew the answer. They didn’t own the pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  What structured practice looks like
&lt;/h2&gt;

&lt;p&gt;Structured practice is not about rigid schedules or fancy roadmaps. It’s about sequencing learning so that each problem reinforces an idea.&lt;/p&gt;

&lt;p&gt;A simple example.&lt;/p&gt;

&lt;p&gt;Instead of solving random array problems, you group problems by technique. Two pointers. Sliding window. Prefix sums.&lt;/p&gt;

&lt;p&gt;You solve a few problems back to back that use the same idea, with increasing difficulty.&lt;/p&gt;

&lt;p&gt;After each set, you pause and ask:&lt;/p&gt;

&lt;p&gt;What was the invariant here&lt;br&gt;&lt;br&gt;
What made this pattern applicable&lt;br&gt;&lt;br&gt;
What variations could break my approach  &lt;/p&gt;

&lt;p&gt;That reflection is where learning actually happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interviews test clarity, not cleverness
&lt;/h2&gt;

&lt;p&gt;Interviewers are not impressed by clever tricks. They’re assessing whether you can reason clearly, communicate tradeoffs, and recover when stuck.&lt;/p&gt;

&lt;p&gt;Structured practice naturally builds these skills because you’re constantly asking why something works, not just whether it passes.&lt;/p&gt;

&lt;p&gt;Over time, you stop memorizing solutions and start recognizing shapes of problems.&lt;/p&gt;

&lt;p&gt;That’s when interviews start feeling familiar again.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real takeaway
&lt;/h2&gt;

&lt;p&gt;If your prep feels busy but not effective, don’t add more hours.&lt;/p&gt;

&lt;p&gt;Change the structure.&lt;/p&gt;

&lt;p&gt;Stop counting problems. Start owning patterns.&lt;/p&gt;

&lt;p&gt;This shift takes slightly more effort upfront, but it’s the difference between endless grinding and actual confidence.&lt;/p&gt;

&lt;p&gt;It’s also the difference between hoping the interview question matches something you’ve seen and knowing you can handle whatever shows up.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>dsa</category>
      <category>career</category>
      <category>interview</category>
    </item>
  </channel>
</rss>
