<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Creatman</title>
    <description>The latest articles on DEV Community by Creatman (@creatman).</description>
    <link>https://dev.to/creatman</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3806804%2F78d84e4b-1735-4590-92c5-8b6e5513e038.jpg</url>
      <title>DEV Community: Creatman</title>
      <link>https://dev.to/creatman</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/creatman"/>
    <language>en</language>
    <item>
      <title>Playwright MCP burns 1.5M tokens. CLI does it in 27k. So I built the skill that splits the phases.</title>
      <dc:creator>Creatman</dc:creator>
      <pubDate>Sun, 03 May 2026 14:16:12 +0000</pubDate>
      <link>https://dev.to/creatman/playwright-mcp-burns-15m-tokens-cli-does-it-in-27k-so-i-built-the-skill-that-splits-the-phases-50j7</link>
      <guid>https://dev.to/creatman/playwright-mcp-burns-15m-tokens-cli-does-it-in-27k-so-i-built-the-skill-that-splits-the-phases-50j7</guid>
      <description>&lt;p&gt;I wanted to test my web app. That's it. A Next.js portfolio and a SaaS chat — run some accessibility checks, catch console errors, verify nothing's broken on mobile. The kind of thing you do before pushing to production.&lt;/p&gt;

&lt;p&gt;I opened Claude Code, connected Playwright MCP, typed "test the app" and watched it burn through tokens like there was no tomorrow. Then &lt;code&gt;/compact&lt;/code&gt; fired at 18% text context. Then I discovered the invisible image budget. Then I spent three days building the tool I wished existed.&lt;/p&gt;

&lt;p&gt;This is the story of how a routine testing session turned into &lt;a href="https://github.com/CreatmanCEO/webtest-orch" rel="noopener noreferrer"&gt;webtest-orch&lt;/a&gt; — an open-source Claude Code skill that does e2e testing without bankrupting your token budget or hitting invisible context limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem: MCP is great at exploring, terrible at replaying
&lt;/h2&gt;

&lt;p&gt;In November 2025, &lt;a href="https://scrolltest.medium.com/playwright-mcp-burns-114k-tokens-per-test-the-new-cli-uses-27k-heres-when-to-use-each-65dabeaac7a0" rel="noopener noreferrer"&gt;Pramod Dutta published an analysis&lt;/a&gt; that went around the AI-testing corner of the internet: Playwright MCP burns ~114k tokens &lt;em&gt;per single test&lt;/em&gt;. The &lt;a href="https://github.com/microsoft/playwright-mcp/issues/889" rel="noopener noreferrer"&gt;Özal benchmark&lt;/a&gt; on Microsoft's own issue tracker shows e-commerce verify workflows hitting &lt;strong&gt;~1.5M tokens&lt;/strong&gt; on MCP. The Playwright CLI? Still ~27k.&lt;/p&gt;

&lt;p&gt;That's a 50–60× asymmetry. The cause is architectural: MCP keeps the LLM in the browser loop for every action — navigate, click, wait, screenshot, reason, repeat. Great for discovering a UI you've never seen. Catastrophically expensive for replaying the same flow a second time.&lt;/p&gt;

&lt;p&gt;Microsoft's own README has since been updated to recommend &lt;strong&gt;CLI + Skills over MCP&lt;/strong&gt; for coding agents. The &lt;a href="https://playwright.dev/docs/test-agents" rel="noopener noreferrer"&gt;official Test Agents documentation&lt;/a&gt; now ships a &lt;code&gt;Planner / Generator / Healer&lt;/code&gt; triplet as the supported architecture — not "agent stays inside MCP for the whole session."&lt;/p&gt;

&lt;p&gt;The fix isn't "use less Playwright MCP." It's &lt;strong&gt;split exploration from replay.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The second problem: the invisible image budget
&lt;/h2&gt;

&lt;p&gt;While debugging my token usage, I found something worse. Claude Code has a &lt;strong&gt;second context limit&lt;/strong&gt; that nobody documents — an inline-image budget of roughly 50–100 blocks per session. No counter. No warning. No &lt;code&gt;--show-image-budget&lt;/code&gt; flag.&lt;/p&gt;

&lt;p&gt;Every &lt;code&gt;Playwright:browser_take_screenshot&lt;/code&gt; returns one image block. Fifty screenshots in, you've used 0.4% of your text budget and 100% of your image budget. Then &lt;code&gt;/compact&lt;/code&gt; fires while your text context is 80% empty. The agent loses everything that wasn't disk-backed.&lt;/p&gt;

&lt;p&gt;I tried three fixes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Just take fewer screenshots"&lt;/strong&gt; — discipline drifts by turn 30 in any real exploratory session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md rule "never take screenshots"&lt;/strong&gt; — soft rules survive about 30 turns under pressure. The agent reaches for a screenshot when stuck on a modal and rationalises the rule violation in the same turn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;context: fork&lt;/code&gt; in skill frontmatter&lt;/strong&gt; — the documented official field, which silently failed to register the skill on my Claude Code 2.1.x Windows build. 90 minutes of debugging before I gave up and put the protection in the skill body instead.&lt;/p&gt;

&lt;p&gt;What actually works: &lt;strong&gt;Task subagents have isolated image budgets.&lt;/strong&gt; Whatever a subagent reads doesn't count against the parent. I verified empirically — dispatched a subagent to read 6 PNGs, return 6 text descriptions, parent counter stayed at zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three days later: webtest-orch
&lt;/h2&gt;

&lt;p&gt;By day three I had a working skill built around one architectural invariant: &lt;strong&gt;the parent chat never receives an image, ever.&lt;/strong&gt; Four patterns enforce it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern A (90% of work):&lt;/strong&gt; ARIA-tree exploration via &lt;code&gt;Playwright:browser_snapshot&lt;/code&gt; — returns the page as text, not images. Same locator information as a screenshot, except the agent can grep and diff it. Zero image-budget cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern B (3–5 per run):&lt;/strong&gt; When vision is genuinely needed (pixel-diff fired, layout check on a zero-baseline page), a Task subagent reads ONE image and returns ONE text line. The subagent burns its own budget; the parent stays clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern C:&lt;/strong&gt; Playwright's built-in &lt;code&gt;toHaveScreenshot()&lt;/code&gt; returns &lt;code&gt;diff%&lt;/code&gt; as JSON when run through &lt;code&gt;npx playwright test&lt;/code&gt;. Text the whole way down. Vision tokens only burn when the diff genuinely fires — and even then it's Pattern B.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern D:&lt;/strong&gt; Screenshots on disk (failure artifacts, MCP cache) cost zero unless explicitly &lt;code&gt;Read&lt;/code&gt;. Don't conflate "file exists" with "file costs context."&lt;/p&gt;

&lt;p&gt;The skill flow: first run → Playwright MCP walks the app via ARIA snapshots (in a subagent), generates &lt;code&gt;*.spec.ts&lt;/code&gt;. Every run after → &lt;code&gt;npx playwright test&lt;/code&gt; directly — deterministic, ~zero ongoing LLM cost. Bug fingerprinting with SHA-256 composite keys, run-diff that classifies bugs as &lt;code&gt;new&lt;/code&gt; / &lt;code&gt;regression&lt;/code&gt; / &lt;code&gt;persisting&lt;/code&gt; / &lt;code&gt;fixed&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The issues[] collector pattern
&lt;/h2&gt;

&lt;p&gt;Every generated spec collects all soft checks into one array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;home page baseline&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;consoleErrors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;failedRequests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pageerror&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;consoleErrors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`pageerror: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;console&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;consoleErrors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`console: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;failedRequests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;url&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;a11y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AxeBuilder&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withTags&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wcag2a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wcag2aa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wcag21aa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wcag22aa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;a11y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;violations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`a11y[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;impact&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;help&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;x nodes)`&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="c1"&gt;// overflow, heading hierarchy, touch targets, html-lang — all push into issues[]&lt;/span&gt;

  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; issues found:\n  - &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;  - &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toEqual&lt;/span&gt;&lt;span class="p"&gt;([]);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;consoleErrors&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toEqual&lt;/span&gt;&lt;span class="p"&gt;([]);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;failedRequests&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toEqual&lt;/span&gt;&lt;span class="p"&gt;([]);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a test fails, you get &lt;strong&gt;every&lt;/strong&gt; problem in one message. Post-run script parses the output and emits one bug record per issue with a stable fingerprint for cross-run diffing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What gets tested out of the box
&lt;/h2&gt;

&lt;p&gt;Every generated spec enforces: console error listeners (attached before &lt;code&gt;page.goto()&lt;/code&gt;, with a noise-filter for GTM/Stripe/Sentry/Next.js/Supabase/ResizeObserver), axe-core WCAG audit (&lt;code&gt;wcag2a&lt;/code&gt; through &lt;code&gt;wcag22aa&lt;/code&gt;), heading hierarchy (no &lt;code&gt;h1 → h3&lt;/code&gt; jumps), touch-target sizing (WCAG 2.5.8 AA = 24×24 CSS px), horizontal overflow detection, and &lt;code&gt;html lang&lt;/code&gt; presence. Visual regression uses Playwright's built-in &lt;code&gt;toHaveScreenshot()&lt;/code&gt; — zero external dependencies.&lt;/p&gt;

&lt;p&gt;Severity is auto-assigned from axe impact fields and error class, with three override mechanisms: &lt;code&gt;[severity:S0]&lt;/code&gt; inline in the collector, in the test name, or &lt;code&gt;// @severity: S0&lt;/code&gt; as a comment before &lt;code&gt;test()&lt;/code&gt;. Tracker mappings for Linear, GitHub, and Jira ship out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest competitive picture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://octomind.dev/blog/a-letter-to-our-users-customers-and-readers/" rel="noopener noreferrer"&gt;Octomind posted a farewell letter&lt;/a&gt; on April 30, 2026. The paid names still standing — &lt;a href="https://www.qawolf.com" rel="noopener noreferrer"&gt;QA Wolf&lt;/a&gt; ($60–250k/year per &lt;a href="https://bug0.com/knowledge-base/qa-wolf-pricing" rel="noopener noreferrer"&gt;Bug0's analysis&lt;/a&gt;), Mabl, BrowserStack AI — sell real value: cloud parallelism, human triage, SOC 2, SLAs.&lt;/p&gt;

&lt;p&gt;webtest-orch does not compete with any of those at scale. No managed cloud, no human review layer, no compliance certification. &lt;strong&gt;For solo devs and small teams already on Claude Code with no QA budget, it's a credible $0/mo option.&lt;/strong&gt; For a 50-person team running cross-browser nightly regression — it isn't, and pretending otherwise would be dishonest.&lt;/p&gt;

&lt;p&gt;The honest peer group is the free/OSS tier: Microsoft's native &lt;code&gt;playwright init-agents --loop=claude&lt;/code&gt; (Planner/Generator/Healer triplet) and &lt;a href="https://github.com/magnitudedev/magnitude" rel="noopener noreferrer"&gt;Magnitude&lt;/a&gt; (Apache/MIT, vision-AI framework). webtest-orch's deltas: out-of-the-box axe-core + console + network audit, bug fingerprinting with run-diff, and tracker mappings. None of those are in the free alternatives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I deliberately did not build
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;No self-healing.&lt;/strong&gt; The &lt;a href="https://bugbug.io/blog/test-automation/self-healing-test-automation/" rel="noopener noreferrer"&gt;QA community has been pushing back on self-healing&lt;/a&gt; — the failure mode is well-documented: healer picks a visually-similar-but-wrong element, test goes green, bug ships. webtest-orch prefers red over false-green.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No vendor cloud.&lt;/strong&gt; Tests stay in your repo. Reports on your filesystem. If the npm package disappears tomorrow, your suite still runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No "AI writes all your tests" pitch.&lt;/strong&gt; This is a complement, not a replacement, for engineering judgment. It's especially good at the boring 80%: a11y, console, network, responsive, regression diffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two real validation runs
&lt;/h2&gt;

&lt;p&gt;Both apps are public on GitHub — not synthetic benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/CreatmanCEO/portfolio" rel="noopener noreferrer"&gt;CreatmanCEO/portfolio&lt;/a&gt;&lt;/strong&gt; (&lt;a href="https://creatman.site" rel="noopener noreferrer"&gt;creatman.site&lt;/a&gt;) — static Next.js portfolio, mobile viewport. 4 real bugs found, 0 false positives: axe-core color-contrast failure on 8 elements (S1), two touch-targets under 24×24 px (S2), heading-jump h1→h3 on /projects (S2). Image budget burned in parent: &lt;strong&gt;zero&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/CreatmanCEO/lingua-companion" rel="noopener noreferrer"&gt;CreatmanCEO/lingua-companion&lt;/a&gt;&lt;/strong&gt; — voice-first AI language-learning SaaS in private beta (Next.js 16 + FastAPI + Supabase + WebSocket). 11 specs across login, chat, translation, TTS, settings, phrase library, scenario mode, stats, logout. 10/10 green after 4 iterations, ~12 min wall-clock. The dogfood round produced 6 fixes for v0.2.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start (3 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx webtest-orch@beta &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# If MCPs are missing:&lt;/span&gt;
claude mcp add &lt;span class="nt"&gt;--scope&lt;/span&gt; user playwright npx @playwright/mcp@latest
claude mcp add &lt;span class="nt"&gt;--scope&lt;/span&gt; user chrome-devtools npx chrome-devtools-mcp@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop a &lt;code&gt;.env.test&lt;/code&gt; in your project with &lt;code&gt;TEST_BASE_URL&lt;/code&gt;, restart Claude Code, say "test the app." The skill auto-detects authed vs public, scaffolds Playwright + axe-core, runs the exploratory pass, writes &lt;code&gt;reports/&amp;lt;run-id&amp;gt;/index.html&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Status
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;0.3.1-beta&lt;/code&gt; — 113 tests, full CI on Linux/macOS/Windows. MIT license. Looking for early users on Linux/macOS — there's a &lt;a href="https://github.com/CreatmanCEO/webtest-orch/issues/new?template=os-compatibility-report.md" rel="noopener noreferrer"&gt;5-minute OS-compatibility report template&lt;/a&gt; in the issues.&lt;/p&gt;




&lt;p&gt;Repo: &lt;a href="https://github.com/CreatmanCEO/webtest-orch" rel="noopener noreferrer"&gt;github.com/CreatmanCEO/webtest-orch&lt;/a&gt;. License MIT.&lt;/p&gt;

&lt;p&gt;The companion piece on the &lt;em&gt;other&lt;/em&gt; invisible token drain in agent loops — hierarchical project context — is here: &lt;a href="https://dev.to/creatman/the-context-problem-nobody-talks-about-why-ai-coding-agents-waste-80-of-tokens-on-files-they-mp1"&gt;The context problem nobody talks about&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;— &lt;em&gt;Nick (Creatman). Full-stack dev, working with Claude Code daily on 15+ web apps. Open to remote opportunities — &lt;a href="mailto:creatmanick@gmail.com"&gt;creatmanick@gmail.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>playwright</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>The context problem nobody talks about: why AI coding agents waste 80% of tokens on files they already read yesterday</title>
      <dc:creator>Creatman</dc:creator>
      <pubDate>Fri, 17 Apr 2026 20:34:21 +0000</pubDate>
      <link>https://dev.to/creatman/the-context-problem-nobody-talks-about-why-ai-coding-agents-waste-80-of-tokens-on-files-they-mp1</link>
      <guid>https://dev.to/creatman/the-context-problem-nobody-talks-about-why-ai-coding-agents-waste-80-of-tokens-on-files-they-mp1</guid>
      <description>&lt;p&gt;Every AI coding agent — Claude Code, Cursor, Codex, Gemini CLI — starts every session completely blind. It doesn't know your projects. It doesn't know your servers. It doesn't remember that you spent three hours yesterday debugging the payment system.&lt;/p&gt;

&lt;p&gt;So it greps. It reads file after file. It SSHs into your server to check what's running. It asks you "which project?" for the hundredth time. By the time it's oriented, you've burned half your context window on reconnaissance.&lt;/p&gt;

&lt;p&gt;I manage 15 projects across 4 VPS servers. This was costing me hours of context per day. So I built a fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern: hierarchical context
&lt;/h2&gt;

&lt;p&gt;The idea is dead simple. Instead of the agent searching bottom-up (grep everything → read files → build understanding), give it a top-down map:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Level 0: Project Map     — knows ALL your projects       (~2KB, always loaded)
Level 1: Project Detail  — architecture of one project   (~5KB, on demand)  
Level 2: Source Files     — actual code                   (only when needed)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Three files instead of fifty. The agent reads the map, knows where to look, and goes straight to the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Without hierarchy
&lt;/h3&gt;

&lt;p&gt;You: "What payment methods does Project A support?"&lt;/p&gt;

&lt;p&gt;Agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Greps &lt;code&gt;C:\Users\&lt;/code&gt; for anything payment-related (3 tool calls)&lt;/li&gt;
&lt;li&gt;Finds 6 candidate files, reads them all (6 tool calls)&lt;/li&gt;
&lt;li&gt;Realizes it's the wrong project, searches more (4 tool calls)&lt;/li&gt;
&lt;li&gt;SSHs into your server to read the production config (2 tool calls)&lt;/li&gt;
&lt;li&gt;Finally answers — 15+ tool calls, 80K+ tokens, 8 minutes&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  With hierarchy
&lt;/h3&gt;

&lt;p&gt;You: "What payment methods does Project A support?"&lt;/p&gt;

&lt;p&gt;Agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads Level 0 — sees Project A is at &lt;code&gt;~/projects/a/&lt;/code&gt; (already loaded, 0 calls)&lt;/li&gt;
&lt;li&gt;Reads &lt;code&gt;~/projects/a/CLAUDE.md&lt;/code&gt; — sees "Payments: Stars + CryptoCloud" (1 call)&lt;/li&gt;
&lt;li&gt;Answers immediately — 1 tool call, ~15K tokens, 10 seconds&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Same question. Same agent. Same model. The only difference is a 2KB file that says "here's where everything is."&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting it up (10 minutes)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Create your project map
&lt;/h3&gt;

&lt;p&gt;Add this to your agent's global instruction file (&lt;code&gt;~/.claude/CLAUDE.md&lt;/code&gt; for Claude Code, &lt;code&gt;.cursorrules&lt;/code&gt; for Cursor, &lt;code&gt;AGENTS.md&lt;/code&gt; for Codex):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Project Map&lt;/span&gt;

| Project | Local path | Server |
|---------|-----------|--------|
| &lt;span class="gs"&gt;**Auth Service**&lt;/span&gt; | &lt;span class="sb"&gt;`~/projects/auth/`&lt;/span&gt; | prod-1:/root/auth/ |
| &lt;span class="gs"&gt;**Landing Page**&lt;/span&gt; | &lt;span class="sb"&gt;`~/projects/landing/`&lt;/span&gt; | Cloudflare Pages |
| &lt;span class="gs"&gt;**Mobile App**&lt;/span&gt; | &lt;span class="sb"&gt;`~/projects/mobile/`&lt;/span&gt; | — |
| &lt;span class="gs"&gt;**Admin Panel**&lt;/span&gt; | &lt;span class="sb"&gt;`~/projects/admin/`&lt;/span&gt; | prod-1 (Docker) |

&lt;span class="gu"&gt;### Servers&lt;/span&gt;
| Name | IP | Key |
|------|-----|-----|
| prod-1 | x.x.x.x | ~/.ssh/prod |
| staging | y.y.y.y | ~/.ssh/staging |

&lt;span class="gu"&gt;### Rule&lt;/span&gt;
Read project CLAUDE.md before reading source files.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is your Level 0. It's ~2KB. The agent loads it automatically at the start of every session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Add CLAUDE.md to each project
&lt;/h3&gt;

&lt;p&gt;In each project root, create a context file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Auth Service — CLAUDE.md&lt;/span&gt;

&lt;span class="gu"&gt;## Status: LIVE&lt;/span&gt;
API for user authentication. Handles OAuth, JWT, rate limiting.

&lt;span class="gu"&gt;## Tech Stack&lt;/span&gt;
Python 3.12, FastAPI, PostgreSQL, Redis

&lt;span class="gu"&gt;## Key Files&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; main.py — entry point, route registration
&lt;span class="p"&gt;-&lt;/span&gt; auth/jwt.py — token generation and validation  
&lt;span class="p"&gt;-&lt;/span&gt; auth/oauth.py — Google/GitHub OAuth providers
&lt;span class="p"&gt;-&lt;/span&gt; models/user.py — SQLAlchemy user model

&lt;span class="gu"&gt;## Deployment&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Server: prod-1 (x.x.x.x)
&lt;span class="p"&gt;-&lt;/span&gt; Service: auth-service.service
&lt;span class="p"&gt;-&lt;/span&gt; Logs: journalctl -u auth-service -f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is Level 1. ~3-5KB per project. The agent reads it when you mention the project and immediately knows the architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 (optional): Add Graphify for code navigation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/safishamsi/graphify" rel="noopener noreferrer"&gt;Graphify&lt;/a&gt; turns your codebase into a knowledge graph. Run it once per project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;graphifyy
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/projects/auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In your AI agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/graphify &lt;span class="nb"&gt;.&lt;/span&gt;
graphify claude &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the agent has Level 1.5 — a structural map of your code. Before grepping, it consults the graph and knows exactly which file to read.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 (optional): Connect Claude Desktop via MCP
&lt;/h3&gt;

&lt;p&gt;If you use Claude Desktop, add Graphify as an MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"graphify"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"graphify.serve"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/graphify-out/graph.json"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Desktop automatically calls &lt;code&gt;query_graph&lt;/code&gt; when you ask about your projects. No prompting needed — it just works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real test results
&lt;/h2&gt;

&lt;p&gt;I ran the same questions with and without the hierarchy. Same model (Haiku — the cheapest), same machine, same projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What is the architecture of Project A?"&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Blind agent&lt;/th&gt;
&lt;th&gt;With hierarchy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool calls&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Behavior&lt;/td&gt;
&lt;td&gt;Grep → read 4 files → build answer&lt;/td&gt;
&lt;td&gt;Read CLAUDE.md → answer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;Correct&lt;/td&gt;
&lt;td&gt;Correct&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;"Which of my projects use library X?"&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Blind agent&lt;/th&gt;
&lt;th&gt;With hierarchy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool calls&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Behavior&lt;/td&gt;
&lt;td&gt;Scan entire disk&lt;/td&gt;
&lt;td&gt;Targeted grep in known paths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;Missed 1 of 3 projects&lt;/td&gt;
&lt;td&gt;Found all 3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;"Where is Project B deployed? Service name? Logs?"&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Blind agent&lt;/th&gt;
&lt;th&gt;With hierarchy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool calls&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Behavior&lt;/td&gt;
&lt;td&gt;Read configs + SSH into server&lt;/td&gt;
&lt;td&gt;Answered from context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSH needed&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The blind agent in T2 actually &lt;strong&gt;missed a project&lt;/strong&gt; that the hierarchy-equipped agent found. More context didn't just save tokens — it produced better answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this works
&lt;/h2&gt;

&lt;p&gt;AI coding agents are fundamentally search engines. When you ask a question, they search for the answer. The quality of the answer depends on the quality of the search.&lt;/p&gt;

&lt;p&gt;Without context, the agent searches blind: grep everything, read everything, hope to find the right files. With a hierarchy, the search is directed: check the map, go to the right project, read the right file.&lt;/p&gt;

&lt;p&gt;This isn't a new idea. It's how humans navigate codebases — you don't &lt;code&gt;grep -r&lt;/code&gt; your company's entire monorepo every time someone asks about a service. You know which repo, which module, which file. The hierarchy gives the agent the same knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is NOT
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Not a framework. It's a pattern — three markdown files.&lt;/li&gt;
&lt;li&gt;Not a token compression tool. The savings come from not reading files, not from compressing them.&lt;/li&gt;
&lt;li&gt;Not a replacement for Graphify. Graphify handles code-level navigation. This handles project-level navigation. They complement each other.&lt;/li&gt;
&lt;li&gt;Not magic. If your project doesn't have a CLAUDE.md, the agent still greps. You have to write the context files.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The full setup
&lt;/h2&gt;

&lt;p&gt;Templates, scripts, and multi-platform guides:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/CreatmanCEO/ai-context-hierarchy" rel="noopener noreferrer"&gt;github.com/CreatmanCEO/ai-context-hierarchy&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Level 0 and Level 1 templates&lt;/li&gt;
&lt;li&gt;Conversation indexing scripts (Claude Code sessions + Desktop export → searchable markdown)&lt;/li&gt;
&lt;li&gt;VPS sync command template&lt;/li&gt;
&lt;li&gt;Platform-specific setup for Claude Code, Cursor, Codex, Gemini CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bonus: conversation indexing
&lt;/h2&gt;

&lt;p&gt;Your past conversations with the AI contain architectural decisions, debugging sessions, deployment notes. But the agent can't search them.&lt;/p&gt;

&lt;p&gt;The repo includes parsers that convert Claude Code session logs (&lt;code&gt;.jsonl&lt;/code&gt;) and Claude Desktop exported chats into markdown files with YAML frontmatter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fixed&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;payment&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;webhook"&lt;/span&gt;
&lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2026-04-14&lt;/span&gt;
&lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;auth-service&lt;/span&gt;
&lt;span class="na"&gt;topics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;webhook"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cryptocloud"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloudflare"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;files_touched&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payments.py"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;webhook.py"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Index these with Graphify and the agent can find "what did we decide about the payment flow last week" without you re-explaining it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start here
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Write a project map (5 minutes)&lt;/li&gt;
&lt;li&gt;Add CLAUDE.md to your main project (5 minutes)&lt;/li&gt;
&lt;li&gt;Ask the agent about your project in a new session&lt;/li&gt;
&lt;li&gt;Watch it answer without grepping&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the whole thing. No dependencies, no installation, no configuration. Just three markdown files that turn your blind agent into one that knows where to look.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with &lt;a href="https://github.com/safishamsi/graphify" rel="noopener noreferrer"&gt;Graphify&lt;/a&gt; for code-level navigation. Source and templates: &lt;a href="https://github.com/CreatmanCEO/ai-context-hierarchy" rel="noopener noreferrer"&gt;ai-context-hierarchy&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>graphify</category>
      <category>agents</category>
      <category>productivity</category>
    </item>
    <item>
      <title>A Government Messenger Detects VPNs and Reports Server IPs. Here's How I Protected My Infrastructure.</title>
      <dc:creator>Creatman</dc:creator>
      <pubDate>Wed, 18 Mar 2026 08:26:55 +0000</pubDate>
      <link>https://dev.to/creatman/a-government-messenger-detects-vpns-and-reports-server-ips-heres-how-i-protected-my-4198</link>
      <guid>https://dev.to/creatman/a-government-messenger-detects-vpns-and-reports-server-ips-heres-how-i-protected-my-4198</guid>
      <description>&lt;h2&gt;
  
  
  Credit First
&lt;/h2&gt;

&lt;p&gt;Everything in this article about how Max's HOST_REACHABILITY module works is based on the research of others. I want to acknowledge them clearly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"runetfreedom"&lt;/strong&gt; — published the original deep analysis on Habr (March 5, 2026, 485K+ views, 1055+ upvotes) documenting the HOST_REACHABILITY module through JADX decompilation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RKS Global&lt;/strong&gt; — independently verified the findings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RigOlit&lt;/strong&gt; — published a detailed security analysis on GitHub (August 2025), later deleted it under pressure. The original repo was preserved in Web Archive&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Denis Yagodin&lt;/strong&gt; — conducted RAG analysis of decompiled code (Medium), discovered the &lt;code&gt;fsb&lt;/code&gt; class and intentionally weakened TLS cipher suites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scamshot Telegram channel&lt;/strong&gt; — confirmed clipboard access and installed app list exfiltration through dynamic analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Novaya Gazeta Europe&lt;/strong&gt; — verified RigOlit's analysis with technical experts before it was removed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These researchers took real personal risk to expose this. A security researcher in Russia who publishes critical analysis of a government-backed app faces genuine consequences — and one of them already had to delete their work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My contribution is narrow but practical&lt;/strong&gt;: I read their research, understood the threat to my VPN infrastructure, and built a working server-side protection system in one day.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Max and Why Should VPN Operators Care?
&lt;/h2&gt;

&lt;p&gt;Max (formerly VK Messenger, &lt;code&gt;ru.oneme.app&lt;/code&gt;) is developed by VK — Russia's largest social media company, controlled by Gazprom structures. Since September 2025, it's mandatory pre-installed on every smartphone sold in Russia. 100 million registered users (Of course, this is according to VK's own data, which cannot be verified in any way).&lt;/p&gt;

&lt;p&gt;VK is run by Vladimir Kirienko — son of the First Deputy Chief of Staff of the Presidential Administration. Max has no end-to-end encryption and is legally required to integrate with SORM (FSB wiretapping infrastructure).&lt;/p&gt;

&lt;p&gt;I run VPN infrastructure (VLESS+Reality with DPI bypass) for approximately 165 clients in Russia. When the HOST_REACHABILITY research was published, I realized my servers were directly at risk of being identified and blocked.&lt;/p&gt;




&lt;h2&gt;
  
  
  How HOST_REACHABILITY Works (Based on Published Research)
&lt;/h2&gt;

&lt;p&gt;The following is a summary of findings published by "runetfreedom" on Habr and verified by RKS Global. I'm condensing it here for VPN operators who need to understand the threat model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: VPN Detection
&lt;/h3&gt;

&lt;p&gt;The module uses Android's &lt;code&gt;NetworkCapabilities.TRANSPORT_VPN&lt;/code&gt; API to detect active VPN connections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: IP Identification via 6 Services
&lt;/h3&gt;

&lt;p&gt;If VPN is detected, the module queries six IP detection services simultaneously — three Russian, three foreign:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ipv4-internet.yandex.net&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Russia&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ipv6-internet.yandex.net&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Russia&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ip.mail.ru&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Russia (VK-owned)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ifconfig.me&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;International&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;api.ipify.org&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cloudflare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;checkip.amazonaws.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The mix is deliberate: if a user has split-tunneling, Russian and foreign services return different IPs, revealing the VPN server address. For personal VPN servers where inbound and outbound IPs match — the server is directly exposed and tied to the user's identity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Censorship Bypass Test
&lt;/h3&gt;

&lt;p&gt;The module tests reachability of blocked resources (Telegram, WhatsApp) using ICMP ping and TCP:443. If blocked resources are reachable — the user is bypassing censorship.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Data Package
&lt;/h3&gt;

&lt;p&gt;Everything is sent to &lt;code&gt;api.oneme.ru&lt;/code&gt; via a proprietary binary protocol (10-byte header + MessagePack + LZ4), mixed with regular messenger traffic: VPN flag, server IP, host reachability results, carrier, userId, sessionId.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Blocking Pipeline
&lt;/h3&gt;

&lt;p&gt;Data flows: Max → api.oneme.ru → VK → Roskomnadzor → TSPU → IP blocked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Critical Detail
&lt;/h3&gt;

&lt;p&gt;The module is controlled by a server-side flag returned per-user during authentication — enabling targeted activation for specific accounts.&lt;/p&gt;

&lt;p&gt;For full technical details — decompiled class names, opcode structure, ASN lists — read the original research on Habr.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built: Two-Layer Server-Side Protection
&lt;/h2&gt;

&lt;p&gt;After reading the research, I needed to protect my infrastructure without requiring any changes on the client side. Here's what I implemented in one day.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Domain Blocking via Xray Routing
&lt;/h3&gt;

&lt;p&gt;Traffic to Max domains is dropped at the Xray level before leaving the VPN server. The HOST_REACHABILITY module can collect data on the device — but it can't send it home.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domains blocked:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;th&gt;False positive risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;domain:oneme.ru&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Core Max API — HOST_REACHABILITY payload channel&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;domain:max.ru&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Max web platform&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;full:ip.mail.ru&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;VK-owned IP detection service used by the module&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Domains NOT blocked&lt;/strong&gt; (to avoid breaking other apps):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;vk.com&lt;/code&gt;, &lt;code&gt;mail.ru&lt;/code&gt; — used by other VK ecosystem apps&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ifconfig.me&lt;/code&gt;, &lt;code&gt;api.ipify.org&lt;/code&gt;, &lt;code&gt;yandex.net&lt;/code&gt; — legitimate services used by thousands of apps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Xray configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outbounds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"protocol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"freedom"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tag"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"direct"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
      &lt;/span&gt;&lt;span class="nl"&gt;"protocol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"blackhole"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
      &lt;/span&gt;&lt;span class="nl"&gt;"settings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"response"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; 
      &lt;/span&gt;&lt;span class="nl"&gt;"tag"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"block-max"&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"routing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"domainStrategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AsIs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rules"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"field"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"domain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"domain:oneme.ru"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"full:ip.mail.ru"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"domain:max.ru"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"outboundTag"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"block-max"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Critical:&lt;/strong&gt; Sniffing must be enabled on every inbound — without it, Xray can't see the domain in TLS ClientHello:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"sniffing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"destOverride"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tls"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 2: Real-Time Monitoring + Telegram Alerts
&lt;/h3&gt;

&lt;p&gt;A Python daemon watches Xray access logs and sends instant Telegram alerts when a client's device tries to contact Max domains:&lt;/p&gt;

&lt;p&gt;⚠️ MAX DETECTED&lt;br&gt;
Client: [CLIENT_NAME]&lt;br&gt;
Domain: api.oneme.ru:443&lt;br&gt;
Server: xray-direct&lt;br&gt;
Time: 2026-03-17 12:00:00&lt;br&gt;
Traffic to Max blocked at Xray level.&lt;/p&gt;

&lt;p&gt;Deduplication: one alert per client per hour. Runs as a systemd service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment
&lt;/h3&gt;

&lt;p&gt;Applied to two Xray services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VLESS+Reality (direct connection)&lt;/li&gt;
&lt;li&gt;VLESS+WebSocket via Cloudflare&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All ~165 clients protected automatically. Zero client-side changes. Hiddify, v2rayNG, Streisand continue working normally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DoH / Hardcoded IPs:&lt;/strong&gt; If Max switches to DNS-over-HTTPS or hardcoded IPs, domain-based blocking fails. Fallback plan: iptables DROP for VK IP ranges (AS47541, AS47764).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Split-tunneling:&lt;/strong&gt; If a user only routes foreign traffic through VPN, Max reaches api.oneme.ru directly. Server-side protection can't help. Recommendation: full tunnel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Module updates:&lt;/strong&gt; When Max changes domains or protocols, the blocklist needs updating. Monitoring helps detect new patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The only guarantee:&lt;/strong&gt; Don't install Max on a device that connects to VPN.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;After the Habr publication went viral, VK released an update removing the Telegram/WhatsApp checks — but the module code was not removed from the APK. It's dormant, waiting.&lt;/p&gt;

&lt;p&gt;The researchers who exposed this took real risk. One deleted their work under pressure. The least we can do as infrastructure operators is take their findings seriously and build actual protections.&lt;/p&gt;

&lt;p&gt;If you operate VPN infrastructure with users in Russia — the Xray configuration above takes 10 minutes to deploy and protects all your clients immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Original research by "runetfreedom" — Habr, March 5, 2026&lt;/li&gt;
&lt;li&gt;RKS Global — independent verification&lt;/li&gt;
&lt;li&gt;RigOlit — GitHub analysis (August 2025, deleted, preserved in Web Archive)&lt;/li&gt;
&lt;li&gt;Denis Yagodin — RAG analysis of decompiled code (Medium)&lt;/li&gt;
&lt;li&gt;Scamshot — dynamic analysis (Telegram channel)&lt;/li&gt;
&lt;li&gt;Novaya Gazeta Europe — investigative reporting&lt;/li&gt;
&lt;li&gt;Human Rights Watch, Reporters Without Borders, Access Now — policy analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;I'm Creatman — I build solutions for problems I encounter. When researchers expose a threat, I build the defense.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GitHub: &lt;a href="https://github.com/CreatmanCEO" rel="noopener noreferrer"&gt;github.com/CreatmanCEO&lt;/a&gt; | Portfolio: &lt;a href="https://creatman.site" rel="noopener noreferrer"&gt;creatman.site&lt;/a&gt; | LinkedIn: &lt;a href="https://linkedin.com/in/creatman" rel="noopener noreferrer"&gt;linkedin.com/in/creatman&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>netsec</category>
      <category>vpn</category>
      <category>infrastructure</category>
      <category>automation</category>
    </item>
    <item>
      <title>I Stopped Claude Code From Breaking My Projects. Here's the Exact Setup</title>
      <dc:creator>Creatman</dc:creator>
      <pubDate>Thu, 05 Mar 2026 16:02:00 +0000</pubDate>
      <link>https://dev.to/creatman/i-stopped-claude-code-from-breaking-my-projects-heres-the-exact-setup-1agi</link>
      <guid>https://dev.to/creatman/i-stopped-claude-code-from-breaking-my-projects-heres-the-exact-setup-1agi</guid>
      <description>&lt;p&gt;&lt;em&gt;A practical guide to anti-regression AI coding with Claude Code, subagents, hooks, and Google Antigravity — from someone who was ready to quit.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Six months ago, my workflow with Claude Code looked like this: build a working prototype in an hour, spend the next three hours watching Claude systematically destroy it while "improving" things. Every developer using AI coding agents knows this pattern. You ask for a small change, and suddenly your auth module is rewritten, three tests are deleted, and there's a new dependency you never asked for.&lt;/p&gt;

&lt;p&gt;I'm a full stack developer running my own company (CREATMAN), and I code solo most of the time. I can't afford to babysit an AI agent on every keystroke. I needed a setup where Claude Code would &lt;em&gt;help&lt;/em&gt; me ship faster without regressing what already works.&lt;/p&gt;

&lt;p&gt;After weeks of research, community deep-dives, and a lot of trial and error, I found a combination that actually works. This article is everything I learned — the tools, the config files, and the exact workflow I use daily.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Claude Code Breaks Things (It's Not What You Think)
&lt;/h2&gt;

&lt;p&gt;The root cause isn't that Claude is "dumb." It's &lt;strong&gt;context window exhaustion&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Claude Code operates in a ~200K token window. About 80% of that gets consumed by file reads and tool results — not your conversation. By the time you're at 90% utilization, Claude literally cannot hold your project's architecture in its working memory anymore. It forgets patterns from earlier in the session, contradicts its own decisions, and starts generating code that conflicts with what it wrote 20 minutes ago.&lt;/p&gt;

&lt;p&gt;The community calls this &lt;strong&gt;"context drift"&lt;/strong&gt; and it's the #1 source of regressions. The fix isn't "write better prompts" — it's architectural.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack: Antigravity + Claude Code + Subagents
&lt;/h2&gt;

&lt;p&gt;Here's what I landed on after testing Cursor, VS Code, Warp, Zed, and several other setups.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Antigravity as the IDE
&lt;/h3&gt;

&lt;p&gt;Antigravity is Google's free, agent-first IDE (a VS Code fork). I chose it for three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Free.&lt;/strong&gt; I'm already paying $100/month for Claude Max — I'm not adding Cursor on top.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VS Code compatible.&lt;/strong&gt; My extensions, keybindings, and theme transferred over in one click.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in browser agent.&lt;/strong&gt; Antigravity's Gemini 3 Pro agent can autonomously open a browser, navigate your app, and test UI. No Playwright setup needed for basic visual checks.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I run Claude Code in Antigravity's terminal with my Max subscription. Gemini 3 Pro runs in the Agent Manager panel. Two AI brains, one IDE, zero extra cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code with Anti-Regression Config
&lt;/h3&gt;

&lt;p&gt;The terminal Claude Code is where real development happens. But instead of running it raw, I set up three layers of protection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: CLAUDE.md — The File That Changes Everything
&lt;/h2&gt;

&lt;p&gt;If you take one thing from this article, make it this. &lt;code&gt;CLAUDE.md&lt;/code&gt; is a special file that Claude reads at the start of &lt;strong&gt;every&lt;/strong&gt; session. It survives compaction. It's your project's constitution.&lt;/p&gt;

&lt;p&gt;Here's my actual template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project Name&lt;/span&gt;

&lt;span class="gu"&gt;## Architecture&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Frontend**&lt;/span&gt;: React + TypeScript
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Backend**&lt;/span&gt;: Python 3.11, FastAPI
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Database**&lt;/span&gt;: PostgreSQL
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Deploy**&lt;/span&gt;: Docker on VPS

&lt;span class="gu"&gt;## Key Commands&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`make dev`&lt;/span&gt; — Start development servers
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`make test`&lt;/span&gt; — Run full test suite
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`make lint`&lt;/span&gt; — Lint everything

&lt;span class="gu"&gt;## CRITICAL RULES&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; NEVER delete or rewrite working tests without explicit request
&lt;span class="p"&gt;-&lt;/span&gt; NEVER delete files without confirmation
&lt;span class="p"&gt;-&lt;/span&gt; ALWAYS run tests after any code change
&lt;span class="p"&gt;-&lt;/span&gt; ALWAYS do git checkpoint before large refactors
&lt;span class="p"&gt;-&lt;/span&gt; One task at a time. Do NOT make multiple changes simultaneously
&lt;span class="p"&gt;-&lt;/span&gt; If unsure — ASK, don't guess

&lt;span class="gu"&gt;## Working Style&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Plan FIRST, code SECOND. Never start coding without confirmed plan
&lt;span class="p"&gt;-&lt;/span&gt; Small diffs. One file → tests → next file
&lt;span class="p"&gt;-&lt;/span&gt; After every change: run tests and show results
&lt;span class="p"&gt;-&lt;/span&gt; Use subagents for codebase research
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;CRITICAL RULES&lt;/code&gt; section is where the magic happens. Claude follows these instructions with remarkable consistency — because they're injected before every conversation, not buried 50 messages deep where they'd get compacted away.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key insight from SFEIR Institute's research&lt;/strong&gt;: 60% of Claude Code support tickets come from the "ghost context" anti-pattern — working without a CLAUDE.md. A simple CLAUDE.md resolves the issue in 90% of cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2: Subagents — Isolated AI Specialists
&lt;/h2&gt;

&lt;p&gt;Subagents are Claude Code's most underrated feature. Each runs in its &lt;strong&gt;own context window&lt;/strong&gt; and returns only a summary to your main session. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Research doesn't pollute your implementation context&lt;/li&gt;
&lt;li&gt;A reviewer can check code without knowing (or forgetting) what the planner decided&lt;/li&gt;
&lt;li&gt;You can run them in parallel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I use three subagents. Create them as markdown files in &lt;code&gt;.claude/agents/&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/agents/planner.md&lt;/code&gt;&lt;/strong&gt; — Researches the codebase and writes implementation plans. Never writes code. Saves plans to &lt;code&gt;./plans/&lt;/code&gt; so other agents (and future sessions) can reference them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/agents/tester.md&lt;/code&gt;&lt;/strong&gt; — Writes tests following existing patterns, runs the &lt;em&gt;full&lt;/em&gt; test suite (not just new tests), and reports regressions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/agents/code-reviewer.md&lt;/code&gt;&lt;/strong&gt; — Reviews changes for regressions, security issues, and pattern violations. Outputs severity-rated findings with file:line references.&lt;/p&gt;

&lt;p&gt;The key is adding a reminder to your CLAUDE.md:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Agents&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use &lt;span class="sb"&gt;`planner`&lt;/span&gt; agent before any complex task
&lt;span class="p"&gt;-&lt;/span&gt; Use &lt;span class="sb"&gt;`tester`&lt;/span&gt; agent after code changes
&lt;span class="p"&gt;-&lt;/span&gt; Use &lt;span class="sb"&gt;`code-reviewer`&lt;/span&gt; agent before commits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this reminder, Claude tends to do everything in the main context — which is exactly what causes context bloat and regressions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: Hooks — Automated Safety Nets
&lt;/h2&gt;

&lt;p&gt;Hooks fire automatically on Claude Code lifecycle events. My most valuable hook blocks commits when tests fail:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git commit*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python -m pytest tests/ -x --timeout=60 || (echo {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;block&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: true, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;message&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Tests failing.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;} 1&amp;gt;&amp;amp;2 &amp;amp;&amp;amp; exit 2)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"timeout"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a hard gate: Claude literally cannot commit code that breaks tests. No exceptions, no "I'll fix it later." The feedback loop forces it to fix regressions before moving forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Daily Workflow
&lt;/h2&gt;

&lt;p&gt;Here's how an actual development session looks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Start the session&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open Antigravity. Terminal. &lt;code&gt;claude&lt;/code&gt;. Claude reads CLAUDE.md automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Before any complex task — plan first&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Use the planner agent to research our codebase and create
&amp;gt; an implementation plan for [feature]. Do NOT write any code.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Review the plan. Question assumptions. Adjust. Only then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Implement Step 1 from the plan. Run tests after.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Monitor context religiously&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check &lt;code&gt;/cost&lt;/code&gt; periodically&lt;/li&gt;
&lt;li&gt;At 60-70% context → &lt;code&gt;/compact "Preserve: modified files list, test results, current plan step"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Switching topics → &lt;code&gt;/clear&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Use checkpoints as undo&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before risky changes: &lt;code&gt;git add -A &amp;amp;&amp;amp; git commit -m "checkpoint: before auth refactor"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If Claude breaks something: &lt;code&gt;Esc + Esc&lt;/code&gt; → restore code only. Or &lt;code&gt;git reset --hard HEAD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Review before commit&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Use code-reviewer agent to check all changes
&amp;gt; Use tester agent to run the full test suite
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;6. Parallel sessions for complex work&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Split terminal in Antigravity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terminal 1: Claude Code — main implementation&lt;/li&gt;
&lt;li&gt;Terminal 2: Claude Code — tests in parallel&lt;/li&gt;
&lt;li&gt;Terminal 3: dev server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three sessions = effectively 600K tokens of context. CLAUDE.md serves as shared memory between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Antigravity Bonus: Two Brains
&lt;/h2&gt;

&lt;p&gt;While Claude handles the heavy coding in the terminal, I use Antigravity's Gemini 3 Pro for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;UI testing via built-in browser&lt;/strong&gt; — "Test the login flow on localhost:3000 and screenshot any errors"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Second opinion&lt;/strong&gt; — paste Claude's plan into Gemini's agent panel and ask for critique&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt; — Gemini is solid at generating docs from code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This dual-model approach gives you the coding power of Claude and the planning/context strengths of Gemini without paying for two subscriptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Servers Worth Installing
&lt;/h2&gt;

&lt;p&gt;For browser automation beyond what Antigravity's built-in browser offers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add playwright &lt;span class="nt"&gt;--&lt;/span&gt; npx &lt;span class="nt"&gt;-y&lt;/span&gt; @executeautomation/playwright-mcp-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now Claude can open browsers, take screenshots, click elements, and verify UI — all through natural language.&lt;/p&gt;

&lt;p&gt;For GitHub integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add github &lt;span class="nt"&gt;--&lt;/span&gt; npx &lt;span class="nt"&gt;-y&lt;/span&gt; @modelcontextprotocol/server-github
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify everything connected with &lt;code&gt;/mcp&lt;/code&gt; in Claude Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Changed
&lt;/h2&gt;

&lt;p&gt;Before this setup, I'd lose 2-3 hours per day to regression whack-a-mole. Claude would "fix" one thing and break two others. I'd context-switch between debugging Claude's mistakes and actually building features.&lt;/p&gt;

&lt;p&gt;After implementing CLAUDE.md + subagents + hooks + checkpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Regressions dropped dramatically.&lt;/strong&gt; The hook that blocks commits on failing tests alone is worth the entire setup time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sessions are productive longer.&lt;/strong&gt; Subagents keep the main context clean. I can work for 2+ hours before needing to compact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recovery is instant.&lt;/strong&gt; Checkpoints + git mean I'm never more than 10 seconds away from a working state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I actually trust the output.&lt;/strong&gt; The planner → implement → review → test pipeline catches issues before they compound.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start (15 Minutes)
&lt;/h2&gt;

&lt;p&gt;If you want to try this today, grab the ready-to-use configs from &lt;a href="https://github.com/CreatmanCEO/claude-code-antiregression-setup" rel="noopener noreferrer"&gt;&lt;strong&gt;the GitHub repo&lt;/strong&gt;&lt;/a&gt; or set it up manually:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;CLAUDE.md&lt;/code&gt; in your project root with your architecture, commands, and CRITICAL RULES&lt;/li&gt;
&lt;li&gt;Create &lt;code&gt;.claude/agents/tester.md&lt;/code&gt; — even just one subagent for testing makes a huge difference&lt;/li&gt;
&lt;li&gt;Add the commit-blocking hook to &lt;code&gt;.claude/settings.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start every complex task with "Create a plan first. Do NOT write code."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. These four changes will transform your Claude Code experience more than any model upgrade or IDE switch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/CreatmanCEO/claude-code-antiregression-setup" rel="noopener noreferrer"&gt;claude-code-antiregression-setup&lt;/a&gt;&lt;/strong&gt; — All configs from this article: CLAUDE.md template, subagents, hooks, rules, workflow docs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/hesreallyhim/awesome-claude-code" rel="noopener noreferrer"&gt;awesome-claude-code&lt;/a&gt; — Curated list of skills, hooks, agents, commands&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/shinpr/claude-code-workflows" rel="noopener noreferrer"&gt;claude-code-workflows&lt;/a&gt; — Production-ready workflow plugins (17 agents, 10 commands)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/davila7/claude-code-templates" rel="noopener noreferrer"&gt;claude-code-templates&lt;/a&gt; — CLI to install 100+ ready-made agents and hooks&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://code.claude.com/docs/en/memory" rel="noopener noreferrer"&gt;Claude Code Official Docs: Memory&lt;/a&gt; — How CLAUDE.md and auto-memory work&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://code.claude.com/docs/en/best-practices" rel="noopener noreferrer"&gt;Claude Code Official Docs: Best Practices&lt;/a&gt; — Anthropic's own recommendations&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;I'm Nick, a full stack developer and digital architect at &lt;a href="https://creatman.site" rel="noopener noreferrer"&gt;CREATMAN&lt;/a&gt;. I build backend systems, AI tools, and VPN infrastructure. The complete anti-regression setup from this article — CLAUDE.md template, all three subagents, hooks, rules, and workflow docs — is available as a ready-to-use repo: &lt;a href="https://github.com/CreatmanCEO/claude-code-antiregression-setup" rel="noopener noreferrer"&gt;claude-code-antiregression-setup&lt;/a&gt;. Clone it, fill in the template, and start shipping without fear. I'm open to remote opportunities — find me on &lt;a href="https://github.com/CreatmanCEO" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>antigravity</category>
      <category>claudecode</category>
    </item>
  </channel>
</rss>
