<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: zxpmail</title>
    <description>The latest articles on DEV Community by zxpmail (@zxpmail).</description>
    <link>https://dev.to/zxpmail</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3971221%2F22674d04-1a24-4e45-8fd8-d6911cab0a37.png</url>
      <title>DEV Community: zxpmail</title>
      <link>https://dev.to/zxpmail</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zxpmail"/>
    <language>en</language>
    <item>
      <title>From Shackles to Anchors: How I Resurrected an Abandoned Open-Source Framework</title>
      <dc:creator>zxpmail</dc:creator>
      <pubDate>Sat, 06 Jun 2026 12:06:06 +0000</pubDate>
      <link>https://dev.to/zxpmail/from-shackles-to-anchors-how-i-resurrected-an-abandoned-open-source-framework-8pi</link>
      <guid>https://dev.to/zxpmail/from-shackles-to-anchors-how-i-resurrected-an-abandoned-open-source-framework-8pi</guid>
      <description>&lt;p&gt;From Shackles to Anchors: How I Resurrected an Abandoned Open-Source Framework by Learning to Work &lt;em&gt;With&lt;/em&gt; AI, Not Against It&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GitHub Finish-Up-A-Thon submission&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Abandoned Framework
&lt;/h2&gt;

&lt;p&gt;ReqForge is an open-source LLM agent harness — a structured workflow for turning product ideas into shippable code. I started it months ago. It worked, but something was off.&lt;/p&gt;

&lt;p&gt;The framework was built on a simple philosophy: &lt;strong&gt;constrain the model enough and it will produce correct code.&lt;/strong&gt; Rules. Validators. Checklists. Gates. Every conversation started with a list of "don'ts" — don't over-abstract, don't hallucinate APIs, don't write empty catch blocks, don't use &lt;code&gt;as any&lt;/code&gt;, don't copy-paste templates...&lt;/p&gt;

&lt;p&gt;I had built a framework that spent most of its energy &lt;strong&gt;fighting the model.&lt;/strong&gt; And the result was predictable: generated code was correct but stiff. Every new feature required more rules. The framework was becoming a burden.&lt;/p&gt;

&lt;p&gt;I shelved it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Spark
&lt;/h2&gt;

&lt;p&gt;Then I watched a YouTube video about a 2300-year-old Chinese philosophy text — Zhuangzi's story of Cook Ding, a butcher whose knife never dulls.&lt;/p&gt;

&lt;p&gt;Lord Wenhui watches Cook Ding cut up an ox. Other butchers smash through bones, replacing their knife every month. But Ding's blade glides through the ox's body like music. After thousands of oxen, his knife is still sharp.&lt;/p&gt;

&lt;p&gt;"How?" asks the lord.&lt;/p&gt;

&lt;p&gt;Ding replies: "What I care about is the Way, which goes beyond skill. A good butcher changes his knife every year. An ordinary butcher changes it every month. I've used this knife for 19 years. When I first started, I saw nothing but the whole ox. After three years, I no longer saw the ox — I saw the gaps between joints. Now I meet it with spirit, not with my eyes. My senses stop. My spirit guides the knife."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I realized: my framework was stuck at "good butcher" level.&lt;/strong&gt; I was adding better rules, sharper validators, more gates — better knives — instead of learning to see the gaps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The gaps are the model's natural pattern-matching ability.&lt;/strong&gt; LLMs aren't logic engines. They're pattern matchers. Every "don't" rule forces the model to suppress its natural generation pattern. Instead of fighting this, I should work &lt;em&gt;with&lt;/em&gt; it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Resurrection: From Shackles to Anchors
&lt;/h2&gt;

&lt;p&gt;I reopened the repo and completely rewrote the design philosophy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before (Shackles)
&lt;/h3&gt;

&lt;p&gt;The framework's code generation guidance looked like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Checklist:
- [ ] No over-abstraction
- [ ] No hallucinated APIs
- [ ] No hardcoded values
- [ ] No empty catch blocks
- [ ] No copy-paste templates
- [ ] No fake tests
- [ ] No TODO debris
- [ ] No type escapes
- [ ] No style scattering
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Nine "don't" rules. The model had to recite them while generating, suppressing its natural tendencies simultaneously. Every suppression could fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  After (Anchors)
&lt;/h3&gt;

&lt;p&gt;I replaced the checklist with three short code examples — perfect patterns showing the model what TO do:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Anchor 1: Error handling pattern
async function createUser(email: string, password: string): Promise&amp;lt;User&amp;gt; {
  const existing = await db.user.findUnique({ where: { email } });
  if (existing) {
    throw new AppError(ErrorCode.CONFLICT, "Email already registered");
  }
  const hashed = await bcrypt.hash(password, 12);
  const user = await db.user.create({ data: { email, passwordHash: hashed } });
  logger.info("User created", { userId: user.id });
  return user;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;(Plus API endpoint and test pattern anchors.)&lt;/p&gt;

&lt;p&gt;The model reads three perfect examples, its pattern-matching activates, and it naturally continues in the correct style. The checklist stays as a safety net — demoted from generation guide to pre-delivery sanity check.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Full Transformation
&lt;/h3&gt;

&lt;p&gt;I made eight interconnected changes in one continuous session with GitHub Copilot:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Difficulty markers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every task treated equally&lt;/td&gt;
&lt;td&gt;🔴/🟡/🟢 levels — model slows down for hard tasks, speeds through easy ones&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anti-slop reform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9 "don't" rules per skill&lt;/td&gt;
&lt;td&gt;3 perfect code anchors + light checklist&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Phase 1 catalyst&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;First phase starts coding immediately&lt;/td&gt;
&lt;td&gt;Lays down domain skeleton first — all subsequent code follows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-review&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code reviewed externally (late)&lt;/td&gt;
&lt;td&gt;Self-review in the same hot context (early)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Closing ritual&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Phase ends, move to next&lt;/td&gt;
&lt;td&gt;Append discoveries to spec, log decisions, clear context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Attention layout&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Key info buried in the middle&lt;/td&gt;
&lt;td&gt;Critical instructions at the end (recency bias)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-rollback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual git checkout&lt;/td&gt;
&lt;td&gt;Automatic snapshot restore on verify failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security rules&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scattered across files&lt;/td&gt;
&lt;td&gt;One installable template&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I also wrote a &lt;strong&gt;benchmark&lt;/strong&gt; to prove the approach works — same task, two approaches, measured results:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Old (9 rules)&lt;/th&gt;
&lt;th&gt;New (3 anchors)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tests passed&lt;/td&gt;
&lt;td&gt;26/26&lt;/td&gt;
&lt;td&gt;26/26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code size&lt;/td&gt;
&lt;td&gt;53 lines&lt;/td&gt;
&lt;td&gt;45 lines (−15%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;td&gt;2-pass filter + Map&lt;/td&gt;
&lt;td&gt;1-pass filter, simpler&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And a &lt;strong&gt;manifesto&lt;/strong&gt; explaining the philosophy — &lt;a href="https://github.com/zxpmail/ReqForge/issues/1" rel="noopener noreferrer"&gt;From Shackles to Anchors&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How GitHub Copilot Made This Possible
&lt;/h2&gt;

&lt;p&gt;This wasn't a "write 1000 lines of boilerplate" session. It was something more interesting.&lt;/p&gt;

&lt;p&gt;The most valuable Copilot interactions weren't code completions — they were &lt;strong&gt;discussions about design philosophy.&lt;/strong&gt; I pasted a Chinese subtitle file about Zhuangzi into the conversation. Copilot connected it to LLM harness design. We iterated on the "2.5 layer" concept together — not as master and tool, but as two collaborators refining an idea.&lt;/p&gt;

&lt;p&gt;Copilot didn't just generate code. It:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Challenged my assumptions&lt;/strong&gt; — when I proposed adding more rules, it pointed out I was building a "good butcher's knife"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connected disparate ideas&lt;/strong&gt; — Zhuangzi's butcher 🠒 Transformer pattern matching 🠒 anchor-based guidance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generated the code changes&lt;/strong&gt; — all 8 framework modifications implemented in one continuous session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wrote the benchmark&lt;/strong&gt; — created the reproducible comparison test&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drafted the philosophy document&lt;/strong&gt; — translated Chinese insights into English&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The final framework has 236 files in sync across 4 AI client adapters, all 98 unit tests pass, and the generated code is measurably cleaner. But the real transformation was in the design philosophy.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;The project went from an abandoned rule-collection to a coherent, philosophy-driven framework with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4 AI client adapters&lt;/strong&gt; (Claude Code, Cursor, OpenCode, Gemini CLI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;13 skills&lt;/strong&gt; with anchor-based guidance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 sub-agents&lt;/strong&gt; for specialized tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4 project starter templates&lt;/strong&gt; for &lt;code&gt;forge-scaffold init&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;944 files&lt;/strong&gt; in perfect sync across all adapters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;98 unit tests&lt;/strong&gt;, all passing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A published design manifesto&lt;/strong&gt; with benchmark evidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A GitHub Issue&lt;/strong&gt; explaining the philosophy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More importantly, the framework no longer fights the model. It works &lt;em&gt;with&lt;/em&gt; it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repository&lt;/strong&gt;: &lt;a href="https://github.com/zxpmail/ReqForge" rel="noopener noreferrer"&gt;github.com/zxpmail/ReqForge&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design Manifesto&lt;/strong&gt;: &lt;a href="https://github.com/zxpmail/ReqForge/issues/1" rel="noopener noreferrer"&gt;Issue #1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark Data&lt;/strong&gt;: &lt;a href="https://github.com/zxpmail/ReqForge/tree/main/benchmark" rel="noopener noreferrer"&gt;benchmark/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark Technical Post&lt;/strong&gt;: &lt;a href="https://github.com/zxpmail/ReqForge/blob/main/docs/benchmark-technical-post.md" rel="noopener noreferrer"&gt;docs/benchmark-technical-post.md&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Built with GitHub Copilot, from an abandoned repo to a published design philosophy — all in one continuous session.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devchallenge</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Less Is More: Why 3 Code Examples Beat 10 Rules for LLM Code Generation</title>
      <dc:creator>zxpmail</dc:creator>
      <pubDate>Sat, 06 Jun 2026 11:55:06 +0000</pubDate>
      <link>https://dev.to/zxpmail/less-is-more-why-3-code-examples-beat-10-rules-for-llm-code-generation-3n08</link>
      <guid>https://dev.to/zxpmail/less-is-more-why-3-code-examples-beat-10-rules-for-llm-code-generation-3n08</guid>
      <description>&lt;p&gt;&lt;em&gt;A controlled benchmark comparing two approaches to guiding LLM code generation.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Question
&lt;/h2&gt;

&lt;p&gt;Most LLM harnesses guide code generation via rules: "Don't hardcode API keys." "Don't use empty catch blocks." "Don't over-abstract."&lt;/p&gt;

&lt;p&gt;But LLMs aren't logic engines. They're pattern matchers. Every "don't" rule adds cognitive load — the model must actively suppress its natural generation pattern while simultaneously constructing code.&lt;/p&gt;

&lt;p&gt;What if we flipped the approach? Instead of telling the model what NOT to do, give it 3 perfect examples of what TO do. Let its pattern-matching do the work.&lt;/p&gt;

&lt;p&gt;Does it matter? I ran a controlled test to find out.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project&lt;/strong&gt;: &lt;a href="https://github.com/zxpmail/ReqForge/tree/main/test-demo/todo-cli" rel="noopener noreferrer"&gt;todo-cli&lt;/a&gt; — a simple CLI todo list tool (Node.js + TypeScript, 6 source files, 5 test files).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task&lt;/strong&gt;: Add a &lt;code&gt;search&lt;/code&gt; command with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keyword search (case insensitive)&lt;/li&gt;
&lt;li&gt;Optional &lt;code&gt;--category&lt;/code&gt; filter&lt;/li&gt;
&lt;li&gt;Grouped output matching existing style&lt;/li&gt;
&lt;li&gt;5 test cases covering normal, empty, filtered, case-insensitive, and error scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Two approaches&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Context given to LLM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OLD (rules)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9-item "don't" checklist (no over-abstraction, no hallucinated APIs, no empty catches, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NEW (anchors)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3 short code snippets showing the project's error handling pattern, API endpoint pattern, and test pattern + 4-item safety checklist&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both received the exact same task definition. Both were implemented in the same environment. Both passed the same test suite.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;OLD (9 rules)&lt;/th&gt;
&lt;th&gt;NEW (3 anchors)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tests passed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;26/26&lt;/td&gt;
&lt;td&gt;26/26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;53 lines&lt;/td&gt;
&lt;td&gt;45 lines (−15%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Filter logic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2-step filter + pre-built Map&lt;/td&gt;
&lt;td&gt;1-step filter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Naming&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;trimmedKeyword.toLowerCase()&lt;/code&gt; called each iteration&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;lowerKeyword&lt;/code&gt; extracted once&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Type safety&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;plain &lt;code&gt;string&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;TodoCategory[]&lt;/code&gt; typed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Extra validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;invalid category check with error message&lt;/td&gt;
&lt;td&gt;omitted (simpler)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both produced functionally identical, fully tested code. The NEW approach produced code that was 15% shorter and structurally simpler.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Code Difference
&lt;/h2&gt;

&lt;p&gt;Here's the core difference in the search logic:&lt;/p&gt;

&lt;h3&gt;
  
  
  OLD (rules-guided)
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let filtered = todos.filter(t =&amp;gt;
  t.description.toLowerCase().includes(trimmedKeyword.toLowerCase())
);

if (category) {
  const validCategory = CATEGORY_ORDER.includes(category);
  if (!validCategory) {
    console.log(`Invalid category: ${category}`);
    return;
  }
  filtered = filtered.filter(t =&amp;gt; t.category === category);
}

// ...then build a grouped Map for output
const grouped: Record&amp;lt;string, typeof todos&amp;gt; = {};
for (const cat of CATEGORY_ORDER) grouped[cat] = [];
for (const todo of filtered) grouped[todo.category]?.push(todo);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The model followed the rules literally: validate everything, check every boundary. The result is safe but verbose — two filter passes + a pre-built Map.&lt;/p&gt;

&lt;h3&gt;
  
  
  NEW (anchor-guided)
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const lowerKeyword = trimmed.toLowerCase();

const filtered = todos.filter(t =&amp;gt; {
  const matchesKeyword = t.description.toLowerCase().includes(lowerKeyword);
  if (!category) return matchesKeyword;
  return matchesKeyword &amp;amp;&amp;amp; t.category === category;
});

// ...group via runtime filter (matching list.ts style)
for (const cat of CATEGORY_ORDER) {
  const items = filtered.filter(t =&amp;gt; t.category === cat);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The model saw the existing &lt;code&gt;list.ts&lt;/code&gt; pattern (runtime filter) and naturally followed it. &lt;code&gt;lowerKeyword&lt;/code&gt; is extracted once. Category filter is rolled into the same pass. No pre-built Map — same approach the existing codebase uses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;The 9-rule checklist created a &lt;strong&gt;constraint-satisfaction problem&lt;/strong&gt;: the model had to simultaneously satisfy 9 negative constraints while generating code. Each constraint competes for attention. The result? Conservative code that over-validates.&lt;/p&gt;

&lt;p&gt;The 3 anchor examples created a &lt;strong&gt;pattern-continuation problem&lt;/strong&gt;: the model saw three correct examples, recognized the pattern, and continued it. No constraints to satisfy — just a familiar path to follow.&lt;/p&gt;

&lt;p&gt;This aligns with how Transformers work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern matching&lt;/strong&gt; is what they do best (attention over repeated patterns)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logical constraint satisfaction&lt;/strong&gt; is what they do worst (requires combining multiple independent conditions)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What This Doesn't Prove
&lt;/h2&gt;

&lt;p&gt;This is one test, one task, one project. It doesn't prove anchors are universally better.&lt;/p&gt;

&lt;p&gt;What it does suggest: &lt;strong&gt;the gap between the two approaches is real but not dramatic.&lt;/strong&gt; At the scale of a single 50-line function, the difference is marginal. At the scale of a 100-file project, a consistent 15% reduction in code volume with no loss in correctness or safety is worth paying attention to.&lt;/p&gt;

&lt;p&gt;The full reproducible benchmark (contexts, task definition, generated code) is in the &lt;a href="https://github.com/zxpmail/ReqForge/tree/main/benchmark" rel="noopener noreferrer"&gt;ReqForge repo&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The two prompt contexts are checked into the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OLD&lt;/strong&gt;: &lt;a href="https://github.com/zxpmail/ReqForge/blob/main/benchmark/context-OLD.md" rel="noopener noreferrer"&gt;benchmark/context-OLD.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NEW&lt;/strong&gt;: &lt;a href="https://github.com/zxpmail/ReqForge/blob/main/benchmark/context-NEW.md" rel="noopener noreferrer"&gt;benchmark/context-NEW.md&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pick a small feature in your own project. Run it twice — once with each context. See if you get the same result.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This benchmark was run as part of the ReqForge project, which implements the "anchor" approach across all 6 of its skills. The full design philosophy is explained in &lt;a href="https://github.com/zxpmail/ReqForge/issues/1" rel="noopener noreferrer"&gt;From Shackles to Anchors&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Repository: &lt;a href="https://github.com/zxpmail/ReqForge" rel="noopener noreferrer"&gt;github.com/zxpmail/ReqForge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>programming</category>
      <category>softwaredevelopment</category>
    </item>
  </channel>
</rss>
