<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: YuhaoLin2005</title>
    <description>The latest articles on DEV Community by YuhaoLin2005 (@yuhaolin2005).</description>
    <link>https://dev.to/yuhaolin2005</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4004437%2F73f778a5-214a-4088-a459-1fe4e47bf755.png</url>
      <title>DEV Community: YuhaoLin2005</title>
      <link>https://dev.to/yuhaolin2005</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yuhaolin2005"/>
    <language>en</language>
    <item>
      <title>pip install self-audit: A Zero-Dependency CLI for AI Output Quality</title>
      <dc:creator>YuhaoLin2005</dc:creator>
      <pubDate>Sat, 27 Jun 2026 00:12:35 +0000</pubDate>
      <link>https://dev.to/yuhaolin2005/pip-install-self-audit-a-zero-dependency-cli-for-ai-output-quality-mdl</link>
      <guid>https://dev.to/yuhaolin2005/pip-install-self-audit-a-zero-dependency-cli-for-ai-output-quality-mdl</guid>
      <description>&lt;p&gt;AI agents pass tests while producing sloppy thinking. They say "should work" without evidence. They present partial work as complete. They embellish.&lt;/p&gt;

&lt;p&gt;I built a tiny tool that catches this. It checks any text across four dimensions: Completeness, Consistency, Groundedness, and Honesty.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;pip install self-audit&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;echo "Should work fine. Ready to ship." | self-audit --verbose&lt;/p&gt;

&lt;h1&gt;
  
  
  Completeness: FIXED
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Groundedness: FIXED  [should work fine]
&lt;/h1&gt;

&lt;h1&gt;
  
  
  FAIL
&lt;/h1&gt;

&lt;p&gt;Zero dependencies. Python 3.8+. Stdlib only. 60 lines of core logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Dimensions
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;What it catches&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Completeness&lt;/td&gt;
&lt;td&gt;Did I answer everything?&lt;/td&gt;
&lt;td&gt;Missing requirements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consistency&lt;/td&gt;
&lt;td&gt;Did I contradict myself?&lt;/td&gt;
&lt;td&gt;A-and-not-A patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Groundedness&lt;/td&gt;
&lt;td&gt;Did I show evidence?&lt;/td&gt;
&lt;td&gt;"should work" claims&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Honesty&lt;/td&gt;
&lt;td&gt;Am I honest about limits?&lt;/td&gt;
&lt;td&gt;Embellishment, TODO stubs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The dimensions are grounded in Anthropic Constitutional AI framework — Completeness (helpfulness), Groundedness (harmlessness), Honesty (truthfulness), Consistency (rule alignment).&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it on your own output
&lt;/h2&gt;

&lt;p&gt;After any AI-assisted coding session, pipe the agent text through self-audit before shipping. You will be surprised what it catches.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/YuhaoLin2005/self-audit" rel="noopener noreferrer"&gt;https://github.com/YuhaoLin2005/self-audit&lt;/a&gt;&lt;br&gt;
Claude Code skill: &lt;a href="https://github.com/anthropics/skills/pull/1361" rel="noopener noreferrer"&gt;https://github.com/anthropics/skills/pull/1361&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
    </item>
    <item>
      <title>Single-Modal LLMs Have a Blind Spot. Here's How to Fix It.</title>
      <dc:creator>YuhaoLin2005</dc:creator>
      <pubDate>Fri, 26 Jun 2026 19:56:45 +0000</pubDate>
      <link>https://dev.to/yuhaolin2005/single-modal-llms-have-a-blind-spot-heres-how-to-fix-it-2ogd</link>
      <guid>https://dev.to/yuhaolin2005/single-modal-llms-have-a-blind-spot-heres-how-to-fix-it-2ogd</guid>
      <description>&lt;p&gt;If you use Claude Code, Cursor, or any AI coding agent, you know the problem: you ask the AI to review its own work, and it says "looks good." Every time.&lt;/p&gt;

&lt;p&gt;The AI isn't being lazy. It's sharing the same mental model that produced the code. It literally can't see what's wrong — the blind spots are baked in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Review this as a senior engineer" Fails
&lt;/h2&gt;

&lt;p&gt;Generic role-playing produces generic findings. The AI fills in what it &lt;em&gt;thinks&lt;/em&gt; a senior engineer would say, which is usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Consider adding error handling"&lt;/li&gt;
&lt;li&gt;"Maybe add more comments"&lt;/li&gt;
&lt;li&gt;"The variable name could be clearer"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Useful? Sometimes. But it misses the real bugs — the ones a human reviewer with a genuinely different perspective would catch.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works: Named-Persona Review
&lt;/h2&gt;

&lt;p&gt;Here's the method. It takes 5 minutes and costs nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Pick three named people, not generic roles.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instead of&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Review as a security engineer"&lt;/td&gt;
&lt;td&gt;"Review as &lt;strong&gt;Linus Torvalds&lt;/strong&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Check for maintainability"&lt;/td&gt;
&lt;td&gt;"Check as &lt;strong&gt;Ken Thompson&lt;/strong&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Think about the user"&lt;/td&gt;
&lt;td&gt;"Think as &lt;strong&gt;Steve Jobs&lt;/strong&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Why? Because Linus Torvalds has an actual documented philosophy: "Good taste is when the special case disappears. Eliminate the edge case by changing the data structure." That produces &lt;em&gt;different findings&lt;/em&gt; than "check for security."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Search for their real philosophy before role-playing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Search: &lt;code&gt;"Linus Torvalds engineering philosophy code review"&lt;/code&gt;. Take 60 seconds per person. Extract 3-5 actual criteria from their documented words. Now role-play.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Each person MUST find at least one issue.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No exceptions. If Ken Thompson finds nothing wrong with your code, you're not thinking like Ken Thompson. Go back and look harder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Run the Feynman check.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After the review, ask: "Would this person actually say what I just said, or am I projecting?" Feynman's actual rule: "The first principle is that you must not fool yourself, and you are the easiest person to fool."&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Starter Pack
&lt;/h2&gt;

&lt;p&gt;For your first try, use these three:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Linus Torvalds&lt;/strong&gt; — Data structures, logic, correctness. Search for "good taste."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ken Thompson&lt;/strong&gt; — Architecture, API design, simplicity. Search for "do one thing well."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steve Jobs&lt;/strong&gt; — User experience, first impressions, clarity. Search for "design is how it works."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two engineers + one product person. Five minutes. You'll find bugs you'd never catch reviewing as yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works (The Science)
&lt;/h2&gt;

&lt;p&gt;de Bono's &lt;em&gt;Six Thinking Hats&lt;/em&gt; (1985) established that parallel multi-perspective thinking produces better decisions than single-perspective analysis. Kahneman's &lt;em&gt;Thinking, Fast and Slow&lt;/em&gt; (2011) showed that forcing System 2 (analytical) thinking overcomes the biases of System 1 (intuitive).&lt;/p&gt;

&lt;p&gt;Named-persona review is a &lt;strong&gt;System 2 forcing function.&lt;/strong&gt; By grounding each perspective in real, searchable philosophy, it prevents the AI from defaulting to generic "looks good" mode.&lt;/p&gt;

&lt;p&gt;This is especially valuable for &lt;strong&gt;single-modal LLMs&lt;/strong&gt; — models that can only process text. They lack the diverse sensory input that helps humans notice different things. Multi-perspective review compensates for that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Now
&lt;/h2&gt;

&lt;p&gt;Take any PR, code review, or document you're working on. Paste this prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review the following using Named-Persona Adversarial Review:

PERSONA 1: Ken Thompson (Unix philosophy)
- Search: "Ken Thompson Unix philosophy do one thing well"
- Find: 3-5 criteria from his actual words
- Review the code. MUST find &amp;gt;= 1 issue.

PERSONA 2: Linus Torvalds (Linux/git)
- Search: "Linus Torvalds good taste code review"
- Find: 3-5 criteria from his actual words
- Review the code. MUST find &amp;gt;= 1 issue.

PERSONA 3: Steve Jobs (Apple)
- Search: "Steve Jobs simplicity design principles"
- Find: 3-5 criteria from his actual words
- Review the code from a user's perspective. MUST find &amp;gt;= 1 issue.

OUTPUT: Structured report with CRITICAL/WARNING/NOTE findings.
PROMOTION: Issues caught by 2+ personas get severity upgrade.
HONESTY CHECK: Would these people actually say this?

&amp;lt;your code here&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. You just ran a multi-perspective review. No API keys, no agents, no cost. Just better quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Go Deeper
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/alirezarezvani/claude-skills/pull/866" rel="noopener noreferrer"&gt;Named-Persona Adversarial Review&lt;/a&gt; — Full Claude Code skill&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/YuhaoLin2005/open-source-flywheel" rel="noopener noreferrer"&gt;open-source-flywheel&lt;/a&gt; — Methodology for turning personal tools into contributions&lt;/li&gt;
&lt;li&gt;de Bono, &lt;em&gt;Six Thinking Hats&lt;/em&gt; (1985) — Theoretical foundation&lt;/li&gt;
&lt;li&gt;Kahneman, &lt;em&gt;Thinking, Fast and Slow&lt;/em&gt; (2011) — Cognitive science backing&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Let 24 Famous Engineers Review My Methodology. Here's What Happened.</title>
      <dc:creator>YuhaoLin2005</dc:creator>
      <pubDate>Fri, 26 Jun 2026 19:52:04 +0000</pubDate>
      <link>https://dev.to/yuhaolin2005/i-let-24-famous-engineers-review-my-methodology-heres-what-happened-31k4</link>
      <guid>https://dev.to/yuhaolin2005/i-let-24-famous-engineers-review-my-methodology-heres-what-happened-31k4</guid>
      <description>&lt;p&gt;I spent tonight building a methodology for turning personal tools into open source contributions. Before publishing it, I decided to let the methodology review itself.&lt;/p&gt;

&lt;p&gt;The results changed how I think about AI-assisted code review.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Method: Named-Persona Adversarial Review
&lt;/h2&gt;

&lt;p&gt;The core idea is simple: instead of asking an AI to "review this code as a security engineer" (which produces generic, shallow feedback), you &lt;strong&gt;web-search actual engineers' documented philosophies&lt;/strong&gt; and role-play as them.&lt;/p&gt;

&lt;p&gt;Not "be a security auditor." Be &lt;strong&gt;Linus Torvalds&lt;/strong&gt;, who said good code is when the special case disappears. Be &lt;strong&gt;Ken Thompson&lt;/strong&gt;, who said each program should do one thing well. Be &lt;strong&gt;Richard Feynman&lt;/strong&gt;, who said the easiest person to fool is yourself.&lt;/p&gt;

&lt;p&gt;Two engineers + one product person per round. Three rounds minimum. Nine genuinely different perspectives.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;p&gt;I applied this to my own &lt;a href="https://github.com/YuhaoLin2005/open-source-flywheel" rel="noopener noreferrer"&gt;open-source-flywheel&lt;/a&gt; methodology. Eight rounds. Twenty-four persona views.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Round 1: Torvalds immediately found a flaw.&lt;/strong&gt; "Your 'When NOT to Use' table repeats the same rules as your Pre-Flight checklist. Merge them. A good data structure eliminates the special case." He was right. Gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Round 2: Feynman asked the hardest question.&lt;/strong&gt; "You only show successes. Where are the failures?" Without documented failures, it was unfalsifiable — the worst kind of cargo cult science. Added three failure cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Round 3: Kahneman called out overconfidence.&lt;/strong&gt; "Self-assessment of originality is inherently biased. Ask someone else to answer your Pre-Flight questions." Added immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rounds 4-5: Carmack wanted time estimates. Musk wanted first-principles decomposition. Beck wanted respect for maintainers.&lt;/strong&gt; All actionable, all added.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Round 6: Bret Victor, Don Norman, and Eric Ries independently flagged the same thing.&lt;/strong&gt; The methodology had no feedback loop. Three legends of design and lean startup, across different decades, pointing at the exact same gap. When Norman, Ries, and Victor agree, you listen. Added LEARN step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rounds 7-8: Brooks confirmed we're not selling a silver bullet.&lt;/strong&gt; Knuth confirmed the structure. Ritchie and Berners-Lee confirmed it was simple enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;v1 (8.3KB) ➜ v8 (~4KB) while gaining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fifth step: LEARN (feedback loop)&lt;/li&gt;
&lt;li&gt;Pre-flight bias protection (ask someone else)&lt;/li&gt;
&lt;li&gt;Bot-check (verify automated review)&lt;/li&gt;
&lt;li&gt;Time estimates per step&lt;/li&gt;
&lt;li&gt;Three documented failure cases&lt;/li&gt;
&lt;li&gt;24-persona review badge&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Single-modal LLMs have a fundamental problem: reviewing their own output means sharing the same mental model, same blind spots, same tendency to say "looks good."&lt;/p&gt;

&lt;p&gt;Named-persona review compensates. By grounding each review in &lt;strong&gt;real, searchable, citable philosophy&lt;/strong&gt;, you force genuine context switches. Ken Thompson actually said "do one thing well." That constraint produces different findings than "review as a senior architect."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Meta-Lesson
&lt;/h2&gt;

&lt;p&gt;A methodology that can review itself is more trustworthy than one that can't. Mine found 8 bugs in itself before I published it.&lt;/p&gt;

&lt;p&gt;Try it: &lt;a href="https://github.com/YuhaoLin2005/open-source-flywheel" rel="noopener noreferrer"&gt;open-source-flywheel&lt;/a&gt;. Or just search your favorite engineer's philosophy and role-play. The method is free.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;24 personas, 8 rounds, 5 steps. Thanks to de Bono (Six Thinking Hats), alirezarezvani (adversarial-reviewer format), and Feynman (don't fool yourself).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>codequality</category>
    </item>
    <item>
      <title>Has Anyone Measured How LLM Output Quality Degrades Across Multiple Compactions?</title>
      <dc:creator>YuhaoLin2005</dc:creator>
      <pubDate>Fri, 26 Jun 2026 18:49:52 +0000</pubDate>
      <link>https://dev.to/yuhaolin2005/has-anyone-measured-how-llm-output-quality-degrades-across-multiple-compactions-1dad</link>
      <guid>https://dev.to/yuhaolin2005/has-anyone-measured-how-llm-output-quality-degrades-across-multiple-compactions-1dad</guid>
      <description>&lt;h2&gt;
  
  
  The Observation
&lt;/h2&gt;

&lt;p&gt;After ~70 sessions with DeepSeek V4 (1M context), I noticed something odd. When Claude Code compacts my session, output quality doesn't just go down linearly. There's a moment — usually after the second compaction — where the model briefly gets &lt;em&gt;better&lt;/em&gt;. Then it declines and never recovers.&lt;/p&gt;

&lt;p&gt;Maybe I'm imagining it. Maybe it's specific to my model, my prompts, my workflow. But I can't shake the thought: &lt;strong&gt;what if context compaction has a curve, and nobody has mapped it?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Found (Not Much)
&lt;/h2&gt;

&lt;p&gt;I searched for benchmarks that measure multi-round compaction degradation. Here's what exists:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RULER&lt;/strong&gt;: Measures how performance drops as &lt;em&gt;static&lt;/em&gt; input grows longer. Nothing about what happens after you compress and re-compress.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Rot&lt;/strong&gt; (Chroma 2025): 18 models tested, all degrade with more tokens. Again, static.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn evaluation&lt;/strong&gt;: Tests whether models drift across conversation turns. Doesn't touch compaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Parameter compression (pruning, quantization) has well-mapped scaling laws. The Lottery Ticket Hypothesis (ICLR 2019) and Compression Laws for LLMs (2025) tell you exactly where the performance peak sits. Context summarization — the thing that happens every time your agent runs &lt;code&gt;/compact&lt;/code&gt; — has no such curve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Might Matter
&lt;/h2&gt;

&lt;p&gt;If the curve is real, you could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Know exactly when to start a fresh session (before the decline hits)&lt;/li&gt;
&lt;li&gt;Compare models on a new dimension: who maintains quality longest across compactions?&lt;/li&gt;
&lt;li&gt;Give LLM providers a concrete target: "your compaction quality drops 20% faster than competitor X"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right now, none of the major benchmark suites (MMLU, HELM, BigBench, RULER) include a "compaction persistence" metric. If context windows keep growing and sessions keep getting longer, this gap gets bigger every year.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm Asking
&lt;/h2&gt;

&lt;p&gt;I built a tiny monitor (&lt;a href="https://github.com/YuhaoLin2005/compact-counter-concept" rel="noopener noreferrer"&gt;compact-counter&lt;/a&gt;) and a rough &lt;a href="https://github.com/YuhaoLin2005/compact-counter-concept/blob/master/EXPERIMENT.md" rel="noopener noreferrer"&gt;experiment framework&lt;/a&gt; — 50 lines of Python, 10 benchmark tasks, 0-5 rubric. It's not polished. It's a starting point.&lt;/p&gt;

&lt;p&gt;What I'd love:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Someone with a Claude Opus / GPT-5 / Gemini account to try reproducing this&lt;/li&gt;
&lt;li&gt;Feedback on whether the methodology makes sense or is fundamentally flawed&lt;/li&gt;
&lt;li&gt;If this is a real thing, ideas for how to measure it properly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I don't have the compute or the stats background to do this alone. But if enough people contribute data points across different models, we might find out whether this curve exists — and if it does, maybe it's useful to more people than just me.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Frankle &amp;amp; Carbin, "The Lottery Ticket Hypothesis" (ICLR 2019)&lt;/li&gt;
&lt;li&gt;"Compression Laws for Large Language Models" (2025)&lt;/li&gt;
&lt;li&gt;RULER: What's the Real Context Size of Your Long-Context Language Models? (COLM 2024)&lt;/li&gt;
&lt;li&gt;Chroma Research, "Context Rot" (2025)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
