<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sergei Frangulov</title>
    <description>The latest articles on DEV Community by Sergei Frangulov (@sfrangulov).</description>
    <link>https://dev.to/sfrangulov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3938011%2Ff552f098-5d6f-44e8-9729-86b896e73aaa.jpeg</url>
      <title>DEV Community: Sergei Frangulov</title>
      <link>https://dev.to/sfrangulov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sfrangulov"/>
    <language>en</language>
    <item>
      <title>Claude Code is not a recursive agent. I read the source and checked.</title>
      <dc:creator>Sergei Frangulov</dc:creator>
      <pubDate>Sun, 07 Jun 2026 11:19:36 +0000</pubDate>
      <link>https://dev.to/sfrangulov/claude-code-is-not-a-recursive-agent-i-read-the-source-and-checked-kll</link>
      <guid>https://dev.to/sfrangulov/claude-code-is-not-a-recursive-agent-i-read-the-source-and-checked-kll</guid>
      <description>&lt;p&gt;A source map shipped in the v2.1.88 npm release: about 1,884 files under &lt;code&gt;src/&lt;/code&gt;, original names and comments intact. So I walked the core modules and checked what everyone "knows" about how Claude Code works against what the code actually does.&lt;/p&gt;

&lt;p&gt;Half of it was wrong. Including things I'd repeated myself.&lt;/p&gt;

&lt;p&gt;I did not read it line by line. Nobody reads 1,884 files line by line. I walked the key modules and tied every claim to something concrete: a function, a constant. So you'll see names like &lt;code&gt;queryLoop&lt;/code&gt; and &lt;code&gt;AUTOCOMPACT_BUFFER_TOKENS&lt;/code&gt; below. Real identifiers, so every claim is checkable against the public teardowns, not vibes.&lt;/p&gt;

&lt;p&gt;This is a map of how it works, not a dump of its guts. I don't quote internal prompts, and anything that only runs in Anthropic's internal builds is flagged as such.&lt;/p&gt;

&lt;h2&gt;
  
  
  Myth 1: "The agent recursively calls itself on every tool result"
&lt;/h2&gt;

&lt;p&gt;The picture everyone has: model replies, tool runs, the agent calls itself again, deeper down the stack.&lt;/p&gt;

&lt;p&gt;There's no recursion.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/query.ts&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;queryLoop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ...run model, run tools...&lt;/span&gt;
    &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;   &lt;span class="c1"&gt;// overwrite in place&lt;/span&gt;
    &lt;span class="k"&gt;continue&lt;/span&gt;               &lt;span class="c1"&gt;// not a nested call&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One &lt;code&gt;while (true)&lt;/code&gt; inside an async generator. It mutates a single &lt;code&gt;State&lt;/code&gt; object and &lt;code&gt;continue&lt;/code&gt;s. The stack never grows deeper.&lt;/p&gt;

&lt;p&gt;Why it matters: every budget, timeout and turn limit you set is counted &lt;strong&gt;per loop pass&lt;/strong&gt;, not per stack frame. "One turn" is literally "one pass." Once I stopped picturing a recursive agent and started picturing a long-running stateful loop, the same tricks I'd use on any long loop applied: count the budget per step, watch what changed between steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Myth 2: "When the context fills up, it just truncates"
&lt;/h2&gt;

&lt;p&gt;This is the interesting one. Context isn't one "drop the old stuff" function. It's five mechanisms, ordered cheapest to most expensive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;snip  -&amp;gt;  microcompact  -&amp;gt;  context-collapse  -&amp;gt;  autocompact  -&amp;gt;  reactive
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The order is deliberate. Each later stage sits after the earlier one precisely so that if a cheap stage already freed space, the expensive one does nothing. The comment says it outright: run collapse before autocompact so autocompact often never fires.&lt;/p&gt;

&lt;p&gt;Cheap stages drop old tool results, surgically. The expensive one, &lt;code&gt;autocompact&lt;/code&gt;, makes a separate model call to summarize the whole history. It kicks in at the effective context window minus &lt;code&gt;AUTOCOMPACT_BUFFER_TOKENS&lt;/code&gt;, a 13,000-token reserve for the summary itself.&lt;/p&gt;

&lt;p&gt;Here's what sent me digging. &lt;code&gt;autocompact&lt;/code&gt; has a silent fuse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="c1"&gt;// after 3 failed compactions in a row, autocompact&lt;/span&gt;
&lt;span class="c1"&gt;// shuts off for the rest of the session. silently.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The comment explains why it's there: sessions were hitting 50+ consecutive failures, up to 3,272 in one session, burning roughly 250,000 extra API calls a day across all users.&lt;/p&gt;

&lt;p&gt;Translation: a session that "felt fine for hours" could have spent part of that time running on surgical drops alone, with no real compaction, and the UI would never tell you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Myth 3: "A full compaction always keeps the last few messages verbatim"
&lt;/h2&gt;

&lt;p&gt;I was sure compaction kept the most recent messages word for word and only touched older ones. For a full &lt;code&gt;autocompact&lt;/code&gt;, no.&lt;/p&gt;

&lt;p&gt;On a full compaction the message array is rebuilt from scratch: a boundary marker, the summary, and a few files pulled back in. &lt;code&gt;messagesToKeep&lt;/code&gt; is empty. The verbatim tail survives only in the other modes (partial, reactive, session-memory compaction), which carry a note that recent messages are kept as-is. Full mode doesn't.&lt;/p&gt;

&lt;p&gt;The uncomfortable part: after a full compaction the model does not remember your last couple of messages word for word. It remembers a retelling of your conversation that it wrote itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Myth 4: "Tools run after the model finishes talking"
&lt;/h2&gt;

&lt;p&gt;No. The tool starts before the model is done with its sentence.&lt;/p&gt;

&lt;p&gt;The moment a &lt;code&gt;tool_use&lt;/code&gt; block shows up in the stream and you don't hit cancel in that split second, &lt;code&gt;StreamingToolExecutor&lt;/code&gt; has already started it. The model is still typing, and the edit on disk has already happened. Parallelism is capped by &lt;code&gt;CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY&lt;/code&gt; (default 10): what's safe to run together goes in a batch, the rest queues.&lt;/p&gt;

&lt;p&gt;One detail I lost half an hour to. The executor has its own child abort controller. If one parallel tool fails (say &lt;code&gt;Bash&lt;/code&gt;), it instantly kills its siblings but does not abort the turn. The killed sibling gets something like "cancelled because a neighbor call failed." So you sit there wondering why a command that depended on nothing didn't run. It depended on a neighbor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Myth 5: "One model message is one reply, and stop_reason tells you if a tool was called"
&lt;/h2&gt;

&lt;p&gt;Two facts that save hours of debugging.&lt;/p&gt;

&lt;p&gt;First, the easy one: Claude Code sends a separate assistant message per block, not one for the whole reply. Text, thinking, tool call: each ships as its own message the moment it's finished.&lt;/p&gt;

&lt;p&gt;Second, the nasty one. When a block closes, &lt;code&gt;stop_reason&lt;/code&gt; is always &lt;code&gt;null&lt;/code&gt;. The real value arrives later, as a separate event, and gets written in after the fact by editing the message that was already sent. The code is honest about it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// stop_reason === 'tool_use' is unreliable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the loop doesn't trust it. To decide whether a tool was called, it checks the fact: did a &lt;code&gt;tool_use&lt;/code&gt; block arrive or not. If you've written a wrapper over a stream like this and hit races on &lt;code&gt;stop_reason&lt;/code&gt;, now you know where they came from.&lt;/p&gt;

&lt;h2&gt;
  
  
  Myth 6: "Permissions are just a chain: user -&amp;gt; project -&amp;gt; local -&amp;gt; policy"
&lt;/h2&gt;

&lt;p&gt;The layers exist, and it sounds logical: higher layer wins. I thought so too. The order isn't by source. It's by strictness.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;deny&lt;/code&gt; beats everything, including &lt;code&gt;bypassPermissions&lt;/code&gt;. Then, in decreasing strictness: targeted &lt;code&gt;ask&lt;/code&gt; rules and safety checks, then the bypass itself, then &lt;code&gt;allow&lt;/code&gt; rules, and only what nobody explicitly allowed finally reaches "ask the user." A denial always outranks a bypass.&lt;/p&gt;

&lt;p&gt;The bigger surprise is the zones the bypass doesn't pierce at all. Even under &lt;code&gt;bypassPermissions&lt;/code&gt; (which supposedly means allow everything, don't ask), edits to &lt;code&gt;.git/&lt;/code&gt;, &lt;code&gt;.claude/&lt;/code&gt;, &lt;code&gt;.vscode/&lt;/code&gt; and shell config files still hit a confirmation. The logic is simple: let the agent edit its own settings unprompted and it writes itself a pass out of the permission sandbox. Defaults lean the same way: until a tool declares it's read-only and safe to parallelize, the system assumes it writes and can't be parallelized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Myth 7: "A subagent is just another Claude running next to you"
&lt;/h2&gt;

&lt;p&gt;A subagent is an isolated fork, and the isolation is harder than it looks. It gets its own &lt;code&gt;agentId&lt;/code&gt;, its own copy of the read-files list, and empty memory.&lt;/p&gt;

&lt;p&gt;And the part that bites. A normal subagent's &lt;code&gt;setAppState&lt;/code&gt; is empty by default, so it can't change application state. And because it can't, it's immediately handed the "don't ask" permission flag, and the rest follows on its own: a background subagent physically can't show a permission dialog, so any &lt;code&gt;ask&lt;/code&gt; it makes silently turns into &lt;code&gt;deny&lt;/code&gt;. Hand a subagent a task that hits a confirmation and it won't wait for you. It gets a no and drives on, as if you'd said no yourself.&lt;/p&gt;

&lt;p&gt;One more: in the public build a subagent can't spawn subagents. Multi-level agents live only in Anthropic's internal builds. For everyone else the hierarchy is flat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Myth 8: "Claude Code's extensions are a handful of standard hooks"
&lt;/h2&gt;

&lt;p&gt;Five extension mechanisms: MCP servers, plugins, skills, hooks, and slash commands. The plugin is the odd one out, an umbrella you can stuff the other four under. Then come the numbers I was actually digging for.&lt;/p&gt;

&lt;p&gt;There aren't five "canonical" hooks (&lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;PreToolUse&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;Stop&lt;/code&gt;, &lt;code&gt;UserPromptSubmit&lt;/code&gt;). There are 28, including events for teammate collaboration, tasks, working-directory changes, and file changes. And the hook contract isn't "non-zero means error." There are three outcomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;exit 0   -&amp;gt;  fine, proceed
exit 2   -&amp;gt;  hard block: action cancelled, model told why
other    -&amp;gt;  soft error: stderr shown to you, session continues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That third case is why there's a slightly funny guard: before running, it separately checks the plugin folder even exists. Otherwise a hook would run &lt;code&gt;python3 &amp;lt;missing file&amp;gt;.py&lt;/code&gt;, that would die with code 2, and one missing file would wedge &lt;code&gt;Stop&lt;/code&gt; and &lt;code&gt;UserPromptSubmit&lt;/code&gt; permanently. The session would never be able to end.&lt;/p&gt;

&lt;p&gt;Skills are their own story, and one constant explains all of it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SKILL_BUDGET_CONTEXT_PERCENT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;  &lt;span class="c1"&gt;// 1% of context for the whole skill list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only the header fits in that 1% (name, description, trigger). The &lt;code&gt;SKILL.md&lt;/code&gt; body loads only when the skill is actually called. That's why you can pile on dozens of skills and barely pay for them in context: until called, they're effectively not there.&lt;/p&gt;

&lt;p&gt;And the small thing that kills the isolation illusion. The MCP tool "namespace" is a fancy word for a plain string prefix, &lt;code&gt;mcp__server__tool&lt;/code&gt;. The server name and tool name are glued together, anything that isn't a letter, digit, &lt;code&gt;_&lt;/code&gt; or &lt;code&gt;-&lt;/code&gt; becomes an underscore, and permissions are handed out by that glued string. There's no real isolation behind the "namespace."&lt;/p&gt;

&lt;h2&gt;
  
  
  One last thing, and I don't like it
&lt;/h2&gt;

&lt;p&gt;I assumed the "do you trust this folder?" prompt was the very first thing Claude Code does on startup. It isn't. That prompt shows up noticeably later. By then startup has already run a good chunk of code (about a thousand lines, by the source), right next to an honest comment that security here is delicate.&lt;/p&gt;

&lt;p&gt;The delicate bit: &lt;code&gt;.claude/settings.json&lt;/code&gt; has already been read by that point, and it lives in the same folder you haven't trusted yet.&lt;/p&gt;

&lt;p&gt;So settings from an untrusted folder get to influence Claude Code before you've said yes. It's not quite a hole: the most sensitive modes double-check against the trust flag. But it sits wrong with me. I'd still rather know it than not.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually changed
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;I count budgets and timeouts per loop pass now, not per abstract "agent call" (Myth 1).&lt;/li&gt;
&lt;li&gt;I stopped believing the model remembers my last messages verbatim. After a full compaction it's working from a retelling it wrote itself (Myth 3).&lt;/li&gt;
&lt;li&gt;I'm careful with background subagents. Since any confirmation they hit becomes a &lt;code&gt;deny&lt;/code&gt;, I don't hand them tasks where a prompt is even possible (Myth 7).&lt;/li&gt;
&lt;li&gt;On compaction, I just try not to reach it. It's cheaper to clear context one extra time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this makes Claude Code worse. The opposite: behind almost every oddity in the code is an incident that happened, or a guard against one. You can read it in the comments. The picture most of us carry (mine included, until last week) is just drawn at altitude.&lt;/p&gt;

&lt;p&gt;If you build on Claude Code or the Anthropic API: which of these would have saved you a debugging session?&lt;/p&gt;

&lt;p&gt;If you just use Claude Code day to day: which one rewrites how you'll drive it tomorrow?&lt;/p&gt;

&lt;p&gt;And if you dug into the leaked source yourself: what did I get wrong?&lt;/p&gt;

</description>
      <category>agents</category>
      <category>claude</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Vibecoding in unskilled hands: 11 ways it quietly breaks</title>
      <dc:creator>Sergei Frangulov</dc:creator>
      <pubDate>Mon, 01 Jun 2026 05:27:35 +0000</pubDate>
      <link>https://dev.to/sfrangulov/vibecoding-in-unskilled-hands-11-ways-it-quietly-breaks-2gii</link>
      <guid>https://dev.to/sfrangulov/vibecoding-in-unskilled-hands-11-ways-it-quietly-breaks-2gii</guid>
      <description>&lt;p&gt;You can get a working demo out of an AI coding agent in an hour. That first hour is the trap.&lt;/p&gt;

&lt;p&gt;The speed is real. A prototype or a small script comes together in front of you, and it is easy to believe the whole project will go like that. It will not. Most vibecoding failures get blamed on the model. In my experience few of them are the model's fault. The bottleneck is almost always the person driving it, and the bill arrives later, on the long distance, where it is expensive to undo.&lt;/p&gt;

&lt;p&gt;Here are eleven places I keep watching it break, and what actually causes each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The short distance lies
&lt;/h2&gt;

&lt;p&gt;The first hour is genuine productivity. The curve flips after that. What sped you up early starts to slow you down: duplicates pile up, earlier decisions quietly contradict each other, and there is no single architecture holding it together. "Almost done" turns into months of patching loosely connected code. The beginner reads the easy start as a property of the whole road, and plans nothing for the tenth iteration or for coherence over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The visible success of AI projects is mostly a bubble
&lt;/h2&gt;

&lt;p&gt;Trending repositories and viral wrappers create the impression that everything works by itself. When I scanned GitHub trending in mid-2026, several agent repos had pulled hundreds of thousands of stars in seven or eight months: one skills framework near 202k, one open coding agent near 164k. That is faster than almost any historical open-source growth, and a large share of it is inflated. Marketing, benchmark-maxxed READMEs, and trending-as-a-service badges, not working software or organic demand. A small, well-packaged project earns a few thousand stars the honest way while the giants farm hundreds of thousands. A star is a vanity metric. Beginners calibrate against this storefront and conclude they are the ones doing it wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The model has no standing picture of your project
&lt;/h2&gt;

&lt;p&gt;Each run sees a limited context window, and that window gets actively trimmed to fit a budget. Some tools prune idle context by design, without telling you. So the model is sharp inside a tight, well-scoped task and loses the thread on a large one: it forgets earlier decisions, contradicts code it just wrote, and over-builds. Holding the whole system in your head is still a human job. At minimum you have to keep architecture docs current, which is its own discipline, and most people skip it.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Garbage in, garbage out, and you pay per token for it
&lt;/h2&gt;

&lt;p&gt;Weak input is the main source of bad output: a vague spec, no codebase context, no examples, no acceptance criteria. The model is not telepathic. It fills the gaps with the most probable answer, not the one you needed. The beginner opens a chat and types a wish instead of a brief, then spends the afternoon arguing with the result. Skilled operators spend most of their time here, before the first prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. One run is one sample, not a verdict
&lt;/h2&gt;

&lt;p&gt;Because models are tuned on human preference, they lean toward the most typical, average answer. Research on alignment (Kirk et al., ICLR 2024) found that RLHF measurably reduces the diversity of outputs for a given prompt. So a single response is one draw from a distribution that has already collapsed toward the median. It is not the best answer, and not the only correct one. Without a precise process you get the internet average instead of an engineering call for your context. Asking for several options and picking one helps, but only when there is somebody qualified to pick.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Reasoning depth is a dial, not a fixed trait
&lt;/h2&gt;

&lt;p&gt;"The model is lazy" is usually a misread. On current models, effort is a setting and depth follows from how you frame the task. The old habit of prompting "don't be lazy, be thorough" is now an anti-pattern: vendors warn that capable models over-trigger on it. The real skill is knowing when to turn effort up. Push it to maximum in the wrong place and you buy overthinking and a worse answer. The beginner never touches the dial and takes the first shallow response as the ceiling of what the model can do.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. It is a capable executor without a global view, not an autonomous engineer
&lt;/h2&gt;

&lt;p&gt;A useful calibration is by seniority. As a junior it is overpriced: you pay top-model rates for intern-level autonomy and still check every line. As a mid it is excellent on a well-scoped task. As a senior or architect it does not hold system coherence or judgment, and it cannot tell you what not to build. The beginner delegates exactly the part it cannot do.&lt;/p&gt;

&lt;p&gt;Buying autonomy off the shelf is no shortcut either. We run a multi-agent orchestration tool called gastown. The author loves westerns, so the agents are named mayor, deacon, convoy, hounds, raccoons. It took two weeks of spare evenings to half-integrate it into our pipeline, and even then not for every task. Simple tools are not really autonomous. Capable ones cost you weeks of setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Memory is primitive and needs manual management
&lt;/h2&gt;

&lt;p&gt;The assistant loses context between sessions and on resume, and it drops things on purpose to fit the window. This is not a guess. One popular coding agent's own postmortem admitted that it had quietly trimmed reasoning in idle sessions to save cost, shipped without a changelog, and that a hidden instruction to answer in under twenty-five words had measurably lowered output quality. Bloated instruction layers degrade output on their own, and people keep piling them on. If you do not understand how the memory behaves, you re-explain the same context every session and wonder why yesterday it understood and today it does not.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. The setup around the model is a separate skill
&lt;/h2&gt;

&lt;p&gt;The value comes from the layer you build around the model: project instructions, ready-made skills, hooks, tools, rules, context. Beginners work straight out of the box, never notice that layer exists, and blame the model for the result. This is applied knowledge of how your own system fits together, and it has to be budgeted like any other engineering work. The license is the cheap part.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Your skills expire the moment you learn them
&lt;/h2&gt;

&lt;p&gt;The operator's competence depreciates faster than the models change. A technique that worked last week is stale this week, and silent changes in model behavior keep your mental model out of date. Staying current is a daily habit, not a one-time milestone, and most people are not up for it. The only thing that works for me is reading the field every day. You can automate the gathering, but someone still has to filter the noise by hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  11. The line keeps moving, but responsibility does not
&lt;/h2&gt;

&lt;p&gt;The boundary between what the model can do and what you must do slides toward the model over time. Responsibility does not slide with it. The mechanics will keep getting absorbed, but intent, taste, and the decision of what not to build stay with the human, and no one can date when that changes. "Let's wait for AGI" is not a strategy. It is the excuse that produces unskilled hands, treating the model as the accountable author instead of a tool in your own hands.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern
&lt;/h2&gt;

&lt;p&gt;None of these is about the model being weak. Each is about a person not seeing the line between what the model does and what they still owe, and that line is masked by hype and kept moving by how fast the field changes. Plenty of people will tell you none of this is critical and it is all solvable. They are probably right. It is solvable.. in skilled hands.&lt;/p&gt;

&lt;p&gt;Treat all of the above as a snapshot of 2026, not a verdict for all time.&lt;/p&gt;

&lt;p&gt;Which of these eleven hits your team hardest, and what did you actually do about it?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>I installed 116 Claude Code skills. After 30 days I'd used 35</title>
      <dc:creator>Sergei Frangulov</dc:creator>
      <pubDate>Mon, 18 May 2026 12:08:35 +0000</pubDate>
      <link>https://dev.to/sfrangulov/i-installed-116-claude-code-skills-after-30-days-id-used-35-1f50</link>
      <guid>https://dev.to/sfrangulov/i-installed-116-claude-code-skills-after-30-days-id-used-35-1f50</guid>
      <description>&lt;p&gt;I kept installing Claude Code skills because it's one command and it's free. Every time I saw a skill that looked useful, I added it. No cost, no friction, so why not. After 30 days I parsed my own session logs, just out of curiosity. I had 116 skills installed. I had actually used 35. I never noticed the other 81.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Skill marketplaces grow fast and installing is easy. One command, no review, nothing to pay. So people keep installing and nobody checks later which skills they actually use. Nothing forces you to check.&lt;/p&gt;

&lt;p&gt;This is not free. Every installed skill loads its metadata into the prompt at the start of every session. So dead skills cost tokens for nothing, every session. With a normal dependency you eventually hit a version conflict or a security warning and you have to look at it. An installed skill never makes you look. It just sits in the prompt until you go and count.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I measured it
&lt;/h2&gt;

&lt;p&gt;I parsed my local &lt;code&gt;~/.claude&lt;/code&gt; session logs. No network, just the JSONL files the client already writes. I put every skill into one of four buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;active&lt;/strong&gt; — installed and used&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;dead&lt;/strong&gt; — installed, used zero times in the window&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;missing&lt;/strong&gt; — used, but no &lt;code&gt;SKILL.md&lt;/code&gt; found for it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;hallucinated&lt;/strong&gt; — used, but the runtime errored, because Claude confused a tool or command name with a skill name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tool that does the parsing is &lt;code&gt;skill-graveyard&lt;/code&gt; (&lt;a href="https://github.com/sfrangulov/skill-graveyard" rel="noopener noreferrer"&gt;github.com/sfrangulov/skill-graveyard&lt;/a&gt;). Here is the summary it printed for my 30-day window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;────────────────────────────────────────────────────────────
 skill-graveyard — 30d audit
 116 installed   35 active · 81 dead · 9 missing
              ██████████                        30% used
 274 sessions, 847 calls, 499 errored
 +65 hallucinated names (telemetry)
────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;116 installed, 35 active. That is 30% used. 81 dead, 9 missing. Across 274 sessions there were 847 skill calls and 499 of them errored. That error rate is high, so I looked closer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The unexpected part
&lt;/h2&gt;

&lt;p&gt;The most interesting bucket was not the dead one. It was the hallucinated one. Claude tried to call 65 names that are not skills at all: &lt;code&gt;bash&lt;/code&gt;, &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;agent&lt;/code&gt;, and 62 more. That is 493 failed calls, because the model used a tool or command name as if it were a skill name.&lt;/p&gt;

&lt;p&gt;Here is the thing. &lt;code&gt;bash&lt;/code&gt;, &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;agent&lt;/code&gt; are real. They are real things the Claude Code runtime can do. But they are not skills. The runtime runs them through a different path: built-in tools and slash-style commands, not the skill loader. So the model wants a real capability, but it asks for it the wrong way. The skill loader fails, because that name is not a skill, even though &lt;code&gt;bash&lt;/code&gt; works fine somewhere else.&lt;/p&gt;

&lt;p&gt;So this bucket is not just junk. It is a signal. It shows how often the model mixes up the runtime's different paths. Dead skills tell you what to uninstall. This tells you where the model's idea of its own tools is wrong. I expected a cleanup report and instead found a debugging lead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it on your own logs
&lt;/h2&gt;

&lt;p&gt;If you use Claude Code, you can get the same four-bucket breakdown for your own setup. Run &lt;code&gt;npx skill-graveyard&lt;/code&gt;. It runs locally, makes no network calls, and only reads your &lt;code&gt;~/.claude&lt;/code&gt; logs.&lt;/p&gt;

&lt;p&gt;Run it and reply with your installed→active ratio. I want to know if 30% is normal or if I just install too much.&lt;/p&gt;

&lt;p&gt;I did not decide to use only 35 of 116. It happened one easy install at a time, and I only saw it because I read the logs.&lt;/p&gt;

&lt;p&gt;(The same four-bucket idea also exists for MCP servers via &lt;code&gt;npx mcp-graveyard&lt;/code&gt; and project memory via &lt;code&gt;npx memory-graveyard&lt;/code&gt;.)&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>development</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
