DEV Community: Sergei Frangulov

AI took the friction out of my work. Then I found out the friction was holding up two things: my ideas and my brakes. Twenty-five years in a confession.

Sergei Frangulov — Tue, 16 Jun 2026 13:24:32 +0000

For a long time now the news and the roundups have all said the same thing. AI will take away the busywork. It will kill the friction. It will hand you wings, a cure-all for whatever ails you.

I installed Claude Code, and the friction is gone. An idea now costs an evening to test, not a month.

Then it turned out the friction mattered to me, and once it was gone, I felt that in full.

I am not a junior who just discovered autocomplete. I have been writing code for twenty-five years.

Ideas drop like flies

For twenty-five years I had one quiet comfort: "if I ever really went for it, oh boy."

Somewhere in my head sat a drawer of ideas I would get to someday. Not because I couldn't build them. I just never got around to them. And an untested idea is pure hope. Maybe this one is the one. Maybe I am actually worth something.

Testing an idea was expensive: evenings, energy I do not have after work. So the drawer stayed full. Expensive means later.

Claude Code dropped the cost of testing to zero. Want it? Working prototype by tonight.

So I started pulling ideas out of the drawer, one by one. And they started dying. Quietly, fast, like flies. The one that was going to be huge turns out mediocre by the third evening. The next one is worse.

A month in, an ugly thing dawned on me. The idea was never the valuable part. The hope before testing was. The friction worked as anesthesia: it kept me from feeling that the drawer is mostly junk. I had an endless supply of brilliant ideas precisely because I never tested them.

The endless self-improvement loop

The second half is funnier, because it looks like something useful.

You sit down to do the thing. Say, post to Telegram. The built-in tools can't do it. Fine, let's write our own sender. And let's have the model run through the subscription I already pay for, not the API. And let's make the sender add its own workflows. And let's have those workflows write workflows.

# what I sat down to do:     post one message to Telegram
# what I had by 2 a.m.:
~/projects/telegram-sender/
  └─ plugins/                # it can add its own workflows now
     └─ plugins/             # and the workflows write workflows
# users: 1 (me)

By 2 a.m. I do not have a Telegram post. I have a tiny AI newsroom full of agents, with exactly one user.

Research is the same story. The built-in one is mediocre, so let's write our own. Properly. With blackjack and programmatic quality gates. Memory is its own saga: let's try a hundred implementations, bolt on codegraph, then something else. By morning it is a monster nobody understands. I look at it with a clear head, see that half of it has to go, delete it, and start a fresh lap the next day.

The thing I sat down to do never got done. But it's all perfect.

What used to save me was the price. Writing my own sender was a few days of work, so I grabbed something off the shelf and went and did the job. Claude Code made "let's just write our own" free. So now I pick "our own" every time. Every single time. Forever.

The friction was a brake

I put the two problems together and saw it is one problem.

The friction I grumbled about for twenty-five years was a brake. Expensive to test an idea, and the weak ones quietly stayed in the drawer with the hope. Expensive to roll my own, and I grabbed something ready and went to do the work. The barrier filtered junk on the way in and kept me out of the rabbit holes on the way out. The whole time I called it the enemy.

Claude Code removed the friction. And the brakes with it. Now nothing slows me down. Ideas die under instant testing. Self-improvement runs for the sake of self-improvement. AI did not remove the busywork. It removed the excuse. And behind the excuse, it turns out, no great version of me was waiting.

So now what

The funny part: by every metric I am more productive. Agents spinning everywhere, prototypes by evening, tools built to fit my hand, tests, gates. A perfect homemade sender that, for some reason, I never use. I have never gotten so much done.

There is just not much left to get done. The ideas are tested and buried, the dream right behind them, and all my energy pours into late-night platforms with one user.

I used to have an excuse: "once I clear the decks, then." Good excuse. Kept me warm for years. Claude Code took it. And it seems the only thing left for me to figure out, with its help, is how to put some friction back. Life used to install the brakes for me.

Seniors with a couple of decades in: was your friction load-bearing too? Or did losing it actually set you free?

Juniors: you started without the friction. Does any of this land, or does it read like a guy mad at his own productivity?

Claude Code is not a recursive agent. I read the source and checked.

Sergei Frangulov — Sun, 07 Jun 2026 11:19:36 +0000

A source map shipped in the v2.1.88 npm release: about 1,884 files under src/, original names and comments intact. So I walked the core modules and checked what everyone "knows" about how Claude Code works against what the code actually does.

Half of it was wrong. Including things I'd repeated myself.

I did not read it line by line. Nobody reads 1,884 files line by line. I walked the key modules and tied every claim to something concrete: a function, a constant. So you'll see names like queryLoop and AUTOCOMPACT_BUFFER_TOKENS below. Real identifiers, so every claim is checkable against the public teardowns, not vibes.

This is a map of how it works, not a dump of its guts. I don't quote internal prompts, and anything that only runs in Anthropic's internal builds is flagged as such.

Myth 1: "The agent recursively calls itself on every tool result"

The picture everyone has: model replies, tool runs, the agent calls itself again, deeper down the stack.

There's no recursion.

// src/query.ts
async function* queryLoop(state) {
  while (true) {
    // ...run model, run tools...
    state = { ...state }   // overwrite in place
    continue               // not a nested call
  }
}

One while (true) inside an async generator. It mutates a single State object and continues. The stack never grows deeper.

Why it matters: every budget, timeout and turn limit you set is counted per loop pass, not per stack frame. "One turn" is literally "one pass." Once I stopped picturing a recursive agent and started picturing a long-running stateful loop, the same tricks I'd use on any long loop applied: count the budget per step, watch what changed between steps.

Myth 2: "When the context fills up, it just truncates"

This is the interesting one. Context isn't one "drop the old stuff" function. It's five mechanisms, ordered cheapest to most expensive:

snip  ->  microcompact  ->  context-collapse  ->  autocompact  ->  reactive

The order is deliberate. Each later stage sits after the earlier one precisely so that if a cheap stage already freed space, the expensive one does nothing. The comment says it outright: run collapse before autocompact so autocompact often never fires.

Cheap stages drop old tool results, surgically. The expensive one, autocompact, makes a separate model call to summarize the whole history. It kicks in at the effective context window minus AUTOCOMPACT_BUFFER_TOKENS, a 13,000-token reserve for the summary itself.

Here's what sent me digging. autocompact has a silent fuse:

const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3
// after 3 failed compactions in a row, autocompact
// shuts off for the rest of the session. silently.

The comment explains why it's there: sessions were hitting 50+ consecutive failures, up to 3,272 in one session, burning roughly 250,000 extra API calls a day across all users.

Translation: a session that "felt fine for hours" could have spent part of that time running on surgical drops alone, with no real compaction, and the UI would never tell you.

Myth 3: "A full compaction always keeps the last few messages verbatim"

I was sure compaction kept the most recent messages word for word and only touched older ones. For a full autocompact, no.

On a full compaction the message array is rebuilt from scratch: a boundary marker, the summary, and a few files pulled back in. messagesToKeep is empty. The verbatim tail survives only in the other modes (partial, reactive, session-memory compaction), which carry a note that recent messages are kept as-is. Full mode doesn't.

The uncomfortable part: after a full compaction the model does not remember your last couple of messages word for word. It remembers a retelling of your conversation that it wrote itself.

Myth 4: "Tools run after the model finishes talking"

No. The tool starts before the model is done with its sentence.

The moment a tool_use block shows up in the stream and you don't hit cancel in that split second, StreamingToolExecutor has already started it. The model is still typing, and the edit on disk has already happened. Parallelism is capped by CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY (default 10): what's safe to run together goes in a batch, the rest queues.

One detail I lost half an hour to. The executor has its own child abort controller. If one parallel tool fails (say Bash), it instantly kills its siblings but does not abort the turn. The killed sibling gets something like "cancelled because a neighbor call failed." So you sit there wondering why a command that depended on nothing didn't run. It depended on a neighbor.

Myth 5: "One model message is one reply, and stop_reason tells you if a tool was called"

Two facts that save hours of debugging.

First, the easy one: Claude Code sends a separate assistant message per block, not one for the whole reply. Text, thinking, tool call: each ships as its own message the moment it's finished.

Second, the nasty one. When a block closes, stop_reason is always null. The real value arrives later, as a separate event, and gets written in after the fact by editing the message that was already sent. The code is honest about it:

// stop_reason === 'tool_use' is unreliable

So the loop doesn't trust it. To decide whether a tool was called, it checks the fact: did a tool_use block arrive or not. If you've written a wrapper over a stream like this and hit races on stop_reason, now you know where they came from.

Myth 6: "Permissions are just a chain: user -> project -> local -> policy"

The layers exist, and it sounds logical: higher layer wins. I thought so too. The order isn't by source. It's by strictness.

deny beats everything, including bypassPermissions. Then, in decreasing strictness: targeted ask rules and safety checks, then the bypass itself, then allow rules, and only what nobody explicitly allowed finally reaches "ask the user." A denial always outranks a bypass.

The bigger surprise is the zones the bypass doesn't pierce at all. Even under bypassPermissions (which supposedly means allow everything, don't ask), edits to .git/, .claude/, .vscode/ and shell config files still hit a confirmation. The logic is simple: let the agent edit its own settings unprompted and it writes itself a pass out of the permission sandbox. Defaults lean the same way: until a tool declares it's read-only and safe to parallelize, the system assumes it writes and can't be parallelized.

Myth 7: "A subagent is just another Claude running next to you"

A subagent is an isolated fork, and the isolation is harder than it looks. It gets its own agentId, its own copy of the read-files list, and empty memory.

And the part that bites. A normal subagent's setAppState is empty by default, so it can't change application state. And because it can't, it's immediately handed the "don't ask" permission flag, and the rest follows on its own: a background subagent physically can't show a permission dialog, so any ask it makes silently turns into deny. Hand a subagent a task that hits a confirmation and it won't wait for you. It gets a no and drives on, as if you'd said no yourself.

One more: in the public build a subagent can't spawn subagents. Multi-level agents live only in Anthropic's internal builds. For everyone else the hierarchy is flat.

Myth 8: "Claude Code's extensions are a handful of standard hooks"

Five extension mechanisms: MCP servers, plugins, skills, hooks, and slash commands. The plugin is the odd one out, an umbrella you can stuff the other four under. Then come the numbers I was actually digging for.

There aren't five "canonical" hooks (SessionStart, PreToolUse, PostToolUse, Stop, UserPromptSubmit). There are 28, including events for teammate collaboration, tasks, working-directory changes, and file changes. And the hook contract isn't "non-zero means error." There are three outcomes:

exit 0   ->  fine, proceed
exit 2   ->  hard block: action cancelled, model told why
other    ->  soft error: stderr shown to you, session continues

That third case is why there's a slightly funny guard: before running, it separately checks the plugin folder even exists. Otherwise a hook would run python3 <missing file>.py, that would die with code 2, and one missing file would wedge Stop and UserPromptSubmit permanently. The session would never be able to end.

Skills are their own story, and one constant explains all of it:

const SKILL_BUDGET_CONTEXT_PERCENT = 0.01  // 1% of context for the whole skill list

Only the header fits in that 1% (name, description, trigger). The SKILL.md body loads only when the skill is actually called. That's why you can pile on dozens of skills and barely pay for them in context: until called, they're effectively not there.

And the small thing that kills the isolation illusion. The MCP tool "namespace" is a fancy word for a plain string prefix, mcp__server__tool. The server name and tool name are glued together, anything that isn't a letter, digit, _ or - becomes an underscore, and permissions are handed out by that glued string. There's no real isolation behind the "namespace."

One last thing, and I don't like it

I assumed the "do you trust this folder?" prompt was the very first thing Claude Code does on startup. It isn't. That prompt shows up noticeably later. By then startup has already run a good chunk of code (about a thousand lines, by the source), right next to an honest comment that security here is delicate.

The delicate bit: .claude/settings.json has already been read by that point, and it lives in the same folder you haven't trusted yet.

So settings from an untrusted folder get to influence Claude Code before you've said yes. It's not quite a hole: the most sensitive modes double-check against the trust flag. But it sits wrong with me. I'd still rather know it than not.

What I actually changed

I count budgets and timeouts per loop pass now, not per abstract "agent call" (Myth 1).
I stopped believing the model remembers my last messages verbatim. After a full compaction it's working from a retelling it wrote itself (Myth 3).
I'm careful with background subagents. Since any confirmation they hit becomes a deny, I don't hand them tasks where a prompt is even possible (Myth 7).
On compaction, I just try not to reach it. It's cheaper to clear context one extra time.

None of this makes Claude Code worse. The opposite: behind almost every oddity in the code is an incident that happened, or a guard against one. You can read it in the comments. The picture most of us carry (mine included, until last week) is just drawn at altitude.

If you build on Claude Code or the Anthropic API: which of these would have saved you a debugging session?

If you just use Claude Code day to day: which one rewrites how you'll drive it tomorrow?

And if you dug into the leaked source yourself: what did I get wrong?

Vibecoding in unskilled hands: 11 ways it quietly breaks

Sergei Frangulov — Mon, 01 Jun 2026 05:27:35 +0000

You can get a working demo out of an AI coding agent in an hour. That first hour is the trap.

The speed is real. A prototype or a small script comes together in front of you, and it is easy to believe the whole project will go like that. It will not. Most vibecoding failures get blamed on the model. In my experience few of them are the model's fault. The bottleneck is almost always the person driving it, and the bill arrives later, on the long distance, where it is expensive to undo.

Here are eleven places I keep watching it break, and what actually causes each one.

1. The short distance lies

The first hour is genuine productivity. The curve flips after that. What sped you up early starts to slow you down: duplicates pile up, earlier decisions quietly contradict each other, and there is no single architecture holding it together. "Almost done" turns into months of patching loosely connected code. The beginner reads the easy start as a property of the whole road, and plans nothing for the tenth iteration or for coherence over time.

2. The visible success of AI projects is mostly a bubble

Trending repositories and viral wrappers create the impression that everything works by itself. When I scanned GitHub trending in mid-2026, several agent repos had pulled hundreds of thousands of stars in seven or eight months: one skills framework near 202k, one open coding agent near 164k. That is faster than almost any historical open-source growth, and a large share of it is inflated. Marketing, benchmark-maxxed READMEs, and trending-as-a-service badges, not working software or organic demand. A small, well-packaged project earns a few thousand stars the honest way while the giants farm hundreds of thousands. A star is a vanity metric. Beginners calibrate against this storefront and conclude they are the ones doing it wrong.

3. The model has no standing picture of your project

Each run sees a limited context window, and that window gets actively trimmed to fit a budget. Some tools prune idle context by design, without telling you. So the model is sharp inside a tight, well-scoped task and loses the thread on a large one: it forgets earlier decisions, contradicts code it just wrote, and over-builds. Holding the whole system in your head is still a human job. At minimum you have to keep architecture docs current, which is its own discipline, and most people skip it.

4. Garbage in, garbage out, and you pay per token for it

Weak input is the main source of bad output: a vague spec, no codebase context, no examples, no acceptance criteria. The model is not telepathic. It fills the gaps with the most probable answer, not the one you needed. The beginner opens a chat and types a wish instead of a brief, then spends the afternoon arguing with the result. Skilled operators spend most of their time here, before the first prompt.

5. One run is one sample, not a verdict

Because models are tuned on human preference, they lean toward the most typical, average answer. Research on alignment (Kirk et al., ICLR 2024) found that RLHF measurably reduces the diversity of outputs for a given prompt. So a single response is one draw from a distribution that has already collapsed toward the median. It is not the best answer, and not the only correct one. Without a precise process you get the internet average instead of an engineering call for your context. Asking for several options and picking one helps, but only when there is somebody qualified to pick.

6. Reasoning depth is a dial, not a fixed trait

"The model is lazy" is usually a misread. On current models, effort is a setting and depth follows from how you frame the task. The old habit of prompting "don't be lazy, be thorough" is now an anti-pattern: vendors warn that capable models over-trigger on it. The real skill is knowing when to turn effort up. Push it to maximum in the wrong place and you buy overthinking and a worse answer. The beginner never touches the dial and takes the first shallow response as the ceiling of what the model can do.

7. It is a capable executor without a global view, not an autonomous engineer

A useful calibration is by seniority. As a junior it is overpriced: you pay top-model rates for intern-level autonomy and still check every line. As a mid it is excellent on a well-scoped task. As a senior or architect it does not hold system coherence or judgment, and it cannot tell you what not to build. The beginner delegates exactly the part it cannot do.

Buying autonomy off the shelf is no shortcut either. We run a multi-agent orchestration tool called gastown. The author loves westerns, so the agents are named mayor, deacon, convoy, hounds, raccoons. It took two weeks of spare evenings to half-integrate it into our pipeline, and even then not for every task. Simple tools are not really autonomous. Capable ones cost you weeks of setup.

8. Memory is primitive and needs manual management

The assistant loses context between sessions and on resume, and it drops things on purpose to fit the window. This is not a guess. One popular coding agent's own postmortem admitted that it had quietly trimmed reasoning in idle sessions to save cost, shipped without a changelog, and that a hidden instruction to answer in under twenty-five words had measurably lowered output quality. Bloated instruction layers degrade output on their own, and people keep piling them on. If you do not understand how the memory behaves, you re-explain the same context every session and wonder why yesterday it understood and today it does not.

9. The setup around the model is a separate skill

The value comes from the layer you build around the model: project instructions, ready-made skills, hooks, tools, rules, context. Beginners work straight out of the box, never notice that layer exists, and blame the model for the result. This is applied knowledge of how your own system fits together, and it has to be budgeted like any other engineering work. The license is the cheap part.

10. Your skills expire the moment you learn them

The operator's competence depreciates faster than the models change. A technique that worked last week is stale this week, and silent changes in model behavior keep your mental model out of date. Staying current is a daily habit, not a one-time milestone, and most people are not up for it. The only thing that works for me is reading the field every day. You can automate the gathering, but someone still has to filter the noise by hand.

11. The line keeps moving, but responsibility does not

The boundary between what the model can do and what you must do slides toward the model over time. Responsibility does not slide with it. The mechanics will keep getting absorbed, but intent, taste, and the decision of what not to build stay with the human, and no one can date when that changes. "Let's wait for AGI" is not a strategy. It is the excuse that produces unskilled hands, treating the model as the accountable author instead of a tool in your own hands.

The pattern

None of these is about the model being weak. Each is about a person not seeing the line between what the model does and what they still owe, and that line is masked by hype and kept moving by how fast the field changes. Plenty of people will tell you none of this is critical and it is all solvable. They are probably right. It is solvable.. in skilled hands.

Treat all of the above as a snapshot of 2026, not a verdict for all time.

Which of these eleven hits your team hardest, and what did you actually do about it?

I installed 116 Claude Code skills. After 30 days I'd used 35

Sergei Frangulov — Mon, 18 May 2026 12:08:35 +0000

I kept installing Claude Code skills because it's one command and it's free. Every time I saw a skill that looked useful, I added it. No cost, no friction, so why not. After 30 days I parsed my own session logs, just out of curiosity. I had 116 skills installed. I had actually used 35. I never noticed the other 81.

The problem

Skill marketplaces grow fast and installing is easy. One command, no review, nothing to pay. So people keep installing and nobody checks later which skills they actually use. Nothing forces you to check.

This is not free. Every installed skill loads its metadata into the prompt at the start of every session. So dead skills cost tokens for nothing, every session. With a normal dependency you eventually hit a version conflict or a security warning and you have to look at it. An installed skill never makes you look. It just sits in the prompt until you go and count.

How I measured it

I parsed my local ~/.claude session logs. No network, just the JSONL files the client already writes. I put every skill into one of four buckets:

active — installed and used
dead — installed, used zero times in the window
missing — used, but no SKILL.md found for it
hallucinated — used, but the runtime errored, because Claude confused a tool or command name with a skill name

The tool that does the parsing is skill-graveyard (github.com/sfrangulov/skill-graveyard). Here is the summary it printed for my 30-day window:

Updated the screenshot, couldn't find the original one :)

116 installed, 35 active. That is 30% used. 81 dead, 9 missing. Across 274 sessions there were 847 skill calls and 499 of them errored. That error rate is high, so I looked closer.

The unexpected part

The most interesting bucket was not the dead one. It was the hallucinated one. Claude tried to call 65 names that are not skills at all: bash, read, agent, and 62 more. That is 493 failed calls, because the model used a tool or command name as if it were a skill name.

Here is the thing. bash, read, agent are real. They are real things the Claude Code runtime can do. But they are not skills. The runtime runs them through a different path: built-in tools and slash-style commands, not the skill loader. So the model wants a real capability, but it asks for it the wrong way. The skill loader fails, because that name is not a skill, even though bash works fine somewhere else.

So this bucket is not just junk. It is a signal. It shows how often the model mixes up the runtime's different paths. Dead skills tell you what to uninstall. This tells you where the model's idea of its own tools is wrong. I expected a cleanup report and instead found a debugging lead.

Try it on your own logs

If you use Claude Code, you can get the same four-bucket breakdown for your own setup. Run npx skill-graveyard. It runs locally, makes no network calls, and only reads your ~/.claude logs.

Run it and reply with your installed->active ratio. I want to know if 30% is normal or if I just install too much.

I did not decide to use only 35 of 116. It happened one easy install at a time, and I only saw it because I read the logs.

(The same four-bucket idea also exists for MCP servers via npx mcp-graveyard and project memory via npx memory-graveyard.)