<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrew Shu</title>
    <description>The latest articles on DEV Community by Andrew Shu (@0xandrewshu).</description>
    <link>https://dev.to/0xandrewshu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3795591%2Fc6636bb4-a665-4faa-affc-56f2e4c9adce.jpg</url>
      <title>DEV Community: Andrew Shu</title>
      <link>https://dev.to/0xandrewshu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/0xandrewshu"/>
    <language>en</language>
    <item>
      <title>Vibe Coding in Production: What's Holding Us Back?</title>
      <dc:creator>Andrew Shu</dc:creator>
      <pubDate>Tue, 07 Apr 2026 16:07:05 +0000</pubDate>
      <link>https://dev.to/0xandrewshu/vibe-coding-in-production-whats-holding-us-back-5kh</link>
      <guid>https://dev.to/0xandrewshu/vibe-coding-in-production-whats-holding-us-back-5kh</guid>
      <description>&lt;p&gt;Vibe coding techniques need to be adapted when you work on production applications with AI. I walk through some challenges and solutions that I've found helpful on real projects.&lt;/p&gt;

&lt;p&gt;I'm going to share some experiences from a few months ago, about how I expanded the scope of my use of agents from vibe coded apps to working on real world problems in production.&lt;/p&gt;

&lt;p&gt;I had been coding with AI agents for a while now: greenfield scripts, prototypes, and features I could build and throw away. Early on in this experimentation, I set my sights on building tools and practices for safely using AI in production. I knew I had to maintain and operate the code I developed. So as I explored AI by building isolated and greenfield code, I made mental notes of techniques that wouldn't work and those that I could bring to a production infrastructure.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There are numerous articles and posts describing techniques for vibe coding well. But there isn't enough documentation describing the practices around customizing your repository and infrastructure to take advantage of your AI agents on real infrastructure and workloads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Daghan Altas, a former Cisco Meraki colleague, &lt;a href="https://www.linkedin.com/posts/daghanaltas_we-were-promised-a-10x-ai-productivity-boost-share-7438596222082453504-dMf1" rel="noopener noreferrer"&gt;phrased it well&lt;/a&gt;: what's the point of a 10x productivity boost if you can't operate and maintain the thing you built any faster? That reframed the question for me. Not "is AI fast?"; obviously it's fast. But: &lt;strong&gt;what specifically is holding me back?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's what I ran into when I applied vibe coding techniques against production infrastructure and workloads. And this is how I've updated my configurations and techniques to address these issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI codes quickly, but what about troubleshooting and testing?
&lt;/h2&gt;

&lt;p&gt;AI is great at implementation, but there's so much surrounding the act of writing code.&lt;/p&gt;

&lt;p&gt;Here's a concrete example: I was building a prototype that needed to normalize messy data from multiple API and database sources. The numbers kept being wrong. I pointed Claude Code at the problem, and it churned for an hour, trying different parsing strategies, refactoring the aggregation logic, adding fallback handlers.&lt;/p&gt;

&lt;p&gt;The fix turned out to be surprisingly simple: the logging wasn't capturing everything it needed to. The agent was trusting the logs at face value and never questioned whether the data was complete. An hour of sophisticated troubleshooting on a problem that needed five minutes of "wait, do we have enough logs to capture the symptom of the problem?"&lt;/p&gt;

&lt;p&gt;The same dynamic plays out with writing unit tests. So I needed to think more broadly about this.&lt;/p&gt;

&lt;p&gt;Implementation is often not the bottleneck. The bottleneck is everything around the implementation. After implementation, that would include: verifying the code does what you think it does, troubleshooting when it doesn't, understanding what already exists so you don't reinvent it, and making sure the architecture holds up next month.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjufvt96eae9lkare93dh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjufvt96eae9lkare93dh.png" alt="I found that AI wrote low value unit tests, and had difficulty troubleshooting in production." width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I've started doing:&lt;/strong&gt; I built subagents for the two patterns that burned the most time when I was catching and fixing AI-written bugs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Firstly, a "Skeptical Testing Subagent" that scrutinizes test suites: checking for duplicates, testing meaningfulness, flagging assertions that don't actually prove anything.&lt;/li&gt;
&lt;li&gt;And secondly, a "Skeptical Troubleshooting Subagent" that focuses on production logs and data integrity before jumping into code changes. Both are early, but they've already caught things I would have missed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I say "skeptical" you can translate it to the term "adversarial", which is what folks in the AI community use more frequently. People have talked about using &lt;a href="https://asdlc.io/patterns/adversarial-code-review/" rel="noopener noreferrer"&gt;"adversarial agents" to review code&lt;/a&gt;, and how these agents "&lt;a href="https://dev.to/marcosomma/adversarial-planning-for-spec-driven-development-4c3n"&gt;think differently&lt;/a&gt;" than an agent told to "write code". My testing and troubleshooting subagents solve specific code review and production log review problems that I've encountered, in a more narrow and specific context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fear slows us down when we vibe code production apps
&lt;/h2&gt;

&lt;p&gt;One of the things that accelerates vibe coding is accepting AI suggestions quickly (specifically, auto-approving the shell commands the agent wants to run). But many of those suggestions are commands run inside a shell with superadmin privileges and access to the internet. Even when the AI isn't doing anything malicious, I worry about a stray &lt;code&gt;rm -rf&lt;/code&gt; or a &lt;code&gt;drop database&lt;/code&gt; or a &lt;code&gt;terraform apply&lt;/code&gt; that destroys a folder, a Google Drive, an RDS instance, a DNS record. Nightmares abound.&lt;/p&gt;

&lt;p&gt;This isn't hypothetical: Alexey Grigorev &lt;a href="https://alexeyondata.substack.com/p/how-i-dropped-our-production-database" rel="noopener noreferrer"&gt;accidentally dropped his production RDS database&lt;/a&gt; while using AI tools and wrote up the full post-mortem. Amazon has called for &lt;a href="https://thenewstack.io/amazon-ai-assisted-errors/" rel="noopener noreferrer"&gt;new safeguards and review processes&lt;/a&gt; after AI-assisted errors in production. Research from Snyk has documented AI coding tools &lt;a href="https://snyk.io/articles/package-hallucinations/" rel="noopener noreferrer"&gt;hallucinating entire package names&lt;/a&gt; that don't exist, and attackers registering those packages to exploit the gap.&lt;/p&gt;

&lt;p&gt;Hallucinations in a local sandbox are an inconvenience. In production, they're a late night page and an embarrassing post-mortem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndtopid54lu6q01mjhdq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndtopid54lu6q01mjhdq.png" alt="Common engineering metaphor: adding safety rails helps increase confidence to move faster. Generated with Gemini." width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I've started doing:&lt;/strong&gt; here are some examples of ways I've improved the safety of my work.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instead of executing unsafe &lt;code&gt;bash&lt;/code&gt; commands, ask AI to write a script you can review.&lt;/strong&gt; I find AI helpful for sysadmin/SRE work, but I have to monitor it closely — no background agents here. Watching commands scroll by is risky, so I ask the agent to write them into a script I can review first. Then as a bonus, I get a script that is reusable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extracting repeat database or log queries into a script.&lt;/strong&gt; When I was troubleshooting customer issues, I often ran a few Postgres database queries or fetched a few logs related to some Lambdas. This was tedious, but I also didn't want AI to be running PG or AWS Lambda commands by itself. So I wrote scripts like &lt;code&gt;fetch_customer_events_pg.ts &amp;lt;customer-alias&amp;gt; &amp;lt;event-type&amp;gt;&lt;/code&gt;, or &lt;code&gt;fetch_customer_logs.ts &amp;lt;customer-alias&amp;gt; --start &amp;lt;start_time&amp;gt; --end &amp;lt;end_time&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In addition to scripts, you can do similar things by codifying repetitive tasks as Skills and Subagents.&lt;/strong&gt; Skills are available in &lt;a href="https://developers.openai.com/codex/skills" rel="noopener noreferrer"&gt;Codex&lt;/a&gt;, &lt;a href="https://cursor.com/docs/skills" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, and &lt;a href="https://code.claude.com/docs/en/skills" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;. Subagents are also available in &lt;a href="https://developers.openai.com/codex/subagents" rel="noopener noreferrer"&gt;Codex&lt;/a&gt;, &lt;a href="https://cursor.com/docs/subagents" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, and &lt;a href="https://code.claude.com/docs/en/sub-agents" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are some of the techniques I've used. I might elaborate on this topic in a future post. If you're interested in chatting about how I do this, DM me &lt;a href="https://www.linkedin.com/in/0xandrewshu/" rel="noopener noreferrer"&gt;on LinkedIn&lt;/a&gt; or &lt;a href="https://x.com/0xAndrewShu" rel="noopener noreferrer"&gt;on X&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  I ❤️ docs, but AI might just love it more (as "context")
&lt;/h2&gt;

&lt;p&gt;I've hopped onto screen shares with other engineers where one of us will spot a hallucination go by mid-session. There's a brief moment of annoyance or concern, and then we keep going. Most of the time, we just let the agent continue. I've done it myself; we have more pressing tasks to finish. You see the wrong thing, you wince, and you move on because you're in flow.&lt;/p&gt;

&lt;p&gt;This is a bigger source of inefficiency than it appears. Not everyone realizes that many hallucination patterns can be fixed with better context. (By context, I'm basically talking about code, docs and additional markdown files.) And those who do know often haven't had the time to pay attention to how context is actually structured across their tools. There's a growing stack of context layers: &lt;code&gt;AGENTS.md&lt;/code&gt; files, skills, subagents, team-shared vs. individual context, memory architecture, code indexers, connections to databases and wikis and issue trackers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But here's the pattern I see most often: someone sets up an &lt;code&gt;AGENTS.md&lt;/code&gt; or a &lt;code&gt;.cursorrules&lt;/code&gt; file when they first adopt a tool, and then never touches it again. Six weeks later the agent is hallucinating patterns you deprecated a month ago, suggesting libraries you've already replaced. Or maybe your automatic &lt;a href="https://code.claude.com/docs/en/memory" rel="noopener noreferrer"&gt;memory.md&lt;/a&gt; is outdated and no longer reflects your code's reality. The agent's context drifts from reality a little more every week, and the hallucinations compound.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the source of a lot of churn. When the agent doesn't know what already exists in the codebase, it reinvents. When it doesn't know your architectural patterns, it improvises. When it doesn't know what you deprecated last sprint, it resurrects it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5kcy22nd9o8ilvjma6o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5kcy22nd9o8ilvjma6o.png" alt="Visualization of the numerous sources of context you can feed your AI." width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I've started doing:&lt;/strong&gt; I treat documentation as infrastructure. When agents hallucinate or reinvent something that already exists, I update the docs so it doesn't happen again. I use MCP servers to push context to my knowledge base. I run Claude Code's &lt;code&gt;/context&lt;/code&gt; command mid-session to see how the 200K token window is being consumed, and it often exposes wasteful allocation I wouldn't have caught otherwise. It's a small amount of effort that compounds over time. If you're going to obsess over something, context hygiene has the best return on neuroticism I've found so far.&lt;/p&gt;

&lt;p&gt;Another technique I use is to keep a &lt;code&gt;plans/&lt;/code&gt; folder and a &lt;code&gt;docs/&lt;/code&gt; folder for architecture decisions and system patterns that agents should know before generating. &lt;a href="https://www.ashu.co/markdown-plan-files-vibe-coding/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Markdown plan files&lt;/a&gt; are still the single best thing I've done for my workflow, and the docs folder is a great supplement. Recently, Andrej Karpathy posted the importance of "&lt;a href="https://x.com/karpathy/status/2039805659525644595" rel="noopener noreferrer"&gt;LLM Knowledge Bases&lt;/a&gt;". I also use Obsidian in a similar way, but I find in-repo docs more pragmatic for keeping context closer to the code.&lt;/p&gt;

&lt;p&gt;You can also layer on custom instructions to &lt;a href="https://developers.openai.com/codex/guides/agents-md" rel="noopener noreferrer"&gt;Codex&lt;/a&gt;, &lt;a href="https://cursor.com/docs/rules#user-rules" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, and &lt;a href="https://code.claude.com/docs/en/memory#choose-where-to-put-claude-md-files" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; to customize your harness to behave differently beyond what your team has done.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Token anxiety is real: what if I run out of budget?
&lt;/h2&gt;

&lt;p&gt;To borrow a friend's metaphor: inference tokens are like Spice in &lt;em&gt;Dune&lt;/em&gt;. They're a substance that augments your abilities, that once taken, you can't live without. And a scarce resource that requires extraordinary effort to accumulate.&lt;/p&gt;

&lt;p&gt;I heard this from a tech lead at a large enterprise. There are technically token budgets per engineer, but they're not being enforced. The current objective is to increase adoption, so that's not an issue. But this engineer worried about what happens when it does get enforced.&lt;/p&gt;

&lt;p&gt;The anxiety comes from multiple directions. There's the worry about rationing: how do you make sure you have enough tokens to hit your deadlines? And if an inference provider goes down mid-sprint, you're stuck without tokens or scrambling to switch to an unconfigured, unfamiliar tool.&lt;/p&gt;

&lt;p&gt;Then we get to the opacity of pricing. I &lt;a href="https://www.linkedin.com/posts/0xandrewshu_fascinating-saturday-i-measured-that-activity-7439405612087635968-AseS" rel="noopener noreferrer"&gt;measured my Claude Code sessions&lt;/a&gt; over a week and found that 2 out of 5 sessions burned tokens at 2x the normal rate, with no obvious change in my behavior. In Theo's &lt;a href="https://youtu.be/j_kJNYLI6Tw?si=WI1b4l7SbO7Ondlt" rel="noopener noreferrer"&gt;YouTube video&lt;/a&gt; on Claude Code's recent (March 2026) capacity reduction, his conclusion is the same as mine.&lt;/p&gt;

&lt;p&gt;Beyond capacity allowances, a feature change from the AI labs can silently double your token costs. There's a pattern emerging across providers in early 2026: models getting more verbose, spawning sub-agents for simple tasks, and nobody has a baseline to tell whether the amount of &lt;a href="https://www.ashu.co/claude-code-vs-cursor-pricing/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;AI agent-hours&lt;/a&gt; they can use per month was reduced 10% or 50%.&lt;/p&gt;

&lt;p&gt;I've written about this a lot, but I don't think we need to over-rotate on token reduction. I've found it helpful just to learn how token limits are enforced, how they're being used. This helps me be mindful of costs. The first thing I'd recommend is to &lt;a href="https://www.ashu.co/claude-code-vs-cursor-pricing/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;understand the tools' pricing structure&lt;/a&gt; and &lt;a href="https://www.ashu.co/cursor-to-claude-code-stuck-at-16-percent-utilization/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;learn how they enforce token limits&lt;/a&gt; so I can make the most out of the tokens they provide (and subsidize). There are also open source tools like &lt;a href="https://ccusage.com/" rel="noopener noreferrer"&gt;ccusage&lt;/a&gt; that track token usage. You can also try vibe coding your own!&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions I ask myself to improve my use of AI agents on production
&lt;/h2&gt;

&lt;p&gt;I've found a number of techniques that have improved my workflow, but I have so many open questions still! I find that thinking about these questions help me find where real improvements can be made. These questions don't require adding tools to immediately seek gains. I'll share them with you, and hopefully they help you reflect on your own engineering work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn3jis3afxg6bbfc9e1n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn3jis3afxg6bbfc9e1n.png" alt="Summary of how I think about using AI on real production workloads; problems, symptoms and solutions." width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are my productivity gains real or not? What did I do in the last week with AI?&lt;/strong&gt; This is a question I ask myself regularly, because productivity gains may feel great but are actually an illusion. METR ran a &lt;a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/" rel="noopener noreferrer"&gt;randomized controlled trial&lt;/a&gt; where experienced developers were 19% slower with AI tools, while believing they were 20% faster. This study ran in July 2025, before the major model improvements in November 2025. Nonetheless, that perception gap is a reminder that intuition alone isn't enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where am I losing time? How do I increase AI's autonomy?&lt;/strong&gt; This goes to my work around adversarial agents for scrutinizing code, tests, and logs. I've found that AI often churns out meaningless work, or takes shortcuts. These are signs the agent isn't truly autonomous, so I need to troubleshoot how to increase the autonomy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How much does fear cost me?&lt;/strong&gt; I monitor what my agents are doing more than I probably need to. Are my colleagues even familiar with which bash commands are risky? How much collective time gets lost to hovering, second-guessing, or just not knowing whether it's safe to let the agent run? Reducing risk here feels like unlocked velocity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is my process to reflect on and improve the effectiveness of my agents?&lt;/strong&gt; Right now, most of us are vibe coding &lt;em&gt;and&lt;/em&gt; vibe evaluating. We finish a session, we feel like it went well or it didn't, and we move on. I think there's value in building a habit of structured reflection: what worked, what didn't, what would I change? And sharing those reflections across a team, not just keeping them in your own head. And how do I share what I learn with colleagues? There's something from Toyota's production system and from Agile retrospectives that applies here: the discipline of continuous reflection and improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe coding deserves more than vibe evaluation
&lt;/h2&gt;

&lt;p&gt;The FOMO around AI coding skills is real. There are new tools every week, new techniques, new claims about what's possible. Most of us are figuring it out on hunches: not fully able to keep up, not clicking into AI news articles to read them in full, not totally understanding the tradeoffs, but feeling the paradigm shift happening underneath us.&lt;/p&gt;

&lt;p&gt;I think that's fine because we're early in the technology adoption. But I also think we can do better than vibes. The engineers I see getting the most value aren't the ones with the most expensive tools or the most aggressive token spend. They're the ones building habits of honest reflection: what did I ship versus what did I generate? Where did I invest versus where did I waste time? What would I do differently next session?&lt;/p&gt;

&lt;p&gt;Everything I've talked about here is from the engineer's perspective: what I can see, what I can measure, what I can control. But I've been having conversations with engineering managers too, and they're wrestling with a different version of the same question: how do you know your &lt;em&gt;team's&lt;/em&gt; AI investment is paying off when you can't see inside any of these tools? That's a different problem with different constraints. More on that soon.&lt;/p&gt;

&lt;p&gt;What are you working through? What's the question you keep asking yourself about your AI workflow? I'd genuinely like to hear: if you're wrestling with the same things, &lt;a href="https://www.linkedin.com/in/0xandrewshu/" rel="noopener noreferrer"&gt;reach out on LinkedIn&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.ashu.co/taking-vibe-coded-into-production/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;ashu.co&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Is Claude Code 5x Cheaper Than Cursor? I Ran 12 Experiments to Find Out</title>
      <dc:creator>Andrew Shu</dc:creator>
      <pubDate>Tue, 31 Mar 2026 16:52:02 +0000</pubDate>
      <link>https://dev.to/0xandrewshu/is-claude-code-5x-cheaper-than-cursor-i-ran-12-experiments-to-find-out-315m</link>
      <guid>https://dev.to/0xandrewshu/is-claude-code-5x-cheaper-than-cursor-i-ran-12-experiments-to-find-out-315m</guid>
      <description>&lt;p&gt;In &lt;a href="https://www.ashu.co/cursor-to-claude-code-stuck-at-16-percent-utilization/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Part 1 of this series&lt;/a&gt;, I noticed something strange while using Claude Code's Max 20x plan: it was the same $200/month as Cursor Ultra, doing the same work, but my Claude Code utilization was stuck at 16% while I had been burning through Cursor's token budget. In &lt;a href="https://www.ashu.co/parallel-claude-code-agents/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Part 2&lt;/a&gt;, I figured out how to push past 50% utilization with parallel Claude Code agents.&lt;/p&gt;

&lt;p&gt;Given that I could use so many more Sonnet/Opus tokens on Claude Code, my first instinct was: "is Claude Code actually 5x cheaper than Cursor?"&lt;/p&gt;

&lt;p&gt;And then I realized you can't compare them apples to apples. I couldn't ask: &lt;em&gt;at the same price, how much token capacity does each tool actually give you?&lt;/em&gt; Their pricing models are enforced incredibly differently (see &lt;a href="https://www.ashu.co/cursor-to-claude-code-stuck-at-16-percent-utilization/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;), and Cursor has 2 pools of tokens (API, and "Auto + Composer").&lt;/p&gt;

&lt;p&gt;So instead, I came up with a metric — "agent-hours" — to serve as a proxy: &lt;em&gt;given each plan's token capacity, how many hours of agents can I run per month?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I had some hunches, but I couldn't be sure they would hold up. So, I did what any engineer with too much curiosity would do: I designed an experiment to find out.&lt;/p&gt;

&lt;p&gt;A few key caveats before we dive in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This is a loosely controlled experiment, not a rigorous benchmark. The findings are directional: order of magnitude, not precise. Readings fluctuated significantly day by day, and the product/capacity changed. But this reflects real life.&lt;/li&gt;
&lt;li&gt;I'm using Individual, not Team plans, focusing on $200/month tiers.&lt;/li&gt;
&lt;li&gt;Things change rapidly in the world of vibe coding token use, models, and costs. The 1M context window for Opus 4.6 dropped for Claude Code and then Cursor. Cursor dropped Composer 2.0, an upgrade from Composer 1.5. &lt;a href="https://x.com/trq212/status/2037254607001559305" rel="noopener noreferrer"&gt;Claude session limits were updated&lt;/a&gt; in between experiments. I normalized for differing "2x limits" promotions in &lt;a href="https://support.claude.com/en/articles/14063676-claude-march-2026-usage-promotion" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; and &lt;a href="https://github.com/openai/codex/discussions/11406" rel="noopener noreferrer"&gt;Codex&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To return to the article: my intuition suggested there was a notable difference in price, and I wanted to quantify. I learned a considerable amount digging into pricing, and this helps me understand how to make the most out of the different models.&lt;/p&gt;

&lt;p&gt;I hope this token and tool pricing analysis helps (and interests) you as much as it did me. It's a long article, but given the volatility of the experiment, I figured it would help for me to show you all the messy details and how I think about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The headline: Claude Code delivers ~5x more capacity per dollar
&lt;/h2&gt;

&lt;p&gt;Here's the summary. At $200/month on individual plans:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fspe4iowdywoj3qew3sxh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fspe4iowdywoj3qew3sxh.png" alt="Graph comparing " width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool + Plan&lt;/th&gt;
&lt;th&gt;Agent-Hours / Month&lt;/th&gt;
&lt;th&gt;vs. Cursor Ultra&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cursor Ultra ($200)&lt;/td&gt;
&lt;td&gt;~138 hours&lt;/td&gt;
&lt;td&gt;1x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex Pro ($200)&lt;/td&gt;
&lt;td&gt;~220 hours&lt;/td&gt;
&lt;td&gt;~1.6x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code Max 20x ($200)&lt;/td&gt;
&lt;td&gt;~678 hours&lt;/td&gt;
&lt;td&gt;~4.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So at the same $200/month, Claude Code gives you ~5x more room to work than Cursor.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important context before we get further.&lt;/strong&gt; This measures &lt;em&gt;capacity per month&lt;/em&gt; (for my workload + codebase): how many agent-hours your subscription delivers if you use it fully. It does not measure work quality, code correctness, or features completed. You shouldn't read it as "5x cheaper" because that assumes you can actually use all that capacity.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But this is too simplistic of a view, because there are greater nuances to the pricing. We should next look at how Cursor's pricing works, because this makes the story considerably more interesting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cursor Ultra's pricing structure: two pools of different tokens
&lt;/h2&gt;

&lt;p&gt;Before we go deeper into the comparison, we need to understand &lt;a href="https://cursor.com/docs/models-and-pricing" rel="noopener noreferrer"&gt;Cursor's pricing structure&lt;/a&gt;. Cursor Ultra doesn't give you one big pool of tokens. It gives you two, and they're dramatically different in size and model characteristics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The first pool is API credits&lt;/strong&gt;, which cover SOTA models: "state of the art" frontier models like Opus 4.6, Sonnet 4.6, and GPT-5.4 (at the time of publishing). These are usually the models scoring highest on benchmarks, and also the most expensive models available.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The second pool is "Auto+Composer" credits&lt;/strong&gt;, which cover Cursor's proprietary Composer models — faster, cheaper models that Cursor has built and optimized for code generation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you upgrade to Ultra expecting unlimited access to the best models available, what you actually get is a small allocation of frontier model credits and a much larger allocation of Composer credits. Here's how the two pools break down:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cursor Ultra Usage Pool&lt;/th&gt;
&lt;th&gt;Estimated Agent-Hours&lt;/th&gt;
&lt;th&gt;% of total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API credits (We use Opus 4.6, both 200k and 1M)&lt;/td&gt;
&lt;td&gt;~18 hours&lt;/td&gt;
&lt;td&gt;13%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto + Composer credits&lt;/td&gt;
&lt;td&gt;~120 hours&lt;/td&gt;
&lt;td&gt;87%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~138 hours&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note: API agent-hours depend on &lt;a href="https://cursor.com/docs/models-and-pricing" rel="noopener noreferrer"&gt;the price of the model&lt;/a&gt; you choose. Opus 4.6 is one of the most expensive options; a cheaper SOTA model would stretch further.&lt;/p&gt;

&lt;p&gt;That ~18 agent-hours of frontier model is a key factor to consider when you use Cursor. When I ran experiments using only Opus 4.6 on Cursor, the API pool burned through fast. When I ran experiments using Composer models, the Composer pool lasted roughly 7–8x longer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;And this is a key finding: Cursor incentivizes you to spend most of your time using the faster Composer 2 model. This seems to be a deliberate design choice, and it's a reasonable one. The combined 5x headline reflects what happens when you use Composer for most of your work, which is how Cursor intends for you to use it. If you default to frontier models, the gap is far wider.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This explains a frustration I've seen across forums and from other engineers: you upgrade to Cursor Ultra and exclusively use SOTA models, only to find out that you hit API credits faster than expected.&lt;/p&gt;

&lt;p&gt;Let's see what this looks like in numbers. We strip out the generous "Auto + Composer" tier and exclusively use SOTA models. (Again, not the optimal use of Cursor.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fibgi5beij5z6v1vywuj5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fibgi5beij5z6v1vywuj5.png" alt="Graph focusing on " width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool + Plan&lt;/th&gt;
&lt;th&gt;Agent-Hours / Month (SOTA only)&lt;/th&gt;
&lt;th&gt;vs. Cursor (SOTA)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cursor Ultra — API only ($200)&lt;/td&gt;
&lt;td&gt;~18 hours&lt;/td&gt;
&lt;td&gt;1x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex Pro ($200)&lt;/td&gt;
&lt;td&gt;~220 hours&lt;/td&gt;
&lt;td&gt;~12x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code Max 20x ($200)&lt;/td&gt;
&lt;td&gt;~678 hours&lt;/td&gt;
&lt;td&gt;~38x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;That's a 38x difference in agent-hours (ignoring the vast amount of Composer 2 tokens that Cursor provides). For engineers exclusively focused on frontier model access for complex reasoning (Opus, GPT, Gemini) and comparing Claude Code to Cursor, this is the source of their surprise.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Even this is too simplistic; I think we need to dive deeper.&lt;/p&gt;

&lt;h2&gt;
  
  
  But capacity isn't velocity: Composer 2 is genuinely fast
&lt;/h2&gt;

&lt;p&gt;Here's where the story gets more interesting than "Tool A gives you more." I tracked project completions across all 12 experiments, and the velocity data tells a different story than the capacity data.&lt;/p&gt;

&lt;p&gt;Here's how long the models took to complete Project 1, which involved a bulk rename across the project:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9vj59t42i2wsvei9sp0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9vj59t42i2wsvei9sp0.png" alt="Average duration (minutes) to complete Project 1, a large refactor" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, let's look at Project 2, which involved cutting out a set of features:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzd8v0r9ihm41jfv52mo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzd8v0r9ihm41jfv52mo.png" alt="Average duration (minutes) to complete Project 2, another large refactor. Codex did not reach the end of Project 2 so is not present." width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I think the first 2 charts provide a much better signal, because they compare 2 larger, more complex refactor projects on the same scope.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Caveat: I'm going to share the following chart, even though it's flawed. After the first 2 large projects, I queued up many small projects like "research X and then build a small full stack feature".&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But nonetheless, I wanted to share the different feeling of speed as I worked with different models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybts3xsqnx4bwd2giumy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybts3xsqnx4bwd2giumy.png" alt="Graph showing average overall projects completed. Don't read numbers too literally — projects were unevenly sized. But it illustrates the feeling of velocity when using Composer 2." width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In all charts, the Composer models were at least 2x faster than the other models. Since it finished the first 2 larger projects, it was able to race ahead to do all the small projects at the end. If you have a mix of small/large projects, Composer's lead may pull it ahead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You might notice the Opus 4.6 200k/1M models not showing a clear trend. The sample size was small, so the fluctuation is a bit noisy.&lt;/p&gt;

&lt;p&gt;So, speed is another tradeoff when choosing tools. Claude Code may give you more capacity per dollar. But using Cursor Composer can dramatically increase throughput. If the work is clearly defined and implementation-focused, you may get more done in fewer agent-hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick aside about Codex + GPT 5.4
&lt;/h2&gt;

&lt;p&gt;If you're looking at Codex + GPT 5.4's velocity, you might notice that it didn't move as quickly. I wouldn't read too much into it. Each metric gives you a part of the picture; each tool has different strengths and weaknesses.&lt;/p&gt;

&lt;p&gt;Firstly, I'm not as proficient at Codex's quirks as I am with Claude Code, so I don't know how to squeeze the most juice out of it. I noticed that during the experimental runs, GPT was much more cautious and spent more time slicing up the work into different groups.&lt;/p&gt;

&lt;p&gt;And qualitatively, consider the &lt;a href="https://www.youtube.com/watch?v=HD5TWE8xD7o" rel="noopener noreferrer"&gt;multiple&lt;/a&gt; pieces of &lt;a href="https://x.com/mitchellh/status/2029348087538565612" rel="noopener noreferrer"&gt;anecdotal&lt;/a&gt; &lt;a href="https://developers.openai.com/community#:~:text=My%20new%20Sunday%20morning%20routine,%40youyuxi" rel="noopener noreferrer"&gt;evidence&lt;/a&gt; that Codex and GPT 5.4 can solve complex issues and that people are loving it. I've been hearing similar things in my conversations with colleagues. It's a potent tool and you should definitely give it a shot.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I tested and how
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The setup
&lt;/h3&gt;

&lt;p&gt;I ran all 12 experiments on the same codebase: a monorepo with Elixir/Phoenix, React, and Terraform infrastructure, roughly 80k lines of code. Every experiment started from the same git commit. I used 4 parallel agents per tool, each on a separate git worktree (the same setup I described in &lt;a href="https://www.ashu.co/parallel-claude-code-agents/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Part 2&lt;/a&gt;). Each agent worked through the same sequence of self-contained refactoring projects: rename all instances of X, extract a module, add an API integration.&lt;/p&gt;

&lt;p&gt;Each experiment ran roughly 60 minutes. I played a lightweight manager role — confirming "done" claims, assigning the next project. My controls tightened over the week as I learned what to watch for.&lt;/p&gt;

&lt;p&gt;If you're interested in the raw data, reach out &lt;a href="https://www.linkedin.com/in/0xandrewshu/" rel="noopener noreferrer"&gt;via LinkedIn&lt;/a&gt; or &lt;a href="https://x.com/0xAndrewShu" rel="noopener noreferrer"&gt;on X&lt;/a&gt;. If there's enough interest, I'd be happy to publish it on my Github.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tool configurations
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;Codex&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interface&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;CLI / Agent mode&lt;/td&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opus 4.6 (200k, 1M context)&lt;/td&gt;
&lt;td&gt;Opus 4.6 / Composer 1.5 / Composer 2 (varied)&lt;/td&gt;
&lt;td&gt;GPT-5.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Plan tested&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Max 5x ($100)&lt;/td&gt;
&lt;td&gt;Pro+ ($60) → Ultra ($200)&lt;/td&gt;
&lt;td&gt;Pro ($200)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Autonomy mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Accept edits on&lt;/td&gt;
&lt;td&gt;CLI with allow-listing (not YOLO)&lt;/td&gt;
&lt;td&gt;Runs commands without asking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parallel instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few notes. I tested Claude Code on Max 5x ($100), not Max 20x ($200). The 20x projection uses Anthropic's published 4x multiplier — more on this in the calculations section. All three tools ran in semi-autonomous mode with different allow-listing behavior, which affects velocity asymmetrically and is unavoidable. Both Claude Code and Codex had active 2x capacity promotions during this period. Codex's promo applied 24/7. Claude Code's applied during specific off-peak hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I measured
&lt;/h3&gt;

&lt;p&gt;For Claude Code, I tracked the percentage of the 5-hour session consumed and the percentage of the weekly limit consumed. For Cursor, I tracked dollar amounts of API usage and Auto/Composer usage consumed, plus the combined total percentage. For Codex, I tracked the same session and weekly percentages as Claude Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I calculated capacity
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Defining "agent-hours"
&lt;/h3&gt;

&lt;p&gt;An agent-hour equals one agent running for one hour. If 4 agents run for 1 hour, that's 4 agent-hours. The key question: how many agent-hours does each plan sustain in a month?&lt;/p&gt;

&lt;h3&gt;
  
  
  Session-based tools (Claude Code, Codex)
&lt;/h3&gt;

&lt;p&gt;Technically, there are 2 limits: the 5-hour session limit and the weekly limit. The weekly limit is always more constricting than the sum of all the 5-hour session limits.&lt;/p&gt;

&lt;p&gt;For each experiment, I measured the usage % at the start and end of the session, and calculated the difference. Since I know how many minutes the experiment ran, I calculate the "percentage consumed per minute" of both the 5-hour session capacity and the weekly limit. Monthly projection: weekly capacity × ~4 weeks × 4 agents.&lt;/p&gt;

&lt;p&gt;To normalize the "5-hour session capacity" to "weekly capacity", I figured that 7 days has 168 hours. Thus, 168h / 5h = 33.6 sessions. If I can reach 100% capacity in 70 minutes, then I can multiply that by 33.6 sessions, and get 2,352 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor's two-pool system
&lt;/h3&gt;

&lt;p&gt;This is where the SOTA vs Composer insight emerges naturally from the math.&lt;/p&gt;

&lt;p&gt;I measured the percentage consumed per minute of the monthly API pool (from the Opus-on-Cursor experiments) and separately the monthly Auto+Composer pool (from the Composer experiments). The API pool yielded roughly 1,065 agent-minutes per month, or about 18 agent-hours. The Auto+Composer pool yielded roughly 7,200 agent-minutes, or about 120 agent-hours. Combined: ~138 agent-hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Max 5x → Max 20x projection
&lt;/h3&gt;

&lt;p&gt;All of my Claude Code experiments ran on the Max 5x plan ($100/month). To estimate Max 20x ($200/month), I used Anthropic's published multiplier.&lt;/p&gt;

&lt;p&gt;Anthropic's &lt;a href="https://support.claude.com/en/articles/11049741-what-is-the-max-plan" rel="noopener noreferrer"&gt;support documentation&lt;/a&gt; states that Max 5x provides 5x Pro usage and Max 20x provides 20x Pro usage — so Max 20x = 4x Max 5x capacity. This is a projection, not a measurement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Off-peak and promo normalization
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://support.claude.com/en/articles/14063676-claude-march-2026-usage-promotion" rel="noopener noreferrer"&gt;Anthropic's 2x off-peak discount&lt;/a&gt; applied to several experiments. I normalized by halving observed capacity during off-peak hours: conservative but approximate. I also ran experiments during peak, off-peak, and on the threshold of both.&lt;/p&gt;

&lt;p&gt;When it was on the threshold of both, I just removed the values from the calculation. I was curious how the code would behave.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/openai/codex/discussions/11406" rel="noopener noreferrer"&gt;Codex's 24/7 2x promo&lt;/a&gt; (through April 2) was similarly halved. Both the promo and normalized figures are shown throughout for transparency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Walking through the math: Experiment 7
&lt;/h3&gt;

&lt;p&gt;Let me show the math for Experiment 7 comparing Claude Code vs Cursor Ultra, both using Opus 4.6.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code (Opus 4.6 1M window):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The weekly limit went from 37% to 42% over 60 minutes with 4 agents — that's 5% of weekly capacity consumed.&lt;/li&gt;
&lt;li&gt;Weekly capacity = 100% / 5% × 60 min ≈ 1,200 minutes of 4-agent usage.&lt;/li&gt;
&lt;li&gt;That's with the 2x off-peak discount.&lt;/li&gt;
&lt;li&gt;Normalize to 1x: 1,200 minutes / 2 = 600 minutes.&lt;/li&gt;
&lt;li&gt;Monthly: 600 × 4 weeks ≈ 2,400 minutes.&lt;/li&gt;
&lt;li&gt;Convert to agent-hours: 2,400 / 60 × 4 concurrent agents ≈ 160 agent-hours on Max 5x.&lt;/li&gt;
&lt;li&gt;Apply the 4x multiplier for Max 20x: ~640 agent-hours.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cursor Ultra (API Pool, Opus 4.6 200K window):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API credits went from 0% to 26% over 60 minutes.&lt;/li&gt;
&lt;li&gt;Monthly API pool capacity: 100% / 26% × 60 minutes = ~231 minutes&lt;/li&gt;
&lt;li&gt;Normalize to agent-hours: ~231 mins × 4 agents / 60 mins/hour ≈ 15.4 agent-hours.&lt;/li&gt;
&lt;li&gt;Since this experiment used only Opus (a frontier model), only the API pool was consumed. We borrow the estimated ~138 total agent-hours for Cursor's 2 pools for the combined estimates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this single experiment, Claude Code Max 20x delivers roughly 41x more than Cursor's API pool (640 / 15.4), or roughly 4.6x more than Cursor's combined capacity (640 / 138). Other experiments produced different ratios depending on the model, discounting, and control tightness. The ~5x headline is the central estimate across all experiments.&lt;/p&gt;

&lt;p&gt;This is back-of-the-spreadsheet math, not a precise benchmark. But for an order-of-magnitude comparison, it's enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qualitative observations
&lt;/h2&gt;

&lt;p&gt;A few things that don't show up in the numbers but matter for choosing a tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Composer 2 velocity was great.&lt;/strong&gt; Some of the projects were eye-opening: Composer 2 raced through an average of 7.1 projects to Opus 4.6's 2.3. Experiencing it in real time was striking. Whether that speed holds up on complex, ambiguous tasks is an open question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opus 4.6 performed consistently across both platforms.&lt;/strong&gt; Same model, same velocity on Claude Code and Cursor. The capacity difference between these tools is pricing architecture, not model quality. If you're choosing based on model capability, both platforms give you access to the same thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token consumption is volatile day to day.&lt;/strong&gt; Model updates, features, regressions, and discounting all hit during the same period. This may have caused noise in the experimental data, but it's also representative of daily life at a particularly active time in the technology and business of AI coding tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The capacity gap is real: ~5x combined, ~38x on frontier models.&lt;/strong&gt; If you use Claude Code with Opus (its default), you get substantially more runway per dollar than Cursor. If you only compare frontier model access, it's not close.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. To make the most out of Cursor, you should be using Composer a lot.&lt;/strong&gt; Most of your Ultra budget buys Composer credits, not SOTA access. If Composer fits your workflow, you get ~138 agent-hours and strong velocity. If you want frontier models full-time, Cursor becomes extremely expensive per agent-hour. A common pattern is to use SOTA models for initial planning and research, then Composer models to implement the plan much more rapidly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Velocity matters — Composer 2 is much faster at completing projects.&lt;/strong&gt; More capacity doesn't automatically mean more output. An engineer running Composer 2 on tasks may complete more work in 138 hours than another running Opus.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The pricing model shapes your workflow.&lt;/strong&gt; Claude Code's speed-limit model rewards consistent daily usage with parallel agents. Cursor's monthly budget is more forgiving for bursty schedules. The "best" plan depends on how you work, not just the capacity math. (I covered this difference in &lt;a href="https://www.ashu.co/cursor-to-claude-code-stuck-at-16-percent-utilization/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Codex is a real contender&lt;/strong&gt; at ~1.6x Cursor's capacity, and a number of engineers I know and follow online have been enjoying Codex for its knack at solving harder problems that Opus 4.6 may have challenges with. And you get the SOTA model for all the agent-hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. With Anthropic's "capacity reductions" for 7% of users, I ran out more often in the 5-hour session, but not necessarily the weekly session.&lt;/strong&gt; I'm not 100% sure yet, because the measurements keep fluctuating. But the weekly session seems to be similar to what it was before. And since it is the constraining factor, running out of 5-hour sessions may not necessarily mean that I have overall fewer tokens per month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats and open questions
&lt;/h2&gt;

&lt;p&gt;This section is long on purpose. The caveats are as important as the findings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experimental design limitations
&lt;/h3&gt;

&lt;p&gt;No two experiments were identical. Models changed, plans changed, promotions came and went. Each experiment is a snapshot of a specific configuration on a specific day.&lt;/p&gt;

&lt;p&gt;I was the human bottleneck. Confirming "done" claims, assigning projects, occasional breaks — all of this introduces noise. Semi-autonomous mode created asymmetry across tools: each tool pauses at different moments for permission, which affects velocity differently and is unavoidable.&lt;/p&gt;

&lt;p&gt;Also, velocity was not the primary objective, since I was interested in token capacity (or agent-hours). In particular, code quality was probably decent but not audited. From my experience, the AI agents usually get most of the way to the finish line.&lt;/p&gt;

&lt;p&gt;And, Codex and Claude Code both have lighter, faster models (e.g. GPT 5.4 mini, Sonnet) for varying speed and token usage.&lt;/p&gt;

&lt;p&gt;There are many interesting variables and questions, and I didn't test them out for the sake of time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limitations in measurement and extrapolation
&lt;/h3&gt;

&lt;p&gt;The whole purpose of this experiment is to normalize across tools that report usage in fundamentally different units, and that's also the main source of imprecision. Claude Code reports percentages of session and week. Cursor reports dollars for API plus a separate pool for Composer, with a combined total. Converting between these systems requires assumptions.&lt;/p&gt;

&lt;p&gt;Resolution of the measurements is often low. If your measurement jumps from 0% to 3% in an hour, the true value could be anywhere from 3.0% to 3.99% — a roughly 33% range of uncertainty. For that reason, I ran multiple experiments to get a sense of averages and ranges. Using 4 agents helped me accelerate burn to see more numerical change in less time.&lt;/p&gt;

&lt;p&gt;I simplified my extrapolation for agent-hours by multiplying the weekly estimated agent-hours by 4, totaling 28 days. Technically, the average number of days is slightly over 30.&lt;/p&gt;

&lt;h3&gt;
  
  
  The chaotic experimental window
&lt;/h3&gt;

&lt;p&gt;I get the sense that something around March 13 or March 14 may have changed Claude Code's token burn to accelerate.&lt;/p&gt;

&lt;p&gt;Moreover, the 2x off-peak discount launched March 14 and ended March 28. I normalized by halving, but the normalization is an approximation. Composer 2 shipped March 19, Experiment 7 may not represent steady state, though Experiment 8 (March 20, no discount) confirms the pattern. Codex's 2x promo was active through April 2, normal-rate Codex may go to 0.8x rather than ~1.6x Cursor. Or, focusing on frontier models, 6.2x instead of 12.4x.&lt;/p&gt;

&lt;p&gt;I could have waited for a quiet week. But there hasn't been a quiet week in AI coding tools in months. This chaos &lt;em&gt;is&lt;/em&gt; normal usage — the launches, the promotions, the regressions. A perfectly controlled experiment would be more precise but less representative of what you'd actually experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open questions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Does the capacity gap change for different work types: greenfield vs refactoring vs debugging?&lt;/li&gt;
&lt;li&gt;What about for tech stack? I was doing full-stack engineering in Elixir/React/Terraform. How does that change for Python/Svelte/Pulumi? Firmware? Mobile? SRE? Database internals?&lt;/li&gt;
&lt;li&gt;What's the quality gap? If any of the models' speed comes at a quality cost, the velocity advantage shrinks.&lt;/li&gt;
&lt;li&gt;How does this look on team and enterprise plans, particularly Claude Code Premium Seats in Teams?&lt;/li&gt;
&lt;li&gt;Will these numbers hold as all the companies adjust pricing and models adjust velocity?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tips for reducing token usage
&lt;/h3&gt;

&lt;p&gt;I wanted to share a few resources I found online or heard while discussing this with friends and colleagues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're using Opus, consider switching to Sonnet as your default model.&lt;/strong&gt; A few of my friends report that Sonnet is similarly effective, but faster and more token efficient. I've been mostly focused on Opus, so I can't speak to this directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reading &lt;a href="https://code.claude.com/docs/en/costs#reduce-token-usage" rel="noopener noreferrer"&gt;Claude Code best practices&lt;/a&gt;.&lt;/strong&gt; Regardless which tool you're using, some of the concepts in the guide may help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clearing context more frequently is an easy change.&lt;/strong&gt; My experiments ran on models with 1M context, and I just let them run and auto-compact over the course of the hour. I believe the whole conversation gets sent up (minus caching effects), so clearing might be impactful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cron job 2–3 hours before you start your work day to send Claude/Codex a trivial message.&lt;/strong&gt; Given that the 5-hour session limit is a constraining factor, consider that you typically have 2 sessions in an 8-hour work day. You can get a 3rd window if you trigger it before you start the bulk of your work. Note that in the end, you'll still hit the weekly constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use "token-reducing" libraries like &lt;a href="https://www.rtk-ai.app/" rel="noopener noreferrer"&gt;RTK&lt;/a&gt;.&lt;/strong&gt; The premise is that a lot of CLI binaries that the AI coding agents call generate noisy output that is bad for LLMs. It creates a proxy to optimize the tokens. Consider looking for more, since this is a class of tooling. In the CLI, there is &lt;a href="https://github.com/mpecan/tokf" rel="noopener noreferrer"&gt;tokf&lt;/a&gt;. There are also prompt compressors like &lt;a href="https://github.com/microsoft/LLMLingua" rel="noopener noreferrer"&gt;Microsoft's LLMLingua&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current events: recent news about token costs
&lt;/h2&gt;

&lt;p&gt;At the cost of extending this article further, I wanted to highlight a few recent items of news as it pertains to this analysis.&lt;/p&gt;

&lt;p&gt;On March 5, 2026, &lt;a href="https://www.forbes.com/sites/annatong/2026/03/05/cursor-goes-to-war-for-ai-coding-dominance/" rel="noopener noreferrer"&gt;Forbes reported&lt;/a&gt; that Cursor's internal analysis showed that the $200/mo Claude Code subscription could get $2,000 of tokens at the end of last year, and in early March 2026 was getting $5,000 of tokens. On the other hand, compare that to Cursor's $200/mo plan offering &lt;a href="https://cursor.com/docs/models-and-pricing" rel="noopener noreferrer"&gt;$400/mo of API usage&lt;/a&gt; plus generous Auto+Composer. But the reason I was interested in this experiment is to begin to translate it to "how many hours of engineering work can I do with this?" and begin to quantify this.&lt;/p&gt;

&lt;p&gt;Also on March 5, 2026, investor-entrepreneur &lt;a href="https://x.com/chamath/status/2029634071966666964" rel="noopener noreferrer"&gt;Chamath Palihapitiya tweeted&lt;/a&gt; that his company 8090 chose to migrate off of Cursor because AI costs have tripled since November 2025, and are "now spending many millions per year", trending to $10m per year. He mentions that it may be how the engineers are using the tooling as well, e.g. running runaway loops ("Ralph loops") without regard to cost. But the main point is that it's a topic of interest and an area worth thinking about.&lt;/p&gt;

&lt;p&gt;Around the weeks of March 14–26, users were reporting increased token burn rates. (See my LinkedIn posts noting my initial observation on &lt;a href="https://www.linkedin.com/posts/0xandrewshu_fascinating-saturday-i-measured-that-activity-7439405612087635968-AseS/" rel="noopener noreferrer"&gt;March 14&lt;/a&gt;, then &lt;a href="https://www.linkedin.com/posts/0xandrewshu_in-the-last-2-days-folks-on-x-and-reddit-activity-7442678814041632769-XBr5/" rel="noopener noreferrer"&gt;my LinkedIn post&lt;/a&gt; when it trended on X on March 25). It looks like Anthropic announced &lt;a href="https://x.com/trq212/status/2037254607001559305" rel="noopener noreferrer"&gt;a capacity change&lt;/a&gt; on March 26, estimated to affect a minor ~7% of users. But as of publishing this article (Mar 30), it seems like &lt;a href="https://x.com/lydiahallie/status/2038686571676008625" rel="noopener noreferrer"&gt;they're still working on it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I speculate that Anthropic tweaked the 5-hour token limits which helps them with scale, but the weekly token limits didn't change that much. If that's true, then overall monthly token capacity doesn't change much. It just means you run into the limits a lot more per day. (You might try that cron job I mention above.)&lt;/p&gt;

&lt;p&gt;Anyways, this article represents a moment in time as our use of the tool and the pricing models around it change. Last June/July (2025), Cursor &lt;a href="https://techcrunch.com/2025/07/07/cursor-apologizes-for-unclear-pricing-changes-that-upset-users/" rel="noopener noreferrer"&gt;changed its pricing models&lt;/a&gt; in a way that upset users. I wouldn't be surprised if this continues to change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;This started with a pricing question and ended up capturing a lot more. While the technology, pricing and business will continue to evolve, I wanted to do this deep dive to understand a snapshot of the ecosystem today. As things evolve further, I can have an anchoring mental model to reason about future changes.&lt;/p&gt;

&lt;p&gt;The choice isn't just "cheaper vs more expensive." It's what kind of capacity you need. Frontier model capacity for complex reasoning? Reach for Claude Code or Codex. Fast implementation throughput on well-scoped tasks, or you prefer an IDE? Cursor Composer has a real speed advantage when you combine frontier models for planning and troubleshooting with fast, lightweight models. Most engineers probably need some of both — the question is which default fits your workflow.&lt;/p&gt;

&lt;p&gt;I plan to keep running experiments as both tools evolve. If you're interested in discussing the findings, seeing the raw data, or talking about token math — I'd like to hear about it: &lt;a href="https://www.linkedin.com/in/0xandrewshu/" rel="noopener noreferrer"&gt;connect on LinkedIn&lt;/a&gt; or find me &lt;a href="https://x.com/0xAndrewShu" rel="noopener noreferrer"&gt;on X&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is Part 3 of my series on transitioning from Cursor to Claude Code. Catch up: &lt;a href="https://www.ashu.co/cursor-to-claude-code-stuck-at-16-percent-utilization/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Part 1: Stuck at 16%&lt;/a&gt;, &lt;a href="https://www.ashu.co/parallel-claude-code-agents/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Part 2: Parallel Agents&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>From 1 to 3 Parallel Claude Code Agents: How I Broke Past 16% Utilization</title>
      <dc:creator>Andrew Shu</dc:creator>
      <pubDate>Tue, 17 Mar 2026 16:41:24 +0000</pubDate>
      <link>https://dev.to/0xandrewshu/from-1-to-3-parallel-claude-code-agents-how-i-broke-past-16-utilization-mee</link>
      <guid>https://dev.to/0xandrewshu/from-1-to-3-parallel-claude-code-agents-how-i-broke-past-16-utilization-mee</guid>
      <description>&lt;p&gt;At the end of my last post, I was: &lt;a href="https://www.ashu.co/cursor-to-claude-code-stuck-at-16-percent-utilization/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;stuck at 16% Claude Code utilization&lt;/a&gt; on the Max 20x plan, and had just figured out that parallel agents could help me break past that limit, explore git worktrees, and make better use of the $200 plan I had paid for. &lt;/p&gt;

&lt;p&gt;But I knew that git commits and conflicts would be a problem as 2 agents make the same commits in the same repository. So how do I coordinate and isolate them?&lt;/p&gt;

&lt;p&gt;Spinning up a second (or third) agent in another terminal is easy. But keeping them productive and increasing velocity was the new challenge. I had been reading many posts about people orchestrating 10's or 100's of background agents, but I haven't read many tutorials covering the evolution from 1 to 3 agents. &lt;/p&gt;

&lt;p&gt;Anthropic recently shipped &lt;a href="https://code.claude.com/docs/en/agent-teams" rel="noopener noreferrer"&gt;Claude Code Agent Teams&lt;/a&gt;, which automates this: a lead agent coordinates teammates, assigns tasks, and synthesizes results across multiple sessions. But this is more about automated delegation of &lt;em&gt;a single existing&lt;/em&gt; project rather than adding the ability to parallelize arbitrary new projects. &lt;/p&gt;

&lt;p&gt;This post covers the observations, changes in my local development environment and the reasoning that took me from 16% to 50%+ utilization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token FOMO: how were people using 3+ Claude Code accounts?
&lt;/h2&gt;

&lt;p&gt;I'll confess: I was feeling token FOMO, watching engineers post articles about their agent squads and getting 100 agents run in parallel. Meanwhile, I was stuck at 16% on a $200 / month. I didn't feel like I needed 100 agents, but I wanted to understand how to break past that to get on the path to higher output.&lt;/p&gt;

&lt;p&gt;After the initial experiment with a second agent to consume more tokens, I realized that extra agents would be chaos: merge conflicts, inconsistent databases, agents pulling the rug out from under each other. &lt;/p&gt;

&lt;p&gt;So, I decided to investigate how to coordinate them. This eventually took me on a journey that improved my workflow. But before I took the first step, I realized I needed to ask: would it actually increase my output?&lt;/p&gt;

&lt;h2&gt;
  
  
  Before adding a second agent, make sure the first is busy
&lt;/h2&gt;

&lt;p&gt;It's a bit counterintuitive. It's so easy to spin up a second agent that it's also easy to miss that you're only getting value if you're able to keep both agents mostly busy. Here are a few hints to figure out where you are.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fko0dhfooe2svc6w7xli2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fko0dhfooe2svc6w7xli2.png" alt="Visualization showing 1 busy agent beats 3 low-utilization agents." width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;If your agent is often waiting on you:&lt;/strong&gt; waiting for an answer to its question or needing clarification on ambiguous tasks, it's not doing real work. You probably need a way to keep them busy better, and this is where &lt;a href="https://www.ashu.co/markdown-plan-files-vibe-coding/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;structured markdown plan files&lt;/a&gt; pay off. Or the agent is waiting on you to execute some queries or deployment, maybe the problem is a tooling problem - you need to automate something and put it in the hands of the agents (if it's safe to).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you keep interrupting the agent to micromanage,&lt;/strong&gt; it's effectively the same problem. It may be helpful to review the markdown plan files and get them into more agreeable state before you let the agents run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If your agents keep churning out work that you end up disagreeing with&lt;/strong&gt; (despite having reasonable plans), then the workload may not be good for parallelizing. I see this often when troubleshooting complex systems or complex projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you prefer to hand-code, or you don't like context-switching&lt;/strong&gt; between multiple agents and tasks, you have a totally valid reason not parallelize. Not every workflow benefits from more agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;But if you're sitting idle while the agent runs&lt;/strong&gt;—working on the next task, responding to Slack, browsing Reddit—this is the signal you can juggle another agent. Basically: if you're waiting on the agent regularly, committing and shipping regularly, then add more agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two agents crammed in a repo: isolation is a problem
&lt;/h2&gt;

&lt;p&gt;When you're ready for Agent 2, you become an engineering manager and face the problem of assigning useful work to your team. You need to source the work: come up with ideas, talk to people. You need to scope it so it's parallelizable and pragmatic. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzovyamgnsh40jrvd71g8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzovyamgnsh40jrvd71g8.png" alt="Visualization of file editing collisions from 2 agents working in the same repo." width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But there's a technical problem you'll face first. If two agents are touching the same files, you'll get merge conflicts, overwritten work and outdated understanding of the code. In the first few hours working with parallel coding agents, I tried to keep them productive and focusing on separate concerns in the same repository. &lt;/p&gt;

&lt;p&gt;Here are a few methods I used to separate the agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend / backend split: these are often separate concerns in separate files&lt;/li&gt;
&lt;li&gt;Application and infrastructure code: e.g. one agent writes Typescript, and another, Terraform&lt;/li&gt;
&lt;li&gt;Feature pipelining: first ship feature 1 behind a feature flag, and validate it / work on corner cases while another agent starts feature 2&lt;/li&gt;
&lt;li&gt;Async refactoring, hardening, polishing, documentation: sometimes if I have a bit of extra bandwidth, I'll spin up an extra agent to do maintenance that avoids my main work. It's useful to accumulate maintenance tasks in a backlog for the agent to pull from.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After a few coding sessions, I realized "separation" wasn't enough: I was trying to keep agents separate through convention rather than configuration. Collision reduction wasn't enough; I needed to properly isolate them to eliminate collisions so they can be more autonomous and be faster.&lt;/p&gt;

&lt;p&gt;And note: these agent "separation" methods may not have fully solved the "isolation" problem. But they're useful scoping / delegation approaches to fully isolated agents, too.&lt;/p&gt;

&lt;p&gt;I also began to be aware of the kinds of workloads that required more active attention, and some kinds of workloads that ran longer. I knew that I could only handle 1 workload that required active attention, which meant the other agents needed longer projects. And having longer projects meant a bit of planning ahead. &lt;/p&gt;

&lt;h2&gt;
  
  
  Git worktrees: multiple coding agents in the same repo
&lt;/h2&gt;

&lt;p&gt;Once you have isolated, right-sized projects, I was still running into the risk of git conflicts. Parallel coding agents are still editing files in the same directory.&lt;/p&gt;

&lt;p&gt;Git worktrees solve this problem. Each worktree is a separate checkout of the same repository: a different folder, a different branch, but linked to the same git history and object store. They're lightweight to create, and you can have 3 agents working in the same sandbox, all contributing back to the same repo. Learn more about &lt;a href="https://git-scm.com/docs/git-worktree" rel="noopener noreferrer"&gt;git worktrees&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Why Git worktrees instead of separate clones? They're a bit more lightweight because they're linked to a history of the various branches and commits. So, for example, a single &lt;code&gt;git fetch&lt;/code&gt; makes it into all worktree directories. And commits in each worktree are known by the others.&lt;/p&gt;

&lt;p&gt;There are a few ways to set this up:&lt;/p&gt;

&lt;h3&gt;
  
  
  Worktrees with git
&lt;/h3&gt;

&lt;p&gt;Git worktrees are both branches and folders, so here are a few commands you can use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List worktrees&lt;/span&gt;
git worktree list

&lt;span class="c"&gt;# Create a new worktree AND a new branch in 1 command&lt;/span&gt;
git worktree add &lt;span class="nt"&gt;-b&lt;/span&gt; &amp;lt;new-branch-name&amp;gt; &amp;lt;path/to/new/directory&amp;gt;

&lt;span class="c"&gt;# Create a new worktree with an existing branch&lt;/span&gt;
git worktree add &amp;lt;path&amp;gt; &lt;span class="o"&gt;[&lt;/span&gt;&amp;lt;branch&amp;gt;]

&lt;span class="c"&gt;# Remove the worktree (i.e. the folder) but the branch remains&lt;/span&gt;
git worktree remove &amp;lt;path&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Built-in support in vibe coding tools
&lt;/h3&gt;

&lt;p&gt;I won't elaborate here, but I'll link to the documentation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cursor: &lt;a href="https://cursor.com/docs/configuration/worktrees" rel="noopener noreferrer"&gt;Parallel Agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude Code: &lt;a href="https://code.claude.com/docs/en/common-workflows#run-parallel-claude-code-sessions-with-git-worktrees" rel="noopener noreferrer"&gt;Run parallel Claude Code sessions with Git worktrees&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Codex: &lt;a href="https://developers.openai.com/codex/app/worktrees/" rel="noopener noreferrer"&gt;Worktrees&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note: If you want automated multi-agent coordination, check out &lt;a href="https://code.claude.com/docs/en/agent-teams" rel="noopener noreferrer"&gt;Claude Code Agent Teams&lt;/a&gt;. This is useful to parallelize tasks within a single project. The worktree-based approach I describe is slightly different. You can create and control your own system of parallel agents to launch multiple, arbitrary projects. Claude Code Agent Teams lets you burn down a project's list faster, and the worktree-based approach lets you branch out to work on multiple projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  3 parallel API servers, 3 frontend servers, 3 databases
&lt;/h2&gt;

&lt;p&gt;Git worktrees went a long way, but I noticed that verifying my code was tedious. Stateless unit tests were easy, I could run them to my heart's content in each worktree. But integration tests that touched the database encountered different postgres schemas. And my local servers obviously collided on ports. &lt;/p&gt;

&lt;p&gt;Here's what a sample web server might look like, with an API server and UI server that each have environment variables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q4otvknx3gyoi983d8r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q4otvknx3gyoi983d8r.png" alt="Baseline visualization of a sample web server. We want to replicate this, 1 per agent." width="800" height="472"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I found myself spinning up and tearing down servers, running migrations back and forth. I started thinking about how to isolate them a bit better by extracting configs into environment files, then parameterizing different ports and databases. &lt;a href="https://12factor.net/" rel="noopener noreferrer"&gt;Classic DevOps practices&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj5xuwhw6h9nq12z5ezqi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj5xuwhw6h9nq12z5ezqi.png" alt="Visualization of worktree creation / teardown so we can run multiple, isolated services locally." width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, I wrote a few Claude Skills that wrap the underlying git worktree command: create a worktree, allocate ports, provision databases, generate &lt;code&gt;.env&lt;/code&gt; files, install packages and register the ports/database in a central JSON file. &lt;/p&gt;

&lt;p&gt;I've open-sourced a generic version that you can customize for your application: you can find it &lt;a href="https://github.com/0xandrewshu/ai-utils/tree/main/skill-worktree" rel="noopener noreferrer"&gt;on Github&lt;/a&gt;. You'll need to customize the setup/teardown to the specifics of your environment. While I can't make it turnkey for every solution, I wanted to share the structural elements: where it runs and how it runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Results: from 16% to 50% with 3 parallel agents
&lt;/h3&gt;

&lt;p&gt;Parallelizing coding agents was fast and straightforward: for a small application, I reconfigured my tooling in about a day. With the agent Skills I shared in Github, I was able to create a worktree and provision the environment in such a way that I could scale up the number of parallel agents to 3 and beyond.&lt;/p&gt;

&lt;p&gt;Whereas I was stuck at 16% before, I was quickly hitting 45+% consistently. I got to the point where I started bumping up against the weekly rate limits. And I could finally see the pathway to 10 or more agents, and the need for 2 or more Max 20x subscriptions.&lt;/p&gt;

&lt;p&gt;But in the end, it wasn't about getting to the top of a token leaderboard. Boosting my Claude Code utilization from 16% up was an exercise to ground my use of AI coding agents towards getting useful work. &lt;/p&gt;

&lt;p&gt;It was helpful to work on a small project to exercise my software development lifecycle: planning, implementing, testing, and running on simple cloud infrastructure. What's the use of fast coding if you can't operate and troubleshoot it?&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't feel FOMO about orchestrating 10+ parallel agents
&lt;/h2&gt;

&lt;p&gt;It's important to keep our eyes on the goal: to build things people use, enjoy and get value from.&lt;/p&gt;

&lt;p&gt;There are folks who are pushing limits and are aiming to build fleets of 10's or 100's of parallel agents. That's awesome, and I can't wait to see what abstractions and tools they create to make it useful for the rest of us.&lt;/p&gt;

&lt;p&gt;But, I wanted to figure out the pathway to parallel agents in a grounded, lightweight way. I wanted to figure out &lt;em&gt;when&lt;/em&gt; to parallelize, &lt;em&gt;how&lt;/em&gt; to split up the work, and &lt;em&gt;what infrastructure to set up&lt;/em&gt;. AI coding agents are clearly accelerating our work, but I wanted to feel the rough edges so I know how specific tools solve specific problems.&lt;/p&gt;

&lt;p&gt;If you take one thing from this: don't start with the tooling. Start by getting your first agent fully occupied, then find an isolated task for a second. The changes you make should follow the problems you encounter.&lt;/p&gt;

&lt;p&gt;If you want to skip the manual setup, &lt;a href="https://github.com/0xandrewshu/ai-utils/tree/main/skill-worktree" rel="noopener noreferrer"&gt;grab the worktree bootstrap script on GitHub&lt;/a&gt; and customize it for your project. It handles port allocation, database creation, and env config for Rails, Phoenix, Django, and similar stacks. Check out the &lt;code&gt;readme.md&lt;/code&gt; for instructions.&lt;/p&gt;

&lt;p&gt;Now that I'm running 3 agents and burning tokens 3x faster, the cost comparison between Claude Code and Cursor becomes interesting again. Next up: I'm going to dig into the pricing math to answer my original question about why switching from Cursor to Claude Code seemed to drop my token usage by 64%.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is Part 2 of my 3-part series on my experience transitioning from Cursor to Claude Code. Catch up: &lt;a href="https://www.ashu.co/cursor-to-claude-code-stuck-at-16-percent-utilization/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;Part 1: Stuck at 16%&lt;/a&gt;. Part 3 next week.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>tutorial</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Switched from Cursor to Claude Code and Got Stuck at 16% Utilization</title>
      <dc:creator>Andrew Shu</dc:creator>
      <pubDate>Fri, 13 Mar 2026 19:16:47 +0000</pubDate>
      <link>https://dev.to/0xandrewshu/i-switched-from-cursor-to-claude-code-and-got-stuck-at-16-utilization-4ca6</link>
      <guid>https://dev.to/0xandrewshu/i-switched-from-cursor-to-claude-code-and-got-stuck-at-16-utilization-4ca6</guid>
      <description>&lt;p&gt;While tinkering over the holidays, I remember thinking: "This is so strange! I was easily reaching $350 of Claude tokens in Cursor usage for the month. After switching to Claude Code, I was barely making it past 16% in Claude Code's 5-hour sessions. Comparing Claude vs Cursor's $200 plans, they both cost $200 / month. It's the same work, same velocity, yet I'm experiencing totally different limits."&lt;/p&gt;

&lt;p&gt;Given my ops and scaling experience, I'm mindful of how much it costs to operate software. So I obviously couldn't leave this alone! This journey started out with me worried I had overpaid for a $200 plan but ended up leading to a significant acceleration in my workflows as I tried to make full use of my Claude Code allotment.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Cursor to Claude Code: monthly token counter vs 5-hour speed limits
&lt;/h2&gt;

&lt;p&gt;Within 15 minutes of using Claude Code, I realized that I was going to need much more than the Pro plan ($20/month). I started with the smallest paid plan to feel where the limits are. This threshold was initially a shock to me, since my mental model of "token limits" was still based on Cursor's monthly window. &lt;/p&gt;

&lt;p&gt;At the time, it would take me a few days to use up Cursor's tokens. But with Claude's 5-hour cycle, you get fast feedback that the Claude plan you've chosen is too small. So to reframe my observation: it was not that I had "used up all the tokens for the month", but that I was using tokens at a much faster speed in this 5-hour session than was supported by the plan.&lt;/p&gt;

&lt;p&gt;Given how fast I had hit my "$20 Pro Plan" limit, I assumed that I wouldn't need to try out the middle Max 5x plan and just jump up to the $200/mo Max 20x plan. (Anthropic only charged me the prorated difference, so it was easy to upgrade). &lt;/p&gt;

&lt;p&gt;I assumed I was going to hit 80-100+% utilization like I was in Cursor, but I was wrong. Imagine my surprise when, after a day or two of coding, I realized that I never hit anywhere close to the Max 20x plans' 80-100% utilization!&lt;/p&gt;

&lt;h2&gt;
  
  
  Did I overpay for Claude's Max 20x Plan? No, but I needed to learn how to use it.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqm2rd18hb0jtktvc5u2h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqm2rd18hb0jtktvc5u2h.png" alt="Claude Code tool usage dashboard showing low utilization per 5-hour session" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After using Claude Code for the first 2-3 sessions, I noticed I was only using 6-12% of the 5-hour usage window in each session. Thus I was only using a fraction of the $200 I spent. This was a surprise! I was doing the same coding workloads on Cursor and Claude Code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdsg1rt27l36y8012am8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdsg1rt27l36y8012am8.png" alt="Cursor tool usage dashboard showing high monthly token consumption" width="800" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Having such low Claude Code utilization was great, because it meant I could code more and spend less money! But it bothered me on two levels: firstly, could I have gotten away with paying less. Secondly, how were people hitting 100%? Not just 100% -- I was reading online about people using 2-3 Claude Code accounts.&lt;/p&gt;

&lt;p&gt;My goal wasn't to maximize token spend or get to the top of the leaderboard. I was puzzled and bothered by this underutilization. So, the first thing I did was to &lt;a href="https://www.ashu.co/markdown-plan-files-vibe-coding/" rel="noopener noreferrer"&gt;set up structured markdown plans&lt;/a&gt; to launch longer-running agents that made full use of the 5-hour session. This let me confirm that tasks were reasonable, and I was unlikely to need to pause Claude's work to answer questions and troubleshoot.&lt;/p&gt;

&lt;p&gt;After a focused few high-usage sessions, I realized I could push my utilization to 14 - 16%. And that seemed to be the ceiling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code's 5-hour sessions are a "use it or lose it" rate limit that spreads out usage
&lt;/h2&gt;

&lt;p&gt;Let's dive into Claude Code's rate limiting system. The "5-hour usage" windows functions as a "speed limit". What this practically means is – you'll figure out your "speed of token usage", and calibrate your plan accordingly. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyhxbp5vlilk11jat8me.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyhxbp5vlilk11jat8me.png" alt="Cursor pricing model diagram showing monthly token accumulation" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So when we compare Cursor vs Claude Code's pricing models, Cursor is billed by monthly total tokens consumed. That means I could leave Cursor untouched for 29 days, then use up all my tokens in the 30th day. (I assume there is a rate limit for extremely bursty token usage in Cursor, but I've never hit it.) It also means that comparing Cursor to Claude Code licensing is an apples to oranges comparison.&lt;/p&gt;

&lt;p&gt;Claude Code also has "weekly limits" that are a second layer on top of the 5-hour limit. Imagine if you maximized the 5-hour limit, 24 / 7; that could get extremely expensive for Anthropic. So, Anthropic sets an upper limit for sustained usage over the week. If we reflect on pricing design, they could have set it at a 1 month limit. But by setting it to be weekly, you must utilize your weekly limits, because it doesn't roll over to the next week.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flico1xpgo2n7tgnlg67r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flico1xpgo2n7tgnlg67r.png" alt="Claude Code pricing model diagram showing 5-hour burst limit layered with weekly ceiling" width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the 5-hour limit is a "burst speed limit", and the weekly limit is a "sustained usage limit". The 5-hour window smooths out utilization across the day, and the weekly window smooths it out over the month. Since tokens don't roll over from week to the next, you use it or you lose it. Technically you can miss a few 5-hour windows, and you can make it up later in the week. And if you don't use it for the week, you don't get to make it back up. &lt;/p&gt;

&lt;p&gt;This isn't a bad thing. Most engineers are probably doing work spread out over days and weeks. Claude Code's system is a fair agreement for typical engineering work, that makes better use of compute resources and the effort you put in writing code. If you want higher levels of usage on either plan, you're an advanced user and you need to pay API usage (i.e. higher) token costs.&lt;/p&gt;

&lt;p&gt;Another way to look at it could be: the 5 hour session roughly maps out to a 10 hour workday, and then utilization assumes a ~40-hour work week of utilization. So some of the windowing and upper limits may make sense from this lens, and may not make sense if you're trying to utilize your license with a 24 / 7 weekly schedule. I haven't explored the math and logic behind this framing, but wanted to share it as a thought experiment. So take it with a grain of salt.&lt;/p&gt;

&lt;h2&gt;
  
  
  So did I get past 16% utilization in Claude Code?
&lt;/h2&gt;

&lt;p&gt;After all this, I was still stuck at 16% utilization. I understood why: the speed-limit system means that a single coding session with a single agent has a natural upper limit. No matter how focused I was, one human directing one AI agent can only consume tokens so fast.&lt;/p&gt;

&lt;p&gt;And that raised the obvious question: if one agent tops out at ~16%, and people online are hitting 100%+ across multiple accounts... they must be running agents in parallel. This meant I needed to figure out how to coordinate multiple AI agents working on the same codebase without them stepping on each other's toes.&lt;/p&gt;

&lt;p&gt;To coordinate parallel agents, I had to rethink how I broke down projects. It also led me to git worktrees and additional changes in my local development environments. I'll cover that in my next post, and describe how it took my utilization from 16% to 50%+.&lt;/p&gt;




&lt;p&gt;If you're tracking your own Claude Code or Cursor utilization, or you've figured out the parallel agent workflow, I'd love to hear about it — DM me &lt;a href="https://www.linkedin.com/in/0xandrewshu/" rel="noopener noreferrer"&gt;on LinkedIn&lt;/a&gt; or &lt;a href="https://x.com/0xAndrewShu" rel="noopener noreferrer"&gt;on X&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I write weekly about vibe coding workflows, costs, and tools. Follow me here on Dev.to, or subscribe at &lt;a href="https://www.ashu.co/?utm_source=devto&amp;amp;utm_medium=syndication" rel="noopener noreferrer"&gt;ashu.co&lt;/a&gt; for email updates.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why I use markdown plan files instead of Cursor and Claude's built-in planning</title>
      <dc:creator>Andrew Shu</dc:creator>
      <pubDate>Mon, 02 Mar 2026 17:49:53 +0000</pubDate>
      <link>https://dev.to/0xandrewshu/why-i-use-markdown-plan-files-instead-of-cursor-and-claudes-built-in-planning-35co</link>
      <guid>https://dev.to/0xandrewshu/why-i-use-markdown-plan-files-instead-of-cursor-and-claudes-built-in-planning-35co</guid>
      <description>&lt;p&gt;The technique that helped me jump over the threshold from "coding with AI" to actually "vibe coding" was the use of a plain markdown file. Interestingly, it wasn't Cursor's built-in planning mode, nor Claude's in-memory task lists. It was a plain markdown file, with numbered subtasks, and a breakdown of the work that needed to be done. &lt;/p&gt;

&lt;p&gt;Early on, I stayed "hands-on" with the coding agent, because I was concerned about multi-part problems, about the AI coding agent "jumping in too soon", and that I wouldn't know how the code worked if the AI hallucinated or ran out of context (tokens). But I found that vibe coding with markdown plans meant that I had an artifact that (quickly) helped me be more intentional with design.&lt;/p&gt;

&lt;p&gt;In this article, I'll share the prompt I put into &lt;code&gt;AGENTS.md&lt;/code&gt; (which I've shared &lt;a href="https://github.com/0xandrewshu/ai-utils/tree/main/rule-markdown-plan" rel="noopener noreferrer"&gt;on Github&lt;/a&gt;), what worked, and what didn't. But first, I'll share some context about how I got there, because I think the path is relevant to a pattern I hear a lot of engineers get stuck in.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I went from 'coding with AI' to actually vibe coding
&lt;/h2&gt;

&lt;p&gt;I used Cursor in my work through most of 2025. I found it useful for inline prompt editing, one-off chat questions, and greenfield scripts for reporting and maintenance. But I was mostly doing single-file work. Multi-file, multi-step projects with real complexity? I'd start in Cursor, hit a hallucination or a design decision I disagreed with, and fall back to writing it by hand.&lt;/p&gt;

&lt;p&gt;For me, it came down to a conversation with Michael Stahnke (leading engineering at Flox) at Github Universe last year. We were comparing notes on how we were using AI for coding, and he mentioned something that resonated instantly: he was structuring his work in markdown plan files. This gave me a lever of control -- it let me audit a multi-part project breakdown before implementation.&lt;/p&gt;

&lt;p&gt;From there, I was able to go from prompting the agent for each change I wanted, to taking a step back and creating a plan that I could let the AI coding agents run for hours on. This was when I truly went "hands off" and let the AI steer for itself.&lt;/p&gt;

&lt;p&gt;Here's what I've found makes this approach work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nltfj4ct9h4xqhhwhx9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nltfj4ct9h4xqhhwhx9.png" alt="Markdown plans are a great way to breakdown complex tasks and vibe code better with Claude and Cursor (Generated with GPT-4o)&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Benefits of a markdown plan file, for me and the AI coding agents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;It gives me an artifact I can control.&lt;/strong&gt; With a file artifact, I can find it easily in my codebase, edit it if I want to, commit it for future reference, append notes and learnings, or feed it into another system or automation. This is a subtle point, but it's the key reason why I prefer a "markdown plan" that I own, over the "planning modes" in Cursor / Claude.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It makes me and the coding agents more intentional.&lt;/strong&gt; AI is great, but still not perfect. A markdown plan forces a quick up front research phase where I can identify gaps in understanding, and areas I may disagree with. It also forces me to roughly understand what's being built. Even as things become more autonomous, it's still important to understand what you own -- even if it's at a higher level and across many more systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I can control the pace better.&lt;/strong&gt; I can pause the AI to ask questions, I can rewind to a previous state (in git and the plan file), skip around tactically, I can adapt the plan as we discover new information, or go fully hands-free. The goal is to increase autonomy and parallelism, but having a file with clearly numbered and grouped tasks lets me communicate about and manage chunks of work. This can be extended by pointing multiple agents at different phases of the same plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It gives me more options in managing my conversation's context windows.&lt;/strong&gt; Since agents &lt;a href="https://www.youtube.com/watch?v=rmvDxxNubIg" rel="noopener noreferrer"&gt;lose efficacy&lt;/a&gt; as their context windows fill up, different people have different preferences in how often they "reset" the agent. Some prefer 50% usage, 90% usage, or some people OK with "infinite-ish conversation" compaction. In any case, having a markdown plan with task status, a work log, and git commits gives you the option to clear the conversation any time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A plan file gives me a place to deposit learnings, error messages and TODO's.&lt;/strong&gt; This is more of a documentation step. But as I interact with the agent, there are times where it encounters an error message, or a design decision that I want to remember or revisit later, I like to log it for later. Since I always archive my plan files instead of deleting, I plan to use this as a work journal where I can come back and ask questions like "what tech debt have I accumulated"? This covers a gap in knowledge, because it's not contained in code or commit messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I can automate post-implementation steps as a Cursor / Claude Skill.&lt;/strong&gt; Cursor Skills and Claude Skills both support reusable "prompt actions" - you can think of them as natural language scripts. When everything in the plan is done, I need to delete or archive the plan to a different folder. I've noticed that this is a natural point to run a Skill to review the code and look for opportunities to improve: security, testing, documentation, and code factoring.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Example Markdown plan to create a local Next.js frontend / backend for the coding agent&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Markdown plan file best practices for Claude Code, Cursor, etc.
&lt;/h2&gt;

&lt;p&gt;I've put my plan prompt on Github, and you can &lt;a href="https://github.com/0xandrewshu/ai-utils/blob/main/rule-markdown-plan/examples/2026-03-01-nextjs-hello-world.md" rel="noopener noreferrer"&gt;read an example&lt;/a&gt; of a markdown plan for creating a simple "hello world" Next.js app.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/0xandrewshu" rel="noopener noreferrer"&gt;
        0xandrewshu
      &lt;/a&gt; / &lt;a href="https://github.com/0xandrewshu/ai-utils" rel="noopener noreferrer"&gt;
        ai-utils
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A collection of scripts, prompts and docs for use with AI and vibe coding
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;AI Utilities: collection of scripts, prompts and docs&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;As I use AI for vibe coding or other types of work, I find it helpful to collect reusable prompts, skills, subagent files, configs, etc. I'm creating this repository to deposit artifacts that I've found useful.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Compatibility&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;The intent is for these snippets to be reusable across AI coding tools (e.g. Claude Code, Cursor, Codex, Gemini / Antigravity, Copilot, etc.). There are occasionally differences in capabilities, but they have typically "caught up" with one another pretty quickly after that.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Repository organization&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Initially, I plan to organize these as a flat directory until more organization is necessary.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rule-$NAME/&lt;/code&gt; - e.g. CLAUDE.md, AGENT.md, AGENTS.md&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;skill-$NAME/&lt;/code&gt; - e.g. Claude Skills, Cursor Skills&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;subagent-$NAME/&lt;/code&gt; - e.g. Claude Subagents, Cursor Subagents&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prompt-$NAME/&lt;/code&gt; - e.g. reusable prompts to copy/paste into vibe coding tools (Claude Code, Cursor) or chat AI tools (Claude.ai, ChatGPT)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In each directory, I'll aim to…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/0xandrewshu/ai-utils" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;




&lt;h3&gt;
  
  
  Header: a quick summary of the plan
&lt;/h3&gt;

&lt;p&gt;I like to have a few lines at the top that summarize what this plan file contains: title, date, objective, and references to any "child" or "related" plans. I "spin off" and "split" big plans into child plans, and I find it useful for the "child plans" to reference the "parent plan", and vice versa.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task List: the focal point for the implementation
&lt;/h3&gt;

&lt;p&gt;Near the top of the plan, I like to have a consistently structured, markdown table of tasks. This is the focal point of the plan that serves as a backlog that sequences and organizes work. Since AI is often inconsistent about formatting, and the structure of the task list is important to my workflow, I've made it a point to specify structure concretely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I like this markdown table to have these 5 columns: #, task, status, priority, comments&lt;/li&gt;
&lt;li&gt;All tasks are numbered, so I can tell the AI things like "Do 1.1 - 1.3 but skip 1.4"&lt;/li&gt;
&lt;li&gt;Tasks are grouped into "phases", so I can tell the AI things like "do phase 3 first"&lt;/li&gt;
&lt;li&gt;I find that using emojis like "✅ Completed" helps me visualize status better for larger lists&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Task Breakdown: a design doc to review
&lt;/h3&gt;

&lt;p&gt;This functions like a design doc - I like to audit this BEFORE implementation. It's usually accurate, so it's mostly to catch the occasional issue and to improve my understanding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Work log: a journal of errors, problems and learnings
&lt;/h3&gt;

&lt;p&gt;This is a dropoff location where I ask the AI to deposit error messages, design decisions and tradeoffs, so I can reference them later. When I run a Claude skill to "close up my plan", I have hooks that will reflect on problems described in the work log. I want to be able to query for exact error messages so I can document it later.&lt;/p&gt;

&lt;h3&gt;
  
  
  TODO section: accumulating ideas that don't impact scope
&lt;/h3&gt;

&lt;p&gt;Regularly in my work, I have to make tradeoffs and tell the AI that "this is outside scope", but "log it for later". So I tell the AI "save a todo to do XYZ", and the TODO's are stored here. Since I like to archive my plans (in git, or Obsidian), you can query old plans belatedly to extract TODO's relating to "Github actions", for example.&lt;/p&gt;

&lt;h2&gt;
  
  
  What didn't work: hand-written plans, long AGENTS.md files
&lt;/h2&gt;

&lt;p&gt;I often find it helpful to read what people tried, but didn't work. So here's a few things that I tried:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write the plan files by hand.&lt;/strong&gt; When I started out, I would hand-write a plan. Very quickly, I realized that this time-consuming process could and should be done by the AI. I see and talk to engineers doing this, and I think this is a common misconception: use a short plan, and let the AI coding agent flesh out the plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Estimate and document "effort" and "risk", to influence the agent to behave differently.&lt;/strong&gt; I thought this would change its behavior so it would scale up rigor / safety up or down. But in the end, I've seen no evidence of this. And its estimates of effort and risk were incredibly inaccurate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A long planning guideline for &lt;code&gt;AGENTS.md&lt;/code&gt;.&lt;/strong&gt; The naive version of this planning guidelines accumulated to be long -- because I had the AI add instructions every time I was annoyed at something, it grew to 130 lines with a nearly full example. I've since condensed it to one that's around 35 lines, and it's productively good.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Having no planning guideline, and just telling the agent to "create a md plan".&lt;/strong&gt; It was usually correct, but it was inconsistent. I depend on the task list being near the top, and having it structured in a certain way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Telling the agent to "create a plan", and assuming it would follow the &lt;code&gt;AGENTS.md&lt;/code&gt;.&lt;/strong&gt; I have to explicitly say "create a markdown plan", and sometimes "create a md plan according to guidelines".&lt;/p&gt;

&lt;h2&gt;
  
  
  Counterpoint: why markdown plans may not be for everyone
&lt;/h2&gt;

&lt;p&gt;Before I close, I should note that this technique may not be for everyone. &lt;/p&gt;

&lt;p&gt;For starters, Claude and Cursor's Planning Mode is actually pretty solid. I think it works for many / most cases, and is simpler to use. It's time consuming to wait for a plan, review it, and simply implement it again. Also, you may get a lot of plans that you simply agree with and are OK to get going. If this is the case, you may as well have the AI jump straight into implementation and review the output at the end.&lt;/p&gt;

&lt;p&gt;And as engineers seek more parallel agents and longer running autonomous agents, they may have to re-evaluate markdown plans. Maybe markdown plans aren't scalable enough, or are too freeform. For example, you can look at Steve Yegge's &lt;a href="https://github.com/steveyegge/beads" rel="noopener noreferrer"&gt;beads project&lt;/a&gt;, as products in issue tracking. (I haven't tried it, but I'd like to.) I believe the idea is that a more structured workflow will boost clarity, performance and understanding, especially for "totally autonomous agent teams" like his &lt;a href="https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04" rel="noopener noreferrer"&gt;Gastown&lt;/a&gt; project. &lt;/p&gt;

&lt;p&gt;There are other philosophies worth exploring. The &lt;a href="https://ghuntley.com/ralph/" rel="noopener noreferrer"&gt;Ralph Wiggum loop&lt;/a&gt;, created by Geoffrey Huntley, takes a different approach entirely: instead of planning across a long session, it runs the agent in a bash loop with fresh context each iteration. Progress lives in files and git, not in the agent's memory, so it avoids "context rot". &lt;/p&gt;

&lt;p&gt;Another philosophy: &lt;a href="https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html" rel="noopener noreferrer"&gt;Spec-driven development&lt;/a&gt; (with tools like GitHub &lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;Spec Kit&lt;/a&gt; and AWS's &lt;a href="https://kiro.dev/docs/specs/" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;) goes further on the planning axis — writing detailed specifications with acceptance criteria before any code generation, so the spec itself becomes the source of truth. &lt;/p&gt;

&lt;p&gt;My markdown plans sit somewhere in between: more structured than a Ralph loop prompt, lighter than a full SDD spec. &lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thoughts: markdown plans are medium weight, and that's the point
&lt;/h2&gt;

&lt;p&gt;This may seem like a heavyweight process, but in reality it's pretty quick. For starters, it's not meant for truly lightweight, one-shot prompts. I primarily use it for larger, multi-hour runs where I want to reduce the likelihood of poor quality work. &lt;/p&gt;

&lt;p&gt;To reference a recently common saying that "with vibe coding, all engineers become managers", I think a markdown plan is basically like a manager or a tech lead asking a teammember to do a bit of researching, project planning and designing things. The complexity and time spent should be scaled up or down depending on complexity and urgency.&lt;/p&gt;

&lt;p&gt;I should also add that I'm open to moving beyond markdown plans, and I don't think this is necessarily the end state of project planning. Specifically what I care about it is: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Taking a small amount of time to do some planning that I can iterate on as things change&lt;/li&gt;
&lt;li&gt;Having good task management: task identification, descriptions, explanations, groupings&lt;/li&gt;
&lt;li&gt;Having an artifact that I can control, integrate with and automate around&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're using markdown plans or have a different approach to keeping agents on track for longer projects, I'd like to hear about it — DM me &lt;a href="https://www.linkedin.com/in/0xandrewshu/" rel="noopener noreferrer"&gt;on LinkedIn&lt;/a&gt; or &lt;a href="https://x.com/0xAndrewShu" rel="noopener noreferrer"&gt;on X&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I write weekly about vibe coding workflows, costs, and tools. Follow me here on Dev.to, or subscribe at &lt;a href="https://www.ashu.co" rel="noopener noreferrer"&gt;ashu.co&lt;/a&gt; for email updates.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
