<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex Cloudstar</title>
    <description>The latest articles on DEV Community by Alex Cloudstar (@alexcloudstar).</description>
    <link>https://dev.to/alexcloudstar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1190670%2F18910089-3a37-4072-9b4c-289211f053eb.JPG</url>
      <title>DEV Community: Alex Cloudstar</title>
      <link>https://dev.to/alexcloudstar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alexcloudstar"/>
    <language>en</language>
    <item>
      <title>Claude Fable 5 Is Here: Mythos-Class Power for Everyone, and Whether It's Worth 2x the Price</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Wed, 10 Jun 2026 07:49:36 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/claude-fable-5-is-here-mythos-class-power-for-everyone-and-whether-its-worth-2x-the-price-5498</link>
      <guid>https://dev.to/alexcloudstar/claude-fable-5-is-here-mythos-class-power-for-everyone-and-whether-its-worth-2x-the-price-5498</guid>
      <description>&lt;p&gt;When I reviewed &lt;a href="https://dev.to/blog/claude-opus-4-8-review-benchmarks-developer-guide-2026"&gt;Claude Opus 4.8&lt;/a&gt; two weeks ago, I flagged one sentence in the announcement as the most interesting thing in it. Anthropic said Mythos-class models were coming to all customers in the coming weeks, gated on safety work rather than capability. That was the tell. The gap between what these labs can build and what they choose to ship was loosening.&lt;/p&gt;

&lt;p&gt;Yesterday, June 9, the other shoe dropped. Claude Fable 5 shipped, and it is a Mythos-class model. Same release playbook as always. No waitlist, no staged rollout. It landed in the Claude API, on Bedrock, in GitHub Copilot, and on the consumer plans on the same day, with the model ID &lt;code&gt;claude-fable-5&lt;/code&gt; ready to drop into config.&lt;/p&gt;

&lt;p&gt;So I spent the day doing what I do with every release. I threw my hardest real tasks at it, dug through the announcement and the third-party benchmarks, and tried to separate what genuinely changed from the launch-day shine. This one is different from the last few. Not because the benchmarks moved a few points, but because Claude Fable 5 is the first time Anthropic has handed the public a model from the tier they previously decided was too capable to release.&lt;/p&gt;

&lt;p&gt;Here is what I found.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Fable 5 and Mythos 5 Actually Are
&lt;/h2&gt;

&lt;p&gt;The naming is doing a lot of work here, so it is worth slowing down. There are two models in this release, and they are the same model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Fable 5&lt;/strong&gt; is the Mythos-class model with safety classifiers turned on. This is the one you and I get. It is available right now through the API, the cloud providers, and the subscription plans.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Mythos 5&lt;/strong&gt; is the identical underlying model with certain safeguards removed. It is not generally available. Right now it is restricted to cybersecurity professionals and infrastructure providers through something Anthropic is calling Project Glasswing, with a trusted-access program for biology researchers planned next.&lt;/p&gt;

&lt;p&gt;I wrote about &lt;a href="https://dev.to/blog/claude-mythos-anthropic-developer-analysis-2026"&gt;Claude Mythos&lt;/a&gt; back when it was a locked research preview, the model that scored absurdly high on coding and cyber benchmarks and that Anthropic explicitly chose not to ship. Fable 5 is the answer to the obvious question that post raised: what happens when they finally decide it is safe enough to release? The answer is that they release it with a set of classifiers bolted on, keep the unfiltered version behind a vetting process, and call the two halves by different names.&lt;/p&gt;

&lt;p&gt;That split matters more than it looks, and I will come back to it. But first, the part everyone actually wants to know.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Benchmarks That Matter
&lt;/h2&gt;

&lt;p&gt;Anthropic claims Fable 5 is state-of-the-art on nearly every capability benchmark they tested, and for once the third-party numbers back the marketing instead of softening it. The pattern from the &lt;a href="https://dev.to/blog/claude-opus-4-8-review-benchmarks-developer-guide-2026"&gt;Opus 4.8 release&lt;/a&gt;, where the gains were real but incremental, does not hold here. These are step changes.&lt;/p&gt;

&lt;p&gt;Here are the numbers worth knowing.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;th&gt;Fable 5&lt;/th&gt;
&lt;th&gt;For comparison&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;Real-world software engineering&lt;/td&gt;
&lt;td&gt;80.3%&lt;/td&gt;
&lt;td&gt;Opus 4.8: 69.2%, GPT-5.5: 58.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FrontierCode&lt;/td&gt;
&lt;td&gt;Production-grade code quality&lt;/td&gt;
&lt;td&gt;29.3%&lt;/td&gt;
&lt;td&gt;Opus 4.8: 13.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GDP.pdf&lt;/td&gt;
&lt;td&gt;Vision reasoning over documents, no tools&lt;/td&gt;
&lt;td&gt;29.8%&lt;/td&gt;
&lt;td&gt;GPT-5.5: 24.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ExploitBench (Mythos 5)&lt;/td&gt;
&lt;td&gt;Cybersecurity, guardrails off&lt;/td&gt;
&lt;td&gt;78.0%&lt;/td&gt;
&lt;td&gt;Opus 4.8: 40.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Core analytics&lt;/td&gt;
&lt;td&gt;Complex analytical tasks&lt;/td&gt;
&lt;td&gt;First model over 90%&lt;/td&gt;
&lt;td&gt;Previous frontier under the line&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The SWE-Bench Pro jump is the one that stopped me. Going from 69% to 80% does not sound like much until you remember what that benchmark is. It is not toy problems. It is real engineering tasks pulled from real repositories, the kind where the model has to understand a codebase, make a change that spans multiple files, and not break anything else. An eleven-point gain at that altitude is the difference between a model that gets most things right and one that gets the hard things right too.&lt;/p&gt;

&lt;p&gt;FrontierCode is the other eye-opener. More than doubling Opus 4.8's score on a benchmark designed to test whether code meets production standards, not just whether it runs, lines up with what I felt in actual use. The output reads less like generated code and more like code a careful engineer wrote.&lt;/p&gt;

&lt;p&gt;The ExploitBench number belongs to Mythos 5, the unfiltered sibling, which is why it nearly doubles Opus 4.8. That gap is the entire reason the unfiltered version is locked behind Project Glasswing. A model that scores 78% on offensive security tasks is exactly the dual-use capability that makes a lab nervous, and it is worth holding that number in your head when we get to the safety section.&lt;/p&gt;




&lt;h2&gt;
  
  
  What 80% on SWE-Bench Pro Feels Like in Practice
&lt;/h2&gt;

&lt;p&gt;Benchmarks tell you the model is capable. They do not tell you what the capability feels like when you are the one driving. So I gave it the work I actually do.&lt;/p&gt;

&lt;p&gt;The first test was a refactor I had been avoiding. A tangled service layer in one of my projects, about a dozen files, with state management that had grown organically and badly over a year. The kind of thing where the &lt;a href="https://dev.to/blog/agentic-coding-2026"&gt;agentic coding&lt;/a&gt; loop usually drifts. One agent, one file at a time, me re-explaining the convention every few files as context slips.&lt;/p&gt;

&lt;p&gt;Fable 5 handled it in a way that felt qualitatively different. It read the whole service layer, identified the actual structural problem rather than just the surface symptoms, and proposed a refactor that I would have been happy to write myself. Not every choice was mine. But the reasoning was sound enough that the disagreements were about taste, not correctness.&lt;/p&gt;

&lt;p&gt;The claim Anthropic leans on hardest is sustained reasoning. The line in the announcement is that the longer and more complex the task, the larger Fable 5's lead. Early testers reported that apps which needed a hundred prompts a year ago now one-shot. I cannot fully verify the hundred-prompt claim, but the direction is right. The model holds focus across a long task better than anything I have used. It does not lose the thread halfway through a migration the way every previous model eventually does.&lt;/p&gt;

&lt;p&gt;The headline customer story is Stripe, who said Fable 5 compressed months of engineering into days, completing a 50-million-line Ruby codebase migration in a single day that would normally take a team two months. I cannot test a 50-million-line migration. But having watched it chew through my own multi-file refactor without me babysitting context, I find the shape of that claim plausible in a way I would have rolled my eyes at six months ago.&lt;/p&gt;

&lt;p&gt;This is where the &lt;a href="https://dev.to/blog/ai-code-review-bottleneck-2026"&gt;self-correction work from Opus 4.8&lt;/a&gt; compounds. Fable 5 inherits the honesty improvements and pairs them with raw capability. It catches its own mistakes more reliably and the mistakes it makes are rarer to begin with.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Price Doubled, and That Changes the Math
&lt;/h2&gt;

&lt;p&gt;Here is the part that is going to reshape how you use it. Claude Fable 5 costs $10 per million input tokens and $50 per million output tokens. That is double Opus 4.8, which sits at $5 and $25.&lt;/p&gt;

&lt;p&gt;For the last several releases, the story was capability going up while price held flat. I made a whole point of it in the Opus 4.8 review, because flat pricing is the quiet engine behind the improving economics of &lt;a href="https://dev.to/blog/pricing-ai-features-2026"&gt;building AI features&lt;/a&gt;. Fable 5 breaks that pattern. The price went up because the model is genuinely more expensive to run, and Anthropic is not hiding it.&lt;/p&gt;

&lt;p&gt;To be fair, they frame it as a discount. Fable 5 is less than half the price of the old Mythos Preview, so relative to the Mythos tier this is a price cut. But relative to your actual bill, the one you pay today on Opus 4.8, it is a doubling.&lt;/p&gt;

&lt;p&gt;So the calculus is no longer "use the best model for everything." It is back to routing.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input (per 1M)&lt;/th&gt;
&lt;th&gt;Output (per 1M)&lt;/th&gt;
&lt;th&gt;Use it for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.8&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;td&gt;$25&lt;/td&gt;
&lt;td&gt;Daily coding, most agent work, anything high-volume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fable 5&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;td&gt;$50&lt;/td&gt;
&lt;td&gt;The hard tasks where the extra capability pays for itself&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The honest framing is that Fable 5 is not a replacement for your default model. It is a tool for the top of the difficulty curve. The gnarly migration, the architecture decision with real tradeoffs, the debugging session that spans three systems and has resisted every cheaper attempt. For those, paying double is trivial against the time saved. For your everyday loop of small edits and lookups, you are lighting money on fire if you route all of it through Fable 5.&lt;/p&gt;

&lt;p&gt;If you are mapping out spend across plans and API usage, my &lt;a href="https://dev.to/blog/claude-june-2026-pricing-survival-guide"&gt;Claude pricing survival guide&lt;/a&gt; walks through how to think about the tradeoffs, and this release adds a new top tier to that decision. It also makes a strong case for getting serious about &lt;a href="https://dev.to/blog/ai-agent-token-costs-developer-guide-2026"&gt;token cost management&lt;/a&gt; if you have not already, because the cost of being lazy about model selection just doubled.&lt;/p&gt;

&lt;p&gt;On the plans side, Anthropic is doing the usual launch promotion. Fable 5 is included at no extra cost on Pro, Max, Team, and Enterprise through June 22, after which usage credits kick in pending capacity. So you have about two weeks to hammer on it for free before the meter starts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mythos 5: The Same Brain With the Guardrails Off
&lt;/h2&gt;

&lt;p&gt;The most genuinely novel thing in this release is not Fable 5. It is the decision to ship its unfiltered twin at all, even to a restricted group.&lt;/p&gt;

&lt;p&gt;Mythos 5 is Fable 5 with the safety classifiers removed. Same weights, same intelligence, none of the blocking. Anthropic is only giving it to cybersecurity professionals and infrastructure providers through Project Glasswing right now, with a biology-researcher program coming that will lift the bio safeguards while keeping the cyber ones in place.&lt;/p&gt;

&lt;p&gt;The reasoning is straightforward once you look at the ExploitBench number. The unfiltered model scores 78% on offensive security work, nearly double Opus 4.8. That is a capability you want defenders to have and attackers not to. Gating it behind a vetted program is Anthropic trying to thread that needle, putting the sharp version in the hands of people who use it to harden systems while keeping it away from everyone else.&lt;/p&gt;

&lt;p&gt;For the security testing I am authorized to do, the existence of a model this capable on the defensive side is a real shift. The flip side is the one I keep thinking about. If the only thing standing between the public model and the offensive model is a set of classifiers, then the safety of the whole arrangement rests entirely on how good those classifiers are. Which brings us to the part of this release that should bother you a little.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Safeguards, and the Part That Should Bother You
&lt;/h2&gt;

&lt;p&gt;Fable 5 ships with three classifier systems. One blocks offensive cybersecurity and exploitation tasks. One blocks dual-use biology and chemistry research. One prevents distillation, the extraction of the model's capabilities into a smaller model.&lt;/p&gt;

&lt;p&gt;The implementation is interesting. When a safeguard triggers, Fable 5 does not refuse. It silently falls back to Opus 4.8 and answers from there. Anthropic says this happens in less than 5% of sessions on average, and that the system is tuned conservatively, so it sometimes blocks benign requests. External red-teaming reportedly found zero successful harmful single-turn requests against 30 public jailbreak techniques, which is a strong result if it holds up.&lt;/p&gt;

&lt;p&gt;So far, so reasonable. A model that downgrades instead of refusing is a better user experience than a hard wall, and a transparent classifier that tells you when it fired is fine.&lt;/p&gt;

&lt;p&gt;The problem, and Nathan Lambert at Interconnects laid this out sharply, is that not all of the downgrading is transparent. He distinguishes between the disclosed safeguards, cyber and bio and distillation, which notify you when they kick in, and undisclosed modifications around frontier AI research that change the model's behavior without telling you. His line is worth quoting directly: "An AI model that gets less intelligent automatically without notifying me is categorically misaligned AI."&lt;/p&gt;

&lt;p&gt;I think he is right to be annoyed, and the reason cuts straight to how I work. If I am using a model for serious engineering and it can quietly become a different, dumber model mid-session without telling me, my &lt;a href="https://dev.to/blog/ai-evals-solo-developers-2026"&gt;eval suite&lt;/a&gt; cannot account for it. The model I tested is not reliably the model I am running. Lambert goes further and says he cannot trust Fable 5 for frontier ML development work for exactly this reason, and reads the opacity as more about protecting Anthropic's competitive position than about safety.&lt;/p&gt;

&lt;p&gt;Whether or not you buy the competitive-entrenchment read, the practical takeaway for developers is concrete. If you are building on Fable 5, assume a small fraction of your requests may be answered by Opus 4.8 instead, and assume you may not always be told. Build your &lt;a href="https://dev.to/blog/ai-evals-solo-developers-2026"&gt;evals&lt;/a&gt; and your &lt;a href="https://dev.to/blog/ai-generated-code-security-risks-2026"&gt;output validation&lt;/a&gt; to be robust to that, because the model behind the API is not a fixed quantity. This is the first frontier release where I would call non-determinism in which model answers a first-class concern rather than a footnote.&lt;/p&gt;

&lt;p&gt;One more operational detail: Mythos-class traffic now carries a mandatory 30-day data retention policy. Anthropic says the data is not used for training or non-safety purposes and is deleted after 30 days in most cases, with human access logged. If you work under strict data-handling requirements, read that policy before you route production traffic through Fable 5.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Science Results Are the Real Story
&lt;/h2&gt;

&lt;p&gt;The coding numbers will get the headlines because that is what most of us buy these models for. But the results that actually made me sit up are in science, and they came from Mythos 5.&lt;/p&gt;

&lt;p&gt;On molecular biology, the model generated novel hypotheses that scientists preferred about 80% of the time over Opus-class models. In genomics, it ran a research task largely on its own for over a week, analyzing millions of cells across 138 animal species, and reportedly outperformed a recent Science journal publication despite being a fraction of the size. In drug design, internal protein experts said it accelerated their work by roughly ten times, with nine of fourteen protein targets yielding strong candidates.&lt;/p&gt;

&lt;p&gt;I am not a biologist and I cannot evaluate those claims on the merits. But the pattern is the one worth noticing. The thing that separates Fable 5 from the models before it is not that it writes slightly better code. It is that it can sustain genuinely autonomous work over long horizons. A week of unsupervised genomics research is a different category of capability than a clever answer to a single prompt.&lt;/p&gt;

&lt;p&gt;That same capability is what powers the coding story. The reason the migrations work is the same reason the genomics works. The model holds the thread. If you have wrestled with &lt;a href="https://dev.to/blog/durable-ai-workflows-orchestration-2026"&gt;agent reliability over long-running tasks&lt;/a&gt;, this is the first model where the long-horizon part feels solved enough to lean on rather than babysit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Should You Switch From Opus 4.8?
&lt;/h2&gt;

&lt;p&gt;Here is how I would think about it depending on where you sit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you do daily coding on a Pro or Max plan:&lt;/strong&gt; Try Fable 5 on your hardest current task in the next two weeks while it is free on the plans. The capability jump is real and you should feel it on genuinely difficult work. But do not make it your default. When the credits kick in after June 22, the doubled price means Opus 4.8 should stay your workhorse and Fable 5 should be the tool you reach for when the cheaper model is struggling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you run Opus 4.8 in production via the API:&lt;/strong&gt; Do not flip the model ID blindly. The price doubling alone means you need to be deliberate about which paths justify it, and the silent-fallback behavior means your outputs are now non-deterministic in a new way. Run your eval suite, then route only the high-value, hard tasks to &lt;code&gt;claude-fable-5&lt;/code&gt; while keeping volume traffic on Opus 4.8. This is a routing decision, not a swap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you are on GPT-5.5 or Gemini for primary work:&lt;/strong&gt; The gaps in coding, vision, and agentic work just widened in Claude's favor, and by more than the Opus 4.8 release did. When I last did a full &lt;a href="https://dev.to/blog/claude-opus-vs-gpt5-vs-gemini-2026"&gt;Claude vs GPT vs Gemini breakdown&lt;/a&gt;, the models converged on baseline and diverged on strengths. Fable 5 stretches Claude's lead in its strong areas rather than reshuffling the board. If you have been on the fence about Claude for serious engineering, this is the strongest case yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you do authorized security or scientific research:&lt;/strong&gt; Look into whether you qualify for Project Glasswing or the upcoming trusted-access programs. The unfiltered Mythos 5 is a meaningfully different tool than the public Fable 5, and for defensive security and research work that is exactly the point.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;What strikes me about Claude Fable 5 is not the SWE-Bench number, impressive as it is. It is what the release tells you about where the constraint now sits.&lt;/p&gt;

&lt;p&gt;For the last year, the story was capability rising while price held flat, and the binding question was how much smarter the next model would be. Fable 5 flips both halves of that. The price went up, and the binding question is no longer capability. It is safety and trust. Anthropic built a model good enough that they split it in two, shipped the filtered half to everyone, locked the unfiltered half behind a vetting program, and bolted on classifiers that can quietly swap in a weaker model behind your back.&lt;/p&gt;

&lt;p&gt;That is a different kind of release. The capability is so far ahead that the interesting decisions are now about governance, access, and disclosure rather than raw benchmarks. The most important sentence in the Opus 4.8 announcement was the one teasing Mythos-class availability. The most important fact about Fable 5 is not how smart it is. It is that "how smart is it" stopped being the hard question.&lt;/p&gt;

&lt;p&gt;For the work I do every day, Fable 5 is the most capable tool I have ever pointed at a hard problem, and I will be reaching for it exactly when a problem is hard enough to earn the price. For everything else, Opus 4.8 is still my default. And the part I will be watching most carefully is not the next benchmark. It is whether the model answering my request is actually the model I think it is.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Markr: Mark the Moment While You Record, Not After</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Tue, 09 Jun 2026 19:56:22 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/markr-mark-the-moment-while-you-record-not-after-3i95</link>
      <guid>https://dev.to/alexcloudstar/markr-mark-the-moment-while-you-record-not-after-3i95</guid>
      <description>&lt;p&gt;I always wanted to do YouTube. The thing that kept stopping me was never the recording. It was everything after.&lt;/p&gt;

&lt;p&gt;I am not talking about fancy editing. No motion graphics, no zoom transitions, no five-hour color grade. My problem was dumber than that. Every time I finished a 20 or 30 minute recording, I had to sit there and watch the whole thing again just to find where I messed up. Where I stumbled, where I repeated myself, where I trailed off and started over.&lt;/p&gt;

&lt;p&gt;Then, if I wanted shorts or TikToks out of it, I had to watch it a third time, hunting for the 30-second bits that were actually good on their own.&lt;/p&gt;

&lt;p&gt;So one recording turned into watching myself talk for an hour before I cut a single frame.&lt;/p&gt;

&lt;p&gt;It is hard to explain how much that drains you until you have done it a few times. The recording is the fun part. The rewatching is the part where you quietly decide you are done with YouTube for a while. I quit more than once, and it was always for this exact reason. Not lack of ideas. Just the dread of the rewatch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The detour I have to admit to
&lt;/h2&gt;

&lt;p&gt;Being a developer, my first instinct was not "fix the small problem." It was "build the big thing."&lt;/p&gt;

&lt;p&gt;So I tried to make an AI auto editor. I know there are already plenty of them. I built one anyway, because that is what we do, right. Feed it the video, let it find the mistakes, let it spit out clips, magic.&lt;/p&gt;

&lt;p&gt;It was too ambitious, and the output never hit the quality bar I had in my head. &lt;a href="https://dev.to/blog/ai-wrappers-dead-what-to-build-instead-2026/"&gt;AI is great until you need it to be exactly right&lt;/a&gt;, and "where exactly are the good moments in my video" turned out to be one of those places where good enough was not good enough. I kept tuning it and it kept being almost.&lt;/p&gt;

&lt;p&gt;Eventually I stopped and asked a simpler question. What if I just marked the moments while I was recording, when I already know they happened?&lt;/p&gt;

&lt;p&gt;That is Markr.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Markr actually is
&lt;/h2&gt;

&lt;p&gt;Markr is a tiny desktop app that lets you mark moments while you record.&lt;/p&gt;

&lt;p&gt;That is the whole idea. You are recording somewhere else, OBS, your camera, screen capture, whatever you use. Markr runs alongside it. The second you flub a line or say something good enough to clip, you drop a marker. When you are done, you have a clean list of timestamps instead of a flat 30-minute file with no idea where anything is.&lt;/p&gt;

&lt;p&gt;So editing stops being archaeology. You go straight to the spot where you stumbled and cut it. You go straight to the moment that would make a good short and pull it. No rewatching the whole thing once, let alone three times.&lt;/p&gt;

&lt;p&gt;It does not record your screen. It does not touch your video file. It does not try to be your editor. It just remembers the moments so you do not have to.&lt;/p&gt;

&lt;p&gt;It is built with Electron, and it is fully open source:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/alexcloudstar/markr" rel="noopener noreferrer"&gt;https://github.com/alexcloudstar/markr&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;You start a session. That is your clock. From that point, Markr is just counting time alongside your real recording.&lt;/p&gt;

&lt;p&gt;When something happens worth marking, you mark it. Two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the button in the app&lt;/li&gt;
&lt;li&gt;Hit the global hotkey, &lt;code&gt;Ctrl+M&lt;/code&gt; on Windows and Linux, &lt;code&gt;Cmd+M&lt;/code&gt; on Mac&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hotkey is the part that matters. It works even when Markr is not the focused window, which is the entire point. You are on camera, mid-sentence, you are not going to alt-tab to a little app and click a button. You tap the keys without looking and keep talking. The marker lands and your flow never breaks.&lt;/p&gt;

&lt;p&gt;Every marker shows up in a list with its timestamp. When you are done, that list is a map of your recording. Here is where I fumbled the intro. Here is the bit that explains the whole thing in one clean take. Here is the 40 seconds that is basically a short already.&lt;/p&gt;

&lt;p&gt;Two markers and a hotkey. That is it.&lt;/p&gt;

&lt;p&gt;YouTube is what I built it for, but it is not the only place this hurts. If you record meetings and later have to dig back through an hour of footage for the one decision everyone actually cared about, it is the same problem with a different file. Mark it when it happens, find it in a second later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a separate app and not the big AI thing
&lt;/h2&gt;

&lt;p&gt;Because the small thing actually works, and the big thing did not.&lt;/p&gt;

&lt;p&gt;The AI editor was trying to guess after the fact what I already knew in the moment. I am the one recording. I know right when I mess up. I know right when I say something I will want to clip. Capturing that knowledge the instant it happens is trivial and reliable. Reconstructing it later from the raw video is hard and flaky.&lt;/p&gt;

&lt;p&gt;There is also something nice about a tool that does one thing. No account. No upload. No cloud, no waiting on a model. Your markers are yours, on your machine, instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  It is an MVP, and I am not pretending otherwise
&lt;/h2&gt;

&lt;p&gt;This is early.&lt;/p&gt;

&lt;p&gt;Right now Markr does the core loop well: start a session, drop markers with a button or the hotkey, see them in a list. That is enough to be useful today, which is the bar I cared about. But there is plenty it does not do yet.&lt;/p&gt;

&lt;p&gt;No labels on markers as you go. No export to formats your editor can import directly. No tagging, no categories, no jumping to a marker by clicking it into a player. Markers live with the session, and that is roughly where the feature set ends for now.&lt;/p&gt;

&lt;p&gt;I would rather ship the small version that solves the real problem than sit on a bigger version that solves it in my head. I already tried the bigger version. The roadmap is open, and the most useful direction will probably come from whoever else records and hates rewatching as much as I do. Labels, exports, dropping the markers straight into the tools people actually edit in. That is what I want to add next, in that order, unless someone makes a good case for something else.&lt;/p&gt;

&lt;p&gt;If you have ever quit a creative thing because of the boring part after, you know exactly the pain this is for.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you want to try it
&lt;/h2&gt;

&lt;p&gt;Markr is on GitHub, open source, take it apart however you like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/alexcloudstar/markr" rel="noopener noreferrer"&gt;https://github.com/alexcloudstar/markr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clone it, run it, drop a few markers during your next recording and see if editing feels less like a punishment. If it helps, great. If a piece annoys you, that is a signal, and the issues tab is right there.&lt;/p&gt;

&lt;p&gt;I built it for myself first, the way &lt;a href="https://dev.to/blog/no-mouse-30/"&gt;most of these small things start&lt;/a&gt;. No pressure to adopt it, no promises about where it goes.&lt;/p&gt;

&lt;p&gt;Just a small fix for the one annoying thing that kept talking me out of doing the thing I actually wanted to do.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>electron</category>
      <category>opensource</category>
    </item>
    <item>
      <title>2,159 Cold Messages, 4 Replies: What Starting on Your Own Actually Looks Like</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Fri, 05 Jun 2026 08:57:37 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/2159-cold-messages-4-replies-what-starting-on-your-own-actually-looks-like-3j4c</link>
      <guid>https://dev.to/alexcloudstar/2159-cold-messages-4-replies-what-starting-on-your-own-actually-looks-like-3j4c</guid>
      <description>&lt;p&gt;It is almost midnight on a Tuesday and I am pasting someone's job title into a message box for what feels like the fortieth time today. Not the same message. A different one every time, because the whole point is that it is not a template. I read their last few posts, I find the one specific thing about what they do that overlaps with the concept I was testing, I write two or three sentences that prove I actually looked. Then I hit send. Then I open the next profile and do it again.&lt;/p&gt;

&lt;p&gt;This is the part that never makes it into the founder story.&lt;/p&gt;

&lt;p&gt;The story everyone tells goes like this: you have an idea, you are brave enough to quit the safe thing, you grind for a bit, and then there is a montage and a graph that goes up and to the right. The version I have been living for the last five weeks has no montage. It is mostly me, a spreadsheet that turned into something worse, and a very quiet inbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 2,159 messages into the void actually feels like
&lt;/h2&gt;

&lt;p&gt;Let me give you the numbers, because the numbers are the honest part.&lt;/p&gt;

&lt;p&gt;In five weeks I sent 2,159 connection requests on LinkedIn. 766 people accepted, which is about a 35 percent acceptance rate, and honestly that part felt good for about a day. An accepted request feels like progress. It is a tiny green light. You start to think the funnel is working.&lt;/p&gt;

&lt;p&gt;Then you write the follow-up. I wrote and sent around 575 personalized messages to those people. Personalized, not blasted. Each one cost me real minutes because I refused to do the spray-and-pray thing that everyone can smell from a mile away.&lt;/p&gt;

&lt;p&gt;Out of those 575 messages, I got 4 genuine replies back. Four. Not four sales. Four humans who wrote a real sentence back to me instead of leaving me on read.&lt;/p&gt;

&lt;p&gt;Do the math and it is brutal. That is about a 0.7 percent reply rate. Roughly one actual conversation for every 540 connection requests I sent. And of those conversations, the number of people who have looked me in the eye, metaphorically, and said "yes, I would pay for this" is zero. Still zero. Not "maybe." Not "looks cool." Zero confirmed.&lt;/p&gt;

&lt;p&gt;I knew validation was supposed to be hard. I have written before about how you should &lt;a href="https://dev.to/blog/stop-validating-ideas-start-validating-pain/"&gt;stop validating ideas and start validating pain&lt;/a&gt;, and how &lt;a href="https://dev.to/blog/the-waitlist-illusion/"&gt;a waitlist is mostly an illusion&lt;/a&gt;. I believed all of it. Knowing it and feeling it are different things. The silence has a texture to it after a few hundred messages. It stops feeling like rejection and starts feeling like weather. You just send into it.&lt;/p&gt;

&lt;p&gt;The thing nobody warns you about is that the silence is not even the hard part.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard part is killing an idea you actually liked
&lt;/h2&gt;

&lt;p&gt;I went into this with three ideas I wanted to pressure-test. As of right now, one is already dead and two are still breathing.&lt;/p&gt;

&lt;p&gt;The one I killed is the one I was most excited about, which is exactly how it tends to go. I liked it because it was clever. I liked it because building it would have been fun. Those are two of the worst reasons to build anything, and I knew that, and I still had to watch the evidence pile up before I could let it go.&lt;/p&gt;

&lt;p&gt;Here is the system I used to keep myself honest. Every time I learned something real from a conversation or a piece of behavior, not a compliment, I logged it as a validation signal. Over five weeks I collected 27 of those signals. Sounds healthy until you read them. 8 of the 27 were red flags. Things like "people agree it is annoying but they have already half-solved it with a spreadsheet," or "the person who has this problem is not the person who pays." So I had almost as many reasons to stop as reasons to keep going.&lt;/p&gt;

&lt;p&gt;That is the math that actually kills an idea. Not a single dramatic no. A slow accumulation of small flags until the honest move is to stop pretending the pile is not there. The red flags taught me more than the green ones did, because a green flag can be politeness and a red flag almost never is. Nobody goes out of their way to discourage you for free.&lt;/p&gt;

&lt;p&gt;Killing it still felt bad. It felt like quitting, even though it was the opposite. I keep having to remind myself that the goal is not to be right about one idea. The goal is to find one that is right, and you cannot do that if you are emotionally married to the first thing you sketched. I learned this the expensive way once already when I &lt;a href="https://dev.to/blog/i-killed-marketingnow-and-built-xpilot-instead/"&gt;killed a product I had poured months into&lt;/a&gt;. Apparently I needed to learn it again, faster and cheaper this time, which I suppose is a kind of progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  I built a product to help me look for a product
&lt;/h2&gt;

&lt;p&gt;Now the part that makes me laugh and wince at the same time.&lt;/p&gt;

&lt;p&gt;To keep track of all of this, the requests, the accepts, the follow-ups, the four precious replies, the 27 signals and their 8 red flags, I needed a system. Spreadsheets fell over almost immediately. So I did the most on-brand thing a developer could possibly do. I built my own CRM from scratch.&lt;/p&gt;

&lt;p&gt;Around 5,800 lines of code in about two days. A real little app to organize the hunt. Pipeline stages, message history, follow-up reminders, a place to tag a signal as a red flag so it would stare back at me later.&lt;/p&gt;

&lt;p&gt;Read that again. I built a product to help me look for a product. The tool to find the thing became the thing I shipped fastest and used most. There is something almost funny about how naturally I reached for code the second the work got uncomfortable. Building is the safe room. Building feels like progress even when it is, technically, avoidance with good syntax.&lt;/p&gt;

&lt;p&gt;In fairness, the CRM is genuinely useful and I do not regret it. But I see the pattern clearly now. The grind I am bad at is the talking-to-strangers grind, and the grind I am good at is the building grind, and the temptation is always to do more of the thing you are already good at and call it work.&lt;/p&gt;

&lt;p&gt;By the way, the total cash I have put into all of this so far is about $261. Five weeks, three ideas, one homemade CRM, two thousand messages, and the bill is the price of a decent pair of headphones. That is the strange economics of starting solo right now. It costs almost nothing in money and almost everything in something harder to budget for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What keeps me going, without the neat bow
&lt;/h2&gt;

&lt;p&gt;I want to end this clean, with a lesson that ties it all together. I do not have one. The outcome is still genuinely unknown. Two ideas are alive, zero people have paid me anything, and there is a real chance that in a month I am writing the post where I kill a second one.&lt;/p&gt;

&lt;p&gt;What keeps me going is smaller than a vision. It is the four replies. Four humans out of two thousand who wrote back like the problem was real to them, and the way those four conversations felt completely different from the 766 polite accepts. That contrast is the signal. That is the thing I am chasing. Not the volume, the texture.&lt;/p&gt;

&lt;p&gt;If you are out here doing the same thing, sending into the same weather, I do not have a pep talk for you. I just want you to know the romantic version is a lie and the unglamorous version is the actual job. The cold messages, the near-total silence, the red flags you have to be brave enough to count, the quiet discipline of killing your own good ideas. That is the work. The montage is not coming. You just send the next message.&lt;/p&gt;

&lt;p&gt;I will let you know if any of it works.&lt;/p&gt;

</description>
      <category>startup</category>
      <category>buildinpublic</category>
      <category>solofounder</category>
    </item>
    <item>
      <title>Svelte 5 vs React in 2026: An Honest Comparison After Shipping Both</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Thu, 04 Jun 2026 10:40:51 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/svelte-5-vs-react-in-2026-an-honest-comparison-after-shipping-both-34ca</link>
      <guid>https://dev.to/alexcloudstar/svelte-5-vs-react-in-2026-an-honest-comparison-after-shipping-both-34ca</guid>
      <description>&lt;p&gt;A few months ago I rebuilt a small internal dashboard twice. Once in React 19, once in Svelte 5. Not as a benchmark, not for a blog post, just because I was genuinely undecided and the only way I trust myself to have an opinion is to ship the same thing twice and see which one I hated less by the end.&lt;/p&gt;

&lt;p&gt;The React version was done first because that is the muscle memory. The Svelte 5 version took longer to start because runes were new to me, and then it caught up fast, and then it pulled ahead in a way I did not expect. By the time both were in production, the Svelte build was a third of the bundle size and I had written maybe 40 percent less code for the same behavior.&lt;/p&gt;

&lt;p&gt;That is the kind of result that makes you suspicious of yourself. So this is me trying to be fair. Svelte 5 vs React in 2026 is no longer the lopsided matchup it used to be, but it is also not the clean win the Svelte crowd will tell you it is. Both have sharp edges. This is where they actually are.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changed: Svelte 5 Is a Different Framework
&lt;/h2&gt;

&lt;p&gt;If your mental model of Svelte is from version 4, throw it out. Svelte 5 shipped in late 2024 and the headline change is runes, a reactivity system built on signals. It replaced the implicit &lt;code&gt;$:&lt;/code&gt; reactive declarations that defined Svelte for years.&lt;/p&gt;

&lt;p&gt;The old Svelte was famous for its magic. You wrote &lt;code&gt;let count = 0&lt;/code&gt;, you used &lt;code&gt;count&lt;/code&gt; in markup, and reassigning it updated the DOM. No hooks, no dependency arrays, no ceremony. It felt like cheating. The problem was that the magic was compiler-driven and brittle at the edges. Reactivity worked inside components but got awkward the moment you wanted to share reactive state across files. Stores filled that gap, but now you had two mental models: implicit reactivity inside components, explicit stores outside them.&lt;/p&gt;

&lt;p&gt;Runes collapse that into one model. You write &lt;code&gt;$state&lt;/code&gt;, &lt;code&gt;$derived&lt;/code&gt;, and &lt;code&gt;$effect&lt;/code&gt;, and they work the same way whether you are inside a component or in a plain &lt;code&gt;.svelte.ts&lt;/code&gt; file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight svelte"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;lang=&lt;/span&gt;&lt;span class="s"&gt;"ts"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;doubled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$derived&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nf"&gt;$effect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;count is now&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;button&lt;/span&gt; &lt;span class="na"&gt;onclick=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; (doubled: &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;doubled&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;)
&lt;span class="nt"&gt;&amp;lt;/button&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you have written anything with signals, in SolidJS or in any of the signal libraries that bolted onto React, this will look familiar. That is the point. Svelte 5 stopped being the odd one out and joined the signals consensus that has been forming across the frontend world. The difference is that Svelte compiles the signals away into direct DOM updates instead of shipping a reactivity runtime that diffs a virtual DOM.&lt;/p&gt;

&lt;p&gt;One honest correction to the old marketing: Svelte 5 is no longer "zero runtime." The runes system needs a small runtime to track dependencies. It is still tiny compared to React, but the "Svelte disappears completely" claim from the version 4 era does not hold anymore. Worth knowing if someone repeats it to you as a selling point.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Reactivity Models, Side by Side
&lt;/h2&gt;

&lt;p&gt;This is the core of the Svelte 5 vs React comparison, so it is worth slowing down. The frameworks disagree about a fundamental question: when something changes, what re-runs?&lt;/p&gt;

&lt;p&gt;React's answer is the component. When state changes, the component function runs again top to bottom, produces a new virtual DOM tree, and React diffs it against the previous one to figure out what actually changed in the real DOM. This is simple to reason about and it is also why React needs &lt;code&gt;useMemo&lt;/code&gt;, &lt;code&gt;useCallback&lt;/code&gt;, and &lt;code&gt;React.memo&lt;/code&gt;. The whole function re-running means you constantly fight unnecessary work.&lt;/p&gt;

&lt;p&gt;The good news for React is that this got dramatically better. The &lt;a href="https://dev.to/blog/the-react-compiler-is-here-say-goodbye-to-usememo-and-usecallback/"&gt;React Compiler reached v1.0&lt;/a&gt; and now handles most of that memoization automatically. You stop hand-wrapping things. The model is still "re-run the component and diff," but you no longer pay the ergonomic tax for it the way you used to.&lt;/p&gt;

&lt;p&gt;Svelte's answer is the smallest possible unit. When a signal changes, only the exact DOM nodes and derived values that depend on it update. There is no component re-run, no diff, no virtual DOM. The compiler knows at build time which DOM node reads which piece of state, so it wires a direct update path.&lt;/p&gt;

&lt;p&gt;In practice this means two things. First, Svelte tends to do less work at runtime for the same UI, which shows up as faster updates on heavy, interactive pages. Second, you almost never think about memoization in Svelte because there is nothing to memoize. The framework is granular by default.&lt;/p&gt;

&lt;p&gt;Here is the same counter logic in React 19:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setCount&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;doubled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// compiler handles memoization now&lt;/span&gt;

  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;count is now&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt; &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setCount&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; (doubled: &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;doubled&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;)
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is not dramatically more code in this trivial case. The gap opens up as the component grows, as you add derived state, and as you start needing effects that other effects depend on. React's dependency arrays are a genuine source of bugs, the kind where you forget a dependency and your effect goes stale, or you add one too many and it loops. Svelte's &lt;code&gt;$derived&lt;/code&gt; and &lt;code&gt;$effect&lt;/code&gt; track dependencies automatically, so that entire class of bug largely disappears.&lt;/p&gt;

&lt;p&gt;That is the single biggest day-to-day difference I felt. Not the bundle size, not the speed. The fact that I stopped thinking about when things re-run and just described what depended on what.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bundle Size: The Gap Is Real But Read the Fine Print
&lt;/h2&gt;

&lt;p&gt;The numbers people quote are real. A trivial counter app ships around 3 to 5 KB gzipped in Svelte versus roughly 42 to 45 KB for React plus React DOM. That is close to a tenfold difference at the floor.&lt;/p&gt;

&lt;p&gt;But the floor is the least interesting part of any bundle conversation. What matters is how the curve behaves as the app grows.&lt;/p&gt;

&lt;p&gt;React's baseline is high because you ship the runtime no matter what. Once it is loaded, additional components are mostly your own code. Svelte's baseline is near zero, but because each component compiles to its own imperative update code, every component you add contributes its own bytes. Svelte stays smaller in almost every realistic comparison, but the ratio narrows as the app gets bigger. A large Svelte app is not 10 times smaller than the equivalent React app. It is meaningfully smaller, often in the 30 to 50 percent range, which is still a lot.&lt;/p&gt;

&lt;p&gt;Whether that matters depends entirely on what you are building. For a marketing site, a content-heavy product, or anything where users are on slow connections or low-end phones, that bundle difference is real money in conversion and bounce rate. For an internal tool where everyone is on corporate fiber and the app loads once and stays open all day, the bundle difference is a rounding error and you should optimize for hiring and ecosystem instead.&lt;/p&gt;

&lt;p&gt;I will not pretend the bundle size decided anything for me. It is a genuine Svelte advantage and it is also the advantage people overweight because it is the easiest one to put in a chart.&lt;/p&gt;




&lt;h2&gt;
  
  
  SvelteKit vs Next.js: The Real Comparison
&lt;/h2&gt;

&lt;p&gt;Here is the thing nobody tells beginners. You are rarely choosing Svelte vs React. You are choosing SvelteKit vs Next.js, or SvelteKit vs Astro, or &lt;a href="https://dev.to/blog/tanstack-start-vs-nextjs-2026/"&gt;SvelteKit vs TanStack Start&lt;/a&gt;. The meta-framework drives far more of your daily experience than the base framework does.&lt;/p&gt;

&lt;p&gt;SvelteKit is the official full-stack framework for Svelte and it is genuinely good. File-based routing, server-side rendering, form actions, server hooks, and a deployment story that adapts to most hosts through adapters. It feels lighter than Next.js because it is lighter. There are fewer concepts, fewer footguns, and the docs are excellent.&lt;/p&gt;

&lt;p&gt;Next.js is the heavyweight. More features, a bigger ecosystem, React Server Components, deep integration with the deployment platform most people use, and a community so large that almost any problem you hit has already been answered. It is also more complicated, and the App Router still trips people up in ways the SvelteKit equivalent does not.&lt;/p&gt;

&lt;p&gt;The honest split: SvelteKit is the more pleasant framework to work in, especially solo or in a small team. Next.js is the safer institutional bet, especially when you need to hire, integrate with a mature ecosystem, or lean on Server Components for a content-heavy app. This mirrors the &lt;a href="https://dev.to/blog/stop-obsessing-over-the-perfect-stack/"&gt;framework choice discussion&lt;/a&gt; I keep coming back to: the best framework is usually the one your situation can support, not the one that benchmarks best.&lt;/p&gt;

&lt;p&gt;One real limitation worth flagging: Svelte has no equivalent to React Server Components. If your architecture leans hard on RSC, streaming server components, and the server-first rendering model, SvelteKit does not have a one-to-one answer. It has its own server rendering story that is perfectly capable, but it is a different model, not a port of RSC.&lt;/p&gt;




&lt;h2&gt;
  
  
  The TypeScript Friction Nobody Mentions
&lt;/h2&gt;

&lt;p&gt;This is the part the Svelte tutorials skip, and it is the thing that frustrated me most during the rebuild.&lt;/p&gt;

&lt;p&gt;Svelte 5 with TypeScript is good but not seamless, and props are the rough spot. Declaring typed props means repeating yourself in a way that feels off:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight svelte"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;lang=&lt;/span&gt;&lt;span class="s"&gt;"ts"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Props&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;onIncrement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;onIncrement&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="nx"&gt;Props&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$props&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You write the type, then you destructure the same names again. On a component with a dozen props this gets verbose, and the developer experience around it is noticeably less polished than React's, where a function component's props are just the parameter type and you are done. There is a real sentiment in the community that runes were designed by people who were not feeling the TypeScript-plus-props pain in daily use, and after shipping with it I understand where that comes from.&lt;/p&gt;

&lt;p&gt;It is not a dealbreaker. The types are sound, the editor support through the Svelte language server is solid, and once you accept the boilerplate it fades into the background. But if you came expecting Svelte 5 to be as frictionless with TypeScript as it is with plain JavaScript, adjust your expectations. React is still ahead on TypeScript ergonomics, and that gap is bigger than the marketing suggests.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Migration Story Is Messier Than "It Is Backward Compatible"
&lt;/h2&gt;

&lt;p&gt;Svelte 5 is backward compatible. Svelte 4 syntax still runs, and you can migrate incrementally rather than rewriting everything at once. That is technically true and it undersells the disruption.&lt;/p&gt;

&lt;p&gt;When runes landed, every standard in the ecosystem shifted at once. Libraries written for the store-based model needed updating. Community patterns that everyone copied from blog posts and Stack Overflow were suddenly the old way. The mental model that made Svelte famous, the implicit &lt;code&gt;$:&lt;/code&gt; reactivity, became legacy. If you learned Svelte in the version 4 era, a lot of what you knew is now the thing you are migrating away from.&lt;/p&gt;

&lt;p&gt;There is also a specific sharp edge that bit people: the pattern of returning a writable store from a load function does not map cleanly to &lt;code&gt;$state&lt;/code&gt;. You cannot return a rune the way you returned a store, and the community has been asking for a clean answer. If your SvelteKit app leaned on that pattern, the migration is not a find-and-replace.&lt;/p&gt;

&lt;p&gt;The developers I have read who finished large migrations mostly landed in the same place I did on the rebuild: the first impression was rough, and the end state was better. Code became more explicit and easier to reason about once the runes clicked. But "more explicit" is doing work in that sentence. Some people loved Svelte precisely because it was implicit and magical, and runes took that away on purpose. If that magic was why you chose Svelte, version 5 might feel like a downgrade in the exact dimension you cared about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hiring and Ecosystem: The Boring Tiebreaker That Wins
&lt;/h2&gt;

&lt;p&gt;Svelte ranks at or near the top of developer satisfaction surveys year after year. People who use it love it. The GitHub stars are well past 85,000 and the momentum through 2025 and 2026 has been real.&lt;/p&gt;

&lt;p&gt;None of that changes the fact that React has an order of magnitude more developers, more libraries, more tutorials, more battle-tested answers, and more candidates in the hiring pool. If you are building a team, this is not a close call. You will fill a React role faster, you will find a component library or integration for almost anything off the shelf, and a new hire will be productive on day one because they probably already know it.&lt;/p&gt;

&lt;p&gt;Svelte's ecosystem is good and growing, but you will hit gaps. The niche library you need might not have a Svelte equivalent, or it will have one maintained by one person. The component libraries are fewer and less mature than React's enormous selection. None of this is fatal for a solo developer or a small team that controls its own hiring, but it is a real cost that does not show up in a performance benchmark.&lt;/p&gt;

&lt;p&gt;This is the boring factor that quietly decides most production choices, and it usually points at React. Not because React is better, but because the surrounding ecosystem reduces risk, and risk reduction is most of what you are buying when you pick a framework for a team.&lt;/p&gt;




&lt;h2&gt;
  
  
  So Which One Should You Actually Pick?
&lt;/h2&gt;

&lt;p&gt;After all of it, here is where I land.&lt;/p&gt;

&lt;p&gt;Pick &lt;strong&gt;Svelte 5 with SvelteKit&lt;/strong&gt; when you control the hiring funnel, bundle size or time-to-interactive is a binding constraint, and you want the most pleasant development experience for a small team. It is excellent for content-driven sites, marketing pages, dashboards, and any product where shipping less JavaScript directly helps your users. The reactivity model is genuinely nicer to work in once you accept the TypeScript-props friction, and the no-dependency-array world removes a whole category of bugs.&lt;/p&gt;

&lt;p&gt;Pick &lt;strong&gt;React 19 with Next.js&lt;/strong&gt; when you need to hire fast, integrate with a mature design system or large ecosystem, lean on Server Components, or ship into an existing React codebase. The &lt;a href="https://dev.to/blog/the-react-compiler-is-here-say-goodbye-to-usememo-and-usecallback/"&gt;React Compiler closed most of the ergonomic gap&lt;/a&gt; that used to make React feel clunky, and &lt;a href="https://dev.to/blog/a-deep-dive-into-react-19-new-features-improvements-and-best-practices/"&gt;React 19 itself brought real improvements&lt;/a&gt; that narrow the distance to Svelte. The ecosystem advantage is enormous and that advantage is mostly invisible until the moment it saves you a week.&lt;/p&gt;

&lt;p&gt;The thing I want to push back on is the framing that one of these is the clear future and the other is legacy. That is not what shipping both taught me. Svelte 5 is a genuinely better experience in several specific ways and React is a genuinely safer bet in several specific ways, and the overlap where either would be fine is large. The reactivity debate that signals reopened is healthy, and it is making both frameworks better. React adopted automatic memoization. Svelte adopted explicit signals. They are converging on the same insight from opposite directions.&lt;/p&gt;

&lt;p&gt;For my own work I now reach for SvelteKit when I am building solo and the project is mine to maintain, and I reach for React when there is a team, a client, or an existing codebase involved. That is not a cop-out, it is the actual answer. The framework is a constraint you choose to fit the situation, and for the first time in a long while, Svelte is a serious option in that decision rather than the interesting underdog you read about but never ship.&lt;/p&gt;

&lt;p&gt;If you have only ever shipped React, the most useful thing you can do is what I did. Build one real thing in Svelte 5. Not a tutorial counter, an actual small tool with routing and data and forms. You will not necessarily switch, but you will understand the reactivity argument from the inside, and you will write better React for having seen the alternative. That is worth more than any comparison table, including this one.&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>frameworks</category>
      <category>react</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Modern CSS in 2026: The JavaScript You Can Finally Delete</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Thu, 04 Jun 2026 10:40:49 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/modern-css-in-2026-the-javascript-you-can-finally-delete-2l15</link>
      <guid>https://dev.to/alexcloudstar/modern-css-in-2026-the-javascript-you-can-finally-delete-2l15</guid>
      <description>&lt;p&gt;I opened an old project last week and found a 90-line JavaScript file whose only job was to add a class to a form's parent element when an input inside it was invalid. A &lt;code&gt;MutationObserver&lt;/code&gt;, an event listener, a bit of state, and a cleanup function. All to turn a border red when something upstream changed.&lt;/p&gt;

&lt;p&gt;In modern CSS that is one selector. &lt;code&gt;form:has(input:invalid)&lt;/code&gt;. No JavaScript, no observer, no cleanup, no bug where the listener leaks after the component unmounts.&lt;/p&gt;

&lt;p&gt;That moment is the whole story of CSS in 2026. The features that were "coming soon" for years are now shipped, supported, and in many cases at 100 percent across browsers. A surprising amount of the JavaScript we write for UI behavior is now legacy code that exists only because the platform could not do it when we wrote it. This is a tour of the JavaScript you can delete, the modern CSS that replaces it, and the honest support caveats so you do not ship something that breaks in Safari.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;code&gt;:has()&lt;/code&gt; Is the Parent Selector We Waited 20 Years For
&lt;/h2&gt;

&lt;p&gt;For the entire history of CSS, selectors only looked downward. You could style a child based on its parent, never the reverse. Every "style the parent based on its children" problem became a JavaScript problem. Toggle a class, observe the DOM, manage the state.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;:has()&lt;/code&gt; ended that, and as of 2026 it has 100 percent support across all major browsers. It is safe to use in production without a fallback. This is not a "progressive enhancement, test carefully" feature anymore. It just works.&lt;/p&gt;

&lt;p&gt;The mental shift is that &lt;code&gt;:has()&lt;/code&gt; lets a selector be conditional on what it contains.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="c"&gt;/* Card that contains an image looks different from a text-only card */&lt;/span&gt;
&lt;span class="nc"&gt;.card&lt;/span&gt;&lt;span class="nd"&gt;:has&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="py"&gt;grid-template-columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120px&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="n"&gt;fr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;/* Label turns red when its input is invalid */&lt;/span&gt;
&lt;span class="nt"&gt;label&lt;/span&gt;&lt;span class="nd"&gt;:has&lt;/span&gt;&lt;span class="o"&gt;(+&lt;/span&gt; &lt;span class="nt"&gt;input&lt;/span&gt;&lt;span class="nd"&gt;:invalid&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;--color-danger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;/* Form disables its submit area while any field is in error */&lt;/span&gt;
&lt;span class="nt"&gt;form&lt;/span&gt;&lt;span class="nd"&gt;:has&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;:invalid&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nt"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;'submit'&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;pointer-events&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;none&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Think about how much JavaScript that last one replaces. The old version listens for input events, validates the form state, finds the submit button, and toggles its disabled styling. The CSS version is declarative and has no lifecycle to manage.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;:has()&lt;/code&gt; also composes with quantity. &lt;code&gt;:has(&amp;gt; :nth-child(5))&lt;/code&gt; lets you style a container differently once it has five or more children, which used to require counting elements in JavaScript. Combined with the new &lt;code&gt;sibling-index()&lt;/code&gt; and &lt;code&gt;sibling-count()&lt;/code&gt; functions, layouts that adapt to the number of items in them are now a pure CSS concern.&lt;/p&gt;

&lt;p&gt;This is the single highest-leverage modern CSS feature to learn first, because it deletes the most JavaScript per line and the support is total.&lt;/p&gt;




&lt;h2&gt;
  
  
  Container Queries Killed the Breakpoint-Per-Page Model
&lt;/h2&gt;

&lt;p&gt;Media queries ask the wrong question. They ask how big the viewport is. What you usually want to know is how big the space this specific component lives in is. A card in a sidebar and the same card in a full-width hero should lay out differently, and the viewport cannot tell you which one you are looking at.&lt;/p&gt;

&lt;p&gt;Container queries fix this, and at roughly 92 percent global support they are a foundational feature in 2026, not an experiment. The shift people are calling "intrinsic design" is real: components style themselves based on their own container, which makes them genuinely portable. Drop the same component into a narrow column or a wide panel and it adapts on its own.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="nc"&gt;.card-list&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="py"&gt;container-type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;inline-size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="py"&gt;container-name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cards&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;@container&lt;/span&gt; &lt;span class="n"&gt;cards&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;min-width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;400px&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nc"&gt;.card&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="py"&gt;grid-template-columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="n"&gt;fr&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="n"&gt;fr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The practical payoff is that you stop maintaining a giant pile of viewport breakpoints that all have to agree with each other. A component carries its own responsive logic. You move it, the logic moves with it. This is the kind of structural improvement that does not show up in a demo but quietly removes a whole category of "this looks broken in this one layout" bugs.&lt;/p&gt;

&lt;p&gt;If you build with a component model already, whether that is React, Svelte, or anything else, container queries are the CSS feature that matches how you already think. I cover the component-model side of that in the &lt;a href="https://dev.to/blog/svelte-5-vs-react-2026/"&gt;Svelte 5 vs React comparison&lt;/a&gt;, and container queries are the styling layer that makes truly portable components possible regardless of which framework you picked.&lt;/p&gt;




&lt;h2&gt;
  
  
  View Transitions Make Page Changes Feel Native
&lt;/h2&gt;

&lt;p&gt;The reason native apps feel smoother than websites is rarely raw speed. It is that native apps animate between states, and websites historically snapped from one hard cut to the next. Bridging that gap meant heavy JavaScript animation libraries, manual coordination of enter and exit states, and a lot of code that broke the moment the DOM structure changed.&lt;/p&gt;

&lt;p&gt;The View Transitions API moves that into the platform. Same-document transitions reached Baseline in 2025, so animating between UI states within a single page is safe to ship now.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Same-document transition: wrap the DOM update&lt;/span&gt;
&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startViewTransition&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;updateTheList&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// your normal DOM mutation&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="c"&gt;/* Then describe the animation in CSS */&lt;/span&gt;
&lt;span class="nd"&gt;::view-transition-old&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nt"&gt;root&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;animation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fade-out&lt;/span&gt; &lt;span class="m"&gt;0.2s&lt;/span&gt; &lt;span class="n"&gt;ease&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nd"&gt;::view-transition-new&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nt"&gt;root&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;animation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fade-in&lt;/span&gt; &lt;span class="m"&gt;0.2s&lt;/span&gt; &lt;span class="n"&gt;ease&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The browser snapshots the before and after states and animates between them for you. You are not manually tracking which elements entered, left, or moved. You name the transition and write the keyframes, and the platform does the choreography.&lt;/p&gt;

&lt;p&gt;Here is the honest caveat. Same-document transitions are solid and Baseline. Cross-document transitions, the ones that animate between full page navigations in a multi-page site, are newer and browsers are still landing consistent implementations. They are spectacular when they work and the perfect fit for content sites and anything using an MPA architecture, but treat them as progressive enhancement for now. The page should be completely usable if the transition does not run, and with view transitions that graceful degradation is automatic: no support means an instant cut, which is exactly what you had before.&lt;/p&gt;




&lt;h2&gt;
  
  
  Anchor Positioning: Tooltips and Popovers Without the Geometry Math
&lt;/h2&gt;

&lt;p&gt;Positioning a tooltip or dropdown next to its trigger is one of those problems that looks trivial and is not. You measure the trigger, measure the viewport, calculate whether the popover fits below or needs to flip above, recalculate on scroll and resize, and handle the edge cases where it collides with the screen edge. Entire libraries exist for nothing but this.&lt;/p&gt;

&lt;p&gt;CSS Anchor Positioning lets you describe the relationship declaratively. You say which element is the anchor and which sides connect, and the browser handles the geometry, including flipping the popover when it would overflow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="nc"&gt;.trigger&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="py"&gt;anchor-name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;--menu-button&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nc"&gt;.dropdown&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;position&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;absolute&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="py"&gt;position-anchor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;--menu-button&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c"&gt;/* attach the dropdown's top to the anchor's bottom */&lt;/span&gt;
  &lt;span class="nl"&gt;top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;bottom&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nl"&gt;left&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;left&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c"&gt;/* automatically flip if it would overflow the viewport */&lt;/span&gt;
  &lt;span class="py"&gt;position-try-fallbacks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;flip-block&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pair this with the native &lt;code&gt;popover&lt;/code&gt; attribute and you can build accessible menus, tooltips, and dialogs with positioning logic that used to require a dedicated dependency, and now requires none.&lt;/p&gt;

&lt;p&gt;The caveat here is the biggest one in this article, so read it carefully. Anchor positioning shipped in Chromium first and browser support is still uneven in 2026. It is not at the safe-everywhere level of &lt;code&gt;:has()&lt;/code&gt; or container queries. If you use it, you need a sensible fallback for browsers that do not support it yet, or you scope it to environments where you control the browser. It is the most exciting feature on this list and the one most likely to bite you if you assume universal support. Check the current numbers before you commit, because this one is moving fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scroll-Driven Animations Without a Single Scroll Listener
&lt;/h2&gt;

&lt;p&gt;Scroll listeners are a performance trap. They fire constantly, they run on the main thread, and the naive version causes jank on exactly the low-end devices you most need to support. Doing scroll-linked animation well in JavaScript means throttling, &lt;code&gt;requestAnimationFrame&lt;/code&gt;, and intersection observers, and it is still easy to get wrong.&lt;/p&gt;

&lt;p&gt;CSS scroll-driven animations move this off the main thread entirely. You tie an animation's progress to scroll position or to an element's visibility, declaratively, and the browser runs it on the compositor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="k"&gt;@keyframes&lt;/span&gt; &lt;span class="n"&gt;reveal&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nt"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;translateY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2rem&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nt"&gt;to&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;translateY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nc"&gt;.section&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;animation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reveal&lt;/span&gt; &lt;span class="n"&gt;linear&lt;/span&gt; &lt;span class="nb"&gt;both&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="py"&gt;animation-timeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="py"&gt;animation-range&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="m"&gt;0%&lt;/span&gt; &lt;span class="n"&gt;cover&lt;/span&gt; &lt;span class="m"&gt;40%&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That fades and lifts each section as it scrolls into view, with no JavaScript and no scroll listener. A reading-progress bar at the top of an article, parallax effects, elements that animate as they enter the viewport, all of it becomes a few lines of CSS that the browser runs efficiently.&lt;/p&gt;

&lt;p&gt;Support is good in Chromium and improving elsewhere. Because these are animations, they degrade gracefully by nature: a browser that does not support the timeline simply shows the end state, so your content is never hidden behind an animation that did not run. That makes scroll-driven animations safe to add as an enhancement even where support is not universal.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Quality-of-Life Features That Add Up
&lt;/h2&gt;

&lt;p&gt;Beyond the headline features, 2026 CSS shipped a long list of smaller things that each delete a little friction. Individually they are minor. Together they change how much you reach for JavaScript or preprocessors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Native nesting.&lt;/strong&gt; You can nest selectors directly in CSS now, no Sass required. For a lot of projects this removes the build-step preprocessor entirely, which is one less thing in your toolchain. I went down the build-simplification path when I &lt;a href="https://dev.to/blog/moved-portfolio-from-nextjs-to-astro/"&gt;moved my portfolio to Astro&lt;/a&gt;, and native CSS nesting is part of why so much tooling that used to feel mandatory is now optional.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="nc"&gt;.card&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1rem&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="err"&gt;&amp;amp;&lt;/span&gt; &lt;span class="err"&gt;.title&lt;/span&gt; &lt;span class="err"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;font-weight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;600&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nd"&gt;:hover&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;background&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;--surface-hover&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cascade layers (&lt;code&gt;@layer&lt;/code&gt;).&lt;/strong&gt; Specificity wars are the reason so many codebases drown in &lt;code&gt;!important&lt;/code&gt;. Cascade layers let you define explicit priority order between groups of styles, so your base, components, and utilities never fight unpredictably. This is the feature that makes large CSS codebases maintainable instead of a game of specificity whack-a-mole.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;@scope&lt;/code&gt;.&lt;/strong&gt; Scoped styles without a build tool or CSS-in-JS runtime. You can limit a block of styles to a subtree of the DOM and even set a lower boundary, which gets you component-style encapsulation natively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modern color with &lt;code&gt;oklch()&lt;/code&gt;.&lt;/strong&gt; Perceptually uniform color, which means generating consistent tints and shades for a design system actually works instead of producing muddy mid-tones. Combined with &lt;code&gt;color-mix()&lt;/code&gt;, you can build an entire palette from a couple of base colors in pure CSS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;text-wrap: balance&lt;/code&gt; and &lt;code&gt;text-wrap: pretty&lt;/code&gt;.&lt;/strong&gt; Headlines that wrap evenly instead of leaving one orphaned word on the last line, and body text that avoids ugly typographic orphans. This used to require a JavaScript library that measured and inserted line breaks. Now it is one declaration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;field-sizing: content&lt;/code&gt;.&lt;/strong&gt; Textareas and inputs that grow to fit their content automatically. The auto-resizing textarea was a rite of passage JavaScript snippet for a decade. It is now one line of CSS.&lt;/p&gt;

&lt;p&gt;None of these is a headline on its own. The cumulative effect is that a modern stylesheet does things in 2026 that genuinely required JavaScript or a preprocessor two years ago, and the code that remains is smaller and easier to reason about.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Actually Adopt This Without Breaking Things
&lt;/h2&gt;

&lt;p&gt;The temptation after reading a list like this is to rewrite everything. Do not. The right approach is to know which tier each feature is in and treat them accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ship today, no fallback needed:&lt;/strong&gt; &lt;code&gt;:has()&lt;/code&gt;, container queries, native nesting, cascade layers, &lt;code&gt;oklch()&lt;/code&gt;, &lt;code&gt;text-wrap&lt;/code&gt;. These are at or near universal support. Use them like any other CSS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ship as progressive enhancement:&lt;/strong&gt; same-document view transitions, scroll-driven animations, &lt;code&gt;field-sizing&lt;/code&gt;. These degrade gracefully on their own. The page works without them and gets nicer with them, so you can add them now and let support fill in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check current numbers first, provide a fallback:&lt;/strong&gt; anchor positioning and cross-document view transitions. These are real and worth using in the right context, but they are not safe-everywhere yet. Scope them to controlled environments or pair them with a sensible default.&lt;/p&gt;

&lt;p&gt;The way to make this concrete is to use a feature query. &lt;code&gt;@supports&lt;/code&gt; lets you write the modern version and fall back cleanly when it is missing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="nc"&gt;.dropdown&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c"&gt;/* fallback positioning */&lt;/span&gt;
  &lt;span class="nl"&gt;top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100%&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;left&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;@supports&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;anchor-name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;--x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nc"&gt;.dropdown&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="py"&gt;position-anchor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;--menu-button&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;bottom&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two minutes of &lt;code&gt;@supports&lt;/code&gt; buys you the new feature where it exists and a working experience everywhere else. That is the entire risk-management strategy.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Shift Is Where Logic Lives
&lt;/h2&gt;

&lt;p&gt;Step back from the individual features and there is a pattern. For 15 years, the answer to "the platform cannot do this" was JavaScript. State that depended on the DOM, layout that depended on context, animation that depended on scroll, positioning that depended on geometry. We pushed all of it into scripts because CSS could not express it. That code accumulated, and it is a large fraction of the JavaScript a typical site ships.&lt;/p&gt;

&lt;p&gt;Modern CSS is pulling that logic back into the declarative layer where it belongs. Declarative code is smaller, it has no lifecycle to manage, it does not leak listeners, and it runs in the browser's optimized paths instead of on the main thread. The same way &lt;a href="https://dev.to/blog/the-react-compiler-is-here-say-goodbye-to-usememo-and-usecallback/"&gt;the React Compiler removed the manual memoization&lt;/a&gt; we used to hand-write, modern CSS is removing the manual DOM coordination we used to script. The platform got good enough that the workaround became the liability.&lt;/p&gt;

&lt;p&gt;This pairs with the broader trend of the web platform absorbing what used to be library territory, the same way &lt;a href="https://dev.to/blog/es2026-javascript-features-guide/"&gt;the latest JavaScript language features&lt;/a&gt; keep replacing utilities we used to install. The lesson is the same in both: before you reach for a dependency or write a clever script, check whether the platform already does it. In 2026, for a surprising amount of UI behavior, the answer is yes.&lt;/p&gt;

&lt;p&gt;The next time you are about to add a scroll listener, a &lt;code&gt;MutationObserver&lt;/code&gt;, a positioning library, or a class-toggling effect, pause and ask whether CSS can do it now. More often than you would expect, it can, and the version that lives in your stylesheet will outlast the version that lived in your bundle. That old 90-line file I deleted is not coming back. Most of yours can go the same way.&lt;/p&gt;

</description>
      <category>css</category>
      <category>webdev</category>
      <category>frontend</category>
      <category>devtools</category>
    </item>
    <item>
      <title>The Best Background Coding Agents in 2026: Codex Cloud vs Cursor vs Copilot vs Claude Code</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Wed, 03 Jun 2026 07:30:02 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/the-best-background-coding-agents-in-2026-codex-cloud-vs-cursor-vs-copilot-vs-claude-code-59gi</link>
      <guid>https://dev.to/alexcloudstar/the-best-background-coding-agents-in-2026-codex-cloud-vs-cursor-vs-copilot-vs-claude-code-59gi</guid>
      <description>&lt;p&gt;By May 2026 there were seven coding tools serious enough to argue about: Claude Code, OpenAI Codex, Cursor, GitHub Copilot, Google Antigravity, Kiro, and Windsurf. What is new is that almost all of them now ship a background mode. Not "AI in your editor" but "AI that runs in the cloud, in its own machine, and opens a pull request while you go do literally anything else."&lt;/p&gt;

&lt;p&gt;I have been running real work through the main contenders for a few months. Not toy tasks, actual production work in actual repos. This is the comparison I wish I had before I started, because the marketing pages make them all sound identical and they are not.&lt;/p&gt;

&lt;p&gt;If you want the conceptual version of when to use this kind of tool at all, I wrote that up separately in the piece on &lt;a href="https://dev.to/blog/background-ai-coding-agents-2026"&gt;background AI coding agents and when to delegate&lt;/a&gt;. This article is the other half: assuming you have decided to delegate, which tool do you actually pick.&lt;/p&gt;




&lt;h2&gt;
  
  
  What "Background Agent" Means Across These Tools
&lt;/h2&gt;

&lt;p&gt;First, the shared model, because it is genuinely the same shape everywhere.&lt;/p&gt;

&lt;p&gt;You describe a task. The tool provisions a fresh, isolated virtual machine. That VM clones your repository at the current HEAD of a branch, runs your setup commands so it has a working environment, then the agent executes the task, runs your checks, and opens a draft pull request with a summary. You review the PR whenever you get to it. You were not watching while it worked.&lt;/p&gt;

&lt;p&gt;The differences are in the details that matter: where you trigger it from, how good the isolated environment is, how it handles setup, how it reports back, how much it costs, and how strong the underlying model is. Those details are the whole comparison.&lt;/p&gt;

&lt;p&gt;One thing worth saying up front. None of these tools should be merging their own code. Every one of them opens a draft PR specifically so your existing quality gates stay in charge. The background agent is a contributor, not a committer. Keep branch protection and required review on, no matter which tool you pick.&lt;/p&gt;




&lt;h2&gt;
  
  
  OpenAI Codex Cloud Tasks
&lt;/h2&gt;

&lt;p&gt;Codex leans hardest into the cloud-native model. The whole product is built around the idea that you describe a task and it runs somewhere else, returns a reviewable diff, and you decide what to do with it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it shines.&lt;/strong&gt; Independent, well-bounded work that returns a clean diff. Codex is genuinely good at the "here is a described change, go make it in a clean environment and show me the result" loop. The isolation is solid, the environment setup is straightforward, and the diffs come back in a form that is easy to review. For draining a backlog of tightly scoped tasks, it is one of the strongest options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it struggles.&lt;/strong&gt; Anything that needs your local state. Because it is so cloud-first, the things that depend on your actual machine, local services, browser state, uncommitted changes, are exactly the things it cannot help with. That is not a flaw, it is the design, but you feel it if you try to use it for the wrong tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing shape.&lt;/strong&gt; Codex runs on a usage-based model rather than a flat seat. That cuts both ways. If you delegate sporadically, you pay for what you use and it is cheap. If you run a fleet of agents around the clock, usage-based billing can climb fast, so watch the meter.&lt;/p&gt;

&lt;p&gt;The mental model that works for Codex: it is a queue you throw repo-scoped tasks into. Treat it like a build server that writes code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cursor Cloud Agents
&lt;/h2&gt;

&lt;p&gt;Cursor came from the IDE side, and its background story reflects that. You can start work interactively in the editor and then push it to a cloud agent that runs in an isolated VM, or kick off cloud agents directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it shines.&lt;/strong&gt; The handoff between foreground and background is the smoothest of the bunch. You can be working interactively, realize a chunk of it is delegatable, and send it to the cloud without leaving your flow. Each cloud task gets a fresh VM with its own filesystem, terminal, network, and package environment, clones your repo at HEAD, runs your setup, and works from there. For developers who already live in Cursor, the continuity is the selling point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it struggles.&lt;/strong&gt; It is the most expensive of the IDE-rooted options at the team tier, and the cloud agent experience, while good, is layered onto a product that was fundamentally designed around the editor. If you do not otherwise use Cursor, adopting it just for background agents is a lot of surface area for the feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing shape.&lt;/strong&gt; Business tier runs in the higher range for team seats, reportedly around the $4,800 per year mark for ten developers, in the same neighborhood as Windsurf. Pricing across all these tools moves constantly, so treat any number as a snapshot, but Cursor sits at the pricier end.&lt;/p&gt;

&lt;p&gt;If you already work in Cursor, the cloud agents are an easy yes. If you do not, the value proposition is narrower.&lt;/p&gt;




&lt;h2&gt;
  
  
  GitHub Copilot Coding Agent
&lt;/h2&gt;

&lt;p&gt;Copilot's background story is the most tightly woven into where the work already lives: GitHub itself. You assign an issue to the coding agent, or mention it, and it works in the background, opens a draft PR, and requests your review. There is also a delegate flow from the CLI and from inside editors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it shines.&lt;/strong&gt; If your team runs on GitHub issues and PRs, the integration is hard to beat. The agent reads the issue description and comments as context, opens a draft PR against your branch protection rules, and the whole thing lives inside the workflow you already use. Turning acceptance criteria in an issue directly into a draft PR, without anyone bypassing branch protection, is a genuinely clean loop. Copilot has actually split its agents into distinct flavors, local, background, cloud, and sub-agents, which is confusing at first but useful once you know which one you are invoking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it struggles.&lt;/strong&gt; The model ceiling has historically trailed the frontier, and for hard, ambiguous reasoning work it can show. It is excellent at well-specified issue-to-PR work and less impressive when the task needs deep judgment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing shape.&lt;/strong&gt; Copilot is among the most affordable team options, reportedly around the $2,280 per year range for ten developers on the Business tier, which makes it an easy default for teams already paying for GitHub.&lt;/p&gt;

&lt;p&gt;The fit here is organizational. If your team's source of truth is GitHub issues, Copilot's coding agent slots in with almost no new process.&lt;/p&gt;




&lt;h2&gt;
  
  
  Claude Code Async
&lt;/h2&gt;

&lt;p&gt;Claude Code is a terminal-first agent, and its async story extends that. You can run it in the background and have it work through tasks while you do other things, and it keeps the deepest reasoning ceiling of the group thanks to the underlying Opus models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it shines.&lt;/strong&gt; Hard tasks. When the work needs real reasoning, multi-step planning, or wrangling a gnarly codebase, the model quality shows. The latest &lt;a href="https://dev.to/blog/claude-opus-4-8-review-benchmarks-developer-guide-2026"&gt;Claude Opus release&lt;/a&gt; pushed the reliability of long agentic sessions up noticeably, which is exactly what you want when an agent is running unattended for twenty minutes. It is also the most loved tool among developers in the 2026 surveys by a wide margin, which is not nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it struggles.&lt;/strong&gt; Cost, mainly. The team tier is dramatically more expensive than the others, reportedly an order of magnitude above Copilot for ten seats. For an individual or small team the per-use cost can be very reasonable, but at scale it is the priciest reasoning you can buy. The terminal-first nature also means the background experience is less GUI-polished than Cursor's or Copilot's GitHub-native flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing shape.&lt;/strong&gt; Individual usage is reasonable. The team tier is the most expensive in this group by a large margin, so it is a deliberate spend you make because you want the reasoning ceiling, not a default.&lt;/p&gt;

&lt;p&gt;The Claude Code ecosystem also goes deep beyond the core agent. If you are standardizing how your team works with it, the &lt;a href="https://dev.to/blog/claude-code-plugin-marketplace-skills-2026"&gt;plugin marketplace and skills&lt;/a&gt; turn your patterns into shareable installable pieces, which matters more once multiple people are delegating against the same conventions.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Actually Choose Between Them
&lt;/h2&gt;

&lt;p&gt;Forget the feature matrices. Here is the decision I actually make, by situation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your team lives on GitHub issues and PRs.&lt;/strong&gt; Copilot coding agent. The integration tax is near zero and the price is the lowest. You are not adopting a new tool, you are turning on a feature in one you already pay for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You already work in Cursor all day.&lt;/strong&gt; Cursor cloud agents. The foreground-to-background handoff is the smoothest you will get, and you are not adding new surface area. If you do not already use Cursor, this is a weaker reason to start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You want a pure delegation queue for backlog work.&lt;/strong&gt; Codex cloud tasks. The usage-based pricing fits sporadic delegation, and the cloud-first design is built precisely for "describe it, run it elsewhere, review the diff." Just watch the meter if you scale up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The work is genuinely hard and you will pay for quality.&lt;/strong&gt; Claude Code async. When the task needs the strongest reasoning available and getting it right the first time is worth real money, this is the ceiling. For individuals the cost is fine. For large teams it is a deliberate, expensive choice.&lt;/p&gt;

&lt;p&gt;Most serious users do not pick one. They use the cheap, integrated option for the bulk of well-scoped work and reach for the expensive, high-reasoning option for the genuinely hard tasks. That is the same split I described in the &lt;a href="https://dev.to/blog/multi-agent-vs-single-agent-architecture-2026"&gt;single-agent versus multi-agent&lt;/a&gt; tradeoff: match the tool to the difficulty of the task, do not pay frontier prices for boilerplate.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pricing Reality Nobody Likes
&lt;/h2&gt;

&lt;p&gt;A quick, honest aside on cost, because the numbers move every quarter and the framing matters more than any specific figure.&lt;/p&gt;

&lt;p&gt;Two pricing models dominate. Flat per-seat (Copilot, Cursor, Windsurf, Kiro, Antigravity) and usage-based (Codex, and Claude Code at the API level). For ten developers per year, the reported flat-tier spread in 2026 ran roughly from the low thousands for Copilot up to multiples of that for Cursor and Windsurf, with Claude Code's team tier sitting far above the rest. Usage-based options are cheap when you delegate occasionally and can become the most expensive when you run agents constantly.&lt;/p&gt;

&lt;p&gt;The trap is optimizing for the seat price and ignoring the usage pattern. A flat seat you barely use is wasted money. A usage-based tool you hammer around the clock can blow past a flat tier you would have been better off buying. Match the billing model to how you actually work, not to which sticker looks lowest.&lt;/p&gt;

&lt;p&gt;And treat every number you see, including the ones in this article, as a snapshot. This category reprices constantly. Verify current pricing before you commit a team to anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Mistakes That Make Any of Them Look Bad
&lt;/h2&gt;

&lt;p&gt;The tool matters less than how you use it, and the same mistakes sink every one of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vague tasks.&lt;/strong&gt; "Improve the codebase" produces garbage on all four. Tight, well-scoped tasks with a clear definition of done produce mergeable PRs. The merge rate gap between good and bad prompts dwarfs the quality gap between tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No instructions file.&lt;/strong&gt; A background agent starts in a clean VM with none of your context. An &lt;code&gt;AGENTS.md&lt;/code&gt; or equivalent at the repo root, covering setup, test commands, and conventions, is the highest-leverage thing you can add. Skip it and every agent looks dumber than it is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rubber-stamping the diff.&lt;/strong&gt; AI-generated code carries measurably higher bug density, and a convincing PR summary is not a substitute for reading the actual change. I run agent PRs through the same &lt;a href="https://dev.to/blog/testing-ai-generated-code-developer-guide-2026"&gt;testing process for AI-generated code&lt;/a&gt; regardless of which tool produced them. The review discipline is tool-independent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring the debt.&lt;/strong&gt; Ship a lot of agent code fast and you accumulate complexity and duplication that no single PR review catches. This &lt;a href="https://dev.to/blog/ai-generated-code-technical-debt-2026"&gt;new shape of technical debt&lt;/a&gt; is real across all of these tools, because it is a property of fast machine-authored code, not of any one vendor.&lt;/p&gt;

&lt;p&gt;Pick the wrong tool and you lose a bit of efficiency. Make these mistakes and the best tool in the world produces a pile of plausible-looking PRs you cannot trust. The leverage is in your process, not the logo.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;If I had to hand someone a single default, it is the Copilot coding agent, purely on integration and price, assuming their team already lives on GitHub. It is the lowest-friction way to start delegating real work, and most teams already pay for it.&lt;/p&gt;

&lt;p&gt;If you live in Cursor, use Cursor's cloud agents and do not overthink it. If you want a delegation queue and you delegate in bursts, Codex is the cleanest fit. And when the task is genuinely hard and correctness is worth real money, Claude Code's reasoning ceiling is the one I reach for, cost accepted.&lt;/p&gt;

&lt;p&gt;But the meta-point is the one I keep landing on. These tools have converged on the same model, and within a release cycle they tend to leapfrog each other on benchmarks anyway. The durable advantage is not which agent you picked. It is whether you got good at scoping tasks, writing instructions, and reviewing output. Do that, and any of these four serves you well. Skip it, and none of them will.&lt;/p&gt;

&lt;p&gt;The background agent category is a year old and already feels permanent. The right move is not to bet everything on one vendor. It is to build the delegation muscle that makes all of them useful, and stay light enough to switch when the next one ships something better. Because it will, probably before you finish reading the changelog.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Background AI Coding Agents in 2026: When to Delegate Work to Async Agents</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Wed, 03 Jun 2026 07:30:00 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/background-ai-coding-agents-in-2026-when-to-delegate-work-to-async-agents-3g92</link>
      <guid>https://dev.to/alexcloudstar/background-ai-coding-agents-in-2026-when-to-delegate-work-to-async-agents-3g92</guid>
      <description>&lt;p&gt;The first time I sent a task to a background agent and closed my laptop, it felt wrong. Like leaving the stove on. I had spent two years learning to watch AI tools work, reading every diff as it appeared, ready to hit escape the moment it went sideways. Now I was supposed to describe a task, walk away, and come back to a finished pull request.&lt;/p&gt;

&lt;p&gt;It worked. The agent cloned the repo, ran the setup, made the change across four files, ran the tests, and opened a draft PR with a summary of what it did. I reviewed it over coffee twenty minutes later. That was the moment the shape of my work changed.&lt;/p&gt;

&lt;p&gt;This is the part of agentic development that people are still catching up to. Not "the AI writes code in my editor" but "the AI writes code somewhere else while I am not looking." Background agents are a different tool with a different mental model, and using them well is mostly about knowing what to hand off and what to keep close.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Background Coding Agent Actually Is
&lt;/h2&gt;

&lt;p&gt;Let me be precise, because the terminology is a mess right now.&lt;/p&gt;

&lt;p&gt;A foreground agent runs where you are. You give it a task, you watch it work, you correct it in real time. This is the &lt;a href="https://dev.to/blog/agentic-coding-2026"&gt;agentic coding loop&lt;/a&gt; most developers got comfortable with over the last eighteen months. It runs in your terminal or your IDE, against your actual filesystem, and you are in the loop the whole time.&lt;/p&gt;

&lt;p&gt;A background agent runs somewhere else, usually a fresh cloud VM, and reports back when it is done. You describe the task, it spins up an isolated environment, clones your repo at the current HEAD of a branch, runs your setup commands, makes the change, runs your checks, and opens a draft pull request. You were not watching. You find out it finished when the PR notification lands.&lt;/p&gt;

&lt;p&gt;The session length data tells the story. Average coding agent sessions went from about 4 minutes in early 2025 to 23 minutes in early 2026. Sessions got longer because agents stopped needing you to babysit every step. Multi-file edits went from 34% of sessions to 78% in the same window. The work got bigger and more autonomous at the same time.&lt;/p&gt;

&lt;p&gt;Every major tool now ships some version of this. GitHub Copilot has a coding agent that opens draft PRs from issues. Cursor has cloud agents that run in isolated VMs. OpenAI's Codex has cloud tasks. Claude Code has async background runs. The category went from "experimental" to "default expectation" in about a year.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Tiers I Actually Use
&lt;/h2&gt;

&lt;p&gt;I think about agent work in three tiers now, and the tier determines where the work runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier one is interactive.&lt;/strong&gt; This is foreground work where I am present and correcting in real time. I use it for anything I do not fully understand yet, anything touching code I am nervous about, and anything where the feedback loop matters more than the throughput. Debugging a weird production issue lives here. So does any change to auth, billing, or data migrations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier two is parallel sprints.&lt;/strong&gt; This is where I fire off two or three background agents on independent tasks and let them run while I work on something else in the foreground. The key word is independent. If the tasks touch the same files, I am setting myself up for a merge nightmare, so I only parallelize work with clean boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier three is the overnight backlog drain.&lt;/strong&gt; This is the one that still feels slightly magical. I queue up a batch of well-scoped, low-risk tasks before I stop for the day, and review the resulting draft PRs in the morning. Documentation patches, test coverage gaps, dependency bumps, small refactors that follow an existing pattern. Boring work that adds up.&lt;/p&gt;

&lt;p&gt;Most developers I know who use background agents seriously use all three tiers. The mistake is treating background agents as a replacement for foreground work rather than a different tool for a different kind of task.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes a Task Cloud-Ready
&lt;/h2&gt;

&lt;p&gt;This is the whole game. The difference between background agents saving you hours and background agents creating a pile of broken PRs is almost entirely about task selection.&lt;/p&gt;

&lt;p&gt;A task is cloud-ready when you can describe it completely in a prompt. That means it has five things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A clear scope.&lt;/strong&gt; One feature, one fix, one refactor. Not "improve the codebase."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The files or area it touches.&lt;/strong&gt; Either you name them or the agent can find them from a clear description.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A setup path.&lt;/strong&gt; The setup commands that get a fresh machine to a working state, usually your install and build steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A test command.&lt;/strong&gt; Something the agent can run to know whether it succeeded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A stop condition.&lt;/strong&gt; A definition of done the agent can actually evaluate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If a task is missing any of these, it is not ready for a background agent. The honest test I use: if I cannot explain the task in a prompt with scope, files, checks, and a stop condition, I either keep it local or I ask the agent to produce a plan first and I review that before letting it run.&lt;/p&gt;

&lt;p&gt;Tasks that fit this shape beautifully: writing tests for existing code, refactors that follow a known pattern, documentation updates, fixing a well-described bug, security review follow-ups, dependency upgrades, and turning a detailed issue into a PR-ready branch. These all have clear acceptance criteria and do not depend on the messy state of your laptop.&lt;/p&gt;

&lt;p&gt;Tasks that do not fit: anything exploratory, anything where the requirements emerge as you work, anything that depends on local services or browser state, and anything where you would not be able to tell from the diff alone whether it is correct.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Keep It Local Instead
&lt;/h2&gt;

&lt;p&gt;Background agents are not the answer to everything, and the failure mode of over-delegating is real. Some work belongs on your machine, in the foreground, where you can see it.&lt;/p&gt;

&lt;p&gt;Keep it local when the work depends on your current filesystem state, uncommitted changes, local services, desktop browser state, or private tools the cloud VM cannot reach. The classic example is the bug that only reproduces on your machine at 11:47 at night. A background agent in a clean VM will never see it. You need to be there, running the same commands against the same broken state.&lt;/p&gt;

&lt;p&gt;Keep it local when you need a tight inspect-run-edit loop. Some debugging is a conversation. You run, you look, you tweak, you run again, ten times in five minutes. Shipping that to a cloud VM that takes a minute to spin up each time is slower, not faster. The latency of the round trip kills you.&lt;/p&gt;

&lt;p&gt;Keep it local when you do not yet understand the problem. Background agents are for executing well-understood work, not for figuring out what the work is. If you cannot write the spec, you cannot delegate the task. This is the same reason &lt;a href="https://dev.to/blog/context-engineering-ai-coding-2026"&gt;context engineering&lt;/a&gt; matters so much: what the agent knows going in determines what comes out, and a background agent gets exactly one shot at the context you gave it.&lt;/p&gt;

&lt;p&gt;There is a decision-framework version of this question I keep coming back to, the &lt;a href="https://dev.to/blog/vibe-ceiling-ai-code-decision-framework-2026"&gt;vibe ceiling&lt;/a&gt;, which is basically the point where AI help stops making you faster and starts making you slower on your own mature codebase. Background agents raise the ceiling for well-scoped work and lower it for everything else. Know which side of the line your task is on.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up So Agents Do Not Fight Each Other
&lt;/h2&gt;

&lt;p&gt;The most productive background agent setup is boring in the best way. Reproducible setup scripts, clear instructions, small tasks, and non-overlapping file ownership.&lt;/p&gt;

&lt;p&gt;The single highest-leverage thing you can do is write a good instructions file. Most tools read an &lt;code&gt;AGENTS.md&lt;/code&gt; or equivalent at the repo root. This is where you put the stuff a new contributor would need to know: how to install, how to run tests, the conventions you care about, the things that always break. A background agent starting in a fresh VM has none of your accumulated context. The instructions file is how you give it some.&lt;/p&gt;

&lt;p&gt;The second thing is non-overlapping file ownership when you run agents in parallel. If two agents are editing the same module, you are going to get conflicting diffs and waste the time you thought you were saving. I scope parallel tasks so each one owns a distinct slice of the codebase. One agent on the API layer, one on the docs, one on tests for an untouched module. They never collide.&lt;/p&gt;

&lt;p&gt;The third thing is small tasks. A background agent given a huge vague task will produce a huge vague PR that takes longer to review than the work would have taken to do. Small, well-scoped tasks produce small, reviewable PRs. The merge rate on tight tasks is dramatically higher than on "go improve this" prompts.&lt;/p&gt;

&lt;p&gt;If you are coordinating more than a couple of agents at once, you are basically doing orchestration, and the tradeoffs there are their own topic. I went deep on the &lt;a href="https://dev.to/blog/multi-agent-vs-single-agent-architecture-2026"&gt;multi-agent versus single-agent question&lt;/a&gt; separately, because more agents is not automatically better and the coordination cost is real.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Review Problem Nobody Warns You About
&lt;/h2&gt;

&lt;p&gt;Here is the thing that bit me, and bites most people who lean into background agents. The bottleneck does not disappear when you delegate the writing. It moves to review.&lt;/p&gt;

&lt;p&gt;Teams using AI heavily merge far more pull requests than they used to, but PR review time went up, not down, and PR size ballooned. When agents can produce ten PRs in a weekend, the question stops being "can I write this code" and becomes "can I review this code fast enough to trust it." I wrote about this shift in detail in the piece on &lt;a href="https://dev.to/blog/ai-code-review-bottleneck-2026"&gt;the AI code review bottleneck&lt;/a&gt;, and background agents make it sharper because the volume goes up while your review capacity stays exactly the same.&lt;/p&gt;

&lt;p&gt;A few things keep me sane here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treat every agent PR as a draft from a junior contributor.&lt;/strong&gt; Background agents open draft PRs for a reason. The draft is a proposal, not a decision. The developer who assigned the task owns reviewing every line, the same way you would own a junior's PR. Delegating the writing does not delegate the responsibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep your quality gates in charge.&lt;/strong&gt; Branch protection, required human review, required status checks, security scans that must pass before merge. These are non-negotiable with background agents. The agent works inside the same gates as everyone else. It does not get to merge its own work, ever. This is the structural protection that lets you delegate without losing control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Review the diff, not the explanation.&lt;/strong&gt; Agents write convincing PR summaries. The summary tells you what the agent thinks it did. The diff tells you what it actually did. These are not always the same thing, and the gap is exactly where bugs live. I read the diff first and the summary second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test the change, do not just read it.&lt;/strong&gt; AI-generated code has measurably higher bug density, and reading is not the same as verifying. I have a &lt;a href="https://dev.to/blog/testing-ai-generated-code-developer-guide-2026"&gt;whole process for testing AI-generated code&lt;/a&gt; that catches the failure modes a visual scan misses, and it applies double to background work where you were not watching it happen.&lt;/p&gt;




&lt;h2&gt;
  
  
  The New Class of Technical Debt
&lt;/h2&gt;

&lt;p&gt;There is a slower problem that does not show up in any single PR review. When you ship a lot of agent-written code fast, you accumulate a specific kind of debt.&lt;/p&gt;

&lt;p&gt;The code works. The tests pass. It merges. But it is verbose, it duplicates patterns that already existed elsewhere, and nobody on the team has a complete mental model of it because nobody wrote it line by line. Studies through 2026 have been blunt about this: heavy agent adoption correlates with rising code complexity, more duplicated code, and climbing change failure rates, even as raw throughput goes up.&lt;/p&gt;

&lt;p&gt;Background agents amplify this because the work is even further removed from human authorship. With a foreground agent, at least you watched it happen and have some memory of the decisions. With a background agent, you get a finished diff and a summary, and if you rubber-stamp it, the code enters your codebase as a black box.&lt;/p&gt;

&lt;p&gt;I think about this as a &lt;a href="https://dev.to/blog/ai-generated-code-technical-debt-2026"&gt;new kind of technical debt&lt;/a&gt; that traditional review processes were not designed to catch. The defense is not to stop using background agents. It is to keep tasks small enough that each PR is genuinely reviewable, to enforce that agent code follows existing patterns rather than inventing new ones, and to accept that review is now the expensive part of your workflow and budget time for it accordingly.&lt;/p&gt;

&lt;p&gt;The other half of the defense is what happens after merge. When agent-written code ships and breaks, you need to find out before your users do. &lt;a href="https://dev.to/blog/production-observability-solo-developer-2026"&gt;Production observability for a solo developer&lt;/a&gt; is a different skill from review, and it is the one most people leaning into background agents are skipping. Fast shipping plus blind spots in production is how a quiet bug becomes a lost week.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security and Access, Because Cloud VMs Are Not Free of Risk
&lt;/h2&gt;

&lt;p&gt;A background agent runs in an environment with access to your code, your setup secrets, and a network. That is a real surface, and it is easy to be careless because the convenience is so high.&lt;/p&gt;

&lt;p&gt;Limit repository access to only what the agent needs. Do not grant org-wide access because it was easier to click the broad permission. The agent working on one service does not need read access to forty repos.&lt;/p&gt;

&lt;p&gt;Be deliberate about secrets in the agent environment. A fresh VM that needs to run your setup may need certain environment variables, but it does not need your production credentials to run unit tests. Scope what the agent can reach to what the task requires, and assume the environment is less trusted than your laptop.&lt;/p&gt;

&lt;p&gt;This is a subset of the broader picture for running agents safely. If background agents are getting real access to your systems, the full set of practices in &lt;a href="https://dev.to/blog/securing-ai-agents-production-2026"&gt;securing AI agents in production&lt;/a&gt; applies, especially least privilege and credential management. The cloud VM being isolated does not mean the access you handed it is.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Actually Work Now
&lt;/h2&gt;

&lt;p&gt;Putting it together, here is the rhythm a normal day looks like.&lt;/p&gt;

&lt;p&gt;Morning, I review the draft PRs from any overnight tier-three tasks. Most merge with light edits. A couple get sent back with comments. One occasionally gets closed because the agent went the wrong direction, which is fine, because it cost me nothing to run.&lt;/p&gt;

&lt;p&gt;During the day, the hard, interesting, ambiguous work stays in the foreground where I can drive it. That is the work I actually want to be doing anyway, the system design and the judgment calls and the gnarly bugs. The agentic loop handles the execution and I handle the direction.&lt;/p&gt;

&lt;p&gt;When I hit a chunk of independent, well-scoped work, I scope it as a background task and fire it off, then keep working. Tests for a module I just finished. A refactor I have been putting off. Documentation that drifted out of date. The agent handles it in a clean VM while I stay in flow on the main thread.&lt;/p&gt;

&lt;p&gt;End of day, I queue whatever boring well-defined work is sitting in the backlog and let it run overnight.&lt;/p&gt;

&lt;p&gt;The throughput change is real, but it is not the headline. The headline is that the work split cleanly into "things only I should do" and "things I can describe well enough to delegate," and that split made both halves better. I spend more time on the half that needs judgment and less on the half that needs typing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Honest Bottom Line
&lt;/h2&gt;

&lt;p&gt;Background coding agents are not a magic productivity multiplier you bolt on and forget. They are a delegation tool, and delegation has always been a skill. The developers getting the most out of them are not the ones with the best prompts. They are the ones who got good at deciding what to hand off.&lt;/p&gt;

&lt;p&gt;The mechanics are easy. Every major tool ships a competent background mode now, and they all work roughly the same way: describe the task, isolated VM, draft PR, you review. The skill that separates useful from chaotic is task selection and review discipline. Scope tightly. Keep the ambiguous work local. Never let an agent merge its own code. Read the diff, not the summary. Test what shipped.&lt;/p&gt;

&lt;p&gt;Do that, and async agents quietly take a real load off your day. Skip it, and you get a firehose of plausible-looking PRs that take longer to review than the work would have taken to write. The tool is the same either way. The difference is entirely in how you use it.&lt;/p&gt;

&lt;p&gt;Two years ago I was nervous about closing my laptop while an agent worked. Now it is just Tuesday. The stove was never on.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How Much Does It Cost to Build an App in 2026? An Honest Breakdown</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Tue, 02 Jun 2026 07:26:40 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/how-much-does-it-cost-to-build-an-app-in-2026-an-honest-breakdown-15f4</link>
      <guid>https://dev.to/alexcloudstar/how-much-does-it-cost-to-build-an-app-in-2026-an-honest-breakdown-15f4</guid>
      <description>&lt;p&gt;The first time someone asked me "how much does it cost to build an app," I gave them a number in about four minutes. I was wrong by a factor of three.&lt;/p&gt;

&lt;p&gt;It was an early freelance project. A founder described his idea on a call, I did some quick mental math, and I said "around twenty grand." He nodded, we shook hands over Zoom, and then reality showed up. The login flow needed social auth and email and a magic link. The "simple" dashboard needed real-time updates. Payments meant Stripe plus webhooks plus a subscription state machine that had to survive failed charges. By the end, the honest number was closer to sixty thousand, and I had already committed to twenty.&lt;/p&gt;

&lt;p&gt;I ate the difference because it was my mistake, not his. But that experience taught me something I now tell every person who asks me about app costs: the number you get in the first four minutes of a conversation is almost never the real number. And if someone gives you a confident, exact price before they understand what you actually need, that is the single biggest red flag in this entire industry.&lt;/p&gt;

&lt;p&gt;So let me do the thing most pricing guides refuse to do. Let me give you real ranges, explain what moves the number up and down, and show you how to get a quote you can actually trust.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Much Does It Cost to Build an App in 2026?
&lt;/h2&gt;

&lt;p&gt;Here is the honest answer first, then the nuance.&lt;/p&gt;

&lt;p&gt;In 2026, building an app with a professional developer or team costs somewhere between &lt;strong&gt;$8,000 and $200,000 or more&lt;/strong&gt;. Industry surveys that aggregate thousands of real projects put the average custom app somewhere around $120,000 to $170,000, but that average is misleading because it lumps tiny MVPs in with enterprise platforms.&lt;/p&gt;

&lt;p&gt;The number you care about depends almost entirely on three things: how complex the app is, how many platforms it runs on, and who you hire to build it. Everything else is noise.&lt;/p&gt;

&lt;p&gt;Let me break the ranges down by what you are actually building.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simple app or MVP: $8,000 to $35,000
&lt;/h3&gt;

&lt;p&gt;This is the bucket most first-time founders belong in, even when they think they need more. A simple app has a handful of screens, user accounts, basic data storage, and one or two core features that do the actual job.&lt;/p&gt;

&lt;p&gt;Think a booking tool for a local business, a habit tracker, an internal dashboard, or a marketplace stripped down to just listings and messages. No fancy real-time anything, no machine learning, no complicated integrations.&lt;/p&gt;

&lt;p&gt;If you are validating an idea, this is where you should start. I have written before about &lt;a href="https://dev.to/blog/stop-validating-ideas-start-validating-pain"&gt;validating pain before you build&lt;/a&gt;, and an MVP in this range is the cheapest way to find out whether anyone actually wants the thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Medium-complexity app: $35,000 to $85,000
&lt;/h3&gt;

&lt;p&gt;This is where most real businesses land. You have user accounts with roles and permissions. You take payments, which means subscriptions, webhooks, and handling the ugly edge cases like failed cards and refunds. You integrate with third-party services. Maybe you have a web app and a mobile app sharing one backend.&lt;/p&gt;

&lt;p&gt;Payments alone are worth calling out. A proper payment integration with subscription logic and webhook handling typically adds $8,000 to $20,000 to a project, and people consistently underestimate it because "just add Stripe" sounds like an afternoon. It is not an afternoon.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complex or multi-platform app: $85,000 to $200,000+
&lt;/h3&gt;

&lt;p&gt;Real-time data, native iOS and native Android, deep integrations, an admin panel, heavy security requirements, AI features that actually work rather than a thin wrapper. This is the tier where you are building something close to a real product company's flagship.&lt;/p&gt;

&lt;p&gt;If your quote lands here, you need to be very sure the complexity is necessary and not aspirational. A lot of founders describe a $150,000 app when their first version should cost $30,000.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Drives App Development Cost Up
&lt;/h2&gt;

&lt;p&gt;The ranges above are wide for a reason. The same idea, described two different ways, can produce quotes that differ by 5x. Here is what moves the needle, in rough order of impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Feature depth, not feature count
&lt;/h3&gt;

&lt;p&gt;People think cost scales with the number of features. It scales with the &lt;em&gt;depth&lt;/em&gt; of each feature. "User profiles" can mean a name and an avatar, or it can mean privacy controls, blocking, verification badges, and a moderation queue. Same two words on your spec sheet, wildly different builds.&lt;/p&gt;

&lt;p&gt;When you describe your app to someone for a quote, the magic question is not "how many features" but "how deep does each one need to go for version one." Most features can be built shallow first and deepened later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Platform count
&lt;/h3&gt;

&lt;p&gt;Every platform you add multiplies cost. Web only is cheapest. Web plus a cross-platform mobile app (using React Native or Flutter, sharing one codebase) is the sweet spot for most people, usually saving 30 to 50 percent versus building separate native apps. True native iOS plus native Android means two separate builds and roughly double the mobile cost.&lt;/p&gt;

&lt;p&gt;The honest advice: almost nobody building their first version needs native iOS and native Android. Cross-platform gets you to market faster and cheaper, and you can always go native later if a specific performance need demands it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who you hire
&lt;/h3&gt;

&lt;p&gt;This is the variable nobody wants to talk about plainly, so I will. Rates in 2026 look roughly like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Offshore teams (Asia, parts of Africa):&lt;/strong&gt; $20 to $45 per hour&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eastern Europe:&lt;/strong&gt; $35 to $85 per hour, often the best quality-to-cost ratio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Freelancers (US and Western Europe):&lt;/strong&gt; $60 to $150 per hour&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small agencies:&lt;/strong&gt; $90 to $160 per hour&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mid-market and enterprise firms:&lt;/strong&gt; $150 to $400+ per hour&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A startup MVP that costs $20,000 to $40,000 with an offshore team can cost $70,000 to $140,000 with a US agency. Same app. The difference is overhead, brand, and process, not necessarily the code quality.&lt;/p&gt;

&lt;p&gt;But cheaper is not automatically better. Offshore management overhead, time-zone friction, and rework from miscommunication can quietly add 15 to 25 percent and erase the savings. I dig into how to evaluate this tradeoff in the guide on &lt;a href="https://dev.to/blog/hire-freelance-app-developer-2026"&gt;hiring a freelance app developer without getting burned&lt;/a&gt;, because picking the right person matters more than picking the cheapest one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design
&lt;/h3&gt;

&lt;p&gt;A template-driven, clean-but-standard design is cheap. A custom design system with animations, illustrations, and a distinctive brand identity is not. For a first version, lean toward clean and standard. Polish is a great problem to have once people are using the thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ongoing maintenance
&lt;/h3&gt;

&lt;p&gt;This one ambushes people. An app is not a one-time purchase. Budget 15 to 25 percent of the build cost per year for maintenance: OS updates, dependency upgrades, bug fixes, server costs, and the small improvements that keep users happy. A $50,000 app costs roughly $7,500 to $12,500 a year to keep alive and healthy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI Has Not Made Apps Free (Even Though It Feels Like It Should)
&lt;/h2&gt;

&lt;p&gt;You have probably heard that AI coding tools mean apps are basically free now. A founder told me last month that he expected his quote to be "like 80 percent cheaper because of AI."&lt;/p&gt;

&lt;p&gt;He was disappointed, and I get why, but the math does not work that way.&lt;/p&gt;

&lt;p&gt;AI has genuinely changed development speed. A competent developer using &lt;a href="https://dev.to/blog/vibe-coding-revolution-2026"&gt;agentic coding tools&lt;/a&gt; can ship in one to two weeks what used to take two to three months. That is real, and it does lower costs, especially on the construction-heavy parts of a build.&lt;/p&gt;

&lt;p&gt;But there is a ceiling. AI accelerates writing code. It does not accelerate the parts of an app that consume most of the budget: figuring out exactly what to build, getting the edge cases right, integrating with payment and auth systems that punish mistakes, testing on real devices, handling security, and the dozens of small judgment calls that separate a demo from a product people pay for.&lt;/p&gt;

&lt;p&gt;I have written about &lt;a href="https://dev.to/blog/vibe-ceiling-ai-code-decision-framework-2026"&gt;the limits of letting AI drive&lt;/a&gt;, and app pricing is a perfect example. The first 70 percent of an app is faster than ever. The last 30 percent, the part that makes it shippable and safe, is exactly as hard as it always was. That last 30 percent is where the money goes.&lt;/p&gt;

&lt;p&gt;So yes, AI has pushed prices down somewhat, particularly for simple apps and MVPs. A clean MVP that cost $40,000 in 2023 might cost $25,000 today. But it has not pushed them to zero, and anyone telling you they will build your full app for a few hundred dollars because "the AI does it" is either building you a toy or about to disappear with your deposit.&lt;/p&gt;




&lt;h2&gt;
  
  
  MVP vs Full App: Why Starting Small Saves You the Most Money
&lt;/h2&gt;

&lt;p&gt;If there is one decision that saves more money than any other, it is starting with an MVP instead of a full app.&lt;/p&gt;

&lt;p&gt;A full app typically costs two to four times more than an MVP of the same idea. More importantly, a huge chunk of the features in a "full app" plan turn out to be wrong. You build them, ship them, and discover users do not care. That is the most expensive kind of money: spent perfectly, on the wrong thing.&lt;/p&gt;

&lt;p&gt;An MVP strips your idea down to the one or two things that prove whether it works. You ship it, you watch real people use it, and then you spend the next chunk of budget on what they actually asked for instead of what you guessed in a planning doc.&lt;/p&gt;

&lt;p&gt;This is the same logic behind the &lt;a href="https://dev.to/blog/micro-saas-playbook-developer-guide-2026"&gt;micro SaaS playbook&lt;/a&gt;: build small, ship fast, let reality tell you where to invest next. The founders who insist on building everything before launch are usually the ones who run out of money before they learn anything.&lt;/p&gt;

&lt;p&gt;My rule of thumb for first-time founders: take whatever you think your app needs, cut it in half, and that is your real MVP. You will fight me on this, and you will be wrong, and that is fine. Almost everyone is.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Get an Accurate App Development Quote
&lt;/h2&gt;

&lt;p&gt;Now the practical part. How do you get a number you can actually trust instead of a guess?&lt;/p&gt;

&lt;h3&gt;
  
  
  Expect a ballpark first, a real quote second
&lt;/h3&gt;

&lt;p&gt;Any developer who gives you an exact price and timeline in the first five minutes is either guessing or padding heavily to protect themselves. A real quote comes after a discovery conversation where they understand your features, your edge cases, your platforms, and your priorities.&lt;/p&gt;

&lt;p&gt;What you should expect is a rough ballpark on the first call ("this sounds like a $30,000 to $50,000 project"), followed by an itemized quote after a short discovery and planning phase. If someone refuses to give you any range until you pay them, that is worth questioning. If someone gives you an exact number before understanding the work, that is worth questioning too. The trustworthy answer lives in between.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write down what you actually need
&lt;/h3&gt;

&lt;p&gt;The single best thing you can do to get an accurate quote is describe your app clearly. Not "it is like Uber but for X." Actually list the screens, the user types, what each user can do, and which features are must-have for launch versus nice-to-have later.&lt;/p&gt;

&lt;p&gt;I cannot overstate how much this helps. Half the price spread in quotes comes from developers guessing at ambiguity and protecting themselves with padding. The clearer your spec, the tighter and more honest the number.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ask what is and is not included
&lt;/h3&gt;

&lt;p&gt;A quote of $40,000 means nothing until you know what it covers. Does it include design? Testing on real devices? App store submission? A few weeks of post-launch bug fixes? Source code ownership? Get these in writing. The gap between a "cheap" quote and an "expensive" one is often just the expensive one being honest about what the project actually requires.&lt;/p&gt;

&lt;h3&gt;
  
  
  Beware the quote that is too good
&lt;/h3&gt;

&lt;p&gt;If three developers quote $40,000 and one quotes $9,000, the cheap one is not a deal. They have either misunderstood the scope, are about to cut corners you will pay for later, or plan to win you with a low number and pile on change requests. I have cleaned up more than one app that was "built cheap" and had to be substantially rebuilt. The second build always costs more than doing it right the first time.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Realistic Budget Framework
&lt;/h2&gt;

&lt;p&gt;Here is how I would think about budget if I were starting from scratch in 2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Under $10,000:&lt;/strong&gt; You are in no-code, low-code, or very simple MVP territory. A skilled freelancer can build something real here if the scope is genuinely tiny. Be ruthless about cutting features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;$10,000 to $40,000:&lt;/strong&gt; The sweet spot for a real MVP built by a freelancer or a small team. This is where most first versions should live. You get a focused product that proves your idea without betting the house.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;$40,000 to $100,000:&lt;/strong&gt; A polished product with payments, multiple user types, web and mobile, and the reliability to charge money confidently. This is where you go after the MVP has shown signs of life.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;$100,000+:&lt;/strong&gt; Reserved for genuinely complex products, or for companies that have validated demand and need to scale. If you are spending this on version one, slow down and ask whether you have proven the idea yet.&lt;/p&gt;

&lt;p&gt;The biggest budgeting mistake is not spending too little or too much. It is spending the full amount up front, before you know whether the idea works. Spend a slice, learn, then spend the next slice on what you learned. That single discipline saves more money than any rate negotiation ever will.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Money Actually Goes
&lt;/h2&gt;

&lt;p&gt;People imagine they are paying for code. They are mostly paying for judgment.&lt;/p&gt;

&lt;p&gt;The code is the cheap part now. What you are really buying is someone who has built this before, who knows that the payment webhook will fail in production at 2am, who knows which "simple" features are quietly enormous, and who will tell you to cut the thing you are emotionally attached to because it is not worth $15,000 in version one.&lt;/p&gt;

&lt;p&gt;That judgment is the difference between an app that ships and works and one that drains your budget into a half-finished prototype. It is also why the cheapest quote is so rarely the cheapest outcome.&lt;/p&gt;

&lt;p&gt;If you are weighing the freelancer-versus-agency-versus-offshore decision, or trying to figure out who to actually trust with your money, the &lt;a href="https://dev.to/blog/hire-freelance-app-developer-2026"&gt;guide to hiring a freelance app developer&lt;/a&gt; walks through the exact questions to ask and the red flags to watch for. And if you are a developer on the other side of this conversation trying to price your own work, the &lt;a href="https://dev.to/blog/freelancing-as-developer-guide-2026"&gt;developer freelancing playbook&lt;/a&gt; covers how to quote without underselling yourself.&lt;/p&gt;

&lt;p&gt;If you have an app idea and you want a straight answer about what it would actually cost, &lt;a href="https://dev.to/contact/"&gt;tell me what you are building&lt;/a&gt;. I will give you an honest ballpark on a call, no padding and no sales pressure, and if your idea should cost less than you think, I will tell you that too. That conversation is free, and it is the fastest way to replace a guess with a real number.&lt;/p&gt;

&lt;p&gt;Build small, spend in slices, and never trust the price someone gives you before they understand the work. Do those three things and you will spend a fraction of what most first-time founders waste.&lt;/p&gt;

</description>
      <category>freelancing</category>
      <category>startup</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How to Hire a Freelance App Developer Without Getting Burned (2026 Guide)</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Tue, 02 Jun 2026 07:26:39 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/how-to-hire-a-freelance-app-developer-without-getting-burned-2026-guide-a77</link>
      <guid>https://dev.to/alexcloudstar/how-to-hire-a-freelance-app-developer-without-getting-burned-2026-guide-a77</guid>
      <description>&lt;p&gt;A founder emailed me last year asking if I could "finish" his app. He had paid a freelancer $18,000 over four months. What he had to show for it was a login screen, a half-built dashboard that crashed on real data, and a codebase that took me two days just to understand because nothing was documented and the previous developer had vanished.&lt;/p&gt;

&lt;p&gt;The painful part was not the money. It was that every warning sign had been there from day one, and he had no way of knowing what to look for. The developer quoted an exact price on the first call. There was no contract about who owned the code. There were no references. There was no agreement about what happened when the scope changed, and it always changes.&lt;/p&gt;

&lt;p&gt;He did not get unlucky. He got burned because nobody had ever told him which questions actually matter when you hire someone to build your app.&lt;/p&gt;

&lt;p&gt;So this is that guide. If you are about to hand real money to a freelance app developer, read this first. It will take you fifteen minutes and it might save you fifteen thousand dollars.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before You Hire Anyone: Get Clear on What You Need
&lt;/h2&gt;

&lt;p&gt;The worst hires start with a vague brief. "It is like Airbnb but for X" is not a brief. It is a vibe. And a vibe is impossible to quote accurately, which means the developer either pads heavily or lowballs and resents you later. Either way you lose.&lt;/p&gt;

&lt;p&gt;Before you talk to a single developer, write down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The core problem your app solves, in one sentence&lt;/li&gt;
&lt;li&gt;The types of users (just customers, or customers plus admins plus vendors?)&lt;/li&gt;
&lt;li&gt;What each user type needs to do&lt;/li&gt;
&lt;li&gt;The handful of features that must exist for launch&lt;/li&gt;
&lt;li&gt;The features that can wait until version two&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You do not need a technical spec. You need clarity. The clearer you are, the more honest the quotes you get back, and the easier it is to tell a thoughtful developer from one who is guessing. If you are not sure how much your project should cost in the first place, start with the breakdown of &lt;a href="https://dev.to/blog/how-much-does-it-cost-to-build-an-app-2026"&gt;what it actually costs to build an app in 2026&lt;/a&gt;, then come back here to find the right person to build it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Questions to Ask Before Hiring an App Developer
&lt;/h2&gt;

&lt;p&gt;This is the heart of it. The right questions surface the right person, and they scare off the wrong one. Here are the ones that matter, and what a good answer sounds like.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Can I see apps you have actually shipped?"
&lt;/h3&gt;

&lt;p&gt;Not screenshots. Not a portfolio of mockups. Shipped, working apps you can open and use. If the apps are live, download them, click around, push on the edges. Do they feel solid? Do they crash? Does the login work? A developer who has shipped real things will be eager to show you. One who deflects with "most of my work is under NDA" for every single project is worth a second look.&lt;/p&gt;

&lt;h3&gt;
  
  
  "What is your development process?"
&lt;/h3&gt;

&lt;p&gt;If the answer is some version of "I just start building," walk away. You want someone who works like a professional: who uses a project tracker, who works in milestones, who can tell you what "done" means for each phase. The specific tools matter less than the fact that a system exists. "I break the project into two-week milestones, demo at the end of each, and you approve before we move on" is the answer you want to hear.&lt;/p&gt;

&lt;h3&gt;
  
  
  "How do you handle scope changes?"
&lt;/h3&gt;

&lt;p&gt;Scope always changes. You will think of new features, change your mind, and discover that something you assumed was simple is not. The question is whether your developer has a calm, fair process for it. A good answer sounds like "small changes I absorb, anything significant I scope and quote separately before doing it, so you are never surprised by an invoice." A bad answer is silence, or "we will figure it out," which is how $18,000 becomes a login screen.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Who owns the code and the intellectual property?"
&lt;/h3&gt;

&lt;p&gt;You. The answer must be you. Get it in writing, in the contract, before any work starts. I have watched founders discover after the fact that their "developer" considered the codebase their own property, or hosted everything on accounts the founder could not access. Your app, your code, your accounts, your domains. This is non-negotiable, and any developer who hesitates here is telling you something important.&lt;/p&gt;

&lt;h3&gt;
  
  
  "What happens after launch when we find bugs?"
&lt;/h3&gt;

&lt;p&gt;Every app has bugs that surface after launch. Ask whether fixing them is included for a window after delivery, or billed separately, and get the terms in writing. The good ones offer a defined warranty period and clear post-launch support, because they see you as a relationship, not a transaction. This question also quietly reveals whether they plan to still be around in three months.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Can you connect me with a past client?"
&lt;/h3&gt;

&lt;p&gt;A confident developer will say yes within seconds. A reference call tells you things a portfolio never will: did they communicate well, did they hit deadlines, did they disappear when things got hard, would the client hire them again? Two minutes on the phone with a past client is worth more than an hour reading testimonials someone wrote themselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  "How do you communicate, and how often?"
&lt;/h3&gt;

&lt;p&gt;Most failed projects do not fail on code. They fail on communication. Ask how often you will hear from them, through what channel, and what a typical week looks like. Weekly demos and a shared channel where you can ask questions beat a developer who goes dark for three weeks and resurfaces with something you did not ask for.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Red Flags That Mean Walk Away
&lt;/h2&gt;

&lt;p&gt;Some signals are bad enough on their own that I would not proceed no matter how good the price is. Here are the ones I have learned to take seriously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An exact quote in the first five minutes.&lt;/strong&gt; A real quote requires understanding your features, edge cases, and platforms. An instant precise number means they are guessing, and guesses turn into change requests, missed scope, or padded invoices. Expect a ballpark on the first call and a real quote after a short discovery conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I wing it" as a process answer.&lt;/strong&gt; No milestones, no tracker, no plan. This is the single most reliable predictor of a project that drifts for months and dies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No willingness to put IP, scope, or payment terms in writing.&lt;/strong&gt; If they resist a simple contract, they are either inexperienced or planning to leave themselves an escape hatch at your expense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A price that is dramatically lower than everyone else.&lt;/strong&gt; If three developers say $40,000 and one says $9,000, the cheap one has misunderstood the work or plans to make it back through change orders and corners cut. I wrote about &lt;a href="https://dev.to/blog/how-much-does-it-cost-to-build-an-app-2026"&gt;why the cheapest quote is rarely the cheapest outcome&lt;/a&gt;, and hiring is exactly where that plays out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Poor communication during the sales conversation.&lt;/strong&gt; This is the best free preview you will ever get. If they are slow, vague, or hard to reach while they are trying to win your business, it does not improve after you have paid them. It gets worse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No references and no shipped work.&lt;/strong&gt; Everyone starts somewhere, and a junior developer at a junior price can be a fine bet. But you should know that is what you are signing up for, and price and risk accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Freelancer vs Agency vs Offshore: Which One Should You Hire?
&lt;/h2&gt;

&lt;p&gt;This is the decision most founders agonize over, so let me make it simple. Each option has a clear sweet spot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Freelancer
&lt;/h3&gt;

&lt;p&gt;A single experienced freelancer is the best fit for MVPs and small-to-medium projects, usually under $40,000. You get direct communication with the person actually building your app, lower rates than an agency, and speed because there are no layers between you and the work.&lt;/p&gt;

&lt;p&gt;The risk is the bus factor: it is one person. If they get sick, overcommitted, or disappear, your project stalls. You mitigate this by hiring someone with a track record, clear communication habits, and code that lives in your accounts, not theirs. US and Western European freelancers run $60 to $150 an hour. A strong freelancer is the highest value-per-dollar option for most first apps. The flip side of this coin is worth understanding too, which is why the &lt;a href="https://dev.to/blog/freelancing-as-developer-guide-2026"&gt;developer freelancing playbook&lt;/a&gt; is useful reading even as a client, because it shows you how the good ones actually operate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agency
&lt;/h3&gt;

&lt;p&gt;An agency makes sense above roughly $40,000, or when your project is complex enough to need multiple specialists: a designer, a backend developer, a mobile developer, a project manager. You pay more, often $90 to $160 an hour, and you get process, redundancy, and a team that does not vanish if one person leaves.&lt;/p&gt;

&lt;p&gt;The downside is overhead and distance. You rarely talk to the people writing the code, decisions move slower, and you are partly paying for the agency's brand and office. For a focused MVP, that overhead is often money you did not need to spend. The &lt;a href="https://dev.to/blog/ai-powered-agency-developer-playbook-2026"&gt;AI-powered agency playbook&lt;/a&gt; explains how modern lean agencies operate, which helps you tell a sharp small studio from a bloated one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Offshore team
&lt;/h3&gt;

&lt;p&gt;Offshore teams, at $20 to $45 an hour, are the cheapest on paper, and for the right project with the right management they genuinely work. The trap is the hidden cost. Time-zone friction, communication overhead, and rework from misaligned expectations can quietly add 15 to 25 percent and erase the savings, sometimes pushing the effective rate back up toward what a local freelancer would have charged.&lt;/p&gt;

&lt;p&gt;Offshore works best when your spec is rock solid, you have someone managing the relationship daily, and the work is well-defined rather than exploratory. It works worst for fuzzy early-stage products that need a lot of back-and-forth, because every round trip costs you a day.&lt;/p&gt;

&lt;p&gt;The rule of thumb: under $20,000, a freelancer is usually your only realistic professional option. Above $20,000, an agency becomes worth considering. Offshore is a budget lever you pull when your scope is clear and your management is hands-on.&lt;/p&gt;




&lt;h2&gt;
  
  
  Protect Yourself: The Contract and the Trial
&lt;/h2&gt;

&lt;p&gt;Two things protect you more than anything else, and most first-time founders skip both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start with a small paid trial
&lt;/h3&gt;

&lt;p&gt;Before you commit to a $40,000 build, pay for a small first piece. A week of work, a single feature, a clickable prototype. Watch how they communicate, how they handle feedback, whether they hit the small deadline. A few hundred or a few thousand dollars spent finding out who someone really is, before you bet the whole budget, is the best money you will spend on the entire project.&lt;/p&gt;

&lt;p&gt;This is also fair to the developer. Good ones welcome a trial because it lets them prove themselves and gives them a clean way to part if you are not a fit either. The ones who resist any trial and demand a big deposit up front are the ones to be careful with.&lt;/p&gt;

&lt;h3&gt;
  
  
  Get the essentials in writing
&lt;/h3&gt;

&lt;p&gt;Your contract does not need to be twenty pages of legalese. It needs to cover the things that cause disputes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope:&lt;/strong&gt; what is being built, and what is explicitly not&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payment schedule:&lt;/strong&gt; tied to milestones, never one big payment up front&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IP ownership:&lt;/strong&gt; the code, designs, and accounts are yours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scope changes:&lt;/strong&gt; how new work gets quoted and approved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post-launch:&lt;/strong&gt; what is covered for how long after delivery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tie payments to delivered milestones, not to time or to a calendar. You pay when something works, not when someone says they have been busy. And avoid paying more than a small deposit before any work is shown. The structure that protects you also keeps the project honest, and it is closely tied to how good developers invoice, which I covered in the &lt;a href="https://dev.to/blog/5-common-mistakes-freelancers-make-with-invoicing-and-how-to-fix-them"&gt;common invoicing mistakes guide&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Tell a Great Developer From a Good One
&lt;/h2&gt;

&lt;p&gt;Once you have filtered out the red flags, you are usually left with a few competent options. Here is how I would pick between them.&lt;/p&gt;

&lt;p&gt;The great ones ask you questions back. When you describe your app, a great developer pushes on it: "Do you really need that for launch? What happens if a payment fails here? Who is the actual first user?" They are trying to build you the right thing, not just take your order. The merely good ones nod and write down whatever you say.&lt;/p&gt;

&lt;p&gt;The great ones tell you when to spend less. If someone talks you out of a feature, suggests an MVP instead of the full build, or tells you your idea should cost half what you expected, hold onto them. That honesty is rare and it is worth more than a slightly lower hourly rate. It is also the same instinct behind &lt;a href="https://dev.to/blog/stop-validating-ideas-start-validating-pain"&gt;validating pain before building&lt;/a&gt;: the goal is a product that works, not an invoice that is large.&lt;/p&gt;

&lt;p&gt;The great ones make you feel calm. Not because they oversell, but because they are clear, they have a plan, and they do what they say in the small interactions before you have even hired them. Trust that signal. The way someone handles the first email is usually the way they handle the whole project.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Short Version
&lt;/h2&gt;

&lt;p&gt;Hire on proof, not promises. Ask to see shipped apps, ask how they handle scope and bugs, and get IP ownership in writing before any money moves. Walk away from instant exact quotes, "I wing it" processes, and prices that are too good to be true. Match the hire to your budget and your project, and always start with a small paid trial before the full build.&lt;/p&gt;

&lt;p&gt;Do that and you will avoid almost every way this goes wrong. Skip it and you will eventually email someone like me asking if their app can be saved.&lt;/p&gt;

&lt;p&gt;If you would rather skip the hunt entirely and talk to someone who builds apps the way this guide describes, &lt;a href="https://dev.to/contact/"&gt;tell me what you are building&lt;/a&gt;. I will give you an honest read on your idea, a real ballpark on a call, and a straight answer about whether I am the right fit, even when the answer is no. The first conversation is free, and worst case you walk away with a much clearer sense of what to look for in whoever you hire.&lt;/p&gt;

</description>
      <category>freelancing</category>
      <category>startup</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Claude Opus 4.8 Is Here: Benchmarks, Dynamic Workflows, and Whether to Upgrade From 4.7</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Fri, 29 May 2026 11:36:47 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/claude-opus-48-is-here-benchmarks-dynamic-workflows-and-whether-to-upgrade-from-47-4eee</link>
      <guid>https://dev.to/alexcloudstar/claude-opus-48-is-here-benchmarks-dynamic-workflows-and-whether-to-upgrade-from-47-4eee</guid>
      <description>&lt;p&gt;Anthropic dropped Claude Opus 4.8 yesterday, May 28. Same playbook as the last few releases. No waitlist, no staged rollout. It showed up in Claude Code, the API, and the major cloud providers on the same day, with the model ID &lt;code&gt;claude-opus-4-8&lt;/code&gt; ready to drop into existing config.&lt;/p&gt;

&lt;p&gt;I have been running &lt;a href="https://dev.to/blog/claude-opus-4-7-review-benchmarks-developer-guide-2026"&gt;Opus 4.7 as my default coding model&lt;/a&gt; since it launched in April. It handled my agentic coding sessions, my content pipeline, and most of my production debugging. So the first thing I did with 4.8 was throw the exact same hard tasks at it that I used to stress-test 4.7, then dig into the official announcement to separate the real changes from the launch-day polish.&lt;/p&gt;

&lt;p&gt;Here is what I found after a day of hands-on use.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Headline: It Stopped Lying to Me About My Code
&lt;/h2&gt;

&lt;p&gt;The benchmark Anthropic led with is not a coding score or a reasoning score. It is honesty. Opus 4.8 is roughly 4x less likely than 4.7 to let a code flaw pass unremarked.&lt;/p&gt;

&lt;p&gt;That number sounds abstract until you have lived the failure mode it describes. You ask a model to review a function. It tells you the function looks good. You ship it. It breaks. The model did not miss the bug because it was incapable of seeing it. It missed it because the path of least resistance in a review is to agree with you and move on.&lt;/p&gt;

&lt;p&gt;Opus 4.8 does this far less. In my testing yesterday, I deliberately fed it three functions I knew had subtle problems. An off-by-one in a pagination helper, a race condition in a debounced save, and a silent error swallow in a fetch wrapper. 4.7 caught the off-by-one and missed the other two on the first pass. 4.8 flagged all three, and on the error swallow it specifically called out that the empty catch block would hide failures in production, which is exactly the kind of thing my &lt;a href="https://dev.to/blog/claude-skills-vs-cursor-rules-vs-copilot-instructions-2026"&gt;global rules&lt;/a&gt; tell it to watch for.&lt;/p&gt;

&lt;p&gt;This is the change that matters most for daily work, and it is the hardest to capture in a single number. A model that reliably tells you when something is wrong is worth more than a model that is marginally smarter but agreeable. The whole point of &lt;a href="https://dev.to/blog/ai-code-review-bottleneck-2026"&gt;AI code review&lt;/a&gt; is catching what you missed. A model that rubber-stamps your mistakes is just a more expensive way to feel confident about broken code.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Benchmarks Actually Say
&lt;/h2&gt;

&lt;p&gt;Anthropic published the usual comparison chart showing Opus 4.8 ahead of 4.7 across coding, agentic skills, reasoning, and practical knowledge work. The improvements are real but mostly incremental on the pure-coding side. The bigger jumps are in agentic and tool-use territory.&lt;/p&gt;

&lt;p&gt;Here are the numbers worth knowing.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;th&gt;Opus 4.8 result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Online-Mind2Web&lt;/td&gt;
&lt;td&gt;Computer use, real web tasks&lt;/td&gt;
&lt;td&gt;84% (ahead of 4.7 and GPT-5.5)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal Agent Benchmark&lt;/td&gt;
&lt;td&gt;All-pass legal reasoning&lt;/td&gt;
&lt;td&gt;First model over 10% on the all-pass standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code flaw detection&lt;/td&gt;
&lt;td&gt;Catching bugs in review&lt;/td&gt;
&lt;td&gt;~4x fewer missed flaws vs 4.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool calling&lt;/td&gt;
&lt;td&gt;Steps to complete a task&lt;/td&gt;
&lt;td&gt;Fewer steps for equivalent intelligence&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Online-Mind2Web score is the one I would not have predicted. Computer use, the ability to drive a real browser and complete multi-step web tasks, has been the weakest part of every frontier model I have used. 84% is the first time the number has been high enough that I would actually trust it for low-stakes automation. It is still not something I would point at my bank, but for filling forms, navigating dashboards, and pulling data out of web apps that lack an API, it crossed the line from demo to useful.&lt;/p&gt;

&lt;p&gt;The Legal Agent Benchmark result is a niche flex, but it signals something broader. Breaking 10% on an all-pass standard, where the model has to get every sub-task in a legal workflow correct or the whole thing fails, means the error rate on long multi-step chains dropped enough to matter. That same reliability shows up in coding agents that have to chain twenty tool calls without going off the rails halfway through.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dynamic Workflows: The Feature I Did Not Know I Needed
&lt;/h2&gt;

&lt;p&gt;The flashiest addition is Dynamic Workflows, shipping as a research preview in Claude Code. The pitch is that Claude can now spin up hundreds of parallel subagents and coordinate them on a single task. The headline use case is codebase-scale migrations, the kind that touch hundreds of thousands of lines.&lt;/p&gt;

&lt;p&gt;I was skeptical. Parallel subagents have been a thing for a while, and in practice they tend to step on each other, duplicate work, or produce inconsistent results that take longer to reconcile than doing the work serially would have. So I tried it on a real job: migrating a mid-sized project from one date library to another, across about 60 files with inconsistent usage patterns.&lt;/p&gt;

&lt;p&gt;The old way, even with &lt;a href="https://dev.to/blog/agentic-coding-2026"&gt;agentic coding&lt;/a&gt;, was a slog. One agent, one file at a time, me babysitting context and re-explaining the pattern every few files as the conversation drifted.&lt;/p&gt;

&lt;p&gt;Dynamic Workflows handled it differently. It scanned the codebase, grouped the files by usage pattern, fanned out a batch of subagents to transform each group in isolation, and then ran a verification pass to reconcile the edits. The whole thing finished in one sitting. Not every file was perfect. I caught two cases where it picked the wrong replacement function. But the wall-clock time was a fraction of the serial approach, and the consistency across files was better than I get when I do migrations by hand and forget my own convention by file 40.&lt;/p&gt;

&lt;p&gt;The honest read is that this is genuinely new leverage for a specific kind of work. Large mechanical migrations, sweeping refactors, repo-wide audits. It is not magic for creative architecture decisions, and you still have to review everything it touches. But for the work that used to eat a full day of tedious repetition, it is the first tool that made me feel like the agent was actually operating at the scale of the codebase rather than the scale of a single file.&lt;/p&gt;

&lt;p&gt;If you have wrestled with &lt;a href="https://dev.to/blog/ai-agent-reliability-engineering-2026"&gt;agent reliability at scale&lt;/a&gt;, the interesting part is how the verification pass cleans up after the fan-out. The subagents are allowed to be imperfect because a final reconciliation step catches the divergence. That is a better architecture than hoping every parallel agent gets it right independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Effort Control Comes to the Consumer Apps
&lt;/h2&gt;

&lt;p&gt;Opus 4.7 introduced the &lt;code&gt;xhigh&lt;/code&gt; effort level for developers. Opus 4.8 takes the idea and exposes it directly to users on claude.ai and Cowork through a setting called Effort Control. You pick how much compute Claude applies to a request. Higher effort means deeper thinking, more tokens spent, slower but more thorough answers.&lt;/p&gt;

&lt;p&gt;By default, 4.8 runs at high effort. Anthropic tuned the default so it spends roughly the same number of tokens as 4.7's default while delivering better results, which is the kind of efficiency win that does not show up in a headline benchmark but shows up on your bill.&lt;/p&gt;

&lt;p&gt;In practice, I leave it on high for almost everything and bump it up only for genuinely hard problems. A gnarly debugging session where the bug spans three systems, an architecture decision with real tradeoffs, a piece of analysis where I want the model to actually sit with the problem. For quick edits and lookups, high is already more than enough, and dropping the effort makes the response snappier without a quality hit I can notice.&lt;/p&gt;

&lt;p&gt;The thing I appreciate is that this makes the cost and latency tradeoff explicit instead of hidden. You are no longer guessing whether the model is thinking hard. You are deciding.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pricing: Nothing Changed, and That Is the Story
&lt;/h2&gt;

&lt;p&gt;Opus 4.8 costs the same as 4.7. Five dollars per million input tokens, twenty-five per million output. The model got better and the price stayed flat.&lt;/p&gt;

&lt;p&gt;That is worth pausing on. We have gotten so used to capability going up while price holds or drops that it barely registers as news anymore. But it is the entire reason the economics of &lt;a href="https://dev.to/blog/pricing-ai-features-2026"&gt;building AI features&lt;/a&gt; keep improving. Every release that holds price while raising the capability floor means the same product gets cheaper to run in real terms, because you can do more with fewer tokens or accomplish a task that previously needed a more expensive workaround.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Input (per 1M)&lt;/th&gt;
&lt;th&gt;Output (per 1M)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;td&gt;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast mode&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;td&gt;$50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Fast mode pricing is the genuinely new line. At $10 input and $50 output it runs at about 2.5x the speed of standard, and Anthropic says it is roughly 3x cheaper than the previous fast mode. For latency-sensitive paths, where you previously had to drop down to a smaller model and accept the quality hit, you can now keep Opus-class quality and just pay a premium for speed. That changes the calculus for anything user-facing where response time affects conversion.&lt;/p&gt;

&lt;p&gt;If you are still mapping out your spend across plans and API usage, my &lt;a href="https://dev.to/blog/claude-june-2026-pricing-survival-guide"&gt;Claude pricing survival guide&lt;/a&gt; walks through how to think about the tradeoffs, and the fast-mode change tilts a few of those decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Quiet API Change With Real Consequences
&lt;/h2&gt;

&lt;p&gt;Buried in the announcement is a Messages API change that most people will skim past. The API now accepts system entries mid-conversation without breaking prompt caching.&lt;/p&gt;

&lt;p&gt;If you have built anything serious on the Claude API, you know why this matters. Prompt caching is how you keep costs sane on long conversations and agent loops. The moment you inject a new system instruction partway through a conversation, the old behavior was to invalidate the cache from that point forward, which meant you ate the full cost of reprocessing the prefix.&lt;/p&gt;

&lt;p&gt;Being able to slot in a system entry mid-conversation without busting the cache means you can steer the model dynamically, injecting fresh instructions or context as a task evolves, without paying the caching penalty every time. For agent architectures that adjust their own instructions based on what they discover, this removes a real cost cliff. It is the kind of plumbing change that does not get a benchmark but quietly makes a whole class of designs cheaper to run.&lt;/p&gt;

&lt;p&gt;This pairs well with the broader push toward &lt;a href="https://dev.to/blog/context-engineering-ai-coding-2026"&gt;context engineering&lt;/a&gt; as the discipline that actually separates good agent performance from bad. The cheaper it is to manage context dynamically, the more aggressively you can do it.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Stacks Up Against the Competition
&lt;/h2&gt;

&lt;p&gt;The frontier model race has not fundamentally reshuffled. When I last did a full &lt;a href="https://dev.to/blog/claude-opus-vs-gpt5-vs-gemini-2026"&gt;Claude vs GPT vs Gemini breakdown&lt;/a&gt;, the picture was that the models converge on baseline capability and diverge on specific strengths. Opus 4.8 widens Claude's lead in the places it was already strong rather than opening a new front.&lt;/p&gt;

&lt;p&gt;On coding, Claude was already my default, and 4.8 reinforces that rather than dramatically extending it. The pure code-generation improvement over 4.7 is modest. The real gap-widener is the reliability and self-correction, the 4x fewer missed flaws, which competitors have not matched in my testing.&lt;/p&gt;

&lt;p&gt;On computer use, the 84% on Online-Mind2Web puts Opus 4.8 ahead of GPT-5.5 on that specific benchmark, which is notable because browser automation has been an area where the gaps between frontier models were small and noisy. A clear lead there is new.&lt;/p&gt;

&lt;p&gt;On reasoning and multimodal breadth, the competitive story has not changed much. If raw reasoning scores or native audio and video are your priority, the calculus from a few months ago still holds. Opus 4.8 did not show up to win those categories.&lt;/p&gt;

&lt;p&gt;The summary I would give a teammate: if you do agentic work, coding, or any task where the model has to chain many steps and catch its own mistakes, Opus 4.8 extended an existing lead. If your work lives in the categories where Claude was already the second choice, this release is not the one that changes your mind.&lt;/p&gt;




&lt;h2&gt;
  
  
  Should You Upgrade From 4.7?
&lt;/h2&gt;

&lt;p&gt;Here is how I would think about it depending on where you sit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you use Claude Code on a Pro or Max plan:&lt;/strong&gt; You already have access. Switch to 4.8 and run it on your current work. The self-correction and reliability improvements are real and the transition is seamless. Try Dynamic Workflows on the next migration or sweeping refactor you have been putting off, since that is where the new leverage actually shows up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you run Opus 4.7 in production via the API:&lt;/strong&gt; The swap to &lt;code&gt;claude-opus-4-8&lt;/code&gt; is low risk because the pricing and core behavior are stable, but test anyway. The improved instruction-following and the more aggressive flaw detection can change outputs in ways your downstream code might not expect, especially if you parse the model's review comments. If you have an &lt;a href="https://dev.to/blog/ai-evals-solo-developers-2026"&gt;eval suite&lt;/a&gt;, run it before you flip the model ID. This is exactly the kind of update your evals exist to catch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you are on GPT-5.5 or Gemini for primary work:&lt;/strong&gt; The coding, tool-use, and computer-use gaps just widened in Claude's favor. If you have been on the fence about Claude for agentic development, this is the strongest case yet. If reasoning depth or multimodal breadth is your main concern, the competitive picture has not moved enough to force a switch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you are new to AI coding tools:&lt;/strong&gt; Start with Claude Code and Opus 4.8. The combination of strong coding, better long-session reliability, explicit effort control, and a model that actually tells you when your code is wrong makes it the most forgiving entry point for getting serious about AI-assisted development.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Anthropic Is Hinting At Next
&lt;/h2&gt;

&lt;p&gt;The announcement closes with two forward-looking notes that are worth reading carefully.&lt;/p&gt;

&lt;p&gt;First, Anthropic says lower-cost Opus-equivalent models are in development. If that lands, it pulls Opus-class capability down into a price bracket where you could run it on high-volume, cost-sensitive paths that currently force a downgrade to a smaller model. That would be a bigger deal for production economics than anything in this release.&lt;/p&gt;

&lt;p&gt;Second, and more loaded, Mythos-class models are coming to all customers in the coming weeks, pending cybersecurity safeguards. I wrote about &lt;a href="https://dev.to/blog/claude-mythos-anthropic-developer-analysis-2026"&gt;Claude Mythos&lt;/a&gt; when it was a restricted research preview that scored 93.9% on SWE-bench and 100% on Cybench, the model Anthropic decided was too capable to release. The fact that a Mythos-class model is now being lined up for general availability, gated on safety work rather than capability, is the most interesting sentence in the entire announcement.&lt;/p&gt;

&lt;p&gt;It tells you the gap between what these labs can build and what they choose to ship is still the binding constraint, and that the constraint is loosening. Opus 4.8 is an excellent model that Anthropic is comfortable handing to everyone today. Something meaningfully more capable is being prepared for release the moment the safety story is solid enough.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;What strikes me about Opus 4.8 is not any single benchmark. It is the shape of the improvement.&lt;/p&gt;

&lt;p&gt;The last few releases chased raw capability. Higher SWE-bench, sharper vision, longer context. 4.8 spent its gains differently. It got more honest, more reliable across long chains, better at catching its own mistakes, and more efficient with tokens, all while holding the price flat and adding a way to coordinate work at the scale of an entire codebase.&lt;/p&gt;

&lt;p&gt;That is what a maturing tool looks like. Not a model that is dramatically smarter than the one before it, but one that is more trustworthy in the moments that actually cost you time. A model that flags the bug instead of waving it through. A model that finishes the migration instead of drifting halfway through. A model that lets you decide how hard it should think instead of guessing.&lt;/p&gt;

&lt;p&gt;For the work I do every day, that is more valuable than another few points on a coding benchmark. Opus 4.7 was the best model I had used for shipping software. Opus 4.8 is better in the specific places that matter when you are the one responsible for what ships.&lt;/p&gt;

&lt;p&gt;If you are already on 4.7, the upgrade is easy and the wins are real. Switch, throw Dynamic Workflows at the migration you have been dreading, and see how it feels to have a model that argues with you when you are wrong. That is the part that does not show up in the chart, and it is the part you will notice first.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Writing Your Own Claude Code Skill in 2026: The Practical Guide</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Fri, 22 May 2026 06:53:26 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/writing-your-own-claude-code-skill-in-2026-the-practical-guide-2f24</link>
      <guid>https://dev.to/alexcloudstar/writing-your-own-claude-code-skill-in-2026-the-practical-guide-2f24</guid>
      <description>&lt;p&gt;I noticed I was typing the same five paragraphs into the Claude Code prompt every morning. The paragraphs explained how I wanted PR reviews done on my own projects: which directories to weight, which patterns I cared about, what to ignore, what to flag, what to never suggest. Five paragraphs, every morning, sometimes twice.&lt;/p&gt;

&lt;p&gt;The third time I noticed it, I copied the paragraphs into a &lt;code&gt;SKILL.md&lt;/code&gt;, dropped it in my plugin directory, and added a one-sentence trigger description. The next morning I typed "review this branch." Claude pulled the skill in automatically and produced exactly the review I had been asking for by hand. I have not pasted those paragraphs again.&lt;/p&gt;

&lt;p&gt;That was the moment I understood that skills are not for sharing with the world. They are for sharing with future-you. The marketplace is great, but the highest-leverage skills you will ever use are the ones you write for your own projects, your own quirks, your own opinions about how the work should be done. This post is the missing manual for writing them.&lt;/p&gt;

&lt;p&gt;If you have not seen the &lt;a href="https://dev.to/blog/claude-code-plugin-marketplace-skills-2026"&gt;Claude Code plugin marketplace post&lt;/a&gt;, read that first for the broader context. This one is the hands-on follow-up: how to actually write a skill, where they live, what makes them fire, and the patterns that keep tripping up developers who try.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why You Should Write Your Own Skills
&lt;/h2&gt;

&lt;p&gt;There is a category of work that you do over and over again, that has a clear right answer, and that you have to re-explain to the model every time. That is the skill-shaped hole in your workflow. Pattern-match for it.&lt;/p&gt;

&lt;p&gt;Common examples from my own machine.&lt;/p&gt;

&lt;p&gt;A skill that knows how my Astro blog is structured, where posts live, what the frontmatter schema is, and how internal links should look. Saves me a paragraph of context every time I write a post.&lt;/p&gt;

&lt;p&gt;A skill that turns "review the work" into a deep verification pass with a specific checklist, instead of the generic "looks good to me" review the model produces by default.&lt;/p&gt;

&lt;p&gt;A skill that estimates how long a ticket should take, in story points, using the rubric my team has agreed on. This used to be a paragraph I copy-pasted from a Notion doc. Now it is a slash command.&lt;/p&gt;

&lt;p&gt;A skill that knows my testing conventions and refuses to suggest tests that mock the database, because we agreed not to mock the database two quarters ago and the model keeps forgetting.&lt;/p&gt;

&lt;p&gt;None of these are interesting to the world. All of them save me actual time, every week, on specific work I am going to do anyway. That is the skill bar. Useful to you, repeatedly, on real work. Everything else is a hobby.&lt;/p&gt;

&lt;p&gt;The other reason to write skills, separate from the time savings, is that the model's default behavior is a compromise designed to please everyone. Your work is not "everyone." Your work is yours. Every skill you write pushes the model's behavior closer to what you actually want, and away from the average of what everyone wanted in the training data. That distance, accumulated across a dozen skills, is what makes the difference between "the AI is a smart intern" and "the AI is a smart colleague who already knows how we do things here."&lt;/p&gt;




&lt;h2&gt;
  
  
  The Anatomy Of A Skill
&lt;/h2&gt;

&lt;p&gt;A skill is a directory with a &lt;code&gt;SKILL.md&lt;/code&gt; at the root. That is it. Everything else is optional. The directory can also contain supporting files (scripts, references, examples, additional Markdown chunks the skill points to) but only the &lt;code&gt;SKILL.md&lt;/code&gt; is required.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;SKILL.md&lt;/code&gt; has frontmatter and a body. The frontmatter looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;review-blog-post&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Review a draft blog post for voice consistency, em-dash usage, and SEO hygiene before publishing. Use when the user mentions "review the post," "check the draft," or pastes a blog post for feedback.&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two fields. &lt;strong&gt;Name&lt;/strong&gt; is the slug. &lt;strong&gt;Description&lt;/strong&gt; is the trigger. The description is doing roughly ninety percent of the work and most people get it wrong on the first try. I will come back to that.&lt;/p&gt;

&lt;p&gt;The body of the &lt;code&gt;SKILL.md&lt;/code&gt; is the instructions the model reads when the skill fires. It is plain Markdown. You can use headings, bullets, code blocks, examples, anything that helps the model do the work. The body is loaded into context as system instructions for the turn, so treat every word as load-bearing.&lt;/p&gt;

&lt;p&gt;A skill can also live inside a &lt;strong&gt;plugin&lt;/strong&gt;. A plugin is a directory with a &lt;code&gt;plugin.json&lt;/code&gt; and one or more skills inside a &lt;code&gt;skills/&lt;/code&gt; subfolder. Plugins are the unit that the marketplace ships. For personal use you do not need a plugin wrapper at all. Just drop the skill into &lt;code&gt;~/.claude/skills/&lt;/code&gt; and it will fire. If you want to share the skill or use it across multiple projects with version control, wrap it in a plugin and ship it through your own Git-backed marketplace, the way &lt;a href="https://dev.to/blog/claude-code-plugin-marketplace-skills-2026"&gt;the marketplace post&lt;/a&gt; walks through.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Trigger Description Is The Whole Game
&lt;/h2&gt;

&lt;p&gt;If you take one thing from this post, take this. The skill's trigger description is the single most important line you will write, and it is the line developers most consistently underthink.&lt;/p&gt;

&lt;p&gt;The description is what the model reads at the top of every turn to decide whether to load the skill. The model sees a list: skill name, one line of description, skill name, one line of description. Maybe forty entries. It picks the ones that match the current task. Your description is competing with thirty-nine others for the model's attention. If your description is vague or generic, it loses, and your skill never fires.&lt;/p&gt;

&lt;p&gt;What a bad description looks like.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;description: Helps with code reviews.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What a good description looks like.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;description: Review code for the alexcloudstar.com Astro blog. Use when the user runs /review, asks to review a PR or branch, or pastes a diff and asks for feedback. Checks voice consistency, em-dash usage, internal-link health, and MDX frontmatter schema.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The good one names the &lt;strong&gt;triggers&lt;/strong&gt; (&lt;code&gt;/review&lt;/code&gt;, "review a PR or branch," pasting a diff), describes the &lt;strong&gt;scope&lt;/strong&gt; (this blog specifically), and lists the &lt;strong&gt;checks&lt;/strong&gt;. The model now has a clear handle on when this skill applies and when it does not. The bad one matches "code review" abstractly and ends up either firing on every code-review-shaped task and confusing the model, or never firing at all because something more specific outranks it.&lt;/p&gt;

&lt;p&gt;The pattern I use.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;{What it does in a sentence}. Use when {trigger phrase 1}, {trigger phrase 2}, or {trigger phrase 3}. {Optional scope note}.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If a skill could plausibly fire on anything code-related, you have not been specific enough. Tighten the description until it fires on the cases you want and stays silent on the rest. The wrong skill firing at the wrong time is worse than no skill firing at all, because it wastes the turn.&lt;/p&gt;




&lt;h2&gt;
  
  
  Writing Your First Skill: A Walkthrough
&lt;/h2&gt;

&lt;p&gt;Let me walk through a concrete example end to end. The goal: a skill that codifies how I want commit messages written on my projects.&lt;/p&gt;

&lt;p&gt;I open a terminal and run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.claude/skills/conventional-commits
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/.claude/skills/conventional-commits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I create &lt;code&gt;SKILL.md&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;conventional-commits&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Write a Git commit message in the project's conventional-commits style. Use when the user runs /commit, asks to "write a commit message," or finishes a change and asks what to commit. Generates the message; never runs git commit unless explicitly asked.&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Conventional Commits&lt;/span&gt;

You are writing a commit message for code changes the user has made. Follow these rules.

&lt;span class="gu"&gt;## Format&lt;/span&gt;

&lt;span class="sb"&gt;`&amp;lt;type&amp;gt;(&amp;lt;scope&amp;gt;): &amp;lt;subject&amp;gt;`&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; type: one of feat, fix, refactor, chore, docs, test, perf, style, build, ci
&lt;span class="p"&gt;-&lt;/span&gt; scope: optional, lowercase, one or two words max, no spaces (use hyphens)
&lt;span class="p"&gt;-&lt;/span&gt; subject: imperative mood, lowercase, no trailing period, under 72 characters

&lt;span class="gu"&gt;## Body&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Wrap at 72 characters
&lt;span class="p"&gt;-&lt;/span&gt; Explain the &lt;span class="gs"&gt;**why**&lt;/span&gt;, not the &lt;span class="gs"&gt;**what**&lt;/span&gt;. The diff already shows the what
&lt;span class="p"&gt;-&lt;/span&gt; Reference related issues with &lt;span class="sb"&gt;`Refs: #123`&lt;/span&gt; or &lt;span class="sb"&gt;`Closes: #123`&lt;/span&gt; on their own line
&lt;span class="p"&gt;-&lt;/span&gt; Skip the body entirely for trivial commits

&lt;span class="gu"&gt;## Hard rules&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Never use em dashes. Use commas, periods, or restructure the sentence
&lt;span class="p"&gt;-&lt;/span&gt; Never add Co-Authored-By unless the user explicitly asks for it
&lt;span class="p"&gt;-&lt;/span&gt; Never run &lt;span class="sb"&gt;`git commit`&lt;/span&gt; yourself. Output the message and let the user decide
&lt;span class="p"&gt;-&lt;/span&gt; If the diff touches more than one logical change, suggest splitting into multiple commits rather than writing one big message

&lt;span class="gu"&gt;## Examples&lt;/span&gt;

&lt;span class="sb"&gt;`feat(blog): add MCP server tutorial post`&lt;/span&gt;

&lt;span class="sb"&gt;`fix(rss): drop trailing slash in canonical URL`&lt;/span&gt;

&lt;span class="sb"&gt;`refactor(content): split shared frontmatter into a helper`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the skill. It took five minutes. I save the file, restart Claude Code (or run &lt;code&gt;/plugin reload&lt;/code&gt;), and from the next session forward, when I finish a change and say "what should I commit," the model produces a message in exactly the format I want. No back and forth, no reminders, no "actually I prefer the subject in lowercase."&lt;/p&gt;

&lt;p&gt;The reason this works is the description. It names three trigger phrases I actually use. It scopes the skill to commit-message writing only. It mentions a guardrail (does not run git commit) that disambiguates this skill from any general "make a commit" skill that might exist. The model loads it when relevant and ignores it the rest of the time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The CLAUDE.md To Skill Promotion Path
&lt;/h2&gt;

&lt;p&gt;The fastest way to write good skills is to not start with skills. Start with &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;When I am working on a new project, every instruction I find myself giving the model more than once goes into &lt;code&gt;CLAUDE.md&lt;/code&gt; first. The file lives in the project root, loads automatically on every turn, and is the lowest-friction way to capture a working pattern. Most projects need ten to thirty lines of &lt;code&gt;CLAUDE.md&lt;/code&gt; and that is the right amount.&lt;/p&gt;

&lt;p&gt;The problem with &lt;code&gt;CLAUDE.md&lt;/code&gt; is that it loads on every turn. As it grows past a couple hundred lines, the model spends real context on it whether the current task is relevant or not. That is when you promote.&lt;/p&gt;

&lt;p&gt;Pick a section of &lt;code&gt;CLAUDE.md&lt;/code&gt; that only matters for a specific class of task. Move it to a skill. Write a description that targets exactly that class of task. The &lt;code&gt;CLAUDE.md&lt;/code&gt; keeps the always-relevant rules. The skill carries the sometimes-relevant ones. The model loads the sometimes-relevant ones only when they apply.&lt;/p&gt;

&lt;p&gt;This promotion path is the most reliable way to write skills that actually fire well, because the body of the skill is content you have already validated by using it as &lt;code&gt;CLAUDE.md&lt;/code&gt; instructions. You know it works. You know the model follows it. You are just changing the trigger surface from "always" to "when relevant."&lt;/p&gt;

&lt;p&gt;Do not skip the validation step. Skills written from scratch, without a real-use loop, are almost always too abstract. The model ignores them in practice because the descriptions sound right but the body says nothing it cannot infer on its own. Skills written by promotion are concrete, opinionated, and load-bearing.&lt;/p&gt;

&lt;p&gt;The same pattern applies to skills you find in the marketplace. Install the one that almost fits. Use it for a week. Note where it does the wrong thing. Fork it, write your own copy in your local &lt;code&gt;~/.claude/skills/&lt;/code&gt;, and tune the body to match your project. The forked skill outperforms the original every time, because the original was the average of what worked for everyone and yours is exactly what works for you.&lt;/p&gt;




&lt;h2&gt;
  
  
  Skill Versus Subagent Versus Hook Versus MCP
&lt;/h2&gt;

&lt;p&gt;This is the matrix that confused me for months. Let me draw it cleanly.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;skill&lt;/strong&gt; is structured instructions the model reads when the trigger matches. Pure prompt. Runs in the same turn. No external process. Fires when relevant, sits silent otherwise. Use for: codifying how you want the model to approach a class of task.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;subagent&lt;/strong&gt; is a separately-spawned model invocation, with its own context window, that returns a summary to the main agent. Use for: research, broad code exploration, parallel work, or any case where you want to protect the main context from a lot of intermediate output.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;hook&lt;/strong&gt; is a shell command the harness runs in response to a lifecycle event. Runs outside the model. Use for: enforcing rules the model cannot enforce on its own (formatters, linters, tests, audit logs). The right tool when the requirement is "every time X happens, Y must run, regardless of what the model decides."&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;MCP server&lt;/strong&gt; exposes tools to the model that it can call mid-turn to read or modify external systems. Use for: any case where the model needs to touch a system it does not natively know about (your database, your ticketing system, your build pipeline). See &lt;a href="https://dev.to/blog/build-mcp-server-developer-guide-2026"&gt;the MCP server building guide&lt;/a&gt; for how to write one.&lt;/p&gt;

&lt;p&gt;The honest matrix.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I want the model to do X consistently" → skill&lt;/li&gt;
&lt;li&gt;"I want to delegate X without polluting context" → subagent&lt;/li&gt;
&lt;li&gt;"I want X to happen every time, no matter what" → hook&lt;/li&gt;
&lt;li&gt;"I want the model to read or modify Y" → MCP server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A well-tuned project uses two or three of these together. A skill that tells the model how to use an MCP server is the canonical pairing. A hook that runs a test suite after every code edit, paired with a skill that explains how to interpret the test output, is the second-most-common pairing. The primitives compose.&lt;/p&gt;

&lt;p&gt;What you should not do is reach for an MCP server when a skill would have been enough, or reach for a skill when a hook is the right answer. The first wastes engineering time. The second produces unreliable enforcement, because the model can decide not to follow a skill but cannot decide not to run a hook.&lt;/p&gt;




&lt;h2&gt;
  
  
  Patterns That Work, And Ones I Keep Failing With
&lt;/h2&gt;

&lt;p&gt;After roughly thirty skills written (most kept, some thrown out), here are the patterns I trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills that codify a checklist work brilliantly.&lt;/strong&gt; Anything where there are five to fifteen specific things you want the model to do or not do, in a specific order, is a perfect fit. The &lt;code&gt;review-work&lt;/code&gt; skill on my machine is twenty bullet points. It runs the same checks every time. The model reliably executes the list. The output is consistent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills that name the bad outputs explicitly work better than skills that describe the good ones.&lt;/strong&gt; "Do not use em dashes. Do not write &lt;code&gt;Of course!&lt;/code&gt;. Do not start with &lt;code&gt;Let me dive into&lt;/code&gt;." beats "Write in a clean style." The model is much better at avoiding specific banned strings than at hitting a vague target.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills that include before/after examples work better than skills that just state rules.&lt;/strong&gt; Two paragraphs of "here is the rule" plus one paragraph of "here is a wrong version and the right version" produces dramatically better adherence than three paragraphs of pure rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills scoped to a tool or a file type work better than skills scoped to a "domain."&lt;/strong&gt; A skill for Astro MDX posts is better than a skill for "blog writing." A skill for Drizzle migrations is better than a skill for "database work." Narrow scope, sharp trigger, concrete body.&lt;/p&gt;

&lt;p&gt;The patterns I keep failing with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills that try to teach the model a whole domain in five hundred lines.&lt;/strong&gt; The model already knows the domain. The skill should add the specifics of how you want the domain handled, not re-teach it. Long, comprehensive skills load slowly, dilute attention, and often get ignored when something more specific exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills with overlapping triggers.&lt;/strong&gt; If two of my skills both claim to fire on "code review," the model picks one (often the wrong one) and the other never gets used. Audit your triggers. Make sure each skill owns a clear slice of the task space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills that mix instructions for the model with documentation for humans.&lt;/strong&gt; Pick a lane. The &lt;code&gt;SKILL.md&lt;/code&gt; body is read by the model on every fire. If you want a human-readable README, put it in &lt;code&gt;README.md&lt;/code&gt; next to &lt;code&gt;SKILL.md&lt;/code&gt;. The body of the skill should be all model-facing, all the time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills that depend on context the model does not have.&lt;/strong&gt; "Use the patterns from the design system" is useless if the design system is not in context. Either inline the relevant patterns, or have the skill instruct the model to read a specific file before continuing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Versioning, Shipping, And The Plugin Wrapper
&lt;/h2&gt;

&lt;p&gt;For personal skills, sitting in &lt;code&gt;~/.claude/skills/&lt;/code&gt;, you do not need a plugin or a version. You are the only user. Just edit the file when it needs to change.&lt;/p&gt;

&lt;p&gt;The moment you want to share a skill (with a teammate, with a client, with the world), wrap it in a plugin.&lt;/p&gt;

&lt;p&gt;A plugin is a directory with a &lt;code&gt;plugin.json&lt;/code&gt; and a &lt;code&gt;skills/&lt;/code&gt; subfolder. The &lt;code&gt;plugin.json&lt;/code&gt; looks roughly like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"alexcloudstar-blog-tools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Skills for writing and reviewing posts on alexcloudstar.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"alexcloudstar"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"skills"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"review-blog-post"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"estimate-post-effort"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"publish-checklist"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Push that to a public or private Git repo, point your &lt;code&gt;marketplace.json&lt;/code&gt; at it, and &lt;code&gt;/plugin install alexcloudstar-blog-tools&lt;/code&gt; works for anyone with the marketplace added.&lt;/p&gt;

&lt;p&gt;Versioning matters more than people expect. The moment a skill is in use by anyone other than you, breaking changes to the skill's behavior need a major version bump. Otherwise users who pull the latest version will see different outputs from the same prompts, and they will not understand why. Treat skills like API contracts. They are read by an agent, but the agent's behavior is observable, and the user has built a mental model around it.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;CHANGELOG.md&lt;/code&gt; next to your &lt;code&gt;plugin.json&lt;/code&gt; is more important than you would guess. The plugins that get adopted have changelogs. The ones that do not have changelogs feel risky to install. The cost of writing two lines per release is roughly zero. Do it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Wish I Knew After Twenty Skills
&lt;/h2&gt;

&lt;p&gt;A few hard-won lessons in no particular order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Most skills should be under a hundred lines.&lt;/strong&gt; If you find yourself writing a fifth heading, you are probably building two skills, not one. Split.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test triggers explicitly.&lt;/strong&gt; After writing a skill, start a fresh session and use natural-sounding prompts that should fire the skill. If it does not fire, the description is wrong. Iterate on the description until it fires reliably on three or four trigger phrases you actually use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Re-read your own skills monthly.&lt;/strong&gt; Skills rot. The conventions in the body drift away from the conventions you actually follow. If a skill no longer matches how you work, edit it or delete it. Stale skills are worse than no skills, because they produce confident wrong answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills compose with subagents better than you would expect.&lt;/strong&gt; If you have a skill that explains how to do a deep review, and you spawn a subagent to do the review, the subagent inherits the skill. You get focused, high-quality work in a protected context window. The combination is one of the strongest patterns in the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not skill-ify creative work.&lt;/strong&gt; A skill that "writes a blog post" produces formulaic blog posts. The point of writing is that the output is yours, not the model's. Skills are excellent for repetitive, rule-based work, and they are the wrong tool for work where the value is in the unique judgement you are bringing. For my own writing, I have skills for the mechanical parts (frontmatter schema, internal-link patterns, SEO checks) and no skills for the actual prose. The mechanical parts are happily automated. The prose is mine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treat skills as a backlog, not a one-shot.&lt;/strong&gt; I keep a notes file with "things I keep typing into the prompt." Every couple of weeks I look at the list, pick the top one, and turn it into a skill. The compounding effect over a year is significant. The single-week effect is not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where The Skills Ecosystem Is Going
&lt;/h2&gt;

&lt;p&gt;The skills surface is going to grow fast in late 2026. Anthropic has signalled that the official marketplace will start accepting third-party plugins under a review process by Q3. Cursor is shipping their own skill format (compatible enough that conversion will be a script). The MCP integration with skills is getting tighter: a skill that knows about a specific MCP server's tools and explains how to chain them is becoming the canonical "useful plugin."&lt;/p&gt;

&lt;p&gt;What this means in practice. The half-life of a useful skill is increasing. The skills I wrote six months ago still fire, still work, and still save time. The skills I write now have a longer runway than the skills I wrote in 2025, because the surrounding ecosystem (plugins, marketplaces, MCP, hooks, subagents) is no longer changing weekly. The investment compounds.&lt;/p&gt;

&lt;p&gt;The other thing it means. The teams that have ten well-tuned internal skills are dramatically more productive than the teams that have none. The gap is roughly the size of a junior developer per developer. That is a real number. It does not show up on dashboards. It shows up in the rate at which work that used to be repetitive becomes automatic.&lt;/p&gt;

&lt;p&gt;If you have read this far, the next step is concrete. Open &lt;code&gt;~/.claude/skills/&lt;/code&gt;. Create a directory. Drop a &lt;code&gt;SKILL.md&lt;/code&gt; in it. Pick the smallest, most boring, most repeated thing you do every week and codify it. Spend twenty minutes on the description. Test the trigger. Use the skill for a week.&lt;/p&gt;

&lt;p&gt;If you do that one time, the next nine skills will write themselves. The leverage is real. The investment is small. The hard part is just starting.&lt;/p&gt;

&lt;p&gt;The dotfiles era of AI tooling produced your &lt;code&gt;vimrc&lt;/code&gt;. The skills era produces your &lt;code&gt;~/.claude/skills/&lt;/code&gt; directory. Treat it with the same care, and it will repay you the same way.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>claude</category>
      <category>agents</category>
    </item>
    <item>
      <title>How to Build Your First MCP Server in 2026: A Practical Developer Guide</title>
      <dc:creator>Alex Cloudstar</dc:creator>
      <pubDate>Fri, 22 May 2026 06:53:25 +0000</pubDate>
      <link>https://dev.to/alexcloudstar/how-to-build-your-first-mcp-server-in-2026-a-practical-developer-guide-264l</link>
      <guid>https://dev.to/alexcloudstar/how-to-build-your-first-mcp-server-in-2026-a-practical-developer-guide-264l</guid>
      <description>&lt;p&gt;The first MCP server I wrote did one thing. It read my Postgres database and returned the schema as structured JSON. That was it. No fancy joins, no query builder, just a list of tables, columns, and types.&lt;/p&gt;

&lt;p&gt;It took me an afternoon. Two weeks later it had saved me hours. Every time I asked Claude Code to add a new feature that touched the database, it pulled the schema through the MCP server instead of hallucinating column names. The bug rate on AI-generated migrations dropped to roughly zero. The pattern was so obviously useful that I now write a small MCP server for almost every project I work on.&lt;/p&gt;

&lt;p&gt;If you have read my &lt;a href="https://dev.to/blog/mcp-model-context-protocol-developer-guide-2026"&gt;MCP developer guide&lt;/a&gt;, you know what the protocol is. This post is the part I did not cover there: actually building one. Not theoretical, not waving at the spec. The exact steps I take when I sit down and decide a project needs its own MCP server, and the mistakes I keep watching other developers make on the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why You Should Build One Even If You Are Not At Anthropic
&lt;/h2&gt;

&lt;p&gt;There is a weird assumption I keep running into. People think MCP servers are something big AI companies write for their integrations, and that the rest of us just install them. That is half right. The official servers (GitHub, Linear, Notion, Postgres) are excellent. They cover the obvious cases. But they cannot cover your case.&lt;/p&gt;

&lt;p&gt;Your case is your repo's weird internal CLI. Your case is the JSON schema you keep pasting into prompts by hand. Your case is the three internal APIs nobody outside your team will ever wrap. Your case is the build script that ships your product, the migration tool you wrote in 2023, the analytics dashboard you query through a homemade SQL view.&lt;/p&gt;

&lt;p&gt;Every one of those things is a candidate for an MCP server. Not because the protocol is glamorous, but because once it is wrapped, every AI agent you use can hit it. Claude Code, Cursor, Windsurf, Zed, the official Anthropic chat app, any future agent runtime. One server, many consumers. The leverage is hard to overstate.&lt;/p&gt;

&lt;p&gt;The other reason to build is simpler. You will understand MCP at a level that is not available from reading the spec. The first time you have to decide what your &lt;code&gt;searchTickets&lt;/code&gt; tool returns and what it does not, you learn more about agent design than a year of theory.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Are Actually Building
&lt;/h2&gt;

&lt;p&gt;An MCP server is a process that speaks JSON-RPC and exposes a small set of primitives. The protocol calls them &lt;strong&gt;tools&lt;/strong&gt;, &lt;strong&gt;resources&lt;/strong&gt;, and &lt;strong&gt;prompts&lt;/strong&gt;. Most servers ship only tools. Some ship resources. Almost nobody ships prompts. Start with tools.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;tool&lt;/strong&gt; is a function with a name, a description, an input schema (JSON Schema), and an implementation. When the model decides to call your tool, the runtime serialises the arguments, sends them to your server, your server runs the function, and the result comes back as text or structured content. That is the whole loop.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;resource&lt;/strong&gt; is something the model can read but not write. Think of it as a file the agent can fetch by URI. A common pattern is to expose your project's docs, your database schema, or a snapshot of system state as resources. The model pulls them when relevant.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;prompt&lt;/strong&gt; is a templated instruction the user can invoke by name. They are useful for codifying common workflows. In practice almost every server I have seen skips them and lets the user invoke slash commands or skills instead.&lt;/p&gt;

&lt;p&gt;The protocol does not care what language you write the server in. There are mature SDKs for TypeScript, Python, Go, Rust, Kotlin, C#, and Java. I write almost all of mine in TypeScript because the toolchain matches the rest of my stack and because the official &lt;code&gt;@modelcontextprotocol/sdk&lt;/code&gt; is the most actively maintained. Pick whatever language gets you to a working server fastest. The model does not know or care.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pick Your Transport: stdio Versus HTTP
&lt;/h2&gt;

&lt;p&gt;There are two transports that matter in 2026, and the choice between them shapes everything else.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;stdio transport&lt;/strong&gt; is the one you want for local-only servers. The agent runtime spawns your server as a child process and pipes JSON-RPC over stdin and stdout. There is no port, no auth, no network. The server lives and dies with the agent session. Most local development tools (Postgres helpers, git wrappers, file system tools, build runners) ship as stdio servers because the security model is dead simple: if the agent can run a process on your machine, it already has the same trust level as that process.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;streamable HTTP transport&lt;/strong&gt; is the one you want for hosted servers. It runs over HTTP with Server-Sent Events for the streaming half. You stand it up on a real server (Fluid Compute, Lambda, a VM, whatever), give it a URL, and any agent that supports remote MCP can connect. Use this when the server needs to be shared across machines, when it needs centralised auth, or when it wraps an API that should not have its credentials on every developer's laptop.&lt;/p&gt;

&lt;p&gt;There is a third option, the deprecated SSE-only transport, which you should ignore. The 2025 spec consolidated on streamable HTTP. New servers should not implement SSE-only.&lt;/p&gt;

&lt;p&gt;For your first MCP server I strongly recommend stdio. The feedback loop is fast, the auth story is non-existent, and the deployment story is "drop the binary on your laptop." You can graduate to HTTP later when you have something worth hosting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Minimum Viable MCP Server
&lt;/h2&gt;

&lt;p&gt;Here is the smallest useful TypeScript server I would actually ship. It exposes one tool that returns the current git status of the repo it was launched in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;CallToolRequestSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;ListToolsRequestSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@modelcontextprotocol/sdk/types.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;git-status-mcp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;0.1.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ListToolsRequestSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;git_status&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Return the porcelain git status of the current working directory. Use this before suggesting commits.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;CallToolRequestSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;git_status&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Unknown tool: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;git status --porcelain=v1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;encoding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;utf-8&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;(clean)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioServerTransport&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a complete, working MCP server. Around forty lines. It does one job, but the job is real, and the model can call it the moment the server is wired in. The pattern scales: every tool you add follows the same shape. List the tool in &lt;code&gt;ListToolsRequestSchema&lt;/code&gt;. Handle it in &lt;code&gt;CallToolRequestSchema&lt;/code&gt;. Return text or structured content. Done.&lt;/p&gt;

&lt;p&gt;The two things I want you to notice. First, the description on the tool. It is not "returns git status." It is "Return the porcelain git status of the current working directory. &lt;strong&gt;Use this before suggesting commits.&lt;/strong&gt;" That last sentence is what the model reads when it decides whether to call the tool. Tool descriptions are not documentation for humans. They are activation prompts for the agent. Treat them accordingly.&lt;/p&gt;

&lt;p&gt;Second, the input schema is empty. The tool takes no arguments. If your tool takes arguments, define them as a proper JSON Schema with &lt;code&gt;required&lt;/code&gt;, types, and &lt;code&gt;description&lt;/code&gt; fields on every property. The model uses the schema to construct calls. Fuzzy schemas produce fuzzy calls.&lt;/p&gt;




&lt;h2&gt;
  
  
  Designing Tools The Model Will Actually Use
&lt;/h2&gt;

&lt;p&gt;This is the part nobody tells you, and the part that makes the difference between a server you wrote and a server you actually use.&lt;/p&gt;

&lt;p&gt;Tools are not API endpoints. The temptation is to expose every method of your underlying system one to one. Resist it. A server with forty tools is worse than a server with eight, because the model has to read every tool description on every call and the noise drowns out the signal. I have hit this directly. A server I wrote with twenty-five tools was used less than a server I refactored to expose eight tools that grouped related operations.&lt;/p&gt;

&lt;p&gt;The rule I follow: &lt;strong&gt;tools should match user intentions, not implementation details.&lt;/strong&gt; A &lt;code&gt;search_tickets&lt;/code&gt; tool that takes a natural-language query and returns ranked results is better than five tools called &lt;code&gt;filter_by_status&lt;/code&gt;, &lt;code&gt;filter_by_assignee&lt;/code&gt;, &lt;code&gt;filter_by_label&lt;/code&gt;, &lt;code&gt;filter_by_date&lt;/code&gt;, and &lt;code&gt;combine_filters&lt;/code&gt;. The model can compose the natural-language query. It cannot reliably compose five micro-tools without getting confused.&lt;/p&gt;

&lt;p&gt;The second rule: &lt;strong&gt;return what the model needs to decide what to do next, not the raw payload.&lt;/strong&gt; If your tool returns a 10,000-line JSON blob, the model will spend half its context reading it. Trim aggressively. Summarise. Paginate. If the user really needs the full payload, expose a second tool that fetches by ID. Default to small, focused responses.&lt;/p&gt;

&lt;p&gt;The third rule: &lt;strong&gt;make errors educational, not stack-tracey.&lt;/strong&gt; When a tool fails, return a message that tells the model how to recover. "User not found. Try search_users first to get a valid user id." is fifty times more useful than &lt;code&gt;Error: 404 Not Found&lt;/code&gt;. The model is not your operator. It cannot read your terminal. The error message is its entire view of the failure.&lt;/p&gt;

&lt;p&gt;The fourth rule: &lt;strong&gt;idempotency where you can swing it.&lt;/strong&gt; Models retry. Sometimes for good reasons, sometimes because they got confused. A tool that creates duplicate records on retry will burn you. Either return existing records when called with the same arguments, or expose a &lt;code&gt;dry_run&lt;/code&gt; mode that the agent can call first.&lt;/p&gt;

&lt;p&gt;If you internalise those four rules before writing any code, you will skip the year of pain I went through.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources, Prompts, And The Other Primitives
&lt;/h2&gt;

&lt;p&gt;I said start with tools. I meant it. But there is a small, specific case where resources earn their keep, and it is worth covering.&lt;/p&gt;

&lt;p&gt;Resources are for things the model needs to &lt;strong&gt;read but never modify&lt;/strong&gt;. The canonical example is project context: a &lt;code&gt;README.md&lt;/code&gt;, a &lt;code&gt;CLAUDE.md&lt;/code&gt;, an OpenAPI spec, a database schema dump, a CHANGELOG. The model fetches them by URI, reads them, uses them, and moves on. Resources are cheaper than tools because they do not require a function call round trip. The agent loads them, often automatically, when starting a session.&lt;/p&gt;

&lt;p&gt;If your server wraps a system with structured context that does not change per-request (the schema, the docs, the config), expose it as resources. Otherwise stick to tools.&lt;/p&gt;

&lt;p&gt;Prompts are the third primitive and I genuinely have not seen a useful one in production. The idea is that you ship reusable prompt templates that the user can invoke. In practice, slash commands in Claude Code and rules in Cursor cover the same ground with less friction. Skip prompts for your first server. Revisit them if you ever feel the lack.&lt;/p&gt;

&lt;p&gt;There is also a notifications/sampling/elicitation surface in the protocol that I am skipping entirely here. It is not necessary for any normal server. If you find yourself needing it, you have already outgrown this guide.&lt;/p&gt;




&lt;h2&gt;
  
  
  Authentication And Secrets
&lt;/h2&gt;

&lt;p&gt;The auth model is where local and hosted servers diverge sharply.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;stdio servers&lt;/strong&gt;, you mostly do not have an auth problem. The server inherits the user's environment. If your tool needs an API key, the standard pattern is to read it from an env var the user sets in their shell or in the agent runtime's config. There is no token exchange, no OAuth, no session. The trust boundary is the user's local machine.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;hosted HTTP servers&lt;/strong&gt;, you have a real auth problem and you should treat it as such. The MCP spec aligned in 2025 on OAuth 2.1 with Dynamic Client Registration. The agent presents a token. Your server validates it. There is also a simpler bearer-token pattern for first-party servers, where the user pastes a token into their agent config and your server checks it on every request. Both are fine for different use cases.&lt;/p&gt;

&lt;p&gt;The mistake I keep seeing: developers ship an HTTP MCP server with no auth at all because they assume only they will hit it. Then six weeks later they leave the URL in a tweet, somebody scrapes their database, and there is a bad afternoon. If your server is on the public internet and it does anything beyond returning constants, it needs auth. No exceptions.&lt;/p&gt;

&lt;p&gt;The other mistake: storing the wrong secrets. Your MCP server is going to handle the model's queries, which include user data. Treat the server like any other production service. Use env vars, not hardcoded values. Use a secrets manager for production. Rotate credentials. Log auth failures. The fact that the consumer is an AI agent does not change the security model. If anything, it raises the stakes, because the agent will retry queries automatically and amplify any leak.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing Without A Model In The Loop
&lt;/h2&gt;

&lt;p&gt;The single biggest mistake I made on my first three MCP servers was testing them only through Claude Code. The feedback loop is too slow and too coarse. The model decides whether to call your tool, what to call it with, and how to interpret the result, which means a single end-to-end test exercises ten degrees of freedom you cannot isolate.&lt;/p&gt;

&lt;p&gt;Test the server like a normal HTTP API. The official SDKs ship inspector tools that let you send raw JSON-RPC messages and see the responses. Use them.&lt;/p&gt;

&lt;p&gt;The workflow that saved me hours.&lt;/p&gt;

&lt;p&gt;Step one, run &lt;code&gt;npx @modelcontextprotocol/inspector&lt;/code&gt; against your server. It opens a UI where you can list tools, call them with arbitrary arguments, and inspect the responses. Every tool gets exercised here first.&lt;/p&gt;

&lt;p&gt;Step two, write integration tests against your tool handlers directly. They are normal async functions. Call them from a test runner. Assert on the output. This catches schema mismatches, edge cases on inputs, and regressions when you refactor.&lt;/p&gt;

&lt;p&gt;Step three, only after the above two pass, wire the server into Claude Code and exercise the actual model loop. The model will do things you did not expect with your tools. That is fine. You are watching for "did the model find this tool" and "did it use the tool sensibly," not for "did this tool work." Those questions were already answered.&lt;/p&gt;

&lt;p&gt;If you follow that order, you will catch ninety percent of bugs without ever burning a token.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wiring It Into Claude Code
&lt;/h2&gt;

&lt;p&gt;The integration surface for Claude Code is &lt;code&gt;~/.claude/mcp_servers.json&lt;/code&gt; (or the project-scoped equivalent). For a stdio server, the config looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"git-status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/absolute/path/to/your/server/index.js"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a hosted HTTP server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"my-internal-api"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://mcp.example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"headers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Authorization"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bearer ${MY_INTERNAL_TOKEN}"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart Claude Code. Run &lt;code&gt;/mcp&lt;/code&gt; to see the server's status. If it shows up green, your tools are loaded. If it shows up red, the bottom of the panel tells you why. The most common failure is a wrong path or a missing env var. The second most common is the server crashing on startup. Run the server manually from the same shell Claude Code launches from to confirm it boots.&lt;/p&gt;

&lt;p&gt;Cursor and Windsurf both ship MCP support with similar config files. The surface is identical. Whatever language and transport you picked, the server runs the same against every consumer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Wish I Knew Before Shipping The First Version
&lt;/h2&gt;

&lt;p&gt;Six things, in order of how much pain they would have saved me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick a clear scope before writing the first tool.&lt;/strong&gt; "An MCP server for my project" is not a scope. "An MCP server that exposes our Postgres schema and lets the agent run safe read-only queries" is a scope. The narrow servers I have shipped have all been useful. The "general-purpose" ones have all been deleted within a month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write the tool descriptions before the tool implementations.&lt;/strong&gt; This is the single highest-leverage practice I have found. If you cannot describe what a tool does in two sentences that include when the model should call it, the tool is wrong. Rewrite. The description is the API contract with the agent. The code is just implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log every tool call with arguments and results.&lt;/strong&gt; Even in development. You need to see what the model is actually doing with your server. The patterns are not what you expect. I have watched models call my tools with arguments I would never have predicted, and the logs are how I find out. Without logging you are guessing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version your server from day one.&lt;/strong&gt; Use semver. Tag releases. When you make a breaking change to a tool's schema, bump the major version, and add a deprecation period for the old schema if the server is shared. Agents do not handle silent breaking changes well. They will keep calling the old shape until the descriptions tell them otherwise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cap response sizes.&lt;/strong&gt; Set a hard ceiling on how much text any tool can return (I default to 16 KB, less for noisy tools). When you hit the cap, return a truncation notice with a hint about how to fetch more. Letting a tool dump 200 KB into the model context once will teach you why this matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treat the server like a product.&lt;/strong&gt; Your future self is its first user. Your team is its second. The agent is its third. Write a README that explains what the server does, what tools it exposes, and what to do when something breaks. Six months from now you will be grateful.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where To Take It Next
&lt;/h2&gt;

&lt;p&gt;Once you have shipped one MCP server you will see candidates everywhere. The instinct is to write more servers. Resist a little. The better move is usually to &lt;strong&gt;grow the first server&lt;/strong&gt; with carefully chosen tools until it covers most of your daily workflow, then split out a second server only when the first server starts to feel unfocused.&lt;/p&gt;

&lt;p&gt;A few directions worth exploring once you are comfortable.&lt;/p&gt;

&lt;p&gt;Combine your MCP server with &lt;a href="https://dev.to/blog/claude-code-plugin-marketplace-skills-2026"&gt;Claude Code skills&lt;/a&gt; by packaging both into a single plugin. Skills tell the model when to reach for the tools your server exposes. The combination is dramatically more reliable than either piece in isolation.&lt;/p&gt;

&lt;p&gt;If your server wraps a third-party API, look at the &lt;a href="https://dev.to/blog/ai-agent-tool-design-2026"&gt;agent tool design patterns&lt;/a&gt; for what production-grade tool signatures look like, especially around pagination, partial failure, and rate limiting.&lt;/p&gt;

&lt;p&gt;If you are running the server in production for multiple users, look at &lt;a href="https://dev.to/blog/ai-agent-observability-debugging-production-2026"&gt;agent observability&lt;/a&gt; for what you actually need to monitor. The interesting metrics are not the ones you would track for a normal HTTP API. They are things like "what percentage of tool calls resulted in the agent making a follow-up call to the same tool with corrected arguments," which is a strong signal that your descriptions are unclear.&lt;/p&gt;

&lt;p&gt;If you are wondering how MCP fits next to direct agent-to-agent communication, the &lt;a href="https://dev.to/blog/a2a-vs-mcp-agent-communication-2026"&gt;A2A vs MCP comparison&lt;/a&gt; is the right next read. Short version: MCP exposes capabilities to a single agent, A2A coordinates multiple agents. They solve adjacent problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Honest Read On MCP Servers In 2026
&lt;/h2&gt;

&lt;p&gt;Most developers will never write one. That is fine. The official integrations are good enough for most use cases. But the developers who do write their own end up with a leverage advantage that is hard to describe until you have it. Every model gets your tool. Every agent runtime gets your tool. Every future tool gets your tool, for free, the moment they ship MCP support, which they all will because the protocol won.&lt;/p&gt;

&lt;p&gt;The cost of entry is one afternoon. The payoff is permanent.&lt;/p&gt;

&lt;p&gt;If you are sitting on a project that has any repetitive interaction you wish the agent could automate, write the server. Start with one tool. Make sure the tool description is clear enough that the model uses it without prompting. Wire it in. See what happens.&lt;/p&gt;

&lt;p&gt;The first time the agent calls your tool unprompted, in the middle of a task, and uses the result correctly, the rest of the post will make sense. That is the moment MCP stops being a protocol you read about and starts being something you build with.&lt;/p&gt;

&lt;p&gt;That moment is one afternoon away. Worth the afternoon.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>mcp</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
