DEV Community:

Programming rituals nobody warned you about

Sun, 26 Jul 2026 02:23:28 +0000

Bash silently lies to you. Django ships your secret key. Python ghosts your async task. Nobody put this in the onboarding doc.

You passed the interview. You shipped the feature. Maybe you even got the senior title. And then, somewhere between your third production incident and your fifth “wait, that’s just how it works?” moment, you realized something: a huge chunk of this job is memorizing a list of traps that nobody officially told you about.

Not edge cases. Not obscure bugs. Traps baked into the tools themselves sitting there, patient, waiting for the one time you forget the ritual.

I’ve been collecting mine for years. Some came from incidents I caused. Some from watching teammates debug something for hours only to find the fix was four characters typed at the top of a file. A few came from reading post-mortems written by engineers at companies way bigger than mine, who made the exact same mistake for the exact same reason: the default behavior was broken, and nobody warned them either.

The good news: once you know the ritual, you never get burned by that particular fire again. The bad news: there’s always another one. The list never stops growing.

TL;DR: This is a tour through five rituals every working developer eventually collects Bash’s silent failure mode, the many flavors of dumb substitution, Django’s charming habit of hardcoding your secret key, and Python quietly garbage-collecting your async task mid-flight. None of these are your fault. All of them will cost you if you forget.

Bash’s “optimistic nihilism” mode

Here’s a fun game. Write a Bash script. Make one command in the middle fail. Don’t add any error handling. Run it.

Bash will finish the script. Every line after the failure. Cheerfully. Like nothing happened.

That’s not a bug report I’m filing that’s the default behavior. Bash, by design, treats a failed command as a suggestion. You said run these things. It ran them. Whether they succeeded is kind of your problem.

It gets better. If a command in a pipeline fails, the pipeline’s exit code is determined by the last command not the one that actually broke. So you can have garbage data flowing through three pipes, the whole thing producing nonsense, and Bash will look you in the eye and return exit code 0. Success. Great job everyone.

And if you reference a variable you forgot to define? Bash assumes it’s an empty string. No warning. No error. Just silently substitutes nothing and keeps going. Which is fine until that variable was part of a path, and now you’re running rm -rf / instead of rm -rf /some/actual/directory.

That last one isn’t hypothetical. In 2021, a Kyoto University backup script wiped 77TB of research data because an undefined variable expanded to empty, turning a targeted delete into a root-level one. The researchers lost years of work. The script ran to completion without complaint.

Cloudflare had their own version of this. A deployment script piped the output of a failed command into the next stage, which consumed the garbage and cascaded into a global outage. The kind of thing that ends up on the front page of Hacker News with 800 comments, half of them saying “just use set -euo pipefail."

Which brings us to the ritual.

set -euo pipefail

Four options. One line. You put it at the top of every Bash script, no exceptions:

-e stop the script if any command returns a non-zero exit code
-u treat unset variables as errors, not empty strings
-o pipefail fail the pipeline if any command in it fails, not just the last one

That’s it. That’s the whole ritual. Three decades of Bash, and “stop if something goes wrong” is still opt-in.

The reason it’s not default is mostly historical old scripts relied on Bash’s permissive behavior, and flipping the default would have broken them. Which is a completely reasonable engineering decision made in the 1980s that the rest of us have been paying for ever since.

The real tell is when you join a new codebase and grep the shell scripts. If half of them are missing set -euo pipefail, you know exactly how the team learned about it or didn't. It's a pretty reliable proxy for "has this project had a bad day yet."

Add it. Every time. Before anything else. The one day you forget is the day a variable comes up undefined on a prod server and you spend the next four hours figuring out what it deleted.

Django said “just commit the secret key, it’s fine”

There’s a specific kind of betrayal that hits different when it comes from a framework you actually like.

Django is genuinely good. Batteries included, sensible defaults, mature ecosystem, excellent documentation. It’s the kind of framework that makes you feel productive fast, which is exactly why the first thing it does to a new project is so magnificently at odds with everything else it stands for.

django-admin startproject myproject
cat myproject/settings.py | grep -i secret

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'django-insecure-m&!$#k(56_this_is_literally_in_your_repo_now'

There it is. Your secret key. In your source file. With a comment telling you to keep it secret, immediately above the line where it isn’t.

The django-insecure- prefix is Django's way of flagging that this key was auto-generated and shouldn't go to production. It's a reasonable idea. The problem is that it's just a string prefix in a settings file it doesn't actually stop anything. It doesn't warn you at deploy time. It doesn't block you from pushing it to GitHub. It doesn't trigger any runtime error if you forget to replace it. It's a comment with ambition.

Meanwhile, the rest of the industry has been building infrastructure specifically to avoid this moment. GitHub has secret scanning that’ll flag exposed credentials in pushed commits. AWS has Secrets Manager. Every CI platform has encrypted environment variables. HashiCorp built an entire product around the idea that secrets should never touch source code. And then you run django-admin startproject and there's your key, sitting in settings.py, one git push away from being public forever.

The number of public GitHub repositories containing django-insecure- in a committed settings.py is not small. It's the kind of thing you can find with a five-second search, and people do.

The ritual is straightforward and should honestly be the default:

import os
SECRET_KEY = os.environ.get('DJANGO_SECRET_KEY')

Or if you want something slightly more ergonomic, python-decouple and django-environ both handle this cleanly pull the key from a .env file locally, from an environment variable in prod, and never let it touch version control at all.

The .env file gets added to .gitignore immediately. The production secret lives in your deployment environment, not your codebase. This is not advanced security practice it's the baseline, and it's been the baseline for years.

What makes the Django situation frustrating isn’t that the framework is insecure it isn’t, overall. It’s that this one specific default runs exactly counter to the secure habits the rest of the ecosystem has spent a decade trying to build. A junior developer running startproject for the first time has no particular reason to know that the generated file is a trap. The warning comment is easy to skim past. The key looks like configuration, not a credential.

Fix it in the first commit. Add it to your project template. Make it the first thing in your team’s Django onboarding doc. Because the alternative is finding out about it the way most people do after the push, after the scan, after the “your repository may have exposed credentials” email from GitHub at an unreasonable hour.

Python’s garbage collector will silently fire your async task

At some point in your Python career you will write something like this:

async def notify_user(user_id):
    await send_email(user_id)

async def handle_request(user_id):
    asyncio.create_task(notify_user(user_id))
    return {"status": "ok"}

Clean. Async. Non-blocking. The request returns immediately, the notification fires in the background. Exactly what you wanted.

Except sometimes it doesn’t fire. Not always just occasionally, unpredictably, in a way that doesn’t produce any error, any warning, or any log entry. The task simply doesn’t run. The email doesn’t send. The user never hears back. And your monitoring shows zero exceptions because nothing actually failed the task just ceased to exist before it had a chance to start.

What happened is that Python’s garbage collector looked at your task, noticed nothing in the program held a reference to it, and collected it. “No reference” means “not needed” in GC logic, and GC logic doesn’t know or care that your task was mid-flight. From the runtime’s perspective, you created an object, immediately dropped it, and the cleanup crew did its job.

This isn’t a bug in asyncio. The Python docs actually document it, in a note that’s easy to read past the first time:

“Important: Save a reference to the result of this function, to avoid a task disappearing mid-execution.”

That’s the whole warning. One italicized paragraph. For behavior that will silently drop work in production.

The ritual is keeping a reference alive for as long as the task needs to run:

tasks = set()

async def handle_request(user_id):
    task = asyncio.create_task(notify_user(user_id))
    tasks.add(task)
    task.add_done_callback(tasks.discard)
    return {"status": "ok"}

The add_done_callback is the clean part once the task finishes, it removes itself from the set automatically. You're not leaking memory, you're just giving the task somewhere to live until it's done. That's all it needed.

If you’re on Python 3.11 or newer, asyncio.TaskGroup is the more structured solution:

async def handle_request(user_id):
    async with asyncio.TaskGroup() as tg:
        tg.create_task(notify_user(user_id))
    return {"status": "ok"}

TaskGroup manages lifetimes explicitly, handles exceptions properly, and generally makes background task management feel like something the language actually supports rather than something you’re working around.

The reason this one is particularly brutal to debug is the failure mode. A missing SQL parameterization blows up loudly. A hardcoded secret key shows up in a GitHub scan. Bash without set -e at least leaves evidence in the logs if you know where to look. But a GC'd async task produces nothing. No stack trace, no error code, no entry in your observability platform. Work just quietly doesn't happen, and you only find out when a user complains or a metric drifts in a direction that takes a while to connect back to the right cause.

I’ve seen this burn a team running a background job processor for three days before anyone noticed the completion rate had dropped. Every individual request looked fine. The service was healthy by every dashboard metric. The tasks were just evaporating before they ran, silently, at a rate that depended on GC timing and load which meant it was inconsistent enough to look like a data issue at first.

The fix took twenty minutes once they found it. The finding took three days.

Add the reference. Use TaskGroup if you can. And maybe add a completion counter to any background task that actually matters, so you know when the numbers stop adding up.

The ritual tax compounds

Here’s the thing nobody tells you when you’re starting out: the job doesn’t get easier as you get more experienced. It gets differently hard. Early on, the hard part is understanding how things work. Later, the hard part is remembering everything that secretly doesn’t.

Every incident adds a line to the list. Every post-mortem ends with “and going forward, we’ll always remember to…” and then someone writes it in a Confluence doc that gets read once and slowly buried under newer Confluence docs. The ritual exists. The knowledge exists. Whether it survives the next team rotation is a different question.

That’s the compounding problem. It’s not that any individual ritual is complicated set -euo pipefail is four characters, parameterized queries are a one-line swap, moving a secret key to an environment variable takes about ninety seconds. The complexity is in the sheer volume of them. Five years in, your mental checklist for "things that will silently wreck you if you forget" is long enough that forgetting one isn't a question of skill. It's just probability.

And the list keeps growing. Every language adds entries. Every framework has its own charming defaults. Every new tool comes with its own set of rituals that aren’t in the README, only in the GitHub issues filed by people who hit the wall before you.

The better-designed tools are trying to fix this. Rust makes a whole category of memory mistakes structurally impossible. Go forces you to handle errors explicitly you can ignore them, but you have to do it on purpose. Some newer frameworks ship with secure defaults and make the insecure path the harder one. That’s the right direction: push the ritual into the language, not into the developer’s memory.

But most of the stack the average engineer works with wasn’t built that way. It was built to be flexible, to be backwards-compatible, to not break the scripts people wrote in 1994. Which is fine. It just means the ritual tax is real, it’s ongoing, and it falls on you.

None of this is your fault. The defaults were set before you got here. The design decisions made sense in context. You inherited a codebase full of traps that were laid by reasonable people under reasonable constraints, and your job now includes knowing where they are.

What I’d push back on is the idea that collecting rituals is the same as getting better. It’s not. It’s just getting older in the industry. The devs who actually level up are the ones who, after getting burned, ask why the trap was there at all and either document it properly, automate the check, add it to the linter, or at minimum write the post-mortem that means the next person doesn’t spend three days finding a twenty-minute fix.

AI tooling is starting to catch some of these Claude Code will flag a missing set -e, Copilot will suggest parameterized queries, static analysis has been doing this for years in some languages. It helps. It's not complete coverage, and it's not a substitute for understanding why the ritual exists, but the gap between "things that will silently break your code" and "things your tooling catches before prod" is narrowing.

Until it closes: keep the list. Add to it every time something burns you. Share it with the next person who joins your team before they find out the hard way.

It’s not our fault the list exists. But it is our job to pass it on.

What ritual burned you hardest? Drop it in the comments I’m genuinely collecting them.

Helpful resources

Bash set builtin official manual
Cloudflare 2019 outage post-mortem the pipe garbage incident in detail

SQL injection / dumb substitution

OWASP SQL Injection the canonical reference
OWASP Query Parameterization Cheat Sheet language-by-language examples
xkcd #327 Exploits of a Mom required reading, forever

Django secret key

Django SECRET_KEY docs
django-environ cleaner .env handling for Django projects
GitHub secret scanning docs worth enabling on every repo

Python async

asyncio.create_task Python 3 docs the warning is in there, blink and you’ll miss it
PEP 654 Exception Groups and TaskGroup the 3.11+ solution
asyncio TaskGroup docs

I stopped using ChatGPT for 30 days. What happened to my brain was terrifying.

Sat, 04 Jul 2026 09:32:53 +0000

The experiment nobody warned me I needed to run.

I was writing a three-sentence status email.

Not a system design doc. Not a complex PR description. A status email.

And I caught myself opening ChatGPT before I’d finished the second sentence.

That’s when I realized something had gone wrong. Not with the tool with me.

So I ran an experiment. 30 days. No ChatGPT. No AI writing assist. No Copilot. Just me, a keyboard, and whatever was left of the brain I’d spent years building and then quietly outsourced to a server farm.

What happened was uncomfortable. Then revealing. Then honestly kind of obvious in hindsight.

TL;DR: AI didn’t make me stupid. But I’d made thinking optional, and my brain had taken the offer. This is what 30 days of going cold turkey actually looked like and what I changed on the other side.

The dependency I didn’t know I’d built

There’s a concept called cognitive offloading using external tools to handle mental tasks your brain would otherwise process.

Writing a grocery list is cognitive offloading. Setting a calendar reminder is too. These are fine. Great, even.

The problem is when offloading becomes total.

Researchers at UC Santa Barbara found that when people consistently outsource a cognitive task to an external system, their ability to perform that task independently degrades. The neural pathway responsible gets less activation. Less reinforcement. It doesn’t disappear it just gets harder to access.

For developers, this happened in a very specific sequence.

It started with boilerplate. Fine. Then debugging shortcuts. Reasonable. Then architecture brainstorming. Then documentation. Then PR descriptions. Then email.

Then thinking.

The old Stack Overflow workflow had friction built in. You had to read five answers, reconcile conflicting opinions, understand your problem well enough to evaluate whether a solution applied. That friction was annoying.

It was also where the thinking happened.

ChatGPT removed the friction. Which felt like an upgrade until the friction turned out to be the feature.

The tell for me was a service boundary design session. Twenty minutes in, I hadn’t made a single decision. I’d been refining my prompt.

I wasn’t thinking about the problem. I was thinking about how to describe the problem to the AI.

Those are not the same skill. One builds domain knowledge. The other builds prompt fluency. Both useful. But I’d let one replace the other without noticing.

The 30-day log

I want to be clear: I didn’t do this to prove AI is bad. I did it because I’d noticed a specific, measurable decline in my ability to think independently.

Here’s what actually happened.

Week 1: the reach

Days one through three were fine. Then the reach started.

Every time I hit cognitive friction mid-paragraph, mid-design, mid-debugging session my hand moved toward a new tab. Before my brain had even registered the obstacle.

I started counting. Day five: eleven reaches. Eleven times I caught myself about to outsource a thought I hadn’t finished having yet.

The worst moment was a strategy doc for a service migration. I stared at a blank page for twenty minutes. Not because I didn’t know the answer. Because I’d lost the tolerance for not knowing it immediately.

Week 2: the grind

Everything took longer. PR descriptions that took three minutes took ten.

But around day ten something shifted. I started actually reading error messages. Not skimming for keywords to paste into a prompt. Reading them.

I caught a subtle race condition on day twelve that I’m fairly certain I would have missed with AI assist on. The AI would have pattern-matched to the surface bug. I would have shipped the fix without finding the underlying concurrency issue.

Small thing. But it stuck.

Week 3: the return

Around week three, something comes back.

It doesn’t feel like a superpower. It feels like a muscle waking up — that slightly uncomfortable awareness of something starting to work again.

Ideas started arriving differently. Not from prompts. From the background hum of just thinking.

I’d be mid-run, not trying to solve anything, and a clean solution to a two-day-old problem would just surface. That used to happen to me constantly. Somewhere in the past year it had mostly stopped.

Turns out I’d been interrupting the incubation phase every time I reached for the prompt box.

Week 4: clarity

By day twenty-five I wasn’t faster than I’d been with AI assistance. The tools are genuinely faster for certain tasks.

But I was thinking differently. Problems felt three-dimensional again. I was holding more context simultaneously, making more lateral connections, writing code that reflected actual decisions rather than synthesized suggestions.

Day thirty: a 1,200-word technical design doc, just under an hour, no prompts. The coffee went cold because I forgot it was there.

That hadn’t happened in a while.

What AI is actually doing to your brain

This isn’t a hot take. This is neuroscience.

When you perform a cognitively demanding task repeatedly, the neural pathways responsible for that task undergo myelination the process by which neurons become faster and more efficient. This is how expertise is built. Repetition, resistance, reinforcement.

When you consistently offload that task, those pathways get less activation. The myelin doesn’t build. The skill stays effortful instead of becoming automatic.

There are two cycles at play here and they compound in opposite directions.

The distraction cycle: every time you feel cognitive friction and reach for AI, you get a dopamine hit from the instant answer. Your brain learns that friction = reach for tool. The tool removes the discomfort. The discomfort was the training stimulus. You never leave the beginner phase.

The focus cycle: every time you push through the friction, you force the pathway to activate. It gets stronger. Problems that required conscious effort start requiring less. You enter flow states more easily. Learning compounds.

Most developers are running the distraction cycle without realizing it.

Cal Newport’s research on deep work puts the average knowledge worker at roughly 90 minutes of genuine deep work per day. For developers leaning heavily on AI assist, anecdotally that number is probably lower because the reflex to outsource friction fires before you even register that you were about to think.

The good news: this is reversible. Habits built the dependency. Habits can break it.

The bad news: the first two weeks are genuinely uncomfortable, which is exactly why most people don’t make it past them.

Why focused devs have never had a bigger edge

Here’s the thing nobody talks about.

While AI dependency is quietly degrading the independent thinking capacity of a large portion of the developer workforce, the bar for standing out has dropped dramatically.

You’re not competing with exceptional developers. You’re competing with their attention spans.

The average developer in 2025 starts more than they finish. Jumps between tools. Vibe codes something, hits a wall, asks AI for a fix, gets a fix that introduces two new problems, asks for fixes for those, ends up three layers deep in AI-generated code they don’t fully understand.

The developer who can sit with a problem, actually think through it, hold the full context, and ship something they genuinely understand that person is increasingly rare.

And rare things are valuable.

Press enter or click to view image in full size

This isn’t about being anti-AI. The best developers I know use AI aggressively. They’re just using it as a tool, not an oracle. There’s a difference between “I’ll use AI to generate the boilerplate for this auth middleware while I think through the actual security model” and “I’ll ask AI what the security model should be.”

One leverages the tool. The other outsources the thinking.

The developers who figure out that distinction who keep the thinking in-house and let AI handle the execution are going to have an enormous edge over the next few years.

Because most won’t.

What I actually do differently now

I came out of the 30 days with a different relationship to AI tools, not a rejection of them.

Here’s what changed practically:

The 30-minute rule. Any problem I’m working on, I spend at least 30 minutes on it before I open an AI tool. Not because AI isn’t helpful. Because forming my own hypothesis first means I can actually evaluate whether the AI’s answer is good. Without the hypothesis, I’m just accepting whatever sounds plausible.

AI for execution, not for thinking. I use AI heavily for boilerplate, for drafting documentation I’ve already outlined in my head, for writing tests once I’ve designed the logic. I don’t use it to figure out what the logic should be.

I read error messages again. Fully. Before I do anything else.

I write things out before I prompt. If I’m going to use AI to help with something complex, I write out my own rough answer first. Even if it’s wrong. Especially if it’s wrong. The act of articulating a position even a bad one makes the AI’s response dramatically more useful, because now I’m evaluating an answer rather than receiving one.

None of this is revolutionary. It’s just deliberately putting friction back in the right places.

The part I didn’t expect

Thirty days after the experiment ended, I still haven’t gone back to the old patterns.

Not because I’m disciplined. Because the new ones feel better. The work feels like mine again. The decisions feel considered. The code feels understood rather than assembled.

That’s the thing nobody tells you about AI dependency: you don’t notice what you’re losing while you’re losing it. The decline is smooth. Each individual shortcut feels reasonable. The compound effect is invisible until you stop and look back.

I’m not telling you to quit AI tools. I use them every day.

I’m telling you to occasionally check whether you’re using them as a lever or a crutch. Because there’s a version of this where AI makes you dramatically more capable. And there’s a version where it makes you comfortable and slow and slightly confused about why your work feels hollow.

The difference is whether you kept the thinking.

Resources worth reading

Deep Work by Cal Newport the foundational framework for this whole conversation
Cognitive Offloading research UC Santa Barbara the actual science behind the dependency pattern
Andrej Karpathy on Software 3.0 the most honest framing of where AI fits in a dev workflow
The Shallows by Nicholas Carr how digital tools reshape cognition, written before LLMs but more relevant than ever
Paul Graham Disconnecting Distraction short, worth reading

AI won’t replace you, but bad AI habits will

Mon, 15 Jun 2026 15:49:09 +0000

A blunt playbook for devs who don’t want to turn into autocomplete zombies.

The first time an AI wrote code for me, I felt like I had unlocked cheat codes for real life. I typed a half-baked function name, hit enter, and suddenly I had a block of code that looked legit. It was magical. The second time, though? It suggested something so catastrophic basically the programming equivalent of pulling the fire alarm that I realized: this thing is less “mentor” and more “overconfident intern who thinks they know pointers but actually just broke prod.”

That’s where most of us are right now. AI is everywhere: in our IDEs, our docs, even sneaking into PR reviews. Some days it feels like rocket fuel; other days it feels like an autocomplete with a drinking problem.

The tricky part isn’t whether AI is “good” or “bad.” The tricky part is how we, as developers, use it without becoming lazy, dependent, or worse complacent. Because here’s the uncomfortable truth: AI won’t replace you, but bad AI habits absolutely will.

TLDR: This article is a survival guide for developers in the AI era. We’ll break down why AI feels both magical and mid, the five switches that make AI actually useful, when to trust and when to verify, how to use AI as a research assistant (not a code monkey), the dangers of autocomplete brain, and a playbook for building a healthy workflow.

Why AI feels both magical and mid

Every dev I know has had that moment with AI. The first time it autocompleted a function and nailed it, you probably thought: “Wow… this thing just saved me half an hour.” It’s the same dopamine hit as discovering ctrl+r in bash or realizing you can pipe grep into less. Pure wizardry.

But the honeymoon ends quickly. The same tool that wrote a clean utility function also happily hallucinates imports that don’t exist, invents APIs, and will confidently explain things that are flat-out wrong. It’s like pair programming with someone who sounds senior but has never actually shipped code.

The magic-mid paradox comes from two truths living side by side:

AI is fast and confident. It fills the silence instantly, which feels great when you’re stuck.
AI is also wrong, a lot. Not always in spectacular ways sometimes it just misses edge cases or forgets how a library actually works.

The result? You get addicted to the speed, but burned by the trust. One minute you’re flying, the next you’re undoing a migration because AI forgot about a foreign key constraint.

Developers on Stack Overflow noticed this quickly so much so that AI-generated answers were banned because they were too often wrong but written with scary confidence. Hacker News threads echo the same:

“Feels powerful, but I can’t trust it.”

And that’s the real catch. AI isn’t here to replace you. It’s here to test whether you still think like an engineer, or whether you’re willing to trust an autocomplete with swagger.

The five switches framework

Using AI effectively isn’t about “prompt engineering wizardry.” It’s about flipping the right switches at the right time. After months of testing (and plenty of bad code reviews), I’ve boiled it down to five controls that separate “autocomplete brain” from “actually useful teammate.”

1. Reasoning mode

AI defaults to spitting out the most common answer. That’s fine for boilerplate, but when you’re debugging or designing, you need it to think step by step.

Before (default):

# Prompt
Write a regex that validates emails.

# Output
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}$

Looks okay… until you realize it fails on example@localhost.

After (reasoning):

# Prompt
Think step by step and list edge cases before writing a regex for emails.

# Output
Edge cases: localhost, subdomains, quoted strings...
Regex: (full RFC5322-compliant pattern)

Still gnarly, but now it’s at least considering reality instead of hallucinating confidence.

2. Verbosity control

Sometimes you want “explain like I’m five.” Sometimes you need “write the RFC.” Most devs forget you can actually control this.

Low verbosity: quick code snippet, zero fluff.
High verbosity: detailed breakdown with trade-offs.

It’s like switching between ls and ls -la. Same tool, different levels of detail.

3. Tooling

Don’t let AI just “guess.” Route it to tools when possible: docs, REPLs, diagrams. If your setup allows retrieval (docs fetching) or code execution, use it. AI without tools is like a dev without man pages dangerous.

4. Self-reflection prompts

The easiest hack: ask AI to critique itself.

“What could be wrong with your answer?”
“List three failure cases.”

Nine times out of ten, it catches something you missed. It’s the rubber duck debugging effect, but automated.

5. Rubrics / meta-prompts

Structure beats vibes. Instead of “write me a design doc,” try:

“Follow this rubric: Problem → Constraints → Options → Risks → Recommendation.”

Before: bland wall of text.
After: structured doc you could actually drop into a repo.

The point

These five switches aren’t about tricking the model. They’re about managing it like you would a junior teammate: sometimes you need short answers, sometimes deep reasoning, sometimes structured artifacts. If you don’t flip the right switch, don’t be surprised when it gives you garbage-tier output.

When to trust, when to verify

AI is like that one coworker who’s great at banging out boilerplate but absolutely should not be let near production. The trick is knowing when you can trust it versus when you need to verify every single line.

Trust-worthy zones:

Generating CRUD scaffolding.
Writing repetitive test stubs.
Summarizing docs you’ll double-check anyway.

High-risk zones:

Database migrations.
Authentication logic.
Anything touching production infra.

Think of it like code reviews: you don’t sweat a for-loop refactor, but you triple-check schema changes. AI should be held to the same standard.

Here’s a quick story: I once asked an AI to generate unit tests. Looked fine. All tests passed. Victory, right? Wrong. It had silently tested the wrong function. Everything was green because the assertions were nonsense. That’s when I realized: green doesn’t mean correct, it just means consistent.

Even Stack Overflow caught onto this AI answers looked legit but were often wrong, so they banned them. If one of the largest dev Q&A platforms can’t trust it unsupervised, why should you?

Bottom line: use AI where it accelerates, but verify like your job depends on it because it does.

AI as research assistant, not code monkey

The best way I’ve found to use AI isn’t as a code generator at all it’s as a research buddy. Treating it like a junior dev who can draft architecture diagrams, outline RFCs, or brainstorm test cases is way more effective than asking it to brute-force production code.

Example: I once asked AI to design an OAuth flow. What I got back was boilerplate diagrams and generic “best practices.” Useless. Then I flipped the script: instead of asking it to design, I gave it my design and told it to critique. Suddenly I got a list of risks, edge cases, and even alternative libraries to consider. That’s value.

Another underrated trick: use it to draft headings or structures. For a design doc,

AI can spit out:

Problem → Constraints → Options → Risks → Recommendation.

Then you fill in the engineering meat. It’s like having a personal tech writer who never complains.

There’s a reason tools like GitHub RFC templates exist: structure matters. AI is great at scaffolding, but you need to provide the judgment and the trade-offs.

So stop asking AI to be your code monkey. Start asking it to be your overcaffeinated research assistant. It’ll still hallucinate, but at least the stakes are lower.

The danger of autocomplete brain

Here’s the uncomfortable truth: the biggest risk with AI isn’t bad code it’s bad habits. If you lean on it too much, you start losing the ability to think through problems yourself. Call it autocomplete brain.

It’s the same loop we all fell into with Stack Overflow back in the day. Copy, paste, ship. You get the dopamine hit of “solving” something without actually understanding it. Multiply that by 10 when AI serves you answers instantly and confidently.

I’ve caught myself in this trap. I once spent nearly an hour debugging with AI, chasing one nonsense suggestion after another. Only when I finally opened the logs myself did I see the obvious error. I wasn’t solving problems anymore I was outsourcing my thinking to an overconfident autocomplete.

This is where burnout creeps in. You’re coding all day, but you’re not learning. You’re not building intuition. And when something truly breaks the kind of bug that requires actual systems thinking you’re suddenly lost.

If you’ve ever felt like you’re coding but not growing, check your habits. Autocomplete brain doesn’t show up overnight, but it’ll hollow out your skills if you let it.

Healthy AI/dev workflow (the playbook)

AI isn’t the enemy. Bad workflow is. The devs who thrive with AI aren’t the ones who let it write everything they’re the ones who treat it like a turbocharged assistant, then layer human judgment on top.

Here’s the playbook I’ve landed on after many facepalms:

Draft with AI
Let it generate the boring stuff: scaffolding, test stubs, outlines. Don’t expect perfection just raw clay.

Verify with docs, logs, tests
Before touching prod, check its output against the actual docs or run it in a sandbox. Logs don’t lie.

Refine with rubrics
Ask AI to restructure or critique. Example: “Follow Problem → Constraints → Options → Risks → Recommendation.” Now you get something useful instead of a wall of text.

Human final judgment
If you wouldn’t merge code from a junior without review, don’t merge AI’s output without review. Same rule.

Decision matrix

This matrix isn’t gospel it’s a sanity check. The point is to stop treating AI like an oracle and start treating it like a tool you configure. If you add even this much structure to your workflow, you’ll avoid 90% of the garbage-tier outputs that lead to wasted hours.

What’s next: router models & smarter tools

AI isn’t standing still. The next wave of tools won’t just be “one big model does everything.” They’ll be router models systems that quietly decide which sub-model or tool should handle your request. Think of it like a senior engineer who knows when to grab the database person, the security person, or the intern for boilerplate.

OpenAI already hinted at this in their system card: when you ask something complex, it can route parts of the query to specialized solvers. That’s why one moment it’s good at summarizing research, and the next it’s drafting halfway-decent code. Behind the curtain, it might not be the same model doing both.

This is exciting, but also risky. Some researchers have praised GPT-5’s problem-solving chops, noting that it produced surprisingly strong results on hard math problems. Others pointed out the obvious: the results were “impressive, but within reach for an expert.” In other words, cool demo but don’t throw out your textbooks yet.

The real future probably isn’t one mega-model ruling everything. It’s orchestration: devs deciding when to lean on reasoning, when to force rubrics, when to route queries. Tools will get smarter, but the responsibility to use them wisely will stay on us.

Conclusion

AI isn’t here to steal your job. But it can absolutely steal your edge if you let it. The devs who survive the AI wave won’t be the ones who let autocomplete write their apps they’ll be the ones who know when to draft, when to verify, and when to flat-out ignore the shiny suggestion.

Here’s the uncomfortable part: if you stop thinking critically, you’re basically just a human captcha. AI doesn’t need more prompt typers; it needs engineers who can orchestrate workflows, verify outputs, and push it beyond surface-level answers. That’s the skill set that will separate “AI user” from “AI abuser.”

I’ve seen both sides in my own projects: the moments where AI made me feel unstoppable, and the moments where I realized I’d trusted a hallucination and wasted hours. The difference was never the tool it was how I used it.

So here’s my take: AI won’t replace you. But bad AI habits will. What’s your worst AI fail story? Drop it in the comments I guarantee you’re not the only one who trusted autocomplete a little too much.

Helpful resources

OpenAI Cookbook practical guides for prompting, evaluation, and workflows.
Stack Overflow AI ban why AI answers got blocked.
GitHub RFC templates structure your design docs like the pros.
Reddit’s r/programming & Hacker News ongoing dev community debates on AI.

MCP servers just made your AI agent actually useful in prod

Mon, 15 Jun 2026 15:40:06 +0000

Your Claude can write Terraform. Can it tell you your cluster is on fire right now? In 2026, the answer is finally yes if you’re plugged in right.

There’s a moment every DevOps engineer has experienced at least once. You’re mid-incident, something is broken in prod, and you decide against your better judgment to ask your AI assistant what’s going on. It confidently tells you to run a kubectl command. You run it. Things get worse. The AI had no idea what your cluster state was. It was just pattern-matching from training data, cosplaying as an SRE.

That’s not a dig at AI. That’s a fundamental architecture problem. The model has no eyes. It’s brilliant in a vacuum it can write your Helm charts, explain your runbooks, draft your postmortems but it cannot see your live Prometheus metrics, your failing pods, or the PagerDuty alert that fired six minutes ago. It’s like hiring the smartest engineer you’ve ever met, then never giving them VPN access.

That’s the gap Model Context Protocol is closing. MCP Anthropic’s open standard, shipped in late 2024 is basically USB-C for AI agents. It gives models a standardized way to connect to real, live tools. Not a snapshot. Not training data. The actual system, right now. And in 2026, the DevOps MCP ecosystem has quietly gone from “interesting experiment” to “wait, this actually works.”

TL;DR: MCP servers are what turn your AI assistant from a really smart text box into something that can meaningfully participate in your infrastructure. This article breaks down the 10 worth knowing about, how to think about picking them, and what it all means for the SRE role going forward.

What MCP actually is (and why it’s not just hype)

If you’ve ever set up an LSP in Neovim, you already understand MCP intuitively. You had a language servera separate process that knew everything about your codebase and your editor talked to it over a standard protocol. You didn’t hardcode autocomplete for every language into Neovim itself. You just taught it how to talk to something that already knew.

MCP is the same idea, but for AI agents and external tools. Instead of every company building bespoke integrations custom API glue, one-off tool wrappers, fragile function-calling hacks you have a standard protocol that any AI agent can speak and any tool can expose. The model doesn’t need to know the internals of your Kubernetes cluster. It just needs to know how to talk to the MCP server sitting in front of it.

Anthropic dropped the MCP spec in late 2024 and it landed with the energy of every good open standard: quiet at first, then suddenly everywhere. By early 2026, GitHub, AWS, Docker, HashiCorp, and Datadog all have official MCP servers. The community registry has hundreds more. It went from “Anthropic internal thing” to “the way you connect AI to tools” faster than most people expected same arc as Docker, same arc as LSP itself.

The real unlock isn’t speed, though everyone sells it that way. The real unlock is accuracy. Before MCP, your agent was operating on stale training data and whatever you pasted into the context window. With MCP, it’s pulling live state actual pod status, real alert history, current IAM policies. The difference between those two things is the difference between a weather app and a weather station. One is telling you what usually happens. The other is telling you what’s happening right now.

That distinction matters a lot when prod is down.

The DevOps gap AI couldn’t fill until now

Here’s an honest recap of AI in DevOps before MCP: great at writing things, useless at knowing things. You could ask it to scaffold a Terraform module and it’d do a decent job. You could ask it to explain a Kubernetes concept and it’d nail it. But the moment you needed it to participate in something live an incident, a deploy, a capacity decision it fell apart. Not because the model was dumb. Because it was blind.

The failure mode was always the same. You’d describe your situation in natural language, the model would generate a plausible-sounding response, and then you’d discover it was using a Terraform provider version from two years ago, or a kubectl flag that got deprecated, or an AWS API that no longer existed. It wasn’t lying to you. It genuinely didn’t know any better. It had no access to your actual environment, your actual versions, your actual state. It was doing its best with a blindfold on.

What’s the point of a copilot that can’t see the cockpit?

The incident war stories are universal at this point. Someone asks an AI for a rollback command during a bad deploy. The command looks right. It runs. It makes things worse because the model didn’t know the current replica count, the current image tag, or that the PVC had already been updated. The AI wasn’t wrong in theory. It was wrong about your specific cluster, at that specific moment, in that specific state. And in infrastructure, that gap between theory and reality is exactly where incidents live.

MCP closes that gap by giving the agent actual context not described context, not pasted context, live context pulled directly from the tools your team already uses. Your agent stops guessing what your cluster looks like and starts reading it. That’s not a small upgrade. That’s the difference between a brilliant intern who’s never been given VPN access and one who’s actually in the system, looking at the same dashboards you are.

The 10 MCP servers worth your attention in 2026

This is the part where most articles throw a numbered list at you and call it a day. We’re not doing that. Each of these is worth understanding what it actually does, why it matters for real DevOps work, and where it fits in your stack.

GitHub MCP server

Best for: Platform engineers, backend developers, DevOps leads

Key capabilities: PR creation and review, issue triage and labeling, repo search, CI/CD status queries, code search, branch management, release tracking

This one’s official, Anthropic-maintained, and the first one most teams reach for because everyone’s already on GitHub. The practical use case that clicked for me: asking Claude to triage open issues by priority, label them, and draft responses without touching the GitHub UI once. That’s not a demo. That’s Tuesday morning. It also handles CI status queries natively, so your agent can check whether the pipeline passed before suggesting a merge. Sounds small. Saves a surprising amount of tab-switching. github-mcp-server

AWS MCP server

Best for: Cloud engineers, infrastructure teams, FinOps-curious SREs

Key capabilities: EC2, S3, IAM, CloudWatch access, cost and usage queries, resource inventory, misconfiguration detection, multi-service coverage across the AWS ecosystem

The official AWS Labs entry, built in collaboration with Anthropic. The “just describe my infra” use case is where this shines ask it what’s running, what’s expensive, what’s misconfigured. It’s not magic, but it’s a lot better than grepping through the console at midnight. The IAM query support is where it quietly earns its keep least-privilege audits that used to take an afternoon now take a prompt. aws-mcp

Kubernetes MCP server

Best for: Platform engineers, SREs

Key capabilities: Full cluster visibility across multiple clusters, natural language pod/deployment/service diagnostics, read-only mode for safe inspection workflows, OpenShift support, direct Kubernetes API integration (not kubectl CLI wrapping)

Real-time pod status, deployment state, node descriptions, events all queryable in plain language. The community-built mcp-k8s-go is the one most teams are running, and the direct Kubernetes API integration matters more than it sounds. It’s not wrapping kubectl and parsing stdout it’s talking to the API server directly, which means cleaner data and no dependency on your local kubeconfig gymnastics. The read-only mode is what makes this safe to hand to junior engineers and on-call rotations without a lengthy approval process. Ask it why a pod is crash-looping and it’ll actually look at the events, not just guess.

Datadog MCP server

Best for: SREs, on-call engineers, platform teams running cloud-native observability

Key capabilities: Live metrics and dashboard queries, monitor and alert status, incident history, log search, APM trace access, SLO tracking

The “is prod on fire?” server. Live metrics, dashboard data, monitor status, alert history all accessible without leaving your agent workflow. Datadog’s official MCP integration is polished and the use case is obvious: incident triage without tab-switching. When your agent can pull the exact metric spike that triggered the alert, correlate it with a recent deploy, and surface the relevant APM traces the postmortem practically writes itself. The SLO tracking access is underrated too. Instead of manually checking error budget burn rate, you ask. That’s the kind of friction removal that compounds over a quarter.

Terraform MCP server

Best for: Infrastructure engineers, DevOps teams managing multi-environment IaC

Key capabilities: Plan inspection, state queries, drift detection, resource validation, workspace management, module dependency resolution

HashiCorp-backed and pairs absurdly well with Claude Code. The workflow that’s become common: describe a change in natural language, have the agent generate the Terraform, validate it against the real state, and flag drift before you apply. It’s not replacing your terraform plan review. It's making it faster and harder to skip. The drift detection use case is where this earns its place in production workflows catching state divergence before it becomes an incident instead of after.

Prometheus MCP server

Best for: SREs and platform teams running open-source observability stacks

Key capabilities: Live PromQL query execution, alert rule inspection, target health and scrape status, recording rule validation, metric label exploration

For teams running open-source observability stacks, this fills the Datadog-shaped gap without the Datadog-shaped bill. Live PromQL queries against your real data, alert rule inspection, target health checks. The agents that can write and validate PromQL against your actual metrics are genuinely useful during capacity planning not just incidents. Metric label exploration alone saves the 10 minutes of curl | jq spelunking you do every time you forget what labels a service is exporting.

Docker MCP server

Best for: Developers, DevOps engineers, platform teams managing containerized workloads

Key capabilities: Container lifecycle management, image inspection and vulnerability queries, compose stack operations, volume and network status, registry integration

Docker’s official mcp-servers repo is actively maintained and covers the workflows that eat developer time during local and staging environment debugging. Container lifecycle management, image inspection, compose operations, volume status. Less critical for pure cloud-native teams running everything in managed Kubernetes essential for everyone else. The image vulnerability query support is a quiet win: ask your agent whether a base image has known CVEs before you promote it to prod.

ArgoCD MCP server

Best for: Platform engineers and DevOps teams running GitOps workflows

Key capabilities: Application sync status and health checks, rollback triggers, Git diff views, multi-cluster app visibility, sync policy inspection

For the GitOps-pilled crowd. Sync status, application health, rollback triggers, diff views, multi-cluster visibility all queryable without opening the ArgoCD UI. If your team is already living in ArgoCD, having your agent able to query app state and initiate syncs is the kind of quality-of-life improvement that’s hard to go back from. The multi-cluster visibility is what bumps this from useful to genuinely powerful one prompt to check sync health across all your environments instead of clicking through cluster after cluster.

PagerDuty MCP server

Best for: SREs, on-call engineers, engineering managers tracking reliability trends

Key capabilities: Incident lookup and acknowledgment, escalation policy queries, on-call schedule inspection, MTTR and incident pattern analysis, service dependency mapping

The middle-of-the-night use case is obvious. Less obvious: using it proactively during business hours to understand incident patterns, MTTR trends, and which services are generating the most noise. The data has always been there. Now your agent can actually read it, surface the signal, and help you make the case for reliability investment before the next big outage makes that case for you instead.

HashiCorp Vault MCP server

Best for: Security engineers, platform teams, anyone running secrets management at scale

Key capabilities: Secrets engine status, policy inspection, lease and token management, audit log queries, PKI certificate status without secrets ever entering LLM context

The most security-sensitive entry on the list and the one that requires the most care. The critical design detail: the Vault MCP server is built so your agent can reason about secrets infrastructure policy coverage, lease expiry, audit anomalies without the secrets themselves ever hitting the LLM context. That’s not a minor implementation detail. That’s the whole security model. If you’re running Vault, understand this boundary before you deploy. The capability is genuinely useful. The risk surface, if misconfigured, is genuinely serious.

How to pick the right ones without breaking your agent

Here’s the temptation: you read a list like that, get excited, and install all ten. I get it. It feels like power-ups. More servers, smarter agent, better DevOps. That’s not how this works.

Context window bloat is real. Every MCP server you connect adds tool definitions your agent has to reason about on every request. Past a certain point, you’re not giving your agent more capability you’re giving it more noise to filter through before it can do anything useful. The analogy that fits: this is exactly what happens when you install every VS Code extension you’ve ever found interesting. Individually they all made sense. Collectively your editor takes 40 seconds to open and the autocomplete is fighting itself.

Start with three. Pick based on where your team actually loses time, not based on what sounds impressive.

The framework that works: observability first, then infra layer, then your SCM. If you’re on Datadog, start there incident triage is where the ROI is most immediate and most obvious. Pair it with either the AWS or Kubernetes server depending on where your infra actually lives. Then add GitHub. That trio covers the majority of real DevOps workflows: something breaks, you find it, you trace it back to a change, you fix it. Three servers, full loop.

What does your team spend the most time doing manually during an incident? Answer that honestly and the right servers become obvious. If you’re spending 20 minutes per incident correlating metrics to deploys, Datadog plus GitHub is your first move. If you’re constantly SSHing into nodes to describe pods, Kubernetes MCP clears that up fast. If your Terraform state drift is a recurring problem, that one pays for itself in the first week.

Security and read-only modes matter more than people realize upfront. The Vault server especially deploy it wrong and you’ve created a problem that’s worse than the one you solved. Start every new MCP server integration in read-only or inspection mode where the option exists. Let your team build trust with it before you enable write operations. The Kubernetes server’s read-only mode exists for exactly this reason, and it’s worth using it for longer than feels necessary.

The MCP community server registry is worth bookmarking for what comes next it’s growing fast and the quality has gone up considerably since the early days of everyone publishing half-finished experiments. But don’t chase the registry. Chase your actual bottlenecks.

What this means for the SRE role (the honest version)

Let’s not do the thing where we pretend this is all upside with no complexity. Every time tooling gets significantly more powerful, the role around it shifts and MCP-connected agents are a meaningful shift, not a incremental one.

The boring parts of SRE work are going first. Alert triage, metric correlation, runbook execution, postmortem drafting, on-call handoff summaries these are the tasks that eat hours without requiring the judgment that actually makes a senior SRE valuable. An agent with the right MCP servers connected can handle a meaningful chunk of that loop already. Not perfectly. Not without oversight. But well enough that the humans in the rotation are spending less time on the mechanical parts and more time on the parts that actually require thinking.

That sounds like a win. It mostly is. But it also means the SRE who isn’t learning to work with these tools is slowly becoming the SRE who’s doing the parts the agent can’t do yet which, right now, is still a lot, but the list is shrinking faster than most people are comfortable admitting.

The SRE who learns to orchestrate agents with MCP is worth significantly more than the one who doesn’t. That’s not a hot take, it’s just the same pattern we’ve seen every time a layer of abstraction gets good enough to trust. The engineers who understood Docker when it was new didn’t get replaced they became the people who designed the container strategy everyone else ran. Same thing happened with Kubernetes. Same thing is happening here.

What’s actually changing underneath all of this is the model of operations itself. We’re moving toward what some teams are already calling intent-based ops you describe the outcome you want, the agent figures out the sequence of tool calls to get there, and you review and approve rather than execute manually. The tooling is already capable of this in constrained, well-defined workflows. The cultural shift trusting an agent to page you back instead of being the one holding the pager is taking longer, which is probably the right pace.

The SRE subreddit and Hacker News threads on this are genuinely split. Half the comments are engineers who’ve connected a Kubernetes and Datadog MCP server and won’t stop talking about the time it saved during a recent incident. The other half are people pointing out, correctly, that agents with write access to production infrastructure are a new and interesting category of incident cause. Both camps are right. The answer isn’t to avoid the tools it’s to deploy them with the same discipline you’d apply to any system that can affect prod.

The teams that figure that balance out early are going to have a real advantage. Not because the tools are magic, but because operating leverage compounds. Every hour an agent saves on mechanical SRE work is an hour a human engineer spends on reliability improvements, architecture, and the judgment calls that still require a person. That gap widens over time. And the teams on the wrong side of it will feel it before they understand why.

Where this is all going (and what you should do Monday morning)

I’ll be honest a year ago I would have filed “AI agents in DevOps” under vaporware and moved on. The demos were always impressive and the production reality was always messier. The model would hallucinate a flag, or confidently suggest a command that hadn’t existed since Kubernetes 1.18, and you’d remember why you still had a human on-call.

MCP changed the calculus. Not because the models got dramatically smarter though they did but because they finally got eyes. The fundamental problem was never intelligence. It was context. And when you give an agent live access to your actual infrastructure through a standardized protocol, the gap between “impressive demo” and “this is genuinely running in our incident workflow” closes faster than expected.

We’re early, but not as early as it feels. GitHub, AWS, Docker, HashiCorp, Datadog these aren’t startups experimenting with MCP. These are the companies whose tools your team already depends on, shipping official integrations because they’ve decided this is the direction. That’s a different signal than hype. That’s ecosystem commitment.

The uncomfortable truth is that the tooling is ready before most teams are. The cultural and operational trust required to let an agent acknowledge a PagerDuty alert, correlate it to a Datadog spike, trace it to a GitHub commit, and propose a rollback that workflow is technically possible today. Whether your team is ready to trust it is a different question, and it’s a legitimate one. Rushing that trust is how you create a new category of production incident.

So here’s the honest version of what to do with all of this: pick one server, pick one workflow, and run it in read-only mode for two weeks. Not ten servers, not a full agentic pipeline, not a rewrite of your incident process. One server, one workflow, real data. Let your team build intuition for what the agent gets right and where it still needs a human in the loop. That intuition is what makes everything else safe to scale.

The teams who figure this out aren’t going to replace their SREs. They’re going to make their SREs unreasonably effective. And in an industry where reliability engineering talent is expensive and incidents are expensive and toil is quietly demoralizing unreasonably effective is a meaningful competitive advantage.

MCP servers aren’t the final form of AI in DevOps. They’re the connective tissue that makes the whole vision coherent. What comes after fully autonomous remediation, self-healing infrastructure, agents that close the loop without human approval on routine fixes that’s still being figured out. But it starts here. With a protocol, a server, and an agent that can finally see what’s actually happening in your cluster.

What’s the first MCP server your team would reach for? Drop it in the comments genuinely curious whether the split is Kubernetes vs Datadog or if GitHub is the obvious first move for most teams.

Helpful resources

Anthropic’s engineer just told you to stop using markdown. Here’s what’s actually going on.

Wed, 10 Jun 2026 18:04:57 +0000

The “HTML vs Markdown” war broke the internet last week. Both sides got it wrong and the real answer was buried in the footnotes the whole time.

Last week, the engineering lead for Claude Code dropped a post called “The Unreasonable Effectiveness of HTML” and the dev internet split in half before the coffee finished brewing.

Thariq Shihipar who runs engineering on Anthropic’s Claude Code published 20 working examples showing why AI agents should output HTML instead of Markdown. Interactive navigation. Collapsible sections. Color-coded code reviews. Embedded visualizations. Shareable links. The post hit 4.4 million views in 16 hours.

The response was exactly what you’d expect from a community that treats tooling preferences like religion.

Team HTML declared Markdown dead. Team Markdown called it a security risk wrapped in a token tax. Threads filled up. Quote-tweets went sideways. Someone definitely posted a “this is why we can’t have nice things” reply. The usual.

Here’s the problem: both sides were arguing about the wrong question entirely.

The HTML camp got the direction right but hand-waved the costs the 3–5x token overhead, the AI-generated JavaScript risks, and the slightly awkward fact that Anthropic profits directly from you using more tokens. The Markdown camp identified real risks but is defending a set of constraints that haven’t been real since context windows hit a million tokens. They’re optimizing for a 2022 problem in a 2026 world.

The actual question the one neither camp bothered asking is simpler than any of that: who reads this output, and what do they do with it?

That’s it. That’s the whole framework. Everything else is noise.

So let’s talk about how Markdown became the default, why three of its core assumptions are quietly rotting, what the token math actually looks like when you run it, and why the format war was always a distraction from the decision tree that was sitting there the whole time.

How markdown became the default AI output (nobody chose it it was inherited)

Markdown didn’t win a format war. It just kept showing up at the right time, three times in a row, until nobody questioned it anymore.

The first wave was developers. John Gruber built Markdown in 2004 as a way to write readable plain text that converted cleanly to HTML. Convenient tool for bloggers. Then GitHub adopted it for READMEs, issues, and pull request descriptions and overnight, every open-source project on earth was writing Markdown. Not because it was evaluated and selected. Because it was already there.

The second wave was knowledge workers. Through the 2010s, Notion, Obsidian, and Jekyll built their entire editing experience around it. It became the default for wikis, note-taking, and static sites. The pitch was the same every time: human-readable and machine-parseable. Write it in any text editor, render it anywhere. Simple enough that anyone could pick it up in an afternoon, powerful enough that you never really needed anything else.

The third wave was AI. When ChatGPT launched in late 2022, it rendered responses in Markdown. Not because OpenAI ran a format evaluation. Because the training data was saturated with it GitHub repos, technical docs, wikis, blog posts, READMEs as far as the eye could see. Markdown was what the model had seen most, so Markdown was what the model produced. Every chatbot and coding assistant since has followed the same default.

I still have READMEs in repos from 2016 that look structurally identical to my Claude outputs today. Same heading hierarchy. Same bullet pattern. Same code block style. That should’ve been the first clue that something was on autopilot.

Three waves. Each one reinforcing the last. Nobody evaluated Markdown for AI output and decided it was the best fit. It inherited the job because it was already wearing the right shirt from the previous three interviews.

That inheritance is the problem. Because the world Markdown was designed for and the world we’re actually building in now are not the same world. And three assumptions that were completely reasonable when Markdown took over are breaking at the same time.

Three assumptions baked into markdown that are quietly rotting

Markdown became the default AI output under three premises. All three made sense in 2022. None of them really hold in 2026.

Premise 1: Humans edit the output.

Markdown was designed for people who write and revise their own text. That’s still how READMEs, docs, and blog posts work someone opens the file, rewrites a paragraph, pushes a commit. But agent output is different. You send a prompt. The agent generates a 2,000-word implementation plan, a code review, a competitive analysis. You read it. Maybe you share it. You almost never open it in an editor and start rewriting paragraphs.

When was the last time you actually did that?

Took a Claude output, opened it in VS Code, and edited the prose?

The format’s core value proposition easy to write and revise by hand no longer matches the actual use case. The agent wrote it. You’re just the reader now.

Premise 2: Content is small.

A 500-word doc renders fine in Markdown. A 3,000-word agent-generated architecture decision with trade-off tables, code samples, and implementation notes does not. Past roughly 100 lines, Markdown becomes a wall. No navigation, no collapsible sections, no way to jump to the part you actually care about without scrolling through everything you don’t.

Thariq’s observation on this is blunt: nobody really reads a Markdown file longer than 100 lines. They skim, miss things, and close it. The format that was perfect for a README is actively fighting you when the output is a full technical report.

Premise 3: Output is read-only.

The old workflow was linear.

Prompt → generate → read → close.

Done. But the agent era is pushing toward something different. Filter a table. Adjust a parameter. Compare two options side by side. Export a subset. Feed the result back into the next prompt as structured input. Markdown can’t carry any of that. It’s a one-way street with no exits.

Here’s the reframe that cuts through all three premises at once: Markdown is a report. HTML is an interface. You read a report and close it. You operate on an interface and feed the result forward.

That distinction matters more than any token cost calculation. But since token cost is the number everyone keeps citing, let’s actually run it.

The token math nobody actually ran

The main ammunition Team Markdown keeps loading is the token overhead. HTML costs 3–5x more tokens. They say it like it ends the conversation.

Almost nobody checks what that actually means in dollars.

Same 2,000-word report, three formats. Plain Markdown comes in around 3,000 output tokens. Lean semantic HTML proper structure, no heavy styling runs about 7,200. Full HTML with CSS, embedded charts, and interactive sections hits roughly 14,400. The “3–5x” range you’ve seen quoted is real. For rich HTML, you’re burning close to 5x the tokens.

Here’s what that costs per report at current Anthropic API pricing:

Markdown report: ~$0.072 on Claude Sonnet
Lean HTML: ~$0.17
Full HTML with styling: ~$0.34

The overhead on a single HTML report is less than the electricity cost to charge the phone you’re reading this on.

You need to generate 171 HTML reports on Claude Sonnet to spend one extra dollar compared to Markdown. One dollar. That’s the number people are building their entire format philosophy around.

This is what I’d call the Token Trap. Optimizing for a cost that’s a rounding error in your actual engineering budget while ignoring the cost that actually matters.

But the math has a second act, and Team Markdown deserves credit for it.

Scale it up and the numbers shift. At 100 reports per day, the HTML overhead on Claude Sonnet runs roughly $500 a month extra. At enterprise volume — thousands of agent calls daily across a whole platform you’re looking at real line items, not pocket change. Team Markdown isn’t wrong about this. They’re just applying it everywhere instead of where it actually applies.

Here’s what both camps keep skipping: human attention has a price too.

A senior engineer earns somewhere between $75 and $150 an hour. Fifteen minutes spent scrolling a Markdown wall hunting for the architecture decision buried in paragraph nine, re-reading a table that should have been filterable, copy-pasting a section into Slack because there’s no shareable link costs between $19 and $38 in engineer time.

The token overhead for that same report in HTML? Seventeen cents on Sonnet.

The Token Trap runs in both directions. Individual developers waste time debating $0.17. Enterprise teams burn thousands in engineer attention to save hundreds in token costs. In both cases, the format decision is being made on the wrong variable entirely.

The right variable is simpler. It’s not what the tokens cost. It’s who reads the output and what they do with it.

Which is exactly where we’re going next.

The decision tree that ends the format war

Every agent output has one of three audiences. The format choice follows directly from which one you’re dealing with. That’s the whole framework.

Reader 1: A human.

Your stakeholder opens a browser tab. They scan for the section they care about, screenshot a chart for Slack, share a link with the team, click through a collapsible architecture section without reading the parts that don’t apply to them. This is the use case Thariq built 20 examples around code reviews with inline severity colors, implementation plans with jump navigation, design system comparisons with live swatches you can actually interact with.

HTML wins here because the output is a destination. The reader navigates it, operates on it, shares it forward. Markdown flattens all of that into a scroll and hopes for the best.

Reader 2: Another agent.

Your output feeds a downstream pipeline. An agent reads the analysis, extracts structured data, makes a decision, triggers the next step. No human ever sees it. This is where Markdown still wins cleanly lightweight, parseable, diffable, and processable without any rendering overhead. Other models consume it without friction. Git tracks it. CI pipelines process it without choking.

Using HTML for agent-to-agent communication is like printing a spreadsheet, laminating it, and handing it to someone who’s going to retype all the numbers anyway.

Reader 3: Both.

This is the most common case in real engineering workflows, and it’s the one neither camp bothers addressing. A developer generates a PR review they read it themselves, and they also want it tracked in the repo. A team lead generates a weekly status report stakeholders view it in the browser, and the data feeds into next week’s planning prompt. Human and machine, same output, different needs.

The answer here isn’t picking a side. It’s: Markdown source, HTML artifact.

Keep Markdown as the editable, diffable, git-tracked source of truth. Generate an HTML companion for the humans who need to navigate and share it. This is actually what Thariq recommends in his own post it got buried under the tribal response, but it’s in there: keep Markdown in repositories, generate HTML as the companion artifact for review.

Both camps were arguing against a recommendation that never said what they thought it said.

The decision tree is three questions. Does only a human read this?

Use HTML. Does only an agent read this? Use Markdown. Do both read it?

Markdown source, HTML artifact. Screenshot that and you never have to read another format war thread again.

The framework is clean. The real world, predictably, is not.

Where it actually breaks and who profits from this shift

The decision tree is clean. Before you rewrite your CLAUDE.md to default HTML output, here are the risks Team Markdown got right and one they didn't mention at all.

Security is the real concern, not the token bill.

AI-generated HTML can include JavaScript. JavaScript means potential XSS vulnerabilities, local data leaks, and code execution you never asked for and definitely didn’t audit. This isn’t a theoretical edge case. If you’re generating HTML for internal tools, dashboards, or anything that touches real user data, you need either a strict no-JS constraint baked into your prompt or a review step before anything hits production.

Thariq’s own guidelines for generating HTML are pretty direct about this: no external CDN links, no unpkg imports, system fonts only, zero network calls at runtime. The vision is clean. The default behavior of most AI-generated HTML is not. You have to prompt for the guardrails explicitly, and most people don’t.

Accessibility doesn’t come for free.

AI-generated HTML misses WCAG compliance by default. No alt text on images, inconsistent focus order, contrast ratios that would fail a basic audit. If your outputs go anywhere near a public-facing interface or a team with accessibility requirements, you have to ask for it explicitly WCAG 2.2 AA, descriptive alt text, logical focus order, minimum 4.5:1 contrast. It’s solvable. It’s just not automatic, and the HTML enthusiasm tends to skip this part entirely.

Reviewability needs a pattern, not a format change.

HTML diffs are noisy. A one-line content change can generate 50 lines of diff because surrounding markup shifts. For teams that live in pull requests, this is real friction. The fix is the template-plus-data pattern keep the HTML structure static, store variable content in a JSON payload, diff only the JSON. Clean version control, rich visual output. Slightly more setup. Worth it if your team reviews agent output in git.

Now the part most coverage skipped.

Anthropic profits directly from this shift. HTML output burns 3–5x more tokens than Markdown. More tokens means more API revenue. And beyond the immediate billing, HTML output creates ecosystem stickiness once your team builds workflows around Claude-generated interactive reports and dashboards, switching to another model means rebuilding all of those workflows from scratch.

This isn’t a conspiracy theory. It’s just an incentive structure worth understanding before you adopt a recommendation wholesale. The engineer making the case works at the company that gets paid per token. That doesn’t make him wrong. Thariq’s examples are genuinely compelling and the use cases are real.

It just means you should read the footnotes.

The argument for HTML is strong for the right contexts. The tooling, the guardrails, and the security patterns are still catching up to the vision. Both things are true at the same time.

Markdown isn’t dying. It’s being promoted.

Here’s the reframe that actually resolves this: Markdown was always better as a machine-readable format than a human-readable one. The agent era just made that obvious.

Think about what Markdown actually is at its core. Structured plain text with lightweight syntax. Easy to parse, easy to diff, easy to version control, easy to feed into the next system in the pipeline. That’s not a display format. That was always a protocol. We just kept using it as a display format because nothing better had shown up yet for the AI output layer and because the training data made it the path of least resistance.

HTML isn’t the future of everything. It’s the future of output that a human actually needs to read, navigate, and act on. For everything else agent pipelines, git-tracked docs, machine-consumed reports, anything that feeds forward into another model Markdown stays. It just stops pretending to be something it isn’t.

The skill that actually matters now isn’t picking the right format. It’s knowing your reader before you write the prompt. Everything else follows from that one question. Who opens this output? What do they do with it? Do they scroll it, share it, diff it, or pipe it into something else? Answer that and the format choice becomes obvious every time.

The format war was always a distraction. Two camps arguing about tooling aesthetics while the actual decision was sitting there quietly the whole time, waiting for someone to ask the right question.

Thariq’s post wasn’t a declaration of war on Markdown. It was a reminder that the default was never chosen it was inherited. And inherited defaults are worth questioning, especially when the use cases have moved on.

So question it. Not because an Anthropic engineer said to. Because your outputs are being read by humans who deserve better than a 3,000-word scroll with no navigation, and by agents who deserve clean structured text without presentation overhead baked in.

Give each reader what they actually need. That’s the whole job.

Drop a comment if your team has already made the switch or if you’ve hit the security issues nobody’s talking about yet. Would genuinely like to know what patterns people are landing on in production.

Resources

Go didn’t ask for permission. It just took over.

Tue, 09 Jun 2026 04:11:05 +0000

The backend language nobody hyped is now running the infrastructure everyone depends on. Here’s why that happened and what it means for your stack.

There’s a specific kind of dread that hits when you’re onboarding onto a new team and you open the services directory for the first time. You expect Java. You expect some Node mess with 400 packages and a package-lock.json that hasn't been touched since 2021. Maybe Python somewhere doing something it shouldn't be doing.

Instead you find .go files. Everywhere.

No Spring Boot XML. No node_modules folder the size of a small country. Just clean, flat directories and a binary that builds in four seconds. You ask your teammate when they migrated. He shrugs. "We just started writing new services in Go. Nobody made a decision. It kind of happened."

That’s the Go story. No conference keynote. No influencer push. No “Go is the future of everything” Medium posts going viral in 2019. Just engineers, quietly choosing it when the old stuff started creaking and never looking back.

TL;DR: Go didn’t win on syntax. It didn’t win a design award or top a “most loved language” poll. It won because it’s cheap to run, fast to deploy, and terrifyingly good at concurrency. When your cloud bill is real and your Kubernetes cluster isn’t infinite, those things start to matter more than elegant abstractions.

This article is about why the traditional backend stack Java, Node, Python is losing ground, why Go is filling the gap, and whether any of this is actually a problem.

The stack that made sense until it didn’t

For a long time, the backend language decision was basically a personality quiz.

Are you an enterprise company with a procurement team and a strong opinion about XML? Java. Are you a startup that needs to ship in three weeks and your whole team came from frontend? Node.js. Are you doing data work, scripts, or anything that touches a model? Python. This wasn’t laziness it was legitimate. These stacks had ecosystems, hiring pools, Stack Overflow answers, and years of production battle-testing behind them.

The problem wasn’t the languages. The problem was the world they were designed for quietly stopped existing.

The old assumptions were reasonable. Applications were mostly monoliths. Deployments happened weekly, maybe monthly. Memory was expensive so you thought about it, but you weren’t running 40 microservices on a shared Kubernetes cluster where every megabyte shows up on a bill. Concurrency was an advanced topic, not a default requirement. You wrote your servlet, deployed your WAR file, went home.

Modern systems blew all of that up.

Now your backend is expected to spin up in milliseconds because Kubernetes will kill and reschedule your pod without warning. It needs to handle thousands of concurrent connections because that’s just Tuesday traffic. It has to run in a container small enough that you can actually afford the node pool. And it needs to deploy 15 times a day because the team runs CI/CD and nobody’s waiting for a release window.

Java started showing its age first. Not because Java is bad it’s still an engineering powerhouse but because the JVM was built for a world where you had one big long-running process, not a fleet of tiny short-lived containers. A simple microservice doing nothing interesting could eat 800MB to 1.4GB of RAM just warming up. Cold starts on a fresh container pod weren’t milliseconds, they were seconds. At scale, that’s not a performance inconvenience. That’s an infrastructure cost problem with a line item.

Node had a different flavor of the same issue. The event loop is genuinely clever for I/O-heavy work. But the moment you put CPU pressure on it image processing, real-time scoring, anything computationally non-trivial it starts struggling in ways that feel personal. Single-threaded by default. Async complexity that compounds over time into callback archaeology. A node_modules folder that is functionally its own biome. Node is great at what it's great at. The issue is teams kept using it for things it was never meant to do.

Python’s situation is more nuanced because Python earns its keep in data and ML. But for backend services under load, the GIL is a real ceiling, and horizontal scaling through brute-force container replication gets expensive fast. More containers, more workers, more infrastructure all to compensate for what the runtime isn’t doing for you.

The stacks weren’t wrong. They just accumulated assumptions like technical debt invisibly, until the bill came due.

What Go actually is (not what you think)

Most developers, when they first encounter Go, file it under “simple language for simple things.” The syntax is minimal. There’s no inheritance. The standard library does a lot without asking permission. It feels almost boring compared to the type gymnastics you can do in Kotlin or the decorator magic in Python.

That instinct is wrong. And it’s why Go keeps surprising teams who adopt it.

Go wasn’t built to be a better scripting language or a cleaner Java. It was built at Google specifically to deal with the kind of infrastructure problems that would make most engineers quietly update their LinkedIn. Massive distributed systems. Internal platform tooling. Networking infrastructure running at a scale where a 50ms GC pause isn’t a benchmark footnote it’s an incident. The language design decisions that feel like limitations are mostly intentional constraints born from that context.

Simple isn’t a weakness. In production at scale, simple is the whole point.

The feature that actually matters the one that gets undersold in every Go explainer is goroutines. Not because they sound cool, but because of what they cost. Traditional threads are expensive. The OS manages them, they each need their own stack, and spinning up thousands of them is a legitimate systems engineering problem. Goroutines are managed by the Go runtime, start with tiny stacks, and can be created by the hundreds of thousands on a single machine without drama.

func fetchHotelPrice(hotelID string, ch chan Price) {
    price := supplierAPI(hotelID)
    ch <- price
}

func main() {
    hotels := []string{"H1", "H2", "H3", "H4", "H5"}
    ch := make(chan Price)
    for _, hotel := range hotels {
        go fetchHotelPrice(hotel, ch)
    }
    for range hotels {
        result := <-ch
        fmt.Println(result)
    }
}

This is roughly what a real-time pricing aggregator looks like. Except in production there are 800 suppliers and the deadline is 200ms.

That’s not clever code. That’s the point. The concurrency model is so lightweight that you stop thinking about threads as a resource to manage and start thinking about them as a default tool. For API gateways, payment retries, supplier aggregations, event fans the pattern just works, and it works cheaply.

The other thing that quietly changes everything is the binary. Go compiles to a single static binary with no runtime dependencies. No JVM. No Python interpreter. No node_modules folder that needs to follow the service around. You build it, you ship it, it runs. Containerizing a Go service produces an image that can be under 20MB if you're not doing anything unusual. That sounds like a nerdy trivia point until you're managing a 40-service cluster and your node costs drop noticeably.

Go binaries are basically the static sites of compiled programs. Annoying to debug sometimes, but they just go.

The minimal standard library deserves a mention too. A production-ready HTTP server in Go is genuinely small a few imports, a handler function, ListenAndServe. No framework required. No magic middleware stack you have to understand before you can read the code. That simplicity compounds over years. Go code written in 2019 is usually still readable in 2026 without a guided tour through six layers of abstraction.

That’s not an accident. That’s what happens when the language is designed by people who spent years cleaning up the messes that cleverness creates.

The Kubernetes gravitational pull nobody talks about

Here’s the thing that gets left out of every “why Go is winning” conversation.

It wasn’t just engineers choosing Go because they read a benchmark. A huge part of Go’s adoption happened passively, through tooling gravity. The infrastructure ecosystem that every backend team now runs on Kubernetes, Docker, Terraform, Prometheus, etcd, Grafana Loki is almost entirely written in Go. Not partially. Not mostly. The core of modern cloud-native infrastructure is a Go codebase.

That’s not a coincidence. It’s a gravitational field.

When Kubernetes became the default deployment platform, teams started reading Kubernetes source code to understand why their pods were dying at 3am. Then they started writing operators and controllers to automate their own infra. Then someone needed to extend Prometheus. Then someone wrote a custom admission webhook. At every step, the path of least resistance was Go because the documentation assumed it, the examples used it, and the libraries were already there.

Go to the CNCF landscape and count how many of the graduated and incubating projects are written in Go. It’s not a subtle pattern. The cloud-native ecosystem essentially standardized on Go as its implementation language before most teams consciously decided to adopt it. By the time developers noticed, they were already writing it.

I’ve had this exact experience. Opened a Kubernetes controller to understand how a custom resource was being reconciled. Needed to patch something. Ended up writing a small operator. Three months later the team had four Go services in production and nobody had sat down to make a “Go strategy” decision. The tooling pulled us in.

That’s how gravity works. You don’t choose it. You notice it after you’ve already moved.

The Kubernetes effect also had a hiring dimension that companies underestimated. DevOps and platform engineers who lived in the Kubernetes ecosystem started becoming fluent in Go naturally. When those same engineers moved into backend roles or started influencing architecture decisions, Go came with them. It wasn’t a top-down mandate from a CTO who read a whitepaper. It was bottom-up adoption from the people who actually ran the systems.

HashiCorp built Terraform, Vault, and Consul in Go. Cloudflare uses Go extensively across their edge infrastructure and writes about it regularly. Uber migrated significant backend services to Go and published the style guide their engineers now follow. These aren’t startups chasing trends. These are companies that operate at a scale where the language choice shows up in the quarterly infrastructure bill.

The ecosystem compounded. More Go in production meant more Go libraries. More Go libraries meant less friction adopting it. Less friction meant more teams defaulting to it for new services. And once your platform team, your DevOps tooling, and your three newest backend services are all Go, the question stops being “why Go?” and starts being “why not Go?”

That shift is quiet. But it’s also basically irreversible.

The honest tradeoffs Go is not perfect

Let’s not do the thing where we spend 1500 words hyping a language and then add one diplomatic paragraph at the end saying “but every tool has its place.” Go has real friction. Some of it is annoying. Some of it is a legitimate reason to pick something else.

The most famous one is error handling. In Go, errors are just values. You check them manually, every time, at every call site. There’s no try-catch block to lean on, no exception propagation to let you ignore the problem until it blows up two layers up the stack. What you get instead is this:

result, err := doSomething()
if err != nil {
    return nil, err
}

data, err := processResult(result)
if err != nil {
    return nil, err
}
output, err := saveData(data)
if err != nil {
    return nil, err
}

Not a joke. This is a normal Tuesday in a Go codebase. You get used to it. You don’t have to like it.

It’s verbose. There’s no getting around that. A function that does four things will have four if err != nil blocks and it will feel repetitive in a way that makes developers coming from Python or Kotlin visibly uncomfortable. The Go community has made peace with it by arguing that explicit error handling forces you to think about failure paths. That's true. It's also a lot of typing.

Generics arrived in Go 1.18 and they’re still polarizing. Before generics, writing a reusable data structure meant either duplicating code for every type or using interface{} and losing type safety. Generics fixed the worst of that but the implementation has rough edges and the community is still figuring out the idioms. If you're coming from a language with a rich type system, Go's type story feels like it's still catching up.

The minimal philosophy that makes Go readable at scale also makes it genuinely limited for certain problem domains. There’s no magic. Which means no shortcuts. Which means if you want something that doesn’t exist in the standard library, you build it or find a library and hope it’s maintained. For teams that rely on rich framework ecosystems the Spring Boots and Djangos of the world Go’s “just use the standard library” energy can feel like being handed a hammer and told to build a house.

Go is also not Python. That sounds obvious but it matters. If your backend is mostly data science glue code, model serving wrappers, or anything that touches pandas and numpy, Go will make you miserable. Python’s ML ecosystem is 15 years deep and nothing is close. Switching to Go for an ML-adjacent service isn’t engineering discipline it’s self-sabotage.

Same story for fullstack or TypeScript-heavy teams. The Node and TypeScript ecosystem for API development has matured significantly. End-to-end type safety from database to frontend, shared types between client and server, a massive library ecosystem if that’s your world, Go doesn’t improve it. It just makes it different and harder to hire for.

The hiring question is real at smaller companies. Go engineers exist, but the talent pool is smaller than Java or JavaScript. If you’re a 12-person startup and your entire team knows Node, rewriting services in Go because you read a blog post is how you create a bus factor problem. The language needs to match the team, not just the benchmark.

Go is a tool with a specific shape. It fits the infrastructure-heavy, concurrency-intensive, cloud-native backend world almost perfectly. It fits a lot of other worlds poorly, and it doesn’t apologize for that.

Knowing the difference is the actual skill.

Where Go is quietly winning right now

The benchmark debates are fine. The goroutine explainers are useful. But the most convincing argument for Go isn’t theoretical it’s the list of companies that bet their infrastructure on it and didn’t regret it.

Cloudflare runs a significant portion of their edge network in Go. Their engineering blog has post after post about replacing C++ and Lua components with Go services that are easier to maintain, cheaper to operate, and fast enough that the performance tradeoff never materialized. For a company whose entire business model is “be faster than the internet,” that’s not a casual endorsement.

Uber migrated large parts of their backend to Go and published the Uber Go Style Guide a living document their engineers actually follow in production. Not a thought leadership piece. A real internal standard that leaked into the open. Stripe’s infrastructure layer, Dropbox’s backend systems, HashiCorp’s entire product suite Terraform, Vault, Consul, Nomad all Go. These aren’t companies that adopted a trend. These are companies that operate at a scale where the wrong language choice shows up as a real number on a real bill.

The pattern across all of them is consistent. High concurrency requirements. Cost pressure on infrastructure. Small teams relative to system complexity. Operational reliability as a non-negotiable. Go fits that profile so well it almost feels designed for it because it was.

Fintech is where Go’s adoption is probably most aggressive right now. Payment orchestration systems, fraud detection engines, real-time risk scoring these are workloads that need low latency, high throughput, and the kind of predictable performance that doesn’t degrade under load. A GC pause at the wrong moment in a payment flow isn’t a benchmark blip. It’s a failed transaction and a support ticket. Go’s runtime behavior is predictable enough that fintech teams trust it with the code path that touches money.

DevOps tooling is the other obvious stronghold. If you’re writing a CLI tool, a Kubernetes operator, a custom controller, or anything that needs to ship as a single binary across multiple platforms, Go is the default answer for most platform teams. The cross-compilation story is genuinely good one machine, one go build command, binaries for Linux, Mac, and Windows. That matters when your tool needs to run everywhere without a runtime installation requirement.

The economic argument is the one that closes the conversation at the director level. Fewer machines for the same throughput. Smaller container images. Faster cold starts. Lower memory footprint per service. None of these are dramatic individually. Together, across a microservices architecture running 24/7 on cloud infrastructure with per-second billing, they compound into a meaningful cost difference. You don’t switch to Go because you love the syntax. You switch because the cloud bill arrives and someone opens a spreadsheet.

The quiet part is that this adoption is still accelerating. As AI workloads get bolted onto backend infrastructure inference endpoints, embedding pipelines, real-time feature serving the runtime efficiency story gets stronger, not weaker. The services wrapping those models need to be fast, cheap, and reliable. Go keeps showing up as the right answer for that layer.

It didn’t announce itself. It just kept being useful in the places that matter.

Go didn’t win a beauty contest. It won a war of attrition.

There was no moment. No keynote where a charismatic founder declared Go the future and the crowd went wild. No viral framework that made every JavaScript developer suddenly want to learn a compiled language. Just engineers, one team at a time, running into the same walls with the same stacks and finding that Go had quietly already solved the problem.

That’s a different kind of winning. It’s slower. It’s less exciting to write about. But it’s also much harder to reverse.

The traditional stacks aren’t dead. Java still runs a significant portion of the world’s financial infrastructure and it’s not going anywhere. Python owns ML and data and that grip is tightening, not loosening. Node still makes sense for teams where TypeScript is already the lingua franca and the workloads fit. None of these languages failed. They just have a new neighbor that’s better at a specific set of problems, and that neighbor keeps getting more relevant as those problems become more common.

The industry shift that’s actually happening quietly, without a conference talk is from “which language is most expressive” to “which language keeps production stable at scale.” That question has a different answer than it did ten years ago. Go benefits from that change more than almost any other language right now.

Here’s the slightly spicy take to end on: in five years, engineers who know Go well and understand distributed systems are going to have the same energy that senior Rails developers had in 2012. Right place, right time, right tool. The timing feels about right.

Or I’m completely wrong and Rust takes everything. Tell me in the comments.

Helpful resources

Go official documentation
Effective Go the canonical style reference
Go by Example best hands-on intro
Uber Go Style Guide real production standards
Cloudflare engineering blog Go posts
CNCF landscape count the Go projects
r/golang community takes, good and bad

They fired the devs. The AI broke production. Now they’re begging them back.

Sat, 06 Jun 2026 15:04:50 +0000

The AI replacement experiment ran at scale. The results are in. Nobody’s writing a press release about what they found.

Nobody announced the plan out loud. But somewhere in 2024, the same decision got made at a hundred companies simultaneously. Developers were expensive. AI coding tools were getting genuinely impressive. The math looked obvious on a slide deck.

Cut headcount. Ship with AI. Protect margins. Simple.

Around 124,000 software developers got laid off across the industry that year. Amazon, Microsoft, Meta, Salesforce the list was long and the announcements came fast. The narrative was confident. AI would write the code. Humans were the bottleneck.

That narrative is now quietly, awkwardly falling apart.

Gartner recently estimated that 50% of companies that laid off workers because of AI will rehire for the exact same roles by 2027. Around 40% of new hires at some companies are already former employees who were shown the door after the AI pivot. Google rehired roughly 20% of its 2025 engineering intake from people it had previously cut.

Nobody’s throwing a party about this. There are no press releases celebrating the return of the developers. No LinkedIn post from a CTO saying “we were wrong.” But the hiring data tells the story that the PR team won’t.

TL;DR: AI replaced the code-writers. It couldn’t replace the context-holders the people who know why the system works the way it does, where the bodies are buried, and which part of the codebase you absolutely do not touch on a Friday. Now those people are getting their calls returned.

The theory was clean. The codebase was not.

The logic behind the layoffs was reasonable on the surface. AI tools generate code fast. If you reduce the humans writing code and increase the AI writing it, output stays roughly the same and the salary line goes down. Clean arbitrage.

The theory had one flaw that took about eighteen months to become undeniable.

Writing code and developing software are not the same thing.

AI can produce code. It produces it fast and it produces a lot of it. What it produced in practice was code that worked in isolation and quietly detonated when it touched a real system. IBM research found that four out of ten development teams reported compatibility issues when integrating AI-generated code into existing infrastructure. Not “it threw a warning.” Integration failures. The kind that take down services.

The syntax was fine. Gartner found that more than 50% of errors in AI-generated code were related to missing business context not algorithm bugs, not type errors, not off-by-one mistakes. The code was technically correct in a vacuum and wrong in the actual system it was dropped into.

The AI didn’t know why the payment service uses polling instead of webhooks. It didn’t know that decision came from a vendor bug years ago that cost the company a full weekend of downtime. It didn’t know that two services share a cache, that a certain table gets locked during batch jobs, or that the “legacy” module nobody touches is legacy for a reason.

It read the codebase. It did not understand the history of the codebase. Those are completely different things.

Think of it like hiring a contractor who builds exactly to spec, fast and cheap, but has no idea the building has a load-bearing wall that’s not on any blueprint. The wall exists because a structural engineer made a call in 2019 and never wrote it down. The contractor removes it. The building does not fall immediately. It falls six months later when someone adds weight to the second floor.

The developers who got laid off were the ones who knew about the wall. The AI read the blueprints and had no idea the wall existed.

The productivity math that nobody checked twice

The ROI case for replacing developers with AI was built on a specific assumption: that AI tools would make the remaining engineers dramatically faster, or eliminate the need for as many of them entirely. Either way, output goes up and costs go down.

Neither half held up.

A study found that seasoned engineers were actually 19% slower when using AI tools compared to working without them. Not junior devs getting overwhelmed. Senior engineers the exact people you’d expect to extract the most value from AI tooling moving slower because of it. The tools generated suggestions that looked plausible on the surface and required time-consuming verification underneath. Reading AI output carefully enough to trust it turns out to take longer than just writing the thing yourself.

The error rate compounded the problem.

AI-generated code contained up to 1.7x more errors than human-written code. Not slightly more. Nearly double. And those errors weren’t always obvious. They were the kind that pass a code review, survive testing, and surface in production three weeks later when a specific edge case finally triggers them.
Teams that adopted AI tools heavily ended up maintaining 38% more code than before. The AI shipped fast. The humans then spent months cleaning up what shipped.

The intern analogy is almost too accurate here. You hired someone who types extremely fast, never gets tired, and produces a lot of output. The problem is that a senior engineer is now spending the majority of their day reviewing that output instead of building anything. You didn’t reduce the senior engineering workload. You redirected it toward something less valuable and more exhausting.

Congrats. You now have 40% more code, zero new features, and one very tired staff engineer.

A GitHub study found that 49% of teams reported a decrease in real productivity after leaning heavily into AI code generation. Not a slowdown in growth. An actual decrease. The companies that moved fastest on the AI-first bet were the ones watching their best engineers burn out on review work while the backlog stayed exactly where it was.

The cost calculation didn’t just fail to improve. It inverted. More code to maintain, more errors to catch, more senior attention consumed by supervision work that hadn’t existed before. The salary savings from the layoffs got eaten by the hidden cost of running a permanent AI audit operation inside the engineering org.

The self-correction problem nobody talks about

Here’s the detail that changes the entire calculation. The one that didn’t make the headlines because it’s less dramatic than a layoff announcement but more important than anything else in this conversation.

Princeton researchers found that AI models failed to self-correct in more than 60% of cases even when explicitly asked to review their own code.

Let that sit for a second.

You can tell the AI “check your work.” It will check its work. It will tell you the work looks fine. And in 60% of cases where the work is not fine, it will still tell you the work looks fine. Confidently. With no visible uncertainty. The same energy it uses when it’s actually right.

This is not a minor limitation. It’s a structural characteristic of how these tools work.

Human engineers accumulate judgment from failure.

The feedback loop is: write code → run it → watch it break → understand why → update your mental model → write better code next time.

That loop is uncomfortable and slow and it’s also exactly how engineers develop the instincts that make them valuable. Every production incident a developer survives makes them slightly harder to fool the next time.

AI doesn’t have that loop. It generates output at the same confidence level regardless of whether the output is correct. The mistakes it makes on day one of a project are structurally identical to the mistakes it makes on day ninety. It doesn’t accumulate experience from failures because it doesn’t experience failure. It just generates the next token.

Think about the senior dev who’s been on your team for four years. Ask them why the auth service has that weird retry logic and they’ll tell you a story about a cascade failure, a 3am incident call, a decision made under pressure that turned out to be right, and three things they’d do differently now. That story is not in the codebase. It’s not in the docs. It exists entirely in their head and it shapes every decision they make.

The AI read the README. It did not survive the incidents.

This means AI-generated code requires permanent human supervision. Not as a transitional phase while the tools mature. Not something that gets better as the models improve. A fundamental characteristic of a system that generates confident output without closing the feedback loop that produces judgment.

Someone has to close that loop. That someone is a developer.

The companies that laid off their developers didn’t eliminate the need for loop-closing. They just left the loop open and hoped nobody would notice until the quarterly numbers came in.

What the rehiring actually reveals

Nobody is issuing a correction. That’s not how large organizations handle being wrong about a strategy that made headlines. What they’re doing instead is more revealing than any admission would be they’re showing it in job postings.

The boomerang is already moving. Around 40% of new hires at some companies are former employees who were let go after the AI pivot. Google rehired roughly 20% of its 2025 engineering intake from people it had previously cut. No announcement. No acknowledgment. Just a recruiter email and a slightly awkward first week back.

But here’s what’s actually interesting about who is getting rehired.

Not junior engineers to ship new features. Senior engineers who understand existing systems, can supervise AI output intelligently, and can catch the class of errors that AI produces reliably but cannot catch itself.
Not warm bodies to fill seats. More than 54% of companies indicated they plan to specifically increase senior developer hiring while reducing junior positions.

The structure of engineering teams is changing but in the opposite direction from what the original thesis predicted. AI isn’t replacing experienced engineers. It’s replacing the entry-level work that used to build the pipeline of future experienced engineers. Which sounds like a solved problem until you realize you’ve stopped planting trees and are now surprised there’s no shade.

The boomerang hires succeed for one specific reason. They already know the context the AI cannot learn from reading the codebase. They know why certain architectural decisions were made, what the system was designed to handle at scale, where the dangerous assumptions live, and which services will silently misbehave if you change the wrong config value. That knowledge doesn’t exist in any file the AI can access. It exists in the heads of the people who were present for the history of the system.

Institutional memory turns out to be an actual competitive asset. Who knew.

The companies that moved fastest to replace developers with AI are now the ones moving fastest to hire them back. Not because AI coding tools don’t work. They work. They generate code quickly and handle repetitive tasks well and genuinely accelerate certain parts of the development workflow. But they work as a layer on top of human judgment, not as a replacement for it.

The experiment ran at scale with real consequences. The results are visible in the hiring data even if they’re nowhere in the press releases.

So what do you actually do with this?

The “90% replacement” thesis that was making the rounds a few years ago was wrong in a specific and instructive way. It assumed that writing code was the core value engineers provided. Turns out the core value was something harder to name and much harder to replicate.

Understanding systems. Maintaining context over time. Catching the errors that confident tools produce invisibly. Making judgment calls that require knowing things that aren’t written down anywhere. Closing the feedback loop that turns failure into instinct.

None of that is replaceable by current AI tools. It may not be replaceable by near-future AI tools either, because it depends on the kind of accumulated, context-specific knowledge that only comes from being present for the history of a system. You can’t fine-tune your way into knowing why the checkout flow has that bizarre edge case handling you had to be in the room when the decision got made.

The real slow-burn crisis isn’t senior developers getting replaced. It’s the junior pipeline collapsing. AI is doing the entry-level work that used to teach people how systems break, how production behaves differently from staging, and how to develop the instincts that eventually make someone a senior engineer worth rehiring. That pipeline doesn’t disappear overnight. It disappears over five years, quietly, and then suddenly every company is competing for the same shrinking pool of experienced engineers who survived the transition.

That’s the uncomfortable part of the story nobody’s writing the press release about either.

For the developers reading this the ones who kept learning during the uncertainty, kept building context, kept sharpening the skills that don’t show up in a GitHub commit you’re the ones getting the calls now. The recruiters reaching back out aren’t doing you a favor. They’re correcting a calculation error.

The question worth sitting with: what does the knowledge you have about your current system look like if you tried to write it down, and how much of it exists nowhere except in your own head?

That’s not a liability. That’s the job.

Drop a comment with your take are companies actually learning from this, or are we six months away from the next round of “AI will replace developers” headlines?

Helpful resources

I replaced GitHub Copilot with 3 tools. My team noticed within a week.

Thu, 21 May 2026 06:27:02 +0000

Cursor, Claude Code, and Windsurf didn’t just change how I code they changed what my PRs look like.

The Copilot breakup nobody talks about

I didn’t plan to replace GitHub Copilot. It was fine. It’s still fine. But “fine” stopped being good enough somewhere around month three of watching teammates ship faster than me while I was still waiting for an autocomplete suggestion that missed the point.

So I started experimenting. Not with one tool. With everything. Forty-something IDEs, agents, plugins, and CLI tools over four months, run against real work not demos, not tutorials, actual PRs that had to pass review. Most of them lasted a week. A few lasted a day. One deleted a file I needed and I don’t want to talk about it.

Three survived.

Cursor for multi-file feature work. Claude Code for everything that lives in the terminal. Windsurf for the days when I need to stay in flow without managing the AI every five minutes. By the end of week one my team lead asked what I’d changed. By week three two teammates had switched.

This isn’t a sponsored comparison post. None of these companies know I exist. It’s just what happened when I stopped treating AI coding tools as a category and started treating them as team members with different strengths.

TL;DR: Copilot is fine for autocomplete. If that’s still your whole AI coding stack in 2026 you’re leaving serious velocity on the table. Cursor, Claude Code, and Windsurf each own a different slot in the workflow and together they cover everything Copilot never could.

Why Copilot stopped being enough

Let me be fair to Copilot first. It’s not bad. For a dev who mostly works in one file at a time, writes straightforward code, and doesn’t need much more than smart autocomplete it does the job. I used it for over a year and I was mostly happy.

The problem isn’t what Copilot does. It’s what it doesn’t do.

Copilot watches what you’re typing and tries to finish the sentence. That’s the whole model. It doesn’t know why you’re writing what you’re writing. It doesn’t know what the rest of the codebase looks like. It doesn’t know that the function you’re building has to fit into an auth system three files away or that your team has a convention for error handling that isn’t in any docs. It makes educated guesses based on what’s visible in the current file and what it saw during training.

For a while that’s enough. Then your codebase grows. Your features get more interconnected. A change to one interface ripples through six files. A refactor touches the API layer, the service layer, and the tests simultaneously. And suddenly you’re doing all the thinking that the AI should be helping with, while it cheerfully suggests the wrong variable name.

The real cost isn’t the bad suggestions. It’s what you do with them. You stop, evaluate, reject, retype. You context switch out of the problem you were solving to manage the tool that was supposed to help you solve it. That friction is small per instance and significant over a day.

I started noticing it when I’d spend twenty minutes on something I expected to take five. Not because the problem was hard. Because I was fighting my tools to get there.

That’s when I started looking for something different. Not a better autocomplete. A different category of help entirely.

Cursor the IDE that started arguing back

I thought AI-assisted coding meant better autocomplete. Then Cursor refactored a function I didn’t ask it to touch, the PR passed review without a single comment, and I had to sit with that for a minute.

Cursor isn’t a smarter Copilot. It’s a different thing entirely. Where Copilot watches what you’re typing and tries to finish the sentence, Cursor reads your whole codebase and forms opinions about it. You can ask it to build a feature and it’ll touch six files, write the tests, and explain what it changed and why.

What I actually use it for:

Multi-file edits This is where it earns its keep. I don’t use Cursor for writing single functions. I use it when a change needs to ripple across the codebase updating an interface, migrating an API, refactoring auth logic. It plans the changes, shows you the diff across every affected file, and lets you approve before anything gets written.
.cursorrules Drop this file in the root of your project. Cursor reads it at the start of every session preferred patterns, things to avoid, naming conventions and actually respects that context every time.

# .cursorrules
You are working on a Node.js REST API.
Always use async/await. Never use callbacks.
Prefer explicit error handling. Never swallow errors silently.
Add JSDoc comments to all exported functions.

This alone cuts the back-and-forth in half.

Cmd+K inline editing Highlight a block, hit Cmd+K, describe what you want. Rewrites in place. No sidebar, no context switching, no copy-paste. Fastest way to refactor something small without losing your train of thought.
Agent window Cursor 3 April 2026, Cursor shipped a new interface built from scratch around agents. Multiple agents running in parallel across different repos, all visible in one sidebar. You dispatch tasks, review diffs, approve changes. It looks less like a code editor and more like an engineering manager’s dashboard. In a good way.

I still write code manually. Cursor doesn’t replace that. But for anything involving more than one file, more than one decision, or more than five minutes of thinking I hand it off and review the output. That’s a different relationship with your tools than most devs are used to, and it takes about a week to actually trust it.

Worth it.

cursor.com

Claude Code the terminal agent I didn’t know I needed

Most AI coding tools live inside your editor. Claude Code lives in your terminal. No IDE. No sidebar. No chat window. You talk to it like a senior engineer who already read the repo, and it writes files, runs commands, fixes test failures, and ships. It’s not a chatbot with code awareness. It’s a collaborator.

The first time it ran a full test suite, found a failing edge case I hadn’t noticed, fixed it, and committed while I was reviewing a completely different PR I had to close my laptop and think about my life choices for a moment.

What I actually use it for:

Natural language task execution You describe what you want in plain English. It reads the repo, makes a plan, executes it, and tells you what it did. No hand-holding, no step-by-step prompting.

$ claude "refactor the auth middleware to use JWT RS256,
run the tests, and fix anything that breaks"

It reads the codebase, plans the changes, runs the tests, iterates on failures without you touching anything.

Terminal-native workflow Claude Code lives where backend and DevOps work actually happens. No tab switching, no copy-pasting output into a chat window, no losing context. You stay in the terminal, it stays in the terminal, and the whole session feels like pairing with someone who never gets tired or distracted.
MCP tool integrations Claude Code connects to external tools via MCP GitHub, databases, deployment pipelines. You can give it real reach beyond just the codebase and it handles multi-step workflows that would normally take you 20 manual commands.
Computer use On Mac, Claude Code can open apps, click through UI, take a screenshot, and verify the result. Build a feature, launch it, test it visually from one terminal session. That one still gets me every time.

I still keep human oversight on anything production-critical. But for local dev, test environments, and deploy pipelines it runs, I review. That’s the whole loop.

docs.anthropic.com/claude-code

Windsurf the dark horse nobody warned me about

Everyone I know is on Cursor. Windsurf kept coming up in my Discord as the tool people switched to and then got annoyingly quiet about like they’d found something they didn’t want to share yet.

I tried it out of mild spite. They were right and I was annoyed about it.

Windsurf isn’t trying to beat Cursor on features. It’s trying to beat it on feel. The whole thing is built around Cascade an agentic AI that doesn’t wait for you to ask it something. It watches what you’re doing, understands the context, and acts. The difference between using Cursor and using Windsurf is the difference between having a very capable assistant and having a very capable assistant who also pays attention.

What I actually use it for:

Cascade agent Multi-file edits, terminal commands, test runs all without you directing every step. You describe the goal, Cascade figures out the path. It feels less like “here’s what I’m going to do, approve?” and more like “here’s what I did, check it.” That distinction matters more than it sounds when you’re deep in a feature and don’t want to break flow.
Codemaps Windsurf indexes your repo and builds a visual map of your architecture how files relate, where the entry points are, what connects to what. Genuinely useful when you’re jumping into a codebase you didn’t write, and it gives the AI accurate context on large projects without you having to explain the structure manually.
Drag-and-drop screenshot → UI generation Drop a screenshot of a design into Cascade and it generates the frontend code. For anyone doing UI work this is the kind of feature that makes you feel like you skipped two steps.
$15/month Cursor Pro is $20. Windsurf Pro is $15. Same capability tier, lower price. Not the reason to pick it, but not nothing either.

As of February 2026 Windsurf sits at number one in the LogRocket AI Dev Tool Power Rankings ahead of Cursor and GitHub Copilot. And with the Cognition AI acquisition bringing Devin integration into the roadmap, it’s about to get significantly more powerful.

I run Windsurf when I want to stay in flow on a feature without stopping to manage the AI. It gets out of the way in a way that Cursor, for all its power, sometimes doesn’t.

windsurf.com

How Cursor, Claude Code, and Windsurf work together

Each tool on its own is solid. Stacked right they cover every layer of the workflow without overlap and without gaps. It’s not about having three tools open at once it’s about each one owning a different job.

Here’s how they actually fit into my day:

Windsurf for active feature work When I’m building something new and want to stay in flow, Windsurf is open. Cascade handles the context, suggests the next move, runs the changes. I steer, it executes. I don’t stop to manage it and it doesn’t ask me to.
Cursor for multi-file refactors and reviews When a change is complex, touches multiple services, or needs careful diff review before anything gets committed Cursor. The agent window and .cursorrules context make it the right tool for surgical, deliberate work where I want to see exactly what's changing before it changes.
Claude Code for everything terminal Tests, deploys, migrations, environment setup, CI debugging. Anything that lives in the command line. Claude Code handles the full sequence while I’m doing something else, then tells me what it did.

The real-world flow looks like this:

Feature branch
→ Windsurf/Cascade writes the feature
→ Cursor agent reviews multi-file diffs
→ Claude Code runs tests + deploys to staging
→ Push PR

Three tools, zero browser tabs open to paste errors into

The week my team lead asked what I’d changed, this was the answer. Not one tool. Not a new IDE. A stack where every part of the workflow had the right tool behind it. Windsurf for flow, Cursor for precision, Claude Code for the terminal. Nothing falling through the cracks.

That’s the whole thing. It took four months and forty tools to figure out. Now it just runs.

Final thoughts: you don’t need 40 tools

I didn’t set out to replace Copilot. I set out to stop feeling like my tools were slowing me down. Those are different problems and they lead to different solutions.

Copilot isn’t the villain here. It’s a solid tool that does exactly what it promises. The issue is that what it promises stopped being enough once the rest of the ecosystem caught up and then kept going. Standing still while everything around you accelerates is its own kind of falling behind.

After four months and forty experiments the three tools that actually changed my output were:

Cursor for multi-file feature work and agentic refactors
Claude Code for terminal tasks, deploys, and anything CLI
Windsurf for staying in flow with Cascade running alongside you

None of them require you to change how you think about code. They just remove the parts that were slowing you down the context switching, the tab juggling, the manual command sequences, the back-and-forth with a tool that doesn’t know what you’re building or why.

My PRs got cleaner. My review cycles got shorter. My team noticed before I even said anything. That’s the only metric that matters.

If you’re still on vanilla Copilot and shipping just fine genuinely, keep going. But if you’ve been feeling the friction and just assumed that was normal, it isn’t. The gap between what Copilot does and what this stack does is real and it’s only getting wider.

Start with one. Cursor if you spend most of your time in the editor. Claude Code if you live in the terminal. Windsurf if you keep getting pulled out of flow. Run it for two weeks on real work. You’ll know by then.

Helpful resources and links

Cursor multi-file AI editor with agent window and .cursorrules support
Cursor 3 announcement the April 2026 agent-first rebuild
Claude Code terminal-native AI coding agent
Windsurf agentic IDE with Cascade and Codemaps
GitHub Copilot still worth knowing what you’re moving away from
LogRocket AI Dev Tool Power Rankings February 2026 edition
r/cursor real-world Cursor workflows and community feedback

Cursor, Claude Code, Windsurf?! My AI coding stack after 40 dev experiments

Tue, 19 May 2026 03:22:53 +0000

For devs drowning in AI tool hype who just want to know what actually stuck

40 tools, 3 survivors

If you’ve ever installed a new AI coding tool on a Monday, spent the whole evening configuring it, been genuinely impressed for about three days, and then quietly gone back to your old setup by Thursday you’re not alone. Some devs binge Netflix. I install AI tools.

I went through a phase (fine, a four-month spiral) where I tested nearly every tool promising to make me ship faster, code smarter, or finally stop copy-pasting stack traces into a browser tab. Most of them were fine. A few were actually good. One deleted a file I needed. But out of the 40+ IDEs, agents, plugins, extensions, and CLI tools I ran through real work projects not toy repos, actual PRs three made it through.

Cursor. Claude Code. Windsurf.

This isn’t a “best of 2026” roundup written by someone who ran each tool for 20 minutes. It’s what survived four months of daily use across real backend work, DevOps tasks, and the occasional frontend emergency. Tools I open every morning without thinking about it. The ones that made my workflow cleaner, faster, and way less frustrating.

If you’re still on vanilla Copilot wondering why things still feel slow, or you’re deep in comparison hell and just want someone to cut through it this one’s for you.

TL;DR: Cursor handles multi-file feature work and agentic refactors. Claude Code owns the terminal tests, deploys, migrations, anything CLI. Windsurf runs Cascade in the background while you stay in flow. Together they cover every slot in a serious dev workflow. Separately, each one is still better than most of the 37 tools I’m not writing about.

Let’s get into it.

Why your AI tool stack actually matters now
Cursor the IDE that started arguing back
Claude Code the terminal agent I didn’t know I needed
Windsurf the dark horse nobody warned me about
How Cursor, Claude Code, and Windsurf work together
Final thoughts: you don’t need 40 tools
Helpful resources and links

Why your AI tool stack actually matters now

Here’s the thing nobody tells you when you first install an AI coding tool: the tool isn’t the hard part. The hard part is figuring out which problems you actually have, and whether what you just installed solves any of them.

Most devs treat AI tooling like a plugin they’ll figure out later. They install Copilot, use it for autocomplete, occasionally ask it to explain a regex, and call it done. That worked fine in 2024. In 2026 it’s the equivalent of using a GPS only to check if it’s raining.

The conversation has moved. Agents are the standard now, not the novelty. The tools worth your time aren’t the ones that finish your line of code they’re the ones that read your whole codebase, plan a multi-step task, execute it, run your tests, fix the failures, and come back when it’s done. That’s not autocomplete. That’s a different category of tool entirely.

And the cost of getting this wrong isn’t just a wasted subscription. It’s the context-switching penalty. Every time you break flow to copy an error into a chat window, switch tabs to ask a question you should be able to ask inside your editor, or manually run a command sequence an agent could handle that’s compounding friction. The DORA 2025 report found that high-performing engineering teams are pulling significantly ahead of the rest, and tooling decisions are a real part of that gap.

The developers figuring out the right AI stack right now aren’t the ones with the most tools installed. They’re the ones who stopped treating AI like a fancy tab completion and started treating it like a collaborator with a job description. You wouldn’t hire one person to do your backend, frontend, DevOps, and code review. Same logic applies here.

That’s the frame for everything that follows not which AI tool is best, but which tool owns which job, and how you stack them so nothing falls through the cracks.

Cursor the IDE that started arguing back

What I actually use it for:

Multi-file edits This is where it earns its keep. I don’t use Cursor for writing single functions. I use it when a change needs to ripple across the codebase updating an interface, migrating an API, refactoring auth logic. It plans the changes, shows you the diff across every affected file, and lets you approve before anything gets written.
cursorrules Drop this file in the root of your project. Cursor reads it at the start of every session preferred patterns, things to avoid, naming conventions and actually respects that context every time.

# .cursorrules
You are working on a Node.js REST API.
Always use async/await. Never use callbacks.
Prefer explicit error handling. Never swallow errors silently.
Add JSDoc comments to all exported functions.

This alone cuts the back-and-forth in half.

Cmd+K inline editing Highlight a block, hit Cmd+K, describe what you want. Rewrites in place. No sidebar, no context switching, no copy-paste. Fastest way to refactor something small without losing your train of thought.
Agent window Cursor 3 April 2026, Cursor shipped a new interface built from scratch around agents. Multiple agents running in parallel across different repos, all visible in one sidebar. You dispatch tasks, review diffs, approve changes. It looks less like a code editor and more like an engineering manager’s dashboard. In a good way.

Worth it.

cursor.com

Claude Code the terminal agent I didn’t know I needed

Why I stuck with it:

Terminal-native workflow Claude Code lives where backend and DevOps work actually happens. No tab switching, no copy-pasting output into a chat window, no losing context. You stay in the terminal, it stays in the terminal, and the whole session feels like pairing with someone who never gets tired or distracted.
Natural language task execution You describe what you want in plain English. It reads the repo, makes a plan, executes it, and tells you what it did.

$ claude "refactor the auth middleware to use JWT RS256,
run the tests, and fix anything that breaks"

It reads the codebase, plans the changes, runs the tests, iterates on failures without you touching anything.

MCP tool integrations Claude Code connects to external tools via MCP GitHub, databases, deployment pipelines. You can give it real reach beyond just the codebase and it handles multi-step workflows that would normally take you 20 manual commands.
Computer use On Mac, Claude Code can open apps, click through UI, screenshot the result, and verify it worked. Build a feature, launch it, test it visually from one terminal session. That one still gets me every time.

I still use it with human oversight on anything production-critical. But for local dev, test environments, and deploy pipelines it runs. I review. That’s the whole loop.

docs.anthropic.com/claude-code

Windsurf the dark horse nobody warned me about

Everyone I know is on Cursor. Windsurf kept coming up in my Discord as the tool that people switched to and then got annoyingly quiet about like they’d found something they didn’t want to share yet.

I tried it out of mild spite. They were right and I was annoyed about it.

Windsurf isn’t trying to beat Cursor on features. It’s trying to beat it on feel. The whole thing is built around Cascade an agentic AI that doesn’t wait for you to ask it something. It watches what you’re doing, understands the context, and acts. The difference between using Cursor and using Windsurf is the difference between having a very capable assistant and having a very capable assistant who also pays attention.

What makes it essential:

Cascade agent Multi-file edits, terminal commands, test runs all without you directing every step. You describe the goal, Cascade figures out the path. It’s similar to Cursor’s agent mode but the flow feels less interrupted. Less “here’s what I’m going to do, approve?” and more “here’s what I did, check it.”
Codemaps Windsurf indexes your repo and builds a visual map of your architecture how files relate, where the entry points are, what connects to what. Useful when you’re jumping into a codebase you didn’t write, and genuinely helpful for giving the AI accurate context on large projects.
Drag-and-drop screenshot → UI generation Drop a screenshot of a design into Cascade and it generates the frontend code. For anyone doing UI work this is the kind of feature that makes you feel like you skipped two steps.
$15/month Cursor Pro is $20. Windsurf Pro is $15. Same tier, same power level, lower price. Not the reason to pick it, but not nothing either.

As of February 2026, Windsurf sits at number one in the LogRocket AI Dev Tool Power Rankings ahead of Cursor and GitHub Copilot. And with the Cognition AI acquisition bringing Devin integration into the roadmap, it’s about to get significantly more powerful.

I run Windsurf when I want to stay in flow on a feature without stopping to manage the AI. It gets out of the way in a way that Cursor, for all its power, sometimes doesn’t.

windsurf.com

How Cursor, Claude Code, and Windsurf work together

Each tool on its own is solid. Stacked right, they cover every layer of the workflow without overlap and without gaps. It’s not about having three tools open at once it’s about each one owning a different job.

Here’s how they fit into my actual day:

Windsurf for active feature work When I’m building something new and want to stay in flow, Windsurf is open. Cascade handles the context, suggests the next move, runs the changes. I steer, it executes.
Cursor for multi-file refactors and reviews When a change is complex, touches multiple services, or needs careful diff review before anything gets committed Cursor. The agent window and .cursorrules context make it the right tool for surgical, deliberate work.
Claude Code for everything terminal Tests, deploys, migrations, environment setup, CI debugging. Anything that lives in the command line. Claude Code handles the full sequence while I’m doing something else.

The real-world flow looks like this:

Feature branch
→ Windsurf/Cascade writes the feature
→ Cursor agent reviews multi-file diffs
→ Claude Code runs tests + deploys to staging
→ Push PR

Three tools, zero browser tabs open to paste errors into.

I stopped thinking of these as “AI tools” and started thinking of them as team members with specializations. Different strengths, different contexts, different jobs. Once you frame it that way the stack stops feeling like overkill and starts feeling obvious.

Final thoughts: you don’t need 40 tools

I didn’t set out to build some ultimate AI coding stack. I just wanted something that worked without me having to think about it every week.

After testing more than 40 tools, the three that actually made my workflow better were:

Cursor for multi-file feature work and agentic refactors
Claude Code for terminal tasks, deploys, and anything CLI
Windsurf for staying in flow with Cascade running alongside you

They’re not trying to do the same thing. They don’t step on each other. And none of them require you to change how you code they just remove the parts that were slowing you down.

If you’re still running vanilla Copilot and calling it your AI stack that’s fine. But you’re leaving a lot on the table. The tooling gap between devs who’ve figured this out and devs who haven’t is real, and it’s only getting wider.

Start with one. Get comfortable. Add the next. By the time you’ve run all three for a month you won’t remember what the friction felt like.

Helpful resources and links

Cursor multi-file AI editor with agent window and .cursorrules support
Cursor 3 announcement the April 2026 agent-first interface rebuild
Claude Code terminal-native AI coding agent
Windsurf agentic IDE with Cascade, Codemaps, and Devin integration incoming
DORA 2025 report state of DevOps and AI-assisted development
LogRocket AI Dev Tool Power Rankings February 2026 rankings
r/cursor community discussion and real-world Cursor workflows

You’re the reason your React app is slow

Mon, 18 May 2026 05:09:41 +0000

You didn’t hit a framework limit. You wrote the bottleneck yourself and it’s been quietly billing you in FPS ever since.

There’s a specific kind of suffering that happens when you open the React DevTools Profiler for the first time on a project that’s been “running fine.” You hit record. You click a button. You stop recording. And then you just sit there, staring at a flame graph that looks like a city on fire, wondering how a todo app is re-rendering 47 components when you clicked “add item.”

That was me, about three years into thinking I was pretty decent at React.

I wasn’t bad. My components looked clean. My PRs got approved. The app shipped. But under the hood, I was doing roughly eight things wrong simultaneously, and the only reason nobody noticed was that our user base was small enough that the jank felt like a “network thing.” It was not a network thing.

The React ecosystem has an interesting culture around performance: everyone knows it matters, most articles cover the same four hooks, and almost nobody talks about the architectural decisions that create the problem in the first place. The React Compiler landing in React 19 is going to paper over some of this it automatically memoizes components and values, essentially applying useMemo and useCallback everywhere it's safe to do so but here's the honest truth: it won't save you from architectural issues like overly broad context providers or massive component trees. You can't compile your way out of a bad design. DEV CommunityDEV Community

This article is the one I wish someone had thrown at me back then. No cargo-cult hooks. No “just add React.memo" advice. Just the actual mistakes, why they happen, and what they cost you in the real world.

TL;DR: React is fast by default. You are often the problem. These 10 mistakes are the most common ways engineers including experienced ones quietly murder their own app’s performance, and most of them have nothing to do with the hooks you’ve been memorizing.

The re-render killers

Let’s start with the category that accounts for maybe 60% of React performance complaints I’ve seen in the wild. Not slow APIs. Not bad algorithms. Just components re-rendering when they absolutely did not need to, because of decisions made in the five seconds it took to write a JSX prop.

React’s re-render model is simple enough that it’s easy to underestimate. React re-renders when state or props change by reference, not by value. That one sentence is responsible for more production slowdowns than any framework bug ever was. It sounds obvious until you realize how many ways you’re accidentally creating new references on every render without thinking about it. DEV Community

Mistake 1: Inline functions in JSX

This is the one that gets everyone eventually, usually when they’re moving fast and the code looks clean.

// you write this and it feels fine
<Button onClick={() => handleDelete(item.id)} label="Delete" />

Here’s what’s actually happening: every time the parent component renders, that arrow function is a brand new function object in memory. React does a shallow comparison on props. New reference equals “props changed” equals re-render even if item.id hasn't moved an inch. JavaScript creates a new object or function reference on every render. React does a shallow comparison when deciding whether to re-render a child, and since the reference is always new, the child always re-renders, even when nothing meaningful has changed. DEV Community

The fix is boring and correct: move static handlers outside the component, and for dynamic ones that depend on state or props, reach for useCallback. But there's a catch which brings us directly to mistake number two.

Mistake 2: Using `useCallback` as a good luck charm

So you read about inline functions, you start wrapping everything in useCallback, and you feel like you've leveled up. You haven't. You've just moved the problem around and added overhead.

useCallback only does anything useful when the component receiving that function is actually memoized wrapped in React.memo. Without that, you're paying the cost of memoization (React has to store the previous function, compare dependencies, and make a decision) while getting zero benefit, because the child rerenders anyway. useCallback only helps if the child component is memoized (React.memo) or uses the callback in its own dependency arrays. Otherwise, you're adding overhead for no benefit. DEV Community

I’ve seen codebases where someone went on a useCallback spree across the entire app, felt productive for a day, and then wondered why nothing got faster. There is a cost to memoization. React must store the previous props, compare them, and make a decision this adds overhead. If your component is fast to render and frequently changing, this comparison step may become more expensive than the render itself. Growin

The actual rule: useCallback is a tool for reference stability, not a performance incantation. Use it when you have a memoized child that receives the function as a prop, or when the function is in a useEffect dependency array and you want control over when the effect fires. That's basically it. Profile first, reach for it second.

Mistake 3: Using array index as `key` in lists

This one feels harmless until you have a list that changes items get added, removed, or reordered and suddenly your UI starts doing weird things. State ends up in the wrong component. Inputs keep the wrong value. Animations fire on the wrong element. You spend an hour blaming a library that did nothing wrong.

The key prop is React's identity system for list items. When you use the array index, you're telling React "the first item is always the first item, regardless of what it actually is." Reorder the list, and React thinks all the same items are still there just with different content. It patches the DOM in place instead of remounting, which is fast but wrong.

// this looks fine and is not fine
{items.map((item, i) => <Card key={i} data={item} />)}

// this is fine
{items.map(item => <Card key={item.id} data={item} />)}

If your data genuinely has no stable IDs which happens more than it should generate them when the data is created, not at render time. A crypto.randomUUID() call in your fetch handler costs nothing. A Math.random() call inside map gives every item a new key on every render, which tells React to unmount and remount the entire list. That costs a lot.

All three of these mistakes share the same root: React’s rendering model is predictable once you understand it, but it punishes you quietly. No errors. No warnings. Just a Profiler graph that looks increasingly unwell.

The good news is that the React DevTools Profiler will catch all three almost immediately. Record a session, look for components highlighted in yellow or red, and ask yourself: “did this actually need to re-render?” Usually the answer is no, and usually one of these three is why.

State architecture sins

Re-renders from inline functions are annoying. State architecture mistakes are a different category of problem entirely. They’re the ones that survive a code review, pass all your tests, and then slowly make your app feel like it’s running through wet concrete as the feature count grows. They’re structural. And they’re almost always invisible until you’re already in pain.

The pattern is consistent across every codebase I’ve seen it in: someone makes a reasonable decision early, the app grows around that decision, and by the time the jank is obvious there are forty components depending on the thing that’s wrong. Refactoring it feels like surgery on a patient who’s still running a marathon.

Understanding why these happen is more useful than just memorizing the fix.

Mistake 4: Putting state too high up the tree

This is the most common architectural mistake in React, and it’s almost always made with good intentions. You want state to be accessible from multiple places, so you lift it up to a common ancestor. Reasonable. Except that ancestor is now App, and every time a checkbox toggles in a deeply nested form, your entire component tree re-renders.

If your App component’s state changes, every child component re-renders even if their props didn’t change. This cascading effect kills performance at scale. The mental model that helps here: state should live as close as possible to the components that actually use it. Not one level above. Not in a global provider “just in case something else needs it later.” Right next to the thing that needs it. This is called state colocation, and it’s one of those ideas that sounds obvious until you see how rarely it’s actually practiced.

If only two components in a subtree share a piece of state, their nearest common ancestor is the right home for it not the root. If state is only used by one component, it belongs inside that component, full stop. Split your components into two types: container components that handle the logic, and presentational components that are pure display. Because they have no local state, presentational components only re-render when their props actually change.

The performance difference between “state at the root” and “state colocated” can be dramatic in a large component tree. It’s also the kind of change that makes every subsequent optimization easier, because you’re no longer fighting against a re-render cascade every time you try to fix something specific.

Mistake 5: Context without memoization

React Context is one of those features that feels like a complete solution right up until it isn’t. You set up a provider, everything can access your global state, PRs get merged, life is good. Then someone notices that updating the user’s theme preference is somehow causing the entire dashboard to re-render, including the charts, the sidebar, and the table that has nothing to do with theming.

Here’s why. When you pass an object as the context value which almost everyone does that object is recreated on every render of the provider component. Every consumer sees a new reference. Every consumer re-renders. Even the ones that only care about one field in that object that didn’t change.

// every render of AppProvider creates a new value object
// every consumer re-renders every time anything changes
function AppProvider({ children }) {
  const [user, setUser] = useState(null);
  const [theme, setTheme] = useState('dark');
  const value = { user, setUser, theme, setTheme };
  return <AppContext.Provider value={value}>{children}</AppContext.Provider>;
}

Every state update created a new value object. Every context consumer re-rendered. Including components that only cared about the theme, not the user.

Two fixes, and you’ll usually want both. First, wrap the context value in useMemo so the reference only changes when the actual data changes. Second, split large contexts into smaller ones by concern a UserContext and a ThemeContext rather than one AppContext that holds everything. A component that only reads theme should never re-render because the user object updated.

This one is particularly brutal in larger apps because context is often set up early, before the component tree is complex enough to make the problem visible. By the time you feel it, the context is load-bearing and everything’s wired to it.

Mistake 6: `useEffect` doing too much

useEffect is the Swiss Army knife of React hooks, which is both its strength and the reason developers use it to solve problems it was never designed for. The classic version of this mistake: a useEffect with a dependency array that's either wrong, empty when it shouldn't be, or so long it fires on basically every render anyway.

The subtler version is using useEffect for things that are actually derived state or event-driven logic. If you're running an effect to compute a value from existing state, that should probably be useMemo. If you're running an effect in response to a user action, that logic belongs in the event handler not in a side effect that triggers after the render. Using useEffect without proper dependencies, or too many, can trigger unnecessary logic or cause infinite loops.

The infinite loop variant is a rite of passage. You set state inside a useEffect, that state triggers a re-render, the effect fires again, sets state again, render again, and so on until your browser tab starts sweating. It happens to everyone. The less dramatic version an effect that fires twice as often as it should because the dependency array includes an object that gets recreated on every render is more insidious because it's harder to notice and easy to shrug off as "just how React works."

It’s not just how React works. It’s a dependency array that needs fixing.

A useful heuristic: if you can’t explain in one sentence what your useEffect is synchronizing with the outside world, there's a decent chance it shouldn't be a useEffect at all.

These three mistakes compound each other in particularly ugly ways. State too high up means more components consuming context. Context without memoization means all those components re-render together. Overloaded effects means those re-renders trigger more side effects. By the time you feel it, the performance problem isn’t one thing it’s a system of bad decisions that arrived gradually and now all need untangling at once.

The good news: fixing any one of them improves things measurably. Fixing all three feels like discovering your app had been running with the handbrake on.

Bundle crimes

Everything so far has been about what happens at runtime components re-rendering when they shouldn’t, state living in the wrong place, effects misfiring. These next two mistakes happen before a single line of your React code executes. They happen at load time, and they’re the reason some apps feel slow before the user has even done anything.

The bundle is the thing you’re shipping. It’s the JavaScript your users have to download, parse, and execute before they can interact with your product. Most developers think about it roughly once during the initial setup and then stop thinking about it entirely as the codebase grows. Features get added, dependencies get installed, and the bundle gets quietly heavier with every sprint until your Lighthouse score starts looking embarrassing.

According to HTTP Archive data, the median JavaScript payload for desktop users is over 500 KB, with mobile users often downloading significantly more. That’s the median. Plenty of production React apps are shipping multiples of that. On a mid-range phone on a decent connection, that’s a real wait before anything is interactive and most users won’t stick around for it. Growin

Mistake 7: Not using `React.lazy` and `Suspense`

The default behavior in a React app without code splitting is simple: everything ships together. Your landing page bundle includes the code for your settings panel, your admin dashboard, your onboarding flow, and every modal you’ve ever built. The user visiting your homepage for the first time downloads all of it, even though they haven’t navigated anywhere yet and statistically might never open half of those routes.

React.lazy and Suspense exist specifically for this. They let you split your bundle at the route or component level, loading chunks only when they're actually needed.

const AdminDashboard = React.lazy(() => import('./AdminDashboard'));

function App() {
  return (
    <Suspense fallback={<Spinner />}>
      <AdminDashboard />
    </Suspense>
  );
}

React.lazy() and Suspense, when combined with route-level code splitting or dynamic imports, offer a reliable and modern solution to optimize loading behavior. Growin

The gains here can be significant. A route-level split on a mid-sized app commonly cuts the initial bundle by 30–50%, which translates directly into faster time-to-interactive. The user landing on your homepage gets only what they need to render that page. Everything else loads on demand, in the background, when it becomes relevant.

The reason developers skip this is usually that the app worked fine without it during development. Local development has no network latency and no cold cache, so a 2MB bundle feels instantaneous. Your users on mobile networks in the real world are having a measurably worse time, and the Profiler won’t show you that you have to look at your bundle analyzer and your Core Web Vitals to see it.

If you’re on Next.js or Remix, a lot of this is handled for you at the framework level. If you’re on a custom Vite or Webpack setup, route-level lazy loading is one of the highest-leverage changes you can make with the least amount of code.

Mistake 8: Importing entire libraries when you need one function

This one has been a known issue for years and it still shows up constantly. The pattern looks like this:

import _ from 'lodash';

const result = _.groupBy(data, 'category');

You needed groupBy. You imported all of lodash. Lodash is around 70KB minified and gzipped which is not catastrophic on its own, but it's also not free, and it compounds with every other library you're doing the same thing with.

Importing an entire library rather than just one or more components can dramatically increase your bundle size. A large bundle can slow download times, thereby negatively affecting the user experience. Use named imports rather than default imports, and use code-splitting so you can load only the code you need. TFTUS Official Blog

The fix is named imports, which tree-shake correctly:

import groupBy from 'lodash/groupBy';
// or
import { groupBy } from 'lodash-es';

The lodash-es variant is the ESM version of lodash, which plays nicely with modern bundlers and tree-shaking. Only the functions you actually import end up in your bundle.

Lodash is just the most common example. The same mistake gets made with moment.js a library that is famously large and should almost always be replaced with date-fns or the native Intl API at this point. It gets made with UI component libraries that export everything from a single entry point. It gets made with icon packs where someone imports the entire icon set to use three icons.

The way to catch this at scale is the Webpack Bundle Analyzer or the equivalent for Vite. Run it once on your production build and look at what’s actually inside your bundle. The results are often surprising in ways that are equal parts educational and mortifying. You will almost certainly find at least one dependency in there that you forgot you installed, and at least one that you installed for something you ended up not shipping.

This is worth doing before you spend time on any runtime optimization. It doesn’t matter how well-memoized your components are if you’re making users wait three seconds to download the JavaScript before any of that code runs.

These two mistakes live in a different category from the previous six because they’re invisible during development and only show up in production metrics. Your app feels fast locally. Your users experience something different. The fix for both requires adding a step to your workflow looking at bundle output, running Lighthouse against a production build, checking Core Web Vitals in Search Console rather than changing how you write components.

That habit of looking at what you’re actually shipping is one of the things that separates engineers who ship fast apps from engineers who ship apps that feel fast on their own machine.

The tools you’ve been ignoring

The previous eight mistakes were all things you did. These last two are things you didn’t do which somehow makes them worse. There’s a special category of performance problem that exists not because you wrote something wrong, but because you never reached for the tool that would have either prevented the problem or shown you it existed.

These aren’t obscure. They’re not advanced. They ship with React or have been in the ecosystem for years. They just require you to stop and think about scale before scale becomes the problem, and most of us don’t do that until a user complains or a Lighthouse score goes red.

Mistake 9: Rendering a thousand items when you could render twelve

At some point in almost every data-heavy React app, someone builds a list. The list works great with twenty items in development. It goes to production. The list now has two thousand items. Scrolling feels like dragging furniture across carpet. The component that renders each row is perfectly optimized memoized, no inline functions, stable keys and it doesn’t matter at all, because you’re rendering all two thousand of them simultaneously into the DOM whether the user can see them or not.

This is the problem that virtualization solves, and it’s one of those solutions that feels almost too simple once you understand it. Instead of rendering every item in the list, you render only the ones currently visible in the viewport plus a small buffer above and below for smooth scrolling. As the user scrolls, items that leave the viewport get unmounted, new ones get mounted. The DOM stays small. Performance stays flat regardless of dataset size.

import { FixedSizeList as List } from 'react-window';

<List
  height={600}
  itemCount={items.length}
  itemSize={72}
  width="100%"
>
  {({ index, style }) => (
    <div style={style}>
      <UserCard user={items[index]} />
    </div>
  )}
</List>

[react-window] only renders visible items, making scrolling smooth. Rendering thousands of DOM nodes at once kills performance. DEV Community

react-window is the standard library for this, maintained by Brian Vaughn who also built the React DevTools. It's small, fast, and well-documented. react-virtuoso is a newer alternative with more built-in features if you need variable item heights or grouped lists. Both work on the same principle.

The reason developers skip virtualization is usually that they don’t anticipate scale. The list has twenty items, the list works, ship it. Then the list grows. The performance cost of rendering a DOM node for every item in a list scales linearly double the items, roughly double the render time, double the memory usage, double the layout work the browser has to do on every scroll event. By the time you feel it, you often have a list that’s expensive to refactor because everything downstream depends on how it currently works.

The rule of thumb: if a list could plausibly ever exceed fifty items in production, plan for virtualization from the start. It’s much easier to set up on a new component than to retrofit it onto a complex one that’s been in production for six months.

Mistake 10: Never opening the React DevTools Profiler

This is the one that quietly enables every other mistake on this list to survive as long as it does. The Profiler has been part of React DevTools since React 16.5. It shows you exactly which components re-rendered, how long each render took, and why the render was triggered. It is, genuinely, one of the most useful debugging tools in the frontend ecosystem. Most developers have it installed and have never clicked the record button.

The workflow is straightforward. Open DevTools. Go to the Profiler tab. Hit record. Interact with the part of your app that feels slow. Stop recording. Look at the flame graph.

What you’re looking for: components that re-render more often than they should, and components whose render time is disproportionately long. The Profiler color-codes this for you gray means fast, yellow means moderate, orange and red mean you should probably look at this. You can click any bar in the graph to see exactly why that component rendered, including which prop or state value changed to trigger it.

React DevTools Profiler will show you whether a component re-render is expensive enough to justify memoization. In React development, especially as applications grow more interactive and component-driven, it’s easy to introduce performance issues without realizing it. Growin

This matters because performance optimization without measurement is just guessing. You add useCallback somewhere because it seems like the right area. You split a component because it feels too big. You might be right. You might be optimizing something that was already fast while the actual bottleneck sits three components over, untouched. No cargo-cult programming just understanding the system and applying targeted fixes. DEV Community

The Profiler removes the guessing. It tells you where the fire actually is, not where you think it might be. Every fix on this list inline functions, context memoization, colocated state, code splitting lands differently depending on your specific app. The Profiler is how you know which ones to prioritize and whether your changes actually did anything.

A practical habit: run the Profiler on any feature before you ship it, not after someone complains. It takes three minutes and it has saved me from shipping performance regressions more than once. It’s also genuinely interesting seeing exactly how React thinks about your component tree teaches you things about the framework that no article can.

These two mistakes share the same underlying cause: not thinking about scale and measurement until after the problem exists. Virtualization is what you add when you think ahead about large datasets. The Profiler is what you use when you want to stop guessing and start knowing.

Together, they close the loop on the entire list. The first eight mistakes give you specific patterns to avoid. These two give you the habit of catching what slips through anyway.

You’re not off the hook just because React 19 exists

Here’s the take that’s going to age either very well or very poorly: most of what’s on this list is going to become less relevant over the next two years, and that should make you more careful, not less.

React 19’s biggest change is the React Compiler, which automatically optimizes components without manual useMemo or useCallback wrappers. It analyzes your code at build time and applies memoization where it's safe. The inline function problem, the useCallback cargo-culting, a chunk of the re-render chaos the compiler handles a meaningful portion of that automatically. That's genuinely good news and the React team deserves credit for it. DEV Community

But I keep coming back to something that doesn’t change regardless of what the compiler does. Architecture isn’t a compiler problem. State living in the wrong place, context providers that re-render everything downstream, useEffect being used as a catch-all for logic that should live elsewhere none of that gets fixed at build time. The compiler won't save you from architectural issues like overly broad context providers or massive component trees. You still have to understand the system well enough to design it correctly. DEV Community

And here’s the uncomfortable part: if you learned React by reaching for useMemo and useCallback without understanding why, the compiler bailing you out doesn't mean you understood what it fixed. It means you got lucky. The next framework, the next abstraction, the next performance problem that falls outside what the compiler covers that's where the gap shows up.

Performance isn’t a checklist you run through once before a release. It’s a design skill. It lives in the decisions you make about where state goes, how components are composed, what you ship in the initial bundle, and whether you ever actually look at what your app is doing under load. The engineers I’ve seen consistently ship fast products aren’t the ones who know the most hooks. They’re the ones who profile before they optimize, think about scale before it’s a crisis, and treat the bundle as something they’re responsible for not something that happens automatically.

The React compiler is a great tool. So is the Profiler. So is react-window. None of them replace the judgment call about whether your component tree makes sense.

If this list had a single takeaway it’d be this: open the Profiler on your current project today, before you read another article, before you install another dependency. Record thirty seconds of normal usage. See what’s actually happening. You might be surprised. You might be horrified. Either way, you’ll know something real and that’s where every fix on this list actually starts.

Drop your worst React performance horror story in the comments. Bonus points if the root cause was on this list and you didn’t know it until a user complained.

Helpful resources

React DevTools Profiler docs official guide to actually using the tool
React.lazy and Suspense React docs on code splitting
React-window on GitHub list virtualization by Brian Vaughn
Webpack Bundle Analyzer see what you’re actually shipping
React Compiler docs what it does and doesn’t do
web.dev Core Web Vitals the metrics that actually matter to users
Kent C. Dodds on state colocation the best deep-dive on mistake #4

The only AI agents article you’ll ever need

Fri, 15 May 2026 00:47:34 +0000

From ReAct loops to production multi-agent systems everything in one place, nothing left out.

Somewhere between the fifteenth “AI agent” LinkedIn post and the third vendor announcing their autonomous workflow platform, I stopped nodding along and started asking the obvious question nobody seemed to want to answer:

What is actually running here?

Because the word “agent” has been doing a lot of heavy lifting lately. It’s been stretched over chatbots with a search button, over multi-step pipelines glued together with vibes, and over genuinely sophisticated systems that can decompose a task, call external APIs, reflect on their own output, and course-correct all without you touching a keyboard. Those are not the same thing. Not even close.

I’ve spent the last few months going deep on this. Built agents that failed in embarrassing ways. Read the research, the docs, the Reddit threads where someone’s agent looped itself into a $600 API bill at midnight. Took the courses. Talked to people shipping this stuff in production at scale. And I distilled all of it down into this article.

This is not a hype piece. There are no screenshots of a ChatGPT conversation doing something mildly impressive. This is the full picture from the first-principles question of what an agent actually is, through the design patterns that separate demos from real systems, all the way to the production concerns that nobody tweets about: evaluation, latency, cost, observability, and security.

Whether you’re just trying to understand what your team keeps talking about in standups, or you’re actively building agent systems and hitting walls, this is the article you keep open in a tab and come back to.

TL;DR: An AI agent is an LLM inside a loop, equipped with tools, memory, and decision-making logic. Building one that demos well takes an afternoon. Building one you’d actually trust with real work takes a different mindset closer to distributed systems design than prompt engineering. This article covers both ends of that spectrum and everything in between.

What an agent actually is

If you’ve used ChatGPT to write an email, you’ve used an LLM. You gave it a prompt, it gave you an output, done. One shot. Linear. The model doesn’t remember what it did, doesn’t check its own work, doesn’t go looking for missing information. It just generates the next most likely tokens until it hits the end and stops.

An agent is what happens when you take that same model and put it inside a loop.

Instead of prompt-in, response-out, you get something closer to how a human actually tackles a non-trivial task. You plan a little. You gather some information. You do a first pass. You read it back, notice what’s wrong, and fix it. You check one more thing. You finish. That back-and-forth, that iterative reasoning over multiple steps that’s the core of what makes something an agent rather than a fancy autocomplete.

The technical name for this loop is the ReAct pattern: Reason, Act, Observe, repeat. The model reasons about what to do next. It acts usually by calling a tool, running a search, querying a database, executing some code. It observes the result. Then it either gives you a final answer or loops back to reason again based on what it just learned. That cycle is the engine underneath almost every agent system you’ll encounter, from the simplest LangChain pipeline to Claude Code rewriting your entire test suite.

Here’s what makes this more powerful than it sounds. Each pass through the loop adds depth. The model isn’t trying to solve everything in one shot under the pressure of a single context window it’s allowed to work iteratively, to gather information it didn’t have at step one, to catch mistakes it made in step two. The output at the end of three loops is almost always better than the output of one. Not because the model got smarter, but because the architecture gave it room to think.

The practical upside of this shows up immediately in tasks that need accuracy and sourcing. Legal research where you have to cite specific cases. Customer support that requires pulling account details before responding. Code generation that needs to run the code, read the error, and try again. Any domain where a single-pass answer is almost certainly incomplete or wrong that’s where agents earn their keep.

Here’s the mental model I use: a regular LLM call is a consultant who reads your brief once and writes a report on the plane. An agent is that same consultant who actually does the research, drafts something, reads it back, realizes they missed a key detail, goes and finds it, then rewrites the section. Same intelligence, different process. The process is what changes the output quality.

One thing worth clearing up early: the model itself isn’t what makes something an agent. You can build a mediocre agent on GPT-4 and a great one on a smaller, faster model with a well-designed loop and the right tools. The architecture and the task decomposition matter more than the leaderboard position of the underlying LLM. Remember that when someone tries to sell you on “the most agentic model” the model is one part of the system, not the whole thing.

The core building blocks

Before you write a single line of agent code, you need to understand four things. Get these right and everything else becomes easier to reason about. Get them wrong and you’ll spend weeks debugging behavior that feels random but isn’t.

Context is the agent’s entire world. Whatever isn’t in the context window doesn’t exist as far as the model is concerned. Context engineering deciding what goes in there is one of the most underrated skills in agent development. It includes the task description, the agent’s role, any memory from previous steps, available tools, and relevant background knowledge. A poorly engineered context produces an agent that hallucinates, repeats itself, or completely ignores the instructions you thought were obvious. Most agent bugs aren’t model bugs. They’re context bugs.

Memory comes in two flavors. Short-term memory is what the agent writes down as it works intermediate results, tool outputs, notes to itself. Long-term memory is lessons from previous runs, stored and loaded at the start of each new task. The combination is what lets an agent improve over time rather than starting from zero on every execution. Knowledge is different from both it’s static reference material you load upfront. Documentation, PDFs, database access. The agent reads from it but doesn’t update it.

Task decomposition is the part nobody talks about enough. The rule is simple: break each step down until a single LLM call or a single tool can handle it cleanly. If a step is too big, the output gets sloppy. The exercise is to think about how you’d do the task yourself what are the actual discrete steps then figure out which of those steps map to an LLM call, which map to a tool call, and which map to a bit of regular code. When something isn’t working, nine times out of ten a step is too coarse.

Guardrails are the bouncer at the door. Because LLMs are non-deterministic, you can’t assume the output will always be in the right format, the right length, or factually consistent with the sources the agent just retrieved. Guardrails are the layer that catches these failures before they reach the user or before they get passed to the next step in the pipeline and silently corrupt everything downstream. Some guardrails are just code: check the output format, validate the schema, enforce length limits. Others use a second LLM to judge quality. And sometimes the right guardrail is a human checkpoint especially for anything irreversible.

Four concepts. Everything else in agent design is built on top of them.

Four design patterns that actually work

Once your building blocks are solid, the next question is how you structure the actual behavior of the agent. There are four patterns that show up in almost every serious agent system. You don’t always need all four, but you need to know all four.

Reflection

The simplest and most effective upgrade you can make to any agent. Instead of shipping the first output, the agent critiques it and rewrites it.

The model produces something, reads it back with a prompt like “what’s wrong with this and how would you fix it,” then revises. That second pass almost always improves the result not because the model is smarter on round two, but because reviewing is an easier cognitive task than generating from scratch. You’re offloading the hard part across two steps instead of cramming it into one.

Reflection is especially powerful when you can add external feedback to the loop. Write code, run it, feed the error back, try again. Generate JSON, validate it against a schema, send the validation errors back if it fails. That concrete feedback signal is what separates reflection from just asking the model to “try harder.”

The tradeoff is latency and cost you’re doing multiple passes. Test with and without it before you commit.

Tool use

An LLM by itself is a text generator. It doesn’t know what time it is, can’t query your database, can’t run code, and has no idea what’s in your company’s internal docs. Tools fix that.

You define a set of functions web search, database query, code execution, calendar access, whatever your use case needs and the model decides when and which ones to call. Under the hood, the model doesn’t actually execute anything. It outputs a structured request, your code runs the function, and the result gets fed back into the context. The model uses that result to continue.

Well-designed tools have clear names, plain-English descriptions of when to use them, typed input schemas, and clean error handling. Think of them as an API your agent uses. Document them like one.

Planning

Instead of following a hardcoded sequence of steps, the agent decides what to do and in what order.

You give it a toolkit, prompt it to create a step-by-step plan, and execute that plan running each tool, feeding results back, and repeating until the task is done. The model acts as its own project manager. This is powerful for tasks where you can’t anticipate every possible path upfront, like a customer service agent handling wildly different request types.

The catch: more autonomy means more unpredictability. Planning agents need tight guardrails, permission checks, and good logging. The strongest current use case is agentic coding systems where the task space is well-defined even if the exact steps aren’t.

Multi-agent collaboration

Some tasks are too complex, too long, or too varied for one agent to handle well. The answer is the same one humans figured out a long time ago: build a team.

Each agent gets a specific role and only the tools that role needs. A researcher agent does web search and retrieval. A writer agent handles drafting. A reviewer agent checks quality. A manager agent coordinates the others. Specialization produces better output than one generalist trying to do everything inside a single sprawling context window.

The coordination patterns range from simple sequential handoffs researcher finishes, passes to writer, writer passes to reviewer to parallel execution where independent agents run simultaneously and merge results. Most production systems start sequential and add parallelism only where latency actually matters.

Multi-agent systems are not the default answer. They add real complexity: agents can conflict, communication overhead adds up, and debugging a failure that happened three agents deep is genuinely painful. Start with one agent. Add a second only when the first one has a clear ceiling it can’t break through.

Shipping to production

This is the section that doesn’t make it into the demo videos. Everything up to this point gets you a working agent. This is what gets you a trustworthy one.

Evaluate before you optimize

The most common mistake people make with agents is trying to improve something they haven’t measured. Before you touch a prompt, swap a model, or restructure a pipeline, you need to know what’s actually failing and how often.

Some evals are simple.

Does the customer service agent correctly identify whether an item is in stock?

That’s a pass/fail check you can automate. Others are harder is this research report actually good? For those, use a second LLM as a judge. Give it a consistent rubric, have it score outputs on a 1–5 scale, and track that score across runs.

Evaluate at two levels. Component-level tells you which specific step is underperforming. End-to-end tells you whether the final output is actually good. If end-to-end scores are low but every component scores fine, the problem is in the handoffs between steps that’s a different fix than a bad prompt.

Start evaluating on day one. An imperfect eval that exists beats a perfect eval you’re still designing.

Latency and cost are the same problem

Every extra LLM call costs time and money. In agent systems, those calls stack up fast.

The fix is the same for both: measure each step, then attack the biggest buckets. Parallelize anything that doesn’t depend on the step before it multiple web searches, multiple document fetches, multiple sub-tasks that can run simultaneously. Right-size your models use a smaller, faster model for simple steps like keyword extraction or format validation, and reserve the expensive one for actual reasoning. Cache aggressively search results, embeddings, intermediate summaries. If the input is identical, don’t recompute.

One research agent run might cost a few cents. At a thousand runs a day that’s hundreds of dollars a month. Know your per-run cost before you scale.

Log everything, assume nothing

Traditional software fails with stack traces. Agent systems fail silently the output looks plausible, the logs show no errors, and something is still wrong.

Observability for agents means tracing every decision: what did the agent plan to do, what tool did it call, what came back, what did it decide next. Tools like LangSmith and Weights & Biases are built for exactly this. When something breaks and it will you want to be able to replay the exact sequence of steps that produced the bad output and see precisely where it went sideways.

Beyond individual traces, track aggregate metrics over time. Hallucination rate. Task success rate. Average cost per run. These trend lines tell you whether your changes are actually helping or just moving the problem around.

Sandbox your code execution

If your agent can write and run code and the useful ones usually can you need to treat that capability like a loaded gun. Run all code in an isolated container that gets destroyed after each execution. Set hard timeouts and memory limits. Whitelist the libraries it’s allowed to use. Never let agent-generated code write to anywhere that matters or reach the network unless you explicitly decided it should.

The failure mode here isn’t theoretical. An agent with unrestricted code execution and a bad prompt is a very expensive, very fast way to ruin your afternoon.

Production isn’t a different version of your demo. It’s a different discipline entirely.

The job changed. Most people haven’t caught up yet.

Here’s the take I’ll leave you with, and you can disagree with me in the comments: the bottleneck in AI development right now isn’t the models. The models are good. The bottleneck is engineers who understand how to build reliable systems around them.

Prompting was never the skill. It was always the entry point. The actual work designing context, decomposing tasks, wiring tools, evaluating outputs, controlling costs, tracing failures that’s systems design. It always was. The wrapper just changed.

The developers who are going to do interesting things with agents in the next few years aren’t the ones who found the best jailbreak or the cleverest chain-of-thought trick. They’re the ones who treat agents the way they treat any other distributed system: with logging, with testing, with failure modes they planned for, with an understanding of what happens when one component does something unexpected.

That’s not a pessimistic take. If anything it’s the opposite. It means the skills you already have debugging, systems thinking, knowing when to add complexity and when not to transfer directly. You’re not starting from zero. You’re applying what you know to a new kind of component.

Agents are going to keep getting more capable, more autonomous, and more embedded in real workflows. The tooling is improving fast. The patterns are stabilizing. This is a good time to actually understand the stack rather than just use the abstraction on top of it.

Build something small. Evaluate it honestly. Add one pattern at a time. Log everything from day one. That’s the whole playbook.

Helpful resources

Anthropic’s guide to building effective agents the clearest first-principles breakdown of agent design patterns available
LangChain documentation practical starting point for building agent pipelines in Python
LangSmith tracing and evaluation tooling built specifically for LLM applications
Building an agentic system deep technical breakdown of how tools like Claude Code are architected under the hood
Weights & Biases production monitoring and experiment tracking for ML systems
OpenAI Cookbook agent examples real code examples for tool use, multi-agent patterns, and evals
r/LocalLLaMA where practitioners actually talk about what’s working and what isn’t

Go vs Rust: the only backend language debate that actually matters in 2026

Thu, 14 May 2026 08:02:23 +0000

You don’t need to pick one. You need to know which fight each one was built for.

There’s a certain kind of developer who treats language choice like a religion.

Go devs will tell you Rust is overkill.

Rust devs will tell you Go is for people who don’t understand memory.

Both are partially right. Both are completely missing the point.

Here’s the thing: choosing between Go and Rust in 2026 isn’t really a language debate anymore. It’s a system design decision. And most of the hot takes you’ll find online are arguing about the wrong thing comparing raw benchmarks and borrow checker frustration instead of asking the actual question:

Where in your architecture does this choice matter, and when does it stop mattering?

I’ve watched teams rewrite entire Go services in Rust because a benchmarks blog post made someone nervous. I’ve also watched Rust codebases grind sprint velocity to a halt because the team picked the wrong tool for a job that really just needed a couple of goroutines and a coffee break.

Neither is winning. Both are expensive.

The honest answer in 2026 is boring:

Go and Rust aren’t competing. They’re slotting into different layers of the same system. One builds the thing. The other makes the thing survive.

Understanding which layer needs which tool is the only skill that actually pays out.

TL;DR: Go is your default. Fast enough, ships fast, scales horizontally, and your team won’t hate you for picking it. Rust enters the picture when specific parts of your system hit a wall that Go can’t resolve without throwing more money at AWS. The debate isn’t Go or Rust it’s Go first, Rust where it earns its keep.

Go: the default loadout every backend team starts with

There’s a reason Go became the lingua franca of cloud-native backend development.

It’s not because Google has a good marketing department.

It’s because Go made a specific, opinionated bet:

Developer velocity and system predictability matter more than raw performance.

For most production systems, that bet pays out every single time.

Think of it like picking your starter in an RPG. Go is balanced stats across the board. Not the highest damage output, not the tankiest build but the one that gets you through 80% of the game without hitting a wall. You pick it, you learn the basics, and you’re shipping features by the end of the week.

The concurrency model is where Go genuinely earns its reputation.

Goroutines are cheap enough to spin up thousands without sweating memory. The channel model gives you a mental framework for concurrent work that doesn’t require a degree in threading theory to reason about.

Building an SQS consumer that fans out to 20 parallel workers?

That’s an afternoon in Go.

Writing a gRPC service sitting behind an ALB on ECS?

Go makes that boring in the best possible way. And boring is exactly what you want in a production system .

The AWS ecosystem leans into this hard:

Fast startup times mean tighter Lambda cold starts
Low memory footprint means cheaper ECS containers
The AWS SDK for Go v2 covers everything from S3 to EventBridge without fighting you

The broader ecosystem is settled too. Gin and Chi for HTTP routing, sqlc for type-safe queries, Wire for dependency injection if that’s your thing. The compiler errors are readable. Onboarding a new engineer onto a Go codebase takes days, not weeks.

That last point matters more than most architecture blogs admit.

The best language for your system is the one your team can debug at midnight without wanting to quit their job. Go clears that bar repeatedly.

Where Go starts showing cracks is predictable if you know where to look.

The garbage collector has improved dramatically but GC pauses are still a reality in latency-sensitive workloads. Memory efficiency plateaus when you’re doing genuinely CPU-heavy work. And if you’re processing high-throughput data streams where every microsecond of predictability matters, Go’s runtime is making decisions for you that you might not agree with.

That’s not a criticism. That’s Go doing exactly what it was designed to do.

The tradeoff is explicit. Most teams never hit the ceiling.

The ones that do need a different tool for that specific corner of the system.

That tool has a crab mascot and a notoriously opinionated compiler.

Rust: the late-game unlock you didn’t know you needed

Nobody picks up Rust on day one.

You come to Rust after you’ve shipped something. After you’ve scaled something. After you’ve sat in an incident review trying to explain why your Go service started spiking p99 latency at a completely unpredictable interval and the only honest answer was “the GC decided it was time.”

That’s the origin story for most Rust adoption in production. Not ideology. Not a rewrite-everything agenda. Just a specific part of the system that stopped behaving and needed a tool with harder guarantees.

Think of it like unlocking a late-game weapon in an RPG. You don’t get it at the start. You earn it. And once you have it, you don’t use it on every enemy you save it for the boss fights.

The core promise of Rust is control.

No garbage collector means no GC pauses. No runtime surprises. What you write is what runs, with predictable memory behavior from the first request to the millionth. The borrow checker the thing everyone complains about until they don’t is just the compiler enforcing that promise at build time instead of letting it blow up in production at 3am.

In AWS terms, this matters in very specific places:

High-throughput Kinesis or Kafka consumers where processing latency compounds
Lambda functions where cold start time and memory ceiling are both constrained
ECS services doing CPU-heavy transformation work on tight instance budgets
Real-time fraud detection or risk scoring pipelines where a GC pause is a business problem

The async ecosystem has matured to the point where this is actually enjoyable to build now. Tokio is the async runtime most production Rust services are built on, and Axum gives you an ergonomic HTTP layer that won’t make you miss Go’s simplicity quite as much as you’d expect.

Crates.io has filled in the gaps that used to make Rust feel incomplete for backend work. There’s a crate for almost everything now serialization, database access, observability, AWS integrations. It’s not Go’s ecosystem in terms of breadth, but it’s not the frontier territory it was three years ago either.

The WASM angle is worth mentioning too.

Rust compiles to WebAssembly better than almost anything else, which opens up edge compute scenarios Cloudflare Workers, Fastly Compute where you want near-native performance in a serverless container with a sub-millisecond cold start. Go can do this too, but Rust’s output is leaner and the toolchain support is stronger.

The honest cost of Rust is team velocity at least upfront.

The borrow checker has opinions. Strong ones. Loudly expressed. Onboarding an engineer who hasn’t written Rust before is a multi-week investment, not a multi-day one. Code review takes longer. Simple things that would take an hour in Go can take a morning in Rust while you negotiate with the compiler about who owns what.

But here’s the flip side nobody puts in the benchmarks post:

Once the code compiles, it tends to just work.

Not “works in staging” work. Not “works until load testing” work. The class of bugs that Rust’s type system eliminates at compile time null pointer dereferences, data races, use-after-free are the exact bugs that cause 2am incidents in Go and every other language that trusts you more than the compiler does.

The tradeoff isn’t really speed vs safety. It’s upfront cost vs long-term stability.

For the right part of your system, that’s an easy trade.

Real architecture patterns: how teams actually use both

Here’s what nobody tells you in the language comparison posts:

Most production systems that use Rust don’t replace Go.

They add Rust to specific coordinates in the architecture where Go stopped being enough. The rest stays Go. The team keeps shipping. Nobody rewrites everything and nobody has an existential crisis about the stack.

These are the three patterns that show up repeatedly in real systems.

Pattern 1: Go orchestrates, Rust crunches

This is the most common one and the cleanest.

Go handles everything that talks to other things the API layer, the service-to-service communication, the SQS consumers that fan out work, the schedulers that trigger jobs. It’s the connective tissue of the system. Fast enough, easy to reason about, easy to change.

Rust handles the part that actually does the heavy lifting.

A real example: a data pipeline that ingests raw events from Kinesis, runs enrichment and scoring logic, and writes results to DynamoDB. The Go service manages the consumer group, handles retries, and pushes work into a processing queue. The Rust binary does the actual scoring CPU-bound, memory-intensive, latency-sensitive. Two languages, one coherent system, each doing the job it was built for.

Pattern 2: Cost optimization at scale

This one sneaks up on you.

Your Go services are fast enough. Response times are fine. Users aren’t complaining. But the AWS bill is climbing faster than your traffic is growing, and when you dig in, you find a handful of services that are just eating CPU and memory disproportionately.

That’s the cost optimization signal.

Rewriting those specific services in Rust not everything, just the ones burning resources can drop memory footprint significantly and increase throughput per instance. Fewer instances needed. Smaller instance sizes. Same traffic handled for less money.

At low scale this is a rounding error. At high scale this is a conversation your CFO notices.

The Benchmarks Game numbers aren’t perfectly representative of production workloads, but the directional signal is real: Rust is consistently 2–5x more efficient than Go on CPU-bound work, and the memory story is even more pronounced.

Pattern 3: Latency-sensitive systems where GC pauses are a product problem

Some systems can’t tolerate unpredictability at the tail.

Financial systems where a p99 spike causes a missed execution window. Voice and video processing where a pause means a glitch the user hears. Real-time analytics dashboards where a stall breaks the illusion of live data.

In these cases Go’s GC even with tuning introduces a variance floor that you can’t engineer away. You can shrink it. You can schedule around it. You can’t eliminate it.

Rust eliminates it.

No runtime. No collector. The memory behavior of a Rust service under load is the same as the memory behavior of a Rust service at rest deterministic, predictable, boring in exactly the way your SLA needs it to be.

This isn’t about raw speed. It’s about the shape of the latency distribution. Go might average faster on a given workload and still lose this comparison because the tail is worse.

The common thread across all three patterns is the same:

You don’t choose Rust instead of Go. You choose Rust for the specific part of the system where Go’s tradeoffs stop working in your favor. Everything else stays exactly as it was.

Build with Go. Optimize with Rust. Ship both.

The wall: when Go stops being enough

Every backend system hits a wall.

At first, everything is fine.

You spin up services, deploy to ECS, wire up queues, add a scheduler, scale horizontally things just work. APIs respond fast enough. The team ships features. The infra bill is acceptable.

Then growth happens.

Traffic increases
Latency spikes in weird places
Costs start climbing faster than usage
Debugging gets harder
Small inefficiencies compound into real problems

And suddenly the question changes.

It’s no longer:

“How fast can we build this?”

It becomes:

“How do we keep this system predictable, efficient, and scalable without slowing the team down?”

That’s where the Go vs Rust decision actually starts to matter. Not at the beginning. Right when you hit that wall.

How to know if you’ve actually hit it

Before you start a Rust migration conversation, run this check first.

Most systems that feel slow aren’t CPU-bound. They’re waiting on databases, on network calls, on downstream services that are slower than they should be. Rewriting a Go service in Rust doesn’t fix a Postgres query that’s missing an index. It doesn’t fix an N+1 problem in your ORM. It doesn’t fix a Lambda that’s cold-starting because you gave it 128MB of memory and called it done.

Profile before you decide.

If your bottleneck is I/O and it usually is Go is not your problem. Fix the query. Add the cache. Right-size the instance. Go home.

If your bottleneck is genuinely CPU sustained high utilization, processing-bound work, memory pressure that doesn’t resolve with horizontal scalingthen you have a real signal. That’s the wall. That’s where Rust earns the conversation.

The team cost is real too

Here’s the part that gets skipped in every benchmark post.

Introducing Rust into a Go shop isn’t free. You’re adding a second language to your codebase, which means:

Two build pipelines to maintain
Two sets of idioms for new engineers to learn
Code review that requires Rust-literate reviewers
A slower ramp for anyone joining the team fresh

None of that is disqualifying. All of it is real. The question is whether the performance gain in the specific component you’re optimizing is worth the ongoing operational tax across the whole team.

For most teams, the answer is: yes, but only for a small slice of the system.

The quick reference

The actual decision

Use Go to build systems.

Use Rust to optimize systems.

Start in Go. Ship in Go. Run in Go. When a specific component starts costing you more than it should in latency, in money, in stability isolate it. Profile it. And if the data points at a genuine CPU or memory problem that Go can’t resolve without throwing more hardware at it, that’s your Rust entry point.

Don’t migrate the whole system. Rewrite the one service that earned it.

Then go back to shipping in Go.

Conclusion: Go builds companies, Rust survives them

Here’s the take nobody wants to say out loud:
Most teams that argue about Go vs Rust don’t have a language problem. They have a prioritization problem. The system isn’t slow because of the runtime. It’s slow because of the decisions made three sprints ago that nobody had time to revisit.

Language choice is downstream of system design. Always.

Go became the default backend language of the cloud-native era not because it’s the fastest or the most elegant or the most technically impressive thing you can put on a resume. It became the default because it consistently produces systems that teams can build quickly, reason about clearly, and operate without a dedicated platform engineering team just to keep the lights on.

That’s genuinely hard to beat.

Rust earns its place in the stack the same way any good tool earns its place by solving a specific problem better than everything else available. Not because it’s newer. Not because the crab mascot is charming. Because when you need deterministic memory behavior, zero GC overhead, and the kind of compile-time guarantees that let you sleep through the night without a PagerDuty notification, nothing else in the backend ecosystem comes close.

The developers who will win in the next few years aren’t the ones who picked a side in this debate. They’re the ones who got comfortable moving between both who can ship a Go service on a Tuesday and drop into a Rust codebase on a Thursday without losing a step.

Polyglot isn’t a buzzword anymore. It’s a survival skill.

The question was never Go or Rust. It was always Go and Rust, applied with enough discipline to know which one the problem actually needs.

Pick the tool that fits the job. Ship the thing. Move on.

And if someone on your team is still writing LinkedIn posts about which language is objectively better in 2026 send them this article and go touch grass.

What do you think? Is your team running both in production, or did you go all-in on one? Drop your take in the comments I read every one.

Helpful resources

A Tour of Go best starting point if you’re new to Go
The Rust Book the official, actually-readable Rust guide
Tokio async runtime where most production Rust backend work lives
Axum web framework ergonomic HTTP for Rust services
AWS SDK for Go v2 the one you actually want to use
The Benchmarks Game real language performance comparisons, not blog post numbers
Crates.io Rust package registry, wa more mature than it used to be
Gin HTTP framework Go’s most popular web framework