Erik Hanchett for AWS

Posted on Mar 12

5 Things To Avoid When Working With AI Coding Tools

#ai #webdev #coding

As a software developer in 2026 you can't escape AI. It's everywhere, and almost every company is using some sort of AI coding tool. And as a long time full-stack developer whose roots are in the front-end, I wasn't always convinced.

Over the past two years I've seen both sides of AI. The terrible designs, but also the surprisingly decent ones. Demo apps that used to take me hours, done in minutes. But I've also found myself knee-deep in AI slop, wondering if I actually saved any time at all.

So are these coding tools making me faster or not? I had to look into it more.

Want to watch a video on the subject then read about it? Check this out!

Last year, Model Evaluation and Threat Research, or METR, ran a randomized controlled trial and found that when experienced open source developers used AI tools, it took them 19% longer to complete tasks than without. AI was actually a detriment to their productivity.

On the other hand, GitHub conducted their own controlled trial around the same time and found that developers coded up to 55% faster with AI assistance.

So which is it?

After trying these tools out myself for the last two years, I can confidently say I think both are true. And the difference comes down to the approach used. AI tools, just like any other tool, can be misused. If the only tool you have is a hammer, every problem starts to look like a nail. When I first started using AI, I used it for everything. New features, bug fixes, refactors, brainstorming, all of it. And at first it felt amazing. But I kept running into the same problems over and over. The output looked good on the surface, but I'd spend more time fixing what it gave me than if I'd just written it myself.

Here are 5 ways AI can hurt you instead of help you, especially on the frontend.

No Real Feature Definition

I like jumping into a coding agent right away. I start vibe coding as soon as my fingers reach the keyboard, but this actually isn't a great way to start. The AI will absolutely give me what I ask for, and it might even work, but the problem lies in the details.

The design generated is often not great (I'm looking at you GPT) and the features don't always work. Validation, error handling, responsiveness, and accessibility will often be partially implemented or skipped entirely. That's when things fall apart.

The real issue is that you need to define what success looks like. If you don't, the AI just guesses, and though it gets some of it right, it gets a lot wrong. Ambiguity isn't your friend.

The real issue is that you need to define what success looks like.

Don't get me wrong, you don't need a 15 page requirements document with detailed designs for every use case. However, maybe a few bullet points and some basic acceptance criteria will help. Simply defining what the feature should do and not do is the minimum you should ask. Adding this all to a markdown file before you get started will dramatically improve what the AI generates for you.

Too Much Bad Context

So that means I should put everything into the context window? Use as many AGENTS.md and markdown files as I can before I start, right? Well, yes we need more context but there is more to it than that.

The second problem is putting too much into your context window. Whenever you work with a coding tool, I prefer Kiro so I'll use that as an example, it has a defined amount of space it can handle. This is often influenced by the model you select. Some models have larger context windows, others have smaller.

Either way, stuffing everything you can find into lots of markdown files causes the opposite problem from #1. Now the model has to comb through lots of useless information to find what it needs. We've recently seen research that shows that more is not always better.

Recent research across over 60,000 repos found that context files are often too long, too vague, and are actively making agents worse. In one study, accuracy dropped from 87% to 54% just from context overload.

Context overload doesn't just drop accuracy, it can also increase token cost. When every request sends more information than is needed, you uselessly waste tokens, which in turn hits your pocketbook.

At the end of the day, it's more about quality than quantity when you're dealing with context windows. Think about it like giving directions. If someone asks you how to get to the grocery store and you hand them a 200-page atlas, that's technically more information. But it's not more helpful.

When creating an AGENTS.md file (or steering file if you're using Kiro), only include information that is needed for your project. Constrain it to your coding practices, tabs vs spaces, design system, API contracts and rendering. Make sure to remove unneeded fluff, and keep it up to date. When I use Kiro, I create an agent that automatically updates my markdown files when I make changes to my components. That way I always have the latest updates in my context files.

There's probably a Goldilocks zone of context. Not too little, not too much. Just the stuff that actually matters for the task.

Too Much in One Shot

While having too much or too little in your context is important, you still need to know the limitations of what you're asking your coding agent to do. One common problem I see is that developers will try to one-shot a whole application. I'll see prompts to build a frontend, backend, tests, create all the things at once. It feels like you're making a lot of progress, and it outputs something in just a few minutes.

This could actually work for a quick prototype or demo. However, in production this is not what you want. You'll end up with a lot of AI slop that often takes a lot of rework.

I was reading the other day that a whole industry has popped up to help small teams fix their vibe-coded messes. This should really tell you something about the state of the industry.

The reality is that creating everything at once causes the AI to lose architectural consistency. It may solve the same problem different ways in different files. It creates patterns that are contradictory or nonsensical.

To fix this issue, you need to scope down the tasks. Unless you're running some autonomous agent loop that's going to churn through a large detailed requirements document and tasks for hours, you'll need to break things down. Build this component, refactor these tests. Here are the edge cases for this part of the app. The key, as we'll discuss in the next section, is to check the output. It's a lot easier to catch a problem in 50 lines than 2,000.

This problem has actually been mitigated in a lot of ways with spec-driven development. I use Kiro all the time to break down complex features into requirements, design and tasks. That way you can check the plan before the AI writes any code at all.

Too Much Trust

As hinted in the last paragraph, we should not be trusting the output of AI at face value. And of course, it does look convincing at first. When I started using Claude 4.5 Opus, I really thought coding was a solved problem. The code passed my linter, the types looked ok, and it ran well.

But as I started looking at the code in more detail, I saw some issues and edge cases I didn't like. Maybe one day AI will write 100% accurate code, but we still aren't there yet.

There is no question AI makes writing code faster, but you need to spend more time checking the output. In other words, AI compresses generation time, but it often expands verification time.

AI compresses generation time, but it often expands verification time.

I'll be the first to admit, I love writing code, debugging it, finding clever ways to abstract the logic, and seeing it run for the first time. I'm not as big of a fan of code reviews. Something about reviewing other people's code and giving feedback isn't as fun.

However, when AI is doing the writing, reviewing becomes the most important part of your job. For example, on the frontend I've seen things like weak accessibility, brittle logic, weird abstractions, duplicated behavior, and terrible design. Let me emphasize again that AI is not great at design. It's passable if you love purple and you don't mind generic Tailwind-looking apps. You really need to iterate over it several times to get a good outcome.

AI is not great at design!

To help mitigate these problems you can set up tests, and those help, but sometimes the AI games those too. Really though, there is no better way than having a real life breathing human verify everything along the way. You should read the code, understand it, gauge the design, and test the assumptions.

Speed Over Maintainability

As a developer, I love writing code fast. I remember being a young software developer and learning as many hotkeys and keystrokes as I could. I learned Vim so I would never have to touch the mouse, because I knew that every second I touched it, it would slow me down.

With AI writing your code, speed stops being the hard part. I'll never type faster than a large language model can generate. But that's the trap, just because it's fast doesn't mean it's good.
When code gets cheap to generate, bad abstractions get cheaper too.

When code gets cheap to generate, bad abstractions get cheaper too.

AI will over-generate wrappers, components, abstractions and APIs. These AI tools are like eager interns, they want to show off what they can do. But if you don't monitor them closely they'll go off the rails.

And my cognitive load can only handle so much. I remember one of the first codebases I worked on had so many levels of abstraction, it took me 15 minutes of ctrl-clicking into every class and interface to figure out what it was exactly doing. (Yes this was in Java). If you let AI go off, it will do the same thing.

You end up owning code you didn't fully think through or really understand. Your job is to understand what the AI is creating, because six months from now, someone has to maintain it. And that someone is probably you.

Modern Tooling

So where are we at today? Well, we know that models are getting better all the time, and maybe in a few years most of these problems will be solved.

But we are not there yet.

Tools like Kiro have spec-driven development, and other tools have plan mode, checkpoints, and better workflows. But these tools are only as good as the person driving them and there still is no silver bullet. You still need your judgement, your context, and your code review skills to be successful.

Conclusion

If we go back to our two studies at the start, both things can be true at once. Some developers will see productivity gains with AI tools, many others won't. It really comes down to how you use them in your day-to-day life.

You can't vibe code every app, and every app shouldn't be vibe coded. You need the right amount of context, not too little, not too much. You need to scope your asks. You need to become a better code reviewer. And you need to think about whether what the AI created is something you can actually maintain six months from now.

Let me know in the comments if you agree or disagree. Until next time.

Top comments (3)

klement Gunndu • Mar 15

The METR vs GitHub results gap makes sense — vibe coding without success criteria is basically letting the model define the requirements. Worth noting that the 19% slowdown probably includes the time debugging AI-generated code you never fully understood.

Erik Hanchett AWS • Mar 16

Yeah, that could be a part of it. I really think it's a lack of knowledge on how to use these tools. Like you said not defining the success criteria.

CrisisCore-Systems • Mar 21

Good post.

What stands out to me is that the real failure is not just bad output. It is teams slowly losing operational ownership of code that still looks reviewable on the surface.

That is the dangerous part of AI assisted development. The code can compile, pass checks, and even look cleaner than what a rushed human would have written, while still leaving nobody with a strong enough mental model to modify or defend it later under pressure.

So to me this is bigger than productivity or technical debt. It is a question of retained human comprehension. If the team cannot explain the behavior, reason about edge cases, and safely extend the code without leaning on the same tool again, then the debt is already live even if the dashboard stays green.

Strong breakdown.