An honest look after a year of using these tools every single day.
I want to be upfront about my bias before we start: I use AI coding tools constantly. GitHub Copilot lives in my editor. I have Claude Code in my terminal. I reach for Cursor when I'm exploring an unfamiliar codebase. These tools have genuinely changed how I work.
And I've also watched colleagues treat them as a replacement for thinking — and ship some of the most subtle, expensive bugs I've seen in years. So I have opinions.
The internet is split on this. One camp says AI coding assistants are going to replace developers entirely (any day now, they've been saying this for three years). The other camp says they produce garbage code that only a junior developer would accept and serious engineers don't touch them. Both camps are confidently wrong.
The truth is more nuanced and, I think, more interesting: AI coding tools are extraordinarily good at specific things and genuinely dangerous for other things, and the line between those two categories tracks very closely with how much thinking is required.
Let me be specific.
The Tools in Question
There are four tools that serious developers are actually using day-to-day right now. They're different in meaningful ways.
GitHub Copilot is the incumbent. It lives in your IDE as an autocomplete that's gotten very good. It suggests code as you type, completes functions, and will generate boilerplate on demand. In 2026 it's expanded significantly with chat, code review suggestions, and workspace-level context. It's the most "invisible" of the tools — it just works alongside your existing workflow. GitHub's data still shows roughly 30% productivity gains on specific task types, which is real, even if the headline obscures a lot of variance.
Cursor is a fork of VS Code built from the ground up around AI. The key difference from Copilot is codebase awareness. Cursor indexes your entire repository and uses that context when generating suggestions, which means it's dramatically better at understanding your specific conventions, your API signatures, your naming patterns. If you're working in a large, established codebase, Cursor is often significantly more useful than Copilot for exactly this reason.
Claude Code is different from both of these. It's a terminal-based agentic tool, not just an autocomplete. You give it a task ("add pagination to the /users endpoint," "write tests for the authentication module," "refactor this module to use dependency injection") and it works through the task autonomously — reading files, writing code, running tests, fixing errors, iterating. It's the tool that feels most like pair programming with a junior engineer who's very fast and sometimes overconfident.
Tabnine is the enterprise play. It's primarily known for its on-premise deployment option, which lets companies run the model on their own infrastructure — important for industries where sending code to a cloud API is a compliance problem. The code quality is generally a notch below Copilot and Cursor, but the privacy story is much stronger.
Where These Tools Are Genuinely Excellent
Boilerplate and scaffolding
No contest. This is where AI coding tools shine so brightly that you'd be actively hurting yourself not to use them.
Need a REST endpoint with validation, error handling, and a database layer? Claude Code will write a complete, working implementation in under a minute. Need to add a new model to your ORM with all the CRUD operations? Copilot will autocomplete the entire thing as you type the class name. Need to scaffold a new microservice with Docker, CI configuration, and a basic test setup? Any of these tools will handle it.
The code isn't always perfect — you'll want to review it — but it's a genuine first draft that saves you twenty minutes of typing boilerplate you've written a hundred times before. This is not a small thing. Boilerplate is cognitively cheap but time-consuming. Offloading it frees you up for the work that actually requires thinking.
Writing tests
AI tools are surprisingly good at generating test cases, and this matters more than people give it credit for. The hardest part of testing isn't writing the test structure — it's thinking of the cases to test. AI tools help here in an interesting way: they'll often generate edge cases you wouldn't have thought of because they've been trained on millions of bug reports and code reviews where those edge cases mattered.
The typical workflow: write your function, ask the AI to generate tests, review the tests (deleting the ones that test implementation details rather than behavior), and add any cases the AI missed. You end up with better test coverage in less time.
Exploring unfamiliar codebases
This is underrated. When you join a new team or open-source project and you're trying to understand how a 200,000-line codebase fits together, asking Cursor "how does authentication flow work in this codebase?" or "what happens when a payment is processed?" is often faster and more useful than reading documentation (which may be outdated) or asking a colleague (who's busy).
The tools are good at summarizing patterns, tracing call graphs, and explaining what code does. This isn't generating code — it's using AI as a code comprehension layer on top of an existing system. Very useful.
Translating between languages or frameworks
Rewriting a Python script in Go? Migrating from REST to GraphQL? Converting a class component to a React hook? These are mechanical transformations with clear rules, and AI tools handle them well. They know the idioms of most major languages and frameworks and can produce idiomatic translations that a junior developer would struggle with.
Documentation
I'll be honest — nobody loves writing documentation. AI tools make it tolerable. Give them a function and ask for docstrings, they'll generate decent ones. Give them a module and ask for a README, you'll get a draft. It won't be exactly right, but editing a draft is much faster than writing from scratch.
Where These Tools Are Dangerous
This is the part that doesn't get talked about enough.
Architecture and system design
This is the clearest failure mode. Ask an AI tool "how should I structure this system?" and it will confidently give you an answer. The answer will sound reasonable. It will use the right buzzwords. It will probably even be internally consistent.
It might also be completely wrong for your specific situation.
AI tools don't know your team's skill level, your operational constraints, your company's existing infrastructure, your latency requirements, your budget, your regulatory environment, or the ten architectural decisions you made three years ago that everything else depends on. They pattern-match to solutions that worked in situations that look similar on the surface, which is not the same as understanding what will work in your situation.
A senior engineer brings genuine contextual knowledge to architecture questions. "We considered microservices but our team is eight people and we don't have the operational expertise to run a service mesh, so we're staying modular monolith for now" — that's judgment based on specific knowledge. AI tools can't do this. They'll recommend the solution that appears most in their training data, which is usually whatever was popular in engineering blog posts this year.
Use AI tools to explore options and think through trade-offs. Don't let them make the decision.
Security-sensitive code
This is where the "move fast and check later" approach breaks down badly. AI-generated code involving authentication, authorization, cryptography, input validation, or data handling requires extremely careful review — more careful than the same amount of human-written code, because the failure modes are less predictable.
AI tools can generate code that looks correct but has subtle vulnerabilities. SQL injection via string concatenation when you expected parameterized queries. JWT validation that checks the signature but not the expiration. Password hashing that uses MD5 because it appeared in an older tutorial the model was trained on. Overly broad CORS policies. Race conditions in concurrent access patterns.
None of this shows up in testing if you're not specifically looking for it. It shows up in production, or in a security audit, or in a breach.
The rule: treat AI-generated security-sensitive code as if it was written by a capable but inexperienced developer. Review it yourself, have a second person review it, run it through a static analysis tool. Don't skip this because the code "looks right."
Complex business logic
Complex business logic — the rules that encode how your specific domain actually works — is where AI tools tend to produce plausible-seeming code that has subtle errors in the business rules themselves.
The model doesn't know that your refund policy has three different rules depending on whether the customer is in the EU, or that the shipping calculation is different for enterprise accounts, or that this particular edge case was handled a specific way because of a legal requirement from 2023. It will write code that handles the common case correctly and get the edge cases wrong.
You, the domain expert, have to catch this. The risk is that the code looks good syntactically and structurally, so you approve it quickly and miss the logic error. This is subtle and dangerous.
Code that requires understanding previous decisions
AI tools are stateless in an important way: they don't remember why you made the decisions you made. When you're extending a system that has existing architectural decisions, constraints, and patterns, the AI will often suggest changes that violate those decisions — not because the suggestion is wrong in isolation, but because it doesn't know about the constraints.
This is where Cursor's codebase indexing helps, but it's not a complete solution. The full history of why your code looks the way it does lives in your team's collective memory, your architecture decision records (if you write those), your pull request comments, and your tickets. No AI tool has that context.
The Real Skill: Knowing When to Trust and When to Verify
After a year of daily use, the meta-skill I've developed is a kind of calibrated trust. It goes something like this:
High trust, minimal review: Boilerplate, tests, documentation, data transformations, simple utility functions with no external dependencies.
Medium trust, standard review: Integration code connecting known APIs, new endpoints following established patterns in the codebase, refactoring with clear mechanical rules.
Low trust, deep review: Any security-related code, complex business logic with domain-specific rules, architectural suggestions, performance-critical paths, database schema changes.
Very low trust, always re-derive: Cryptography implementations (use a library instead), authorization logic for sensitive resources, anything that will run with elevated privileges.
The mistake junior developers make with these tools is applying the same level of trust uniformly. They've seen the tool produce good boilerplate, so they trust it equally for security logic. That's how subtle bugs get shipped.
The mistake some senior developers make is refusing to use these tools at all because of the failure modes — and then spending time on boilerplate that a tool would have handled in thirty seconds. That's leaving real productivity on the table.
An Honest Assessment of Each Tool
GitHub Copilot: Best for day-to-day coding in a flow state. The inline autocomplete is genuinely good and doesn't interrupt your thinking the way tab-completion-to-full-function can. Weaker on codebase-wide context. If you're already paying for GitHub, the value is there.
Cursor: Best for working in large, established codebases where local context matters. The investment in learning its features (particularly how to write good rules files) pays off significantly. More opinionated about workflow than Copilot — it wants to be your primary editor, not just an extension.
Claude Code: Best for agentic tasks — things you can describe as a goal and let run while you work on something else. Writing a full test suite, adding a feature end-to-end, refactoring a module. Worse at the inline autocomplete flow. The output tends to be more verbose and explicit than Copilot — sometimes that's exactly what you want.
Tabnine: Best when you're in an enterprise environment with strict data residency requirements. If that's you, it's probably your only option and it's serviceable. Otherwise the other tools are meaningfully better.
The Thing Nobody Says Out Loud
AI coding tools are making the gap between good and mediocre developers wider, not narrower.
A good developer uses these tools to move faster on the things they'd do well anyway, while bringing genuine judgment to the things the tools can't handle. A mediocre developer accepts AI output uncritically, ships code they don't fully understand, and accumulates technical debt that's harder to reason about because it was generated rather than designed.
The skill floor hasn't dropped. If anything, the bar for what "a real developer" means has shifted toward higher-order thinking — system design, code review, security reasoning, understanding trade-offs — because the tools have commoditized the rest.
This is uncomfortable to say because it contradicts the "AI is democratizing coding" narrative that sells well. But it's what I'm actually observing in the teams I work with.
The developers who are thriving are the ones who know their domain deeply, have strong opinions about quality, and use AI to amplify that expertise. The ones who are struggling are the ones who hoped AI would let them skip the hard part of becoming good at the craft.
It won't. But it will make the craft faster, once you've done the hard part.
Next: Multi-agent systems — orchestrating agent factories for complex knowledge work, and the surprisingly tricky git workflow that makes it possible.
Top comments (0)