There's a question that keeps popping up in developer communities, on Twitter, in tech podcasts, and probably in your own mind: Can AI actually lead software development?
It's not just academic curiosity. With tools like Claude, ChatGPT, and GitHub Copilot getting more sophisticated, some developers are reporting that they barely write code anymore - they just prompt, review, and ship. The promise is tantalizing: describe what you want, let AI figure out how to build it, and move on to the next feature.
But does it actually work?
The Question Everyone's Asking
I've seen variations of this question everywhere:
- "Should I let ChatGPT write most of my code?"
- "Can Claude architect an entire application?"
- "Is AI-driven development the future?"
- "Will we even need to know how to code in 5 years?"
The responses are usually polarized. AI enthusiasts point to impressive demos and rapid prototyping success stories. Skeptics highlight limitations, hallucinations, and the irreplaceable value of human expertise.
But here's what bothered me: most of these discussions were based on toy examples or theoretical scenarios.
I wanted real data.
My Reality Check: Three Components, One Question
Instead of debating in the abstract, I decided to run a proper experiment. I built a real project with three distinct components, each with different levels of AI involvement:
Component 1: Chrome Extension - Let AI lead completely
Component 2: Web Application - Heavy AI assistance with human oversight
Component 3: Backend Services - Selective AI help for specific tasks
The project was substantial enough to reveal real patterns - not just the honeymoon phase where everything looks promising, but the maintenance phase where reality sets in.
What "AI-Led Development" Actually Looks Like
When I say "AI-led," I mean I approached development like this:
- Describe the feature in natural language
- Let Claude generate the implementation
- Test the result and ask for fixes if needed
- Move to the next feature without deep code review
This mirrors how many developers are actually using AI tools today. It's the "vibe coding" approach - fast, intuitive, and optimistic.
The Chrome Extension: Pure AI Leadership
For the Chrome extension, I went all-in. Claude generated everything:
- Content scripts for scraping LinkedIn activity
- Background service workers
- Popup UI and interactions
- Data processing and storage logic
- Manifest configuration
Initial Result: 4,000 lines of working code in just a few days. The extension actually functioned - it could scrape posts, comments, and likes from LinkedIn. I was impressed.
The Reality Check: When I started adding features and fixing bugs, I discovered the hidden costs of AI leadership:
- 1,000 lines of dead code - duplicate functions, unused imports, commented-out experiments
- Overengineered solutions - complex try-catch blocks where simple validation would suffice
- Inconsistent patterns - the same functionality implemented three different ways
- Architecture drift - what started clean became a sprawling mess as the AI "helped" with each new feature
After cleaning up, only about 40% of the original code was actually necessary.
The Fear Factor: But here's what really bothered me - I became afraid to touch certain parts of the code. When you don't fully understand logic you didn't write, making changes becomes risky. The extension had no tests (testing browser extensions is genuinely challenging), so every modification felt like walking through a minefield.
I started getting anxious whenever I opened a file with more than 150-200 lines. Those files had become black boxes where changing one thing might break three others in ways I couldn't predict.
The Web Application: Heavy Assistance with Guardrails
For the Vue.js web app, I maintained more control but still relied heavily on AI:
What Worked:
- Rapid component scaffolding
- Quick CSS styling with Vuetify
- Boilerplate reduction for forms and data handling
What Broke Down:
- AI preferred custom solutions over framework conventions (building title wrappers instead of using Vuetify's title props)
- Resistance to creating reusable components - everything got inlined
- Inconsistent component patterns within the same app
- Context loss leading to repeated explanations of project structure
The Backend: Selective AI Partnership
For backend services, I used AI more strategically:
- Generate API endpoint boilerplate
- Create data validation logic
- Write test cases for specific scenarios
This approach worked much better, but it required me to:
- Maintain architectural vision
- Review every generated piece
- Ensure consistency with existing patterns
- Make all design decisions myself
Even here, when I experimented with letting AI handle more complex business logic, the results were often disappointing. I'd get 100 lines of "AI spaghetti" that I could refactor down to 20 lines of clear, simple code. The AI's tendency to over-engineer struck again, even in smaller doses.
The Hidden Costs of AI Leadership
The experiment revealed costs that aren't obvious when you're moving fast:
1. Technical Debt Accumulation
AI doesn't think about long-term maintainability. Each feature gets solved in isolation, leading to:
- Duplicated logic across components
- Inconsistent error handling patterns
- Mixed abstraction levels
- Circular dependencies
2. The Context Amnesia Problem
Every time I hit token limits and started a new conversation:
- Project conventions got forgotten
- Architectural decisions needed re-explanation
- Code quality gradually degraded
- Previously solved problems got re-solved differently
3. Over-Engineering Epidemic
AI tends to implement the most general solution rather than the simplest one:
- Generic error handlers for specific use cases
- Complex state management for simple data
- Defensive programming taken to extremes
- Multiple layers of abstraction where none were needed
4. The Debugging Paradox
When AI-generated code breaks:
- You need to understand code you didn't write
- The AI that created the bug might not be able to fix it
- Debugging requires the same skills AI was supposed to replace
- Context about why something was implemented a certain way is lost
5. The Maintenance Anxiety
Perhaps most concerning is the psychological impact: you become afraid of your own codebase. When files grow beyond 150-200 lines of AI-generated logic, they become black boxes. Without tests and without understanding the implementation details, every change becomes a gamble.
This is especially problematic with browser extensions, where testing is already challenging and the execution environment adds complexity.
The Verdict: AI as Assistant, Not Leader
After weeks of experimentation, my conclusion is nuanced:
AI excels at: Rapid prototyping, boilerplate generation, implementing well-defined specifications, exploring possibilities quickly
AI struggles with: Long-term architectural consistency, understanding business context, making trade-offs, maintaining simplicity
The real insight: The question isn't whether AI can lead development, but whether AI should lead development.
When AI Leadership Works (And When It Doesn't)
✅ Good Candidates for AI Leadership:
- Throwaway prototypes where maintenance doesn't matter
- Simple MVPs with well-defined, limited scope
- Learning projects where the goal is exploration
- Isolated components with clear interfaces
❌ Poor Candidates for AI Leadership:
- Production systems that need long-term maintenance
- Complex business logic requiring domain expertise
- Performance-critical applications where optimization matters
- Team projects where consistency and knowledge sharing are crucial
What This Means for Developers
The future isn't AI replacing developers or developers ignoring AI. It's about finding the right relationship:
Developers should lead:
- Architectural decisions
- Business logic design
- Performance optimization
- Code review and quality standards
- Long-term maintenance strategy
AI should assist with:
- Implementation of well-defined specs
- Boilerplate and repetitive coding
- Testing and validation scenarios
- Documentation generation
- Refactoring and code transformation
The Path Forward
This experiment convinced me that we need better frameworks for human-AI collaboration in development. Pure AI leadership creates unsustainable code. Pure human development ignores powerful tools.
The sweet spot is developer-led, AI-assisted development with strong quality guardrails.
In upcoming posts, I'll explore how Test-Driven Development can provide those guardrails, turning AI from a chaotic code generator into a disciplined implementation partner.
What's been your experience with AI-led development? Have you found the sweet spot between human oversight and AI assistance? I'd love to hear your stories - both the successes and the disasters.
Top comments (4)
Excellent experiment and great insights! I’ve made similar observations myself, and ended up building an AI agent control stack that helped mitigate some of the challenges you described.
If you’re curious, I’ll leave a link to my article at the end and if you have time, I’d love to hear your thoughts on whether any of the patterns I used could’ve helped in your case.
A few reflections on your key findings:
1. Technical Debt Accumulation
I saw the same issue. One thing that worked for me was forcing the agent to run scheduled code refactor rounds. In the first sprint, it did duplicate logic but in the refactor phase, these were cleaned up pretty effectively.
“AI preferred custom solutions over framework conventions”
Yep, I had a similar issue, especially when the agent generated native iOS and Android code. I improved this by ~50% just by making the agent read the platform’s official specs (from Apple or Google) as a first step. I think better instruction and task context can really help here.
2. The Context Amnesia Problem
This was one area I felt I solved pretty well. My setup loaded a
/rules/
folder, plustask.md
andplanning.md
, at the start of every agent execution cycle. That kept the agent consistent across all tasks. Curious if something like this might’ve helped in your project?I also agree with your conclusion => AI as Assistant, not Leader.
And under “Developers should lead” I’d add one more thing:
It’s the developer’s job to make sure the agent: A) understands exactly what to build, and B) doesn’t make assumptions.
Any unclear requirement should trigger a clarification prompt before coding even starts.
Here’s my write-up if you want to explore further: dev.to/teppana88/i-shipped-3x-more...
Thanks again for sharing your experience, really valuable read!
Moi Teemu ))) Thanks for the interest! You've hit on some key patterns I wish I'd implemented.
Yeah, you are right regarding prompts. I think prompts also should be standardised otherwise it's not clear how to force AI not to don't do crazy stuff.
I'll definitely check out your article. I'm not so expert in mobile development, just built a couple of simple apps for android. But I wanted to understand deeply how it works.
Kiitos )))
Wait… Moi? 😄
That definitely caught me off guard, not something I expect to see in dev.to comments!
About mobile development: even though I used Flutter, the structure I built is more or less tech-agnostic. I also tested the same setup with a simple Spring Boot backend app, and it worked just as well (I just updated all Flutter-related rules to match the Spring Boot context).
Yeah, I lived in Helsinki for 5 years, and I have Finnish roots, check my face on Linkedin linkedin.com/in/maksim-matlakhov/ )))
Btw, for backend I'm gonna use Kotlin + Spring Boot for the experiment.