Maksim Matlakhov

Posted on Jul 28 • Edited on Jul 30 • Originally published at blog.vibetdd.dev

Can AI Actually Lead Development? What I Learned from 4,000 Lines of AI-Generated Code

#ai #tdd #vibecoding #testing

There's a question that keeps popping up in developer communities, on Twitter, in tech podcasts, and probably in your own mind: Can AI actually lead software development?

It's not just academic curiosity. With tools like Claude, ChatGPT, and GitHub Copilot getting more sophisticated, some developers are reporting that they barely write code anymore - they just prompt, review, and ship. The promise is tantalizing: describe what you want, let AI figure out how to build it, and move on to the next feature.

But does it actually work?

The Question Everyone's Asking

I've seen variations of this question everywhere:

"Should I let ChatGPT write most of my code?"
"Can Claude architect an entire application?"
"Is AI-driven development the future?"
"Will we even need to know how to code in 5 years?"

The responses are usually polarized. AI enthusiasts point to impressive demos and rapid prototyping success stories. Skeptics highlight limitations, hallucinations, and the irreplaceable value of human expertise.

But here's what bothered me: most of these discussions were based on toy examples or theoretical scenarios.

I wanted real data.

My Reality Check: Three Components, One Question

Instead of debating in the abstract, I decided to run a proper experiment. I built a real project with three distinct components, each with different levels of AI involvement:

Component 1: Chrome Extension - Let AI lead completely

Component 2: Web Application - Heavy AI assistance with human oversight

Component 3: Backend Services - Selective AI help for specific tasks

The project was substantial enough to reveal real patterns - not just the honeymoon phase where everything looks promising, but the maintenance phase where reality sets in.

What "AI-Led Development" Actually Looks Like

When I say "AI-led," I mean I approached development like this:

Describe the feature in natural language
Let Claude generate the implementation
Test the result and ask for fixes if needed
Move to the next feature without deep code review

This mirrors how many developers are actually using AI tools today. It's the "vibe coding" approach - fast, intuitive, and optimistic.

The Chrome Extension: Pure AI Leadership

For the Chrome extension, I went all-in. Claude generated everything:

Content scripts for scraping LinkedIn activity
Background service workers
Popup UI and interactions
Data processing and storage logic
Manifest configuration

Initial Result: 4,000 lines of working code in just a few days. The extension actually functioned - it could scrape posts, comments, and likes from LinkedIn. I was impressed.

The Reality Check: When I started adding features and fixing bugs, I discovered the hidden costs of AI leadership:

1,000 lines of dead code - duplicate functions, unused imports, commented-out experiments
Overengineered solutions - complex try-catch blocks where simple validation would suffice
Inconsistent patterns - the same functionality implemented three different ways
Architecture drift - what started clean became a sprawling mess as the AI "helped" with each new feature

After cleaning up, only about 40% of the original code was actually necessary.

The Fear Factor: But here's what really bothered me - I became afraid to touch certain parts of the code. When you don't fully understand logic you didn't write, making changes becomes risky. The extension had no tests (testing browser extensions is genuinely challenging), so every modification felt like walking through a minefield.

I started getting anxious whenever I opened a file with more than 150-200 lines. Those files had become black boxes where changing one thing might break three others in ways I couldn't predict.

The Web Application: Heavy Assistance with Guardrails

For the Vue.js web app, I maintained more control but still relied heavily on AI:

What Worked:

Rapid component scaffolding
Quick CSS styling with Vuetify
Boilerplate reduction for forms and data handling

What Broke Down:

AI preferred custom solutions over framework conventions (building title wrappers instead of using Vuetify's title props)
Resistance to creating reusable components - everything got inlined
Inconsistent component patterns within the same app
Context loss leading to repeated explanations of project structure

The Backend: Selective AI Partnership

For backend services, I used AI more strategically:

Generate API endpoint boilerplate
Create data validation logic
Write test cases for specific scenarios

This approach worked much better, but it required me to:

Maintain architectural vision
Review every generated piece
Ensure consistency with existing patterns
Make all design decisions myself

Even here, when I experimented with letting AI handle more complex business logic, the results were often disappointing. I'd get 100 lines of "AI spaghetti" that I could refactor down to 20 lines of clear, simple code. The AI's tendency to over-engineer struck again, even in smaller doses.

The Hidden Costs of AI Leadership

The experiment revealed costs that aren't obvious when you're moving fast:

1. Technical Debt Accumulation

AI doesn't think about long-term maintainability. Each feature gets solved in isolation, leading to:

Duplicated logic across components
Inconsistent error handling patterns
Mixed abstraction levels
Circular dependencies

2. The Context Amnesia Problem

Every time I hit token limits and started a new conversation:

Project conventions got forgotten
Architectural decisions needed re-explanation
Code quality gradually degraded
Previously solved problems got re-solved differently

3. Over-Engineering Epidemic

AI tends to implement the most general solution rather than the simplest one:

Generic error handlers for specific use cases
Complex state management for simple data
Defensive programming taken to extremes
Multiple layers of abstraction where none were needed

4. The Debugging Paradox

When AI-generated code breaks:

You need to understand code you didn't write
The AI that created the bug might not be able to fix it
Debugging requires the same skills AI was supposed to replace
Context about why something was implemented a certain way is lost

5. The Maintenance Anxiety

Perhaps most concerning is the psychological impact: you become afraid of your own codebase. When files grow beyond 150-200 lines of AI-generated logic, they become black boxes. Without tests and without understanding the implementation details, every change becomes a gamble.

This is especially problematic with browser extensions, where testing is already challenging and the execution environment adds complexity.

The Verdict: AI as Assistant, Not Leader

After weeks of experimentation, my conclusion is nuanced:

AI excels at: Rapid prototyping, boilerplate generation, implementing well-defined specifications, exploring possibilities quickly

AI struggles with: Long-term architectural consistency, understanding business context, making trade-offs, maintaining simplicity

The real insight: The question isn't whether AI can lead development, but whether AI should lead development.

When AI Leadership Works (And When It Doesn't)

✅ Good Candidates for AI Leadership:

Throwaway prototypes where maintenance doesn't matter
Simple MVPs with well-defined, limited scope
Learning projects where the goal is exploration
Isolated components with clear interfaces

❌ Poor Candidates for AI Leadership:

Production systems that need long-term maintenance
Complex business logic requiring domain expertise
Performance-critical applications where optimization matters
Team projects where consistency and knowledge sharing are crucial

What This Means for Developers

The future isn't AI replacing developers or developers ignoring AI. It's about finding the right relationship:

Developers should lead:

Architectural decisions
Business logic design
Performance optimization
Code review and quality standards
Long-term maintenance strategy

AI should assist with:

Implementation of well-defined specs
Boilerplate and repetitive coding
Testing and validation scenarios
Documentation generation
Refactoring and code transformation

The Path Forward

This experiment convinced me that we need better frameworks for human-AI collaboration in development. Pure AI leadership creates unsustainable code. Pure human development ignores powerful tools.

The sweet spot is developer-led, AI-assisted development with strong quality guardrails.

In upcoming posts, I'll explore how Test-Driven Development can provide those guardrails, turning AI from a chaotic code generator into a disciplined implementation partner.

What's been your experience with AI-led development? Have you found the sweet spot between human oversight and AI assistance? I'd love to hear your stories - both the successes and the disasters.

Top comments (4)

Teemu Piirainen • Jul 29

Excellent experiment and great insights! I’ve made similar observations myself, and ended up building an AI agent control stack that helped mitigate some of the challenges you described.

If you’re curious, I’ll leave a link to my article at the end and if you have time, I’d love to hear your thoughts on whether any of the patterns I used could’ve helped in your case.

A few reflections on your key findings:

1. Technical Debt Accumulation
I saw the same issue. One thing that worked for me was forcing the agent to run scheduled code refactor rounds. In the first sprint, it did duplicate logic but in the refactor phase, these were cleaned up pretty effectively.

“AI preferred custom solutions over framework conventions”
Yep, I had a similar issue, especially when the agent generated native iOS and Android code. I improved this by ~50% just by making the agent read the platform’s official specs (from Apple or Google) as a first step. I think better instruction and task context can really help here.

2. The Context Amnesia Problem
This was one area I felt I solved pretty well. My setup loaded a /rules/ folder, plus task.md and planning.md, at the start of every agent execution cycle. That kept the agent consistent across all tasks. Curious if something like this might’ve helped in your project?

I also agree with your conclusion => AI as Assistant, not Leader.
And under “Developers should lead” I’d add one more thing:
It’s the developer’s job to make sure the agent: A) understands exactly what to build, and B) doesn’t make assumptions.
Any unclear requirement should trigger a clarification prompt before coding even starts.

Here’s my write-up if you want to explore further: dev.to/teppana88/i-shipped-3x-more...

Thanks again for sharing your experience, really valuable read!

Maksim Matlakhov • Jul 29

Moi Teemu ))) Thanks for the interest! You've hit on some key patterns I wish I'd implemented.

Yeah, you are right regarding prompts. I think prompts also should be standardised otherwise it's not clear how to force AI not to don't do crazy stuff.

I'll definitely check out your article. I'm not so expert in mobile development, just built a couple of simple apps for android. But I wanted to understand deeply how it works.

Kiitos )))

Teemu Piirainen • Jul 29

Wait… Moi? 😄
That definitely caught me off guard, not something I expect to see in dev.to comments!

About mobile development: even though I used Flutter, the structure I built is more or less tech-agnostic. I also tested the same setup with a simple Spring Boot backend app, and it worked just as well (I just updated all Flutter-related rules to match the Spring Boot context).

Maksim Matlakhov • Jul 29

Yeah, I lived in Helsinki for 5 years, and I have Finnish roots, check my face on Linkedin linkedin.com/in/maksim-matlakhov/ )))

Btw, for backend I'm gonna use Kotlin + Spring Boot for the experiment.