Leena Malhotra

Posted on Dec 23, 2025

What Broke When I Let One AI Handle an End-to-End Workflow

#webdev #programming #ai

I gave Claude complete control of building a customer onboarding flow. Not just code generation—the entire workflow. Requirements gathering, architecture decisions, implementation, testing strategy, deployment planning. Everything.

The experiment lasted six hours before I had to intervene. Not because the AI was incompetent, but because I'd made a category error about what AI actually does well.

This isn't another "AI can't replace developers" think piece. I already knew that. This is about something more interesting: the specific points where single-model thinking breaks down, and what that reveals about how we should actually be using these tools.

The Setup

The task seemed perfect for AI automation. Build a simple onboarding flow for new users: email verification, profile setup, tutorial walkthrough, first action completion. Standard stuff. Nothing exotic.

I opened Claude and decided to treat it like a solo developer. Give it the full context, let it make decisions, see how far pure AI-driven development could go without human intervention beyond prompting.

First hour: incredible. Claude generated a clean architecture proposal, explained tradeoffs between different approaches, wrote initial implementations with proper error handling. The code looked production-ready.

Second hour: still solid. Integration tests, edge case handling, even thoughtful comments explaining non-obvious decisions.

Third hour: cracks appeared.

Fourth hour: I was debugging AI-generated code that referenced other AI-generated code that assumed things about yet more AI-generated code. The system had become a hall of mirrors.

By hour six, I abandoned the experiment and started over with a different approach entirely.

Where It Broke Down

The failure wasn't in any single piece of code. Every individual function, every isolated component, every unit test—all perfectly competent. The breakdown happened at the integration points between decisions made in isolation.

Problem 1: Context fragmentation across conversations. Claude made reasonable architectural choices in the planning phase. Then made different reasonable choices during implementation. These choices weren't contradictory in isolation, but they didn't compose well together. The authentication strategy assumed stateless sessions. The user profile system assumed stateful sessions. Both decisions were defensible. Together, they created friction.

Problem 2: Optimizing for demonstration over integration. The AI consistently generated code that looked impressive in isolation but created integration debt. Each function was well-documented and self-contained. Each module had clear interfaces. But the system as a whole required careful orchestration that the AI never considered because it was always focused on the immediate task.

Problem 3: No memory of implicit assumptions. I'd mentioned in passing during the planning phase that we were targeting mobile-first users. Claude incorporated this into initial decisions. Three hours later, when implementing the tutorial system, it generated a desktop-optimized solution. Not because it disagreed with mobile-first, but because it had no persistent memory of that constraint across the conversation.

Problem 4: Inability to recognize systemic complexity. When I asked Claude to implement email verification, it generated a clean, simple solution. When I asked for profile setup, another clean solution. When I asked for the tutorial walkthrough, another clean solution. None of these solutions acknowledged that they were part of a larger state machine with timing dependencies and rollback requirements.

The Fundamental Limitation

Here's what I learned: single-model AI excels at local optimization but struggles with global coherence.

This isn't a capability problem—it's an architectural one. When you give one AI end-to-end control, you're asking it to operate like a solo developer with severe amnesia. It can reason brilliantly about the problem in front of it, but it has no persistent understanding of the larger system it's building into.

Every prompt becomes a new context window. Every response optimizes for that specific question. The AI doesn't maintain a running model of the system state the way a human developer does. It doesn't wake up thinking about yesterday's architectural decisions. It doesn't carry forward the implicit constraints and priorities that shape every subsequent choice.

This creates a specific pattern of failure: locally optimal decisions that create globally incoherent systems.

What Actually Works

After the failed experiment, I rebuilt the same onboarding flow using a different approach. Instead of one AI handling everything end-to-end, I used multiple specialized models for different aspects of the problem.

I used Claude 3.7 Sonnet for architectural decisions and system design thinking. Its strength is reasoning about tradeoffs and explaining complex technical choices. I'd describe the requirements, get its perspective on different approaches, then use that to shape my own decisions.

For implementation details and code generation, I switched to GPT-4o mini—faster, more focused on producing clean code within clear constraints. I'd give it specific, bounded tasks: "implement this specific function with these exact inputs and outputs." No room for architectural creativity. Just clean execution.

When I needed to verify my understanding of complex logic flows, I'd use the Code Explainer to break down what I'd written or what the AI had generated. This created a feedback loop where AI generation, human review, and AI explanation all reinforced each other.

The difference was dramatic. Instead of six hours of increasingly tangled code, I had a working system in three hours. The code was cleaner, the architecture more coherent, the integration points smoother.

The Multi-Model Insight

Using multiple specialized AI models isn't just about picking the best tool for each job. It's about creating a system where the boundaries between models force you to maintain coherence.

When you switch from Claude to GPT between architecture and implementation, you're forced to articulate your decisions clearly. You can't rely on implicit context or assumed knowledge. You have to specify exactly what you want implemented and why.

This friction is valuable. It prevents the local optimization problem. Each transition point becomes a moment of clarity where you have to think about how pieces fit together.

It's similar to how good software architecture uses interfaces to enforce boundaries. The interface forces you to think clearly about contracts and dependencies. Using multiple AI models creates a similar forcing function for your thinking.

The Orchestration Layer

What I realized through this experiment: the developer isn't being replaced by AI. The developer is becoming the orchestration layer between specialized AI capabilities.

This is a fundamentally different role than traditional development, but it's not less important. If anything, it requires deeper understanding of system design and integration patterns.

You need to know which problems require deep reasoning versus fast execution. You need to understand when to let AI explore possibilities versus when to constrain it tightly. You need to maintain the global context that no single AI model can hold.

Tools like Crompt make this orchestration practical by giving you access to multiple models in one interface. Instead of juggling different platforms and conversation threads, you can switch between Claude, GPT, and specialized tools while maintaining your own mental model of the larger system.

The platform becomes your workspace for conducting different AI capabilities toward a coherent outcome. You're not asking one AI to handle everything. You're coordinating multiple AI strengths toward a goal that requires integration thinking no single model can provide.

The System Design Perspective

This experience changed how I think about AI-assisted development. The question isn't "Can AI build this?" The question is "Which parts of this problem map well to AI capabilities, and how do I maintain coherence across the boundaries?"

Some tasks genuinely benefit from AI handling them end-to-end. Code refactoring within a single file. Documentation generation. Test case creation. These are bounded problems where local optimization is sufficient.

Other tasks require human orchestration across AI capabilities. System architecture. Integration design. State management across multiple services. These are coherence problems where someone needs to maintain the global picture.

The skill isn't knowing how to code anymore—it's knowing how to break problems into pieces that AI can handle well, then integrate those pieces into systems that actually work.

What This Means for Development

We're moving toward a world where developers spend less time writing code and more time designing systems that coordinate AI-generated components. This isn't the death of programming. It's the evolution of programming toward higher-level abstraction.

You still need to understand code deeply. You still need to reason about performance, security, edge cases, and user experience. But increasingly, your job is specifying what should exist and verifying that it works, rather than manually typing out every implementation detail.

This requires different skills than traditional development. You need to be better at:

Prompt engineering as specification. Learning to describe exactly what you want in a way that leaves no room for misinterpretation. This is harder than it sounds.

System thinking across integration points. Understanding how independently-generated components will interact and where friction will emerge.

Pattern recognition for AI failure modes. Knowing when AI output looks plausible but contains subtle errors that will compound into larger problems.

Verification and testing strategy. Building confidence in code you didn't write through systematic testing and code review practices.

These aren't easier skills than traditional programming. They're different skills. And the developers who adapt fastest will be those who recognize that AI isn't replacing development work—it's changing the level of abstraction where development work happens.

The Real Lesson

Letting one AI handle an end-to-end workflow taught me that the future of development isn't about AI autonomy. It's about AI specialization coordinated by developers who understand both system design and AI capabilities.

The tools are getting better. The models are getting smarter. But the need for human orchestration isn't going away—it's becoming more important as the complexity of what we can build increases.

Your value as a developer isn't in typing code faster than an AI. It's in maintaining coherence across complexity that no single AI model can fully grasp.

That's not a consolation prize. That's the actual work.

Ready to coordinate multiple AI capabilities toward coherent outcomes? Try Crompt AI free—where you can switch between specialized models while maintaining the system thinking no single AI can provide.

DEV Community