I Built the Same App With and Without AI Coding Tools: The Difference Was Embarrassing

#ai #coding #frontend #api

A few months back I got into a conversation with a colleague who was skeptical about AI coding tools, not dismissive, actually skeptical, which is a different thing. His argument was essentially: the productivity gains are real in demos but overstated in actual production development, where the complexity lives in the system design and the debugging, not in the typing.

I thought he had a point. So, I did something slightly ridiculous: I built the same small application twice. First without any AI coding assistance, just me, my editor, documentation, and Stack Overflow like it's 2019. Then with AI tools integrated throughout, Cursor for code generation, Claude for architecture discussions and debugging, Copilot for the repetitive scaffolding.

The project was a task management API with a React frontend. Nothing novel. A user authentication system, a PostgreSQL database, CRUD operations for tasks with filtering and tagging, and a basic dashboard. The kind of thing a mid-level developer could build competently. I kept detailed time logs for both builds.

Here's what I found, and I want to be honest about the nuances rather than just giving you the headline number.

The Setup: What the App Actually Required to Build

To be useful as a comparison, the spec needed to be fixed between both builds. Same feature set, same tech stack, same quality bar. No cutting corners in the AI-assisted build to make the comparison look better.

Tech stack: Node.js with Express, PostgreSQL with Prisma ORM, React with TypeScript on the frontend, JWT authentication, deployed to a simple VPS. Testing: unit tests for core business logic with Jest, basic integration tests for the API endpoints. About 60 API routes total, with the frontend dashboard consuming them.

I've built things like this dozens of times. I know the patterns well. This wasn't a case of AI helping me figure out something unfamiliar, it was AI helping me do something I already knew how to do. Which I thought would minimize the advantage. I was wrong about that.

Build 1: Without AI Tools - Time Breakdown and Where I Actually Spent Hours

Total time: 47 hours over 9 days. Here's roughly where it went.
Project scaffolding and configuration: 3.5 hours. Setting up the project structure, configuring TypeScript, setting up the ORM, writing the initial Docker Compose file. This stuff is tedious and I know it well, which means I do it quickly, but it's still time.

Authentication system: 6 hours. Writing the middleware, the JWT handling, the refresh token logic, the password hashing, the validation. This is one of those areas where I'm careful because the cost of getting it wrong is high, so I moved slowly.

Database schema and migrations: 2.5 hours. Schema design was quick because I've done these enough times. Migrations took longer than they should have because of one relationship I kept getting wrong.

API routes, the bulk of the work: 18 hours. Sixty routes with proper error handling, input validation, and some of them with non-trivial business logic. The repetition is real but not completely mechanical, each endpoint has edge cases.

Frontend: 12 hours. React with TypeScript, state management, the dashboard components. This is where I enjoy myself more, but it's also where I spent the most time debugging.

Testing: 5 hours. Writing tests after the fact, which is not the way I prefer to work but was consistent between both builds.

// Auth middleware — I wrote this from scratch in Build 1// ~45 minutes to get right with edge casesexport const authenticate = async (req, res, next) => {  const token = req.headers.authorization?.split(' ')[1];  if (!token) return res.status(401).json({ error: 'No token' });  try {    const decoded = jwt.verify(token, process.env.JWT_SECRET);    req.user = await prisma.user.findUnique({ where: { id: decoded.userId } });    if (!req.user) return res.status(401).json({ error: 'User not found' });    next();  } catch (err) {    return res.status(401).json({ error: 'Invalid token' });  }};

Build 2: With AI Coding Tools - What I Used and How I Actually Used It

Total time: 21 hours over 4 days. Before I explain how, let me be specific about what AI assistance I used and how.

I used Cursor as my primary editor with AI code completion enabled. For anything architectural or complex, I used Claude in a separate window for discussion and longer code generation. I used GitHub Copilot for autocomplete on repetitive patterns. I did not paste the generated code without reading it. Every function that touched authentication, database queries, or user data got a deliberate review before I moved on.

Scaffolding and configuration: 45 minutes. I described the project structure I wanted, got a reasonable starting point, and adjusted a few things. The Docker Compose file took about 8 minutes total.

Authentication system: 1.5 hours. I generated the initial implementation, reviewed it carefully (this is where I found two issues, a missing token expiry checks and a subtle timing attack vulnerability in the password comparison that used == instead of crypto.timingSafeEqual), fixed both, then moved on. Finding those issues would have been equally important without AI help, but I got to the review stage much faster.

// AI generated this — I caught the timing attack issue// Wrong version (AI's first output):if (storedHash == inputHash) { ... }  // vulnerable// Corrected version after review:const isValid = crypto.timingSafeEqual(  Buffer.from(storedHash),  Buffer.from(inputHash));if (isValid) { ... }

API routes: 7 hours, down from 18. This is the biggest single gain. For standard CRUD endpoints with input validation and error handling, I generated the boilerplate, reviewed the logic, added the edge cases, and moved on. The repetitive structure of sixty similar-but-not-identical endpoints is exactly where AI assistance multiplies productivity.

Frontend: 7.5 hours, down from 12. TypeScript component scaffolding, basic state management patterns, hook implementations, the AI did the structural work, and I did the design and logic decisions.

Testing: 3 hours. Generating test structure and basic test cases for common paths, then writing the edge case tests myself. The AI is good at test boilerplate and obvious happy-path tests. Less reliable on edge cases that require understanding your specific business logic.

The Real Numbers and What They Actually Mean

47 hours versus 21 hours. A 55% reduction in development time. My skeptical colleague's specific objection was that the gains disappear in production-quality work. My experience was the opposite, the gains were larger in production-quality work than in the kind of clean demos I'd seen, precisely because production work has so much structured boilerplate that AI handles well.

But here's the nuance: the code quality was roughly equivalent between both builds, and that's because I reviewed everything. If I hadn't reviewed the AI output, the second build would have shipped with two security issues and probably a handful of logic errors I caught on review. The productivity gain is real only if it doesn't come at the cost of the review process.

The areas where AI assistance helped least: debugging runtime errors, especially anything environment specific. Designing the data model, I tried to delegate this and got back something technically functional but not well-suited to how I needed to query the data. Anything requiring me to hold significant system context in my head simultaneously.

Where This Matters for Real Teams and Real Projects

A solo developer or small team working on a well-scoped application will see productivity gains in the range I saw, maybe 40-60% on implementation time if they're disciplined about how they use AI assistance and don't skip the review process.

The gains compress on more complex systems. When the challenge is not 'implement this pattern' but 'design the right architecture for this constraint set,' AI tools are useful discussion partners but not time multipliers in the same way. The 10x productivity claims you see in social media are for the kind of work that's already fast, not the kind of work that's genuinely hard.

There's also a class of production system, the ones with real scale requirements, complex integrations, security surface areas that need proper review, and maintenance teams who need to understand every part of the codebase, where AI-assisted development gets you to a prototype efficiently and then you need experienced engineers to take it further. Mittal Technologies is the kind of team you bring in when the system has grown past what AI-assisted solo development can carry cleanly. The handoff from prototype to production-grade software is a real transition and having people who've made that transition across many different systems is the difference between a codebase you can maintain and one that becomes technical debt.

AI coding tools have made me faster at the things I was already good at. They haven't changed what I need to know, what I need to review, or what requires genuine engineering judgment. That's both a limitation and, honestly, the right way for this technology to work.

FAQs

Which AI coding tool is actually the best for professional development?
Cursor has become my primary editor integration, the context-aware generation within the editor is more useful for day-to-day work than context-switching to a separate chat interface. Claude is my go-to for architecture discussions and longer code generation tasks where I want to think through the approach before implementation. The right combination depends on your workflow.

Does AI-generated code need to be reviewed differently than human-written code?
Yes, with an important qualifier. AI-generated code should be reviewed at least as carefully as code from a junior developer, not because it's worse on average, but because the failure modes are different. AI code tends to be syntactically clean and structurally sensible while sometimes missing subtle security, performance, or logical issues that require domain knowledge to catch. Treat it as a first draft that needs a thorough review, not as production-ready output.

Does using AI tools make you a worse developer over time?
This is the question I think about the most. My current view: if you use AI to avoid understanding what the code does, yes, potentially. If you use it to move faster through things you already understand, and you still do the review, design, and debugging work yourself. No, you probably get better because you're exposed to more patterns and spend more time on the hard parts.

How should I log time spent with AI tools for client billing?
Bill for your expertise and judgment, not for keystrokes. If AI tools reduce your implementation time but the design, review, and quality assurance is still yours, the value you're delivering hasn't changed. Many developers are having this conversation with clients explicitly and finding that honesty about AI use is better received than they expected.

Are AI coding tools reliable for security-critical code?
Unreliable enough to require thorough review. In my experiment, the AI produced two security issues in authentication code that would have shipped if I hadn't caught them on review. Both were subtle. Neither was obvious from reading the code quickly. Security-critical code should always be reviewed by someone with security knowledge, regardless of how it was generated.