Leena Malhotra

Posted on Mar 18

Lessons from Using AI Tools in Actual Engineering Work

#webdev #programming #ai

I spent six months integrating AI into my daily engineering workflow. Not as experiments or side projects—as the primary way I shipped production code, debugged systems, and made architectural decisions.

This wasn't about maximizing AI use or proving it could replace developers. It was about finding where AI actually made me faster versus where it created new problems I didn't have before.

The results were uncomfortable. AI transformed some parts of my work and made other parts demonstrably worse. The difference had nothing to do with prompting skill or model choice. It had everything to do with understanding which engineering tasks are actually about pattern matching and which require something AI fundamentally cannot provide.

The First Uncomfortable Truth

AI is exceptional at tasks I already know how to do. It's nearly useless for tasks I don't understand yet.

When I asked Claude Opus 4.6 to write a data validation function for a REST API, it generated clean, working code in seconds. But I could have written that function myself in ten minutes. The AI saved me time, but it didn't expand my capabilities.

When I hit a gnarly bug in our authentication middleware—something I'd never debugged before—AI became a liability. It confidently suggested solutions that sounded plausible but were architecturally wrong for our system. Following its advice cost me three hours before I realized I needed to understand the problem myself first.

This revealed a pattern I saw repeatedly: AI accelerates execution of known patterns. It cannot replace the understanding required to navigate unknown territory.

The most productive developers I know use AI primarily for tasks they could do in their sleep—boilerplate, refactoring, test generation. They don't use it for the hard thinking that actually moves projects forward.

Where AI Actually Saved Time

Code generation for well-defined patterns: Writing CRUD endpoints, data transformations, API clients. Anything where the structure is predictable and the requirements are clear. AI generates these faster than I can type, and the code is usually correct on first try.

Refactoring without changing logic: Renaming variables, restructuring files, converting between patterns. I used Gemini 3.1 Pro to refactor a 500-line function into smaller, testable units. It preserved all logic while improving readability. This would have taken me hours of careful manual work.

Test case generation: Once I wrote the implementation, AI generated comprehensive test cases covering edge conditions I would have missed. It's relentless about boundary testing in ways humans aren't.

Documentation: AI wrote better docstrings than I would have written myself. Not because it's smarter, but because it doesn't get bored explaining obvious things.

Translation between formats: Converting API responses, transforming data structures, adapting code between libraries. AI handles these mechanical transformations flawlessly.

The pattern: AI excels at well-defined transformations where correctness can be verified immediately. These tasks don't require judgment—they require precision and patience, which AI has in abundance.

Where AI Actively Hurt Productivity

Debugging unfamiliar systems: AI suggested fixes that looked reasonable but showed fundamental misunderstanding of our architecture. Following these suggestions wasted more time than searching documentation would have.

Making architectural decisions: When I asked AI whether to use a monolith or microservices for a new feature, it gave me a textbook answer that ignored every constraint specific to our system. Generic advice is worse than no advice when context matters.

Understanding legacy code: AI could explain what code did, but it couldn't explain why it was written that way. The why is usually more important than the what when working with legacy systems.

Performance optimization: AI suggested optimizations that looked clever but didn't address the actual bottleneck. It optimized based on theoretical efficiency, not measured reality.

Security reviews: AI confidently missed security issues that would be obvious to anyone who understood the attack surface. It validated code structure but couldn't reason about threat models.

The pattern: AI fails catastrophically when the task requires understanding system-specific context, historical decisions, or constraints that aren't explicit in the code itself.

The Model Comparison Reality

I used three different AI models for the same tasks to see if model choice mattered as much as everyone claims.

For straightforward code generation, all three models (Claude Opus 4.6, Gemini 3.1 Pro, GPT-5.4) produced nearly identical, working code. Model choice didn't matter.

For complex refactoring, the outputs diverged wildly:

Claude prioritized maintainability and extensibility
Gemini optimized for performance
GPT focused on simplicity and readability

None were objectively better. They reflected different philosophies about code quality. This is where using a platform that lets you compare multiple AI outputs becomes valuable—not to find the "right" answer, but to see different valid approaches.

For debugging and problem-solving, all three models were equally unreliable. They generated plausible-sounding explanations that were often wrong in subtle ways.

The lesson: Model choice matters for subjective tasks (refactoring, design) where you want multiple perspectives. It doesn't matter much for objective tasks (code generation, formatting) where there's a clear correct answer.

What Changed About My Workflow

I stopped writing boilerplate entirely. CRUD operations, API clients, data transformations—I let AI generate the first draft and spend my time reviewing rather than writing. This is genuinely faster.

I started writing more tests. When AI can generate comprehensive test cases in seconds, the friction of test writing disappears. I now have better test coverage because the AI doesn't get tired of writing edge case tests.

I became more skeptical of my own code. Using AI to review code I wrote revealed bugs I would have missed. Not because AI is smarter, but because it checks systematically while I check selectively.

I stopped asking AI for architectural advice. Early on, I'd ask AI questions like "How should I structure this feature?" The answers were generic and unhelpful. Now I use AI to execute decisions I've already made, not to make decisions for me.

I developed a multi-model review habit. For any important piece of code, I have multiple AI models review it. They catch different types of issues because they're trained on different data with different biases. Claude Sonnet 4.5 catches conceptual issues, Gemini catches performance issues, GPT catches readability issues.

I stopped trusting AI-generated explanations. When AI explains code or debugging approaches, I verify everything. AI explanations sound authoritative but are often subtly wrong in ways that compound if you build on them.

The Productivity Paradox I Didn't Expect

Using AI consistently made me ship features faster while simultaneously making me worse at certain kinds of engineering.

I became faster at implementation because I wasn't writing boilerplate or doing mechanical refactoring. But I became slower at understanding new codebases because I started relying on AI explanations instead of reading code carefully.

I became better at catching bugs because AI-generated tests were more comprehensive than mine. But I became worse at designing testable code because I wasn't thinking about tests while writing.

I became more productive at executing known patterns. But I didn't improve at the skills that actually advance my career—system design, architectural thinking, understanding complex domains.

The uncomfortable realization: AI can make you more productive while simultaneously making you a worse engineer if you're not intentional about which skills you're outsourcing.

What Actually Works

After six months, here's the workflow that survived:

Use AI for mechanical work you already know how to do. Code generation, refactoring, test writing, documentation. Let AI handle these so you can focus on harder problems.

Never use AI for work you don't understand yet. If you're learning something new or working in unfamiliar territory, AI will give you confident wrong answers that delay your learning.

Use multiple models for code review, not code generation. Generate with one model, review with others. They catch different issues. Platforms like Crompt AI make this practical by letting you compare outputs without switching tools.

Verify everything AI tells you about your system. AI doesn't know your architecture, your constraints, or your history. It gives generic advice. You need specific solutions.

Keep the skills AI is replacing sharp through practice. If you stop writing tests because AI does it better, you'll lose the ability to design testable code. Outsource execution, not understanding.

Use AI as a second pair of eyes, not a first brain. AI is great at catching things you missed. It's terrible at figuring out what you should be looking for in the first place.

The Questions That Actually Matter

The debate about AI replacing developers misses the point. The real questions are:

Which parts of engineering are actually about pattern matching? AI excels here. Code generation, refactoring, test writing—these are largely mechanical once you know what you want.

Which parts require genuine understanding of context? Architecture, debugging, performance optimization, security—these require knowing things about your specific system that AI cannot access.

What happens to your skills when AI handles the mechanical work? If you stop writing code because AI does it faster, do you lose the ability to understand code? If you stop debugging because AI suggests fixes, do you lose the ability to diagnose problems?

How do you stay sharp at skills you're outsourcing? This is the question nobody has answered yet. If AI writes your tests, how do you maintain the skill of designing testable code?

What I'd Tell Someone Starting Today

Don't try to maximize AI usage. Try to maximize the value of your time.

Use AI for anything mechanical where you know exactly what you want and can verify correctness quickly. Code generation, refactoring, test writing—let AI handle these.

Don't use AI for anything that requires understanding your specific system context. Architecture, debugging, performance—these require knowledge AI doesn't have.

Build verification habits. When AI generates code, review it like you'd review code from a junior developer who writes clean code but doesn't understand the system. It will look good but might be subtly wrong.

Use tools that let you compare multiple AI models because different models catch different issues. Single-model workflows miss too much.

Keep practicing the skills AI is replacing. Write code by hand sometimes. Debug without AI assistance occasionally. Design tests manually even though AI can generate them. The skills you stop using are the skills you'll lose.

The developers who thrive with AI won't be the ones who use it most. They'll be the ones who use it strategically for the right tasks while staying sharp at the skills that actually matter.

Because in the end, AI is a tool for execution. Engineering is about knowing what to execute and that's still on you.

Using AI in your engineering workflow? Try Crompt AI to compare multiple model outputs and catch issues single-model workflows miss—because the best code review happens when different AI perspectives meet human judgment.

-Leena:)

DEV Community