DEV Community: David Inyang-Etoh

AI Amplifies Judgment — Or Confusion

David Inyang-Etoh — Wed, 13 May 2026 08:41:31 +0000

How AI Changed the Way I Design, Build, Debug, and Ship Software

AI is quietly creating two kinds of engineers.

Engineers whose judgment becomes exponentially more valuable.

And engineers whose confusion now scales at the same speed.

The difference between them isn't which tools they use.

It's what they bring to the tools.

I write less code manually than I did two years ago.

But I think about systems more than ever.

The bottleneck moved.

For most of software engineering history, the bottleneck was execution. You had the idea. You knew what to build. Writing the code just took time.

AI dramatically reduced that friction.

Now the bottleneck is specification:

defining systems clearly,
communicating intent precisely,
providing the right context,
validating tradeoffs,
and recognizing when the AI is wrong.

That shift is changing engineering workflows faster than most people realize.

"Vibe Coding" vs Actual Engineering

In early 2025, Andrej Karpathy popularized the phrase "vibe coding":

fully giving in to the vibes, forgetting the code exists, and letting AI generate everything.

It captured something real. Watching an AI generate a working app in 40 seconds genuinely felt like magic.

But by 2026, the conversation had evolved.

Karpathy later described the shift toward something closer to agentic engineering — engineers orchestrating systems of AI agents while acting as oversight, validation, and architectural control.

That distinction matters.

Because there's already a visible gap forming between:

developers who collaborate with AI,
and developers who outsource thinking to it.

You can see that gap in production systems, pull requests, architecture decisions, debugging quality, infrastructure mistakes, and increasingly — hiring outcomes.

AI Amplifies Whatever Is Already There

Addy Osmani, Engineering Director at Google, summarized this perfectly:

"If you come to the table with solid software engineering fundamentals, AI amplifies your productivity. Without that foundation, it can amplify confusion."

That's the entire conversation.

AI doesn't replace judgment. It multiplies whatever judgment already exists.

A strong engineer becomes dramatically faster.

A weak engineer can now generate bad code, bad abstractions, and bad infrastructure decisions at unprecedented speed.

That's why I no longer think of AI as "code generation."

I think of it as an implementation partner.

The biggest unlock wasn't better prompting. It was better context engineering.

The quality of AI output changes dramatically when the system understands your architecture, conventions, testing patterns, infrastructure assumptions, dependency boundaries, product constraints, and operational expectations.

Weak context produces generic output. Rich context produces leverage.

That's why tools like Cursor rules, MCP integrations, internal coding standards, architecture docs, and reusable specs matter so much now.

The engineer's job increasingly becomes defining systems, constraining behavior, reviewing output, validating tradeoffs, and directing execution.

Not just typing faster.

Where AI Actually Changed My Workflow

The interesting part isn't that AI can write code.

The interesting part is how it changed the shape of engineering work.

Bridging skill gaps without pretending they don't exist

A friend of mine is an exceptional backend engineer.

Distributed systems, APIs, architecture, infrastructure — extremely strong.

Frontend development was always his weak spot. Not functionality. Design. Spacing, animations, layouts, design systems.

Recently he showed me a new product he built. The frontend looked polished. Clean. Modern. Actually good.

He didn't suddenly become a frontend specialist overnight. He stayed focused on what he already did well — backend architecture, data modeling, APIs, infrastructure, business logic — and used AI tools to transform rough prototypes into usable interfaces.

Were there flaws? Of course.

But instead of spending weeks fighting CSS and UI polish, he had a working, testable product quickly.

That's leverage.

Infrastructure as code became easier to evolve

Before AI, I spent years building infrastructure using AWS SAM and CloudFormation YAML. When AWS CDK became more mature, I transferred that infrastructure knowledge into a more predictable TypeScript-based IaC workflow much faster than I could have done manually — and it aligned our entire ecosystem around a consistent Node.js and TypeScript workflow across backend, frontend, and infrastructure. Later I transferred those same mental models into Terraform for a cloud-agnostic setup.

AI didn't teach me infrastructure. The knowledge already existed. What changed was the speed at which I could translate existing understanding into new ecosystems.

Designing systems interactively with Claude

This is probably the AI workflow I value most.

Before implementing a complex feature, I'll often open Claude and start a design conversation. Not "build this feature." More like: here's the architecture, here are the constraints, here's the scaling concern, here's the event flow, here are the failure cases I'm worried about.

Then we iterate through tradeoffs together. Different approaches. Potential bottlenecks. Event consistency concerns. Caching layers. Race conditions. CQRS boundaries. Infrastructure implications.

Some of the best architectural decisions I've made in the last year came from those conversations.

Not because AI replaced engineering thinking.

Because explaining systems clearly enough for AI to reason about them exposed weaknesses in my own assumptions.

Internal libraries and abstractions became cheaper to build

Microservice architectures naturally create duplication — helpers, utilities, validation layers, shared contracts, event schemas. Before AI, extracting reusable internal libraries often got postponed because the effort-to-value ratio felt too high. Now that calculus changed. I can extract utilities, standardize interfaces, generate typings, enforce conventions, and package reusable components across services in a fraction of the time it used to take. AI reduced the friction of engineering hygiene. That's a bigger deal than it sounds.

Docker environments and production simulation

Setting up realistic local environments used to be tedious — Dockerfiles, compose files, ports, volumes, environment variables, service dependencies, seed scripts, network configuration. Now I define the environment requirements clearly and use AI to generate most of the scaffolding. I still review everything. But reviewing concrete infrastructure is cognitively different from creating it from scratch. That distinction matters more the more complex the stack gets.

AI is surprisingly good at narrowing debugging scope

One of the most underrated productivity gains is debugging assistance — especially vague infrastructure or framework errors. You paste stack traces, logs, deployment failures, dependency conflicts, weird runtime behavior, then provide enough context. AI often becomes very good at explaining probable causes, identifying likely failure points, interpreting noisy logs, and narrowing investigation scope.

I'm not outsourcing debugging. I'm outsourcing translation.

The actual diagnosis still requires understanding the system. But I get to signal faster.

MCP integrations changed workflow orchestration

Cursor and similar tools can now connect directly to external systems through MCP servers — and this is where things start feeling fundamentally different.

The Figma MCP allows agents to read directly from design source files, interpreting components, spacing, typography, and layout constraints without me copying and pasting anything. The design is the context.

The Jira MCP connects implementation to tickets and workflows. Git integrations generate structured PR descriptions and changelogs automatically.

The AI isn't just generating isolated code anymore. It's participating in the operational workflow — design, planning, implementation, documentation, review, release management. The context-switching overhead drops dramatically.

Documentation finally became part of the workflow

Before AI, documentation often happened "later." Which usually meant never.

Now I generate API documentation, Mermaid system diagrams, architecture decision records, coding conventions, onboarding docs, agent rules, and infrastructure references as part of the build loop — not as a post-deadline obligation.

The discipline didn't change. The friction did.

Where AI Wastes Time (And Creates Real Risk)

This is the part most AI productivity discussions skip.

Some tasks are genuinely faster without AI. Much faster.

The README version bump problem

Imagine changing a version number in your README from 15.2 to 16.2.

If you ask an AI coding agent to do it, here's what actually happens: it reads your prompt, searches the repo, scans multiple files, opens the README, locates the line, asks for permission, edits the file, potentially validates dependencies, maybe even tries running lint or install commands.

Several minutes. Multiple permission prompts. Hundreds or thousands of tokens burned.

Or: you open the file and change one character in 10 seconds.

Knowing when not to use AI is now a real engineering skill. Simple find-and-replace operations, single-line edits, renaming a variable across known files — these are faster done manually. Reaching for an agent because it's available is how you burn time and money on tasks that don't need intelligence.

The Terraform mistake that became a billing disaster

We once separated our Terraform infrastructure into its own repository so the security team could contribute independently from the backend application codebase.

During the migration, a colleague relied heavily on AI-generated Terraform recommendations without fully understanding the infrastructure sizing decisions being suggested.

The agent provisioned an aggressively over-scaled DocumentDB setup for a tiny development environment: 8 vCPUs, unnecessary sharding, production-scale assumptions, excessive capacity planning — for a team of about five engineers.

What should have cost roughly $15/day jumped to nearly $350/day.

It took several days before billing threshold alerts exposed the issue.

Nothing technically failed. Terraform applied successfully. The infrastructure followed generic scaling best practices. It was simply completely wrong for our actual context.

That's what happens when you vibe code infrastructure without validation.

Infrastructure mistakes don't always fail loudly. Sometimes they fail financially.

AI-generated tests can create dangerous false confidence

If you ask AI to write tests without sharing your testing conventions, mock abstractions, dependency boundaries, existing factories, or API contracts — it invents mocks that should never exist.

I've seen AI generate fake abstractions, duplicated production logic, brittle mocks, meaningless assertions, and tests that technically pass while validating nothing useful.

The problem isn't AI-generated tests. The problem is unreviewed AI-generated assumptions.

Libraries like jest-mock-extended exist precisely to give you robust, type-safe mock abstractions — AI tools know they exist. But without the right context and conventions in the prompt, they'll reach for the simplest mock pattern available, not the right one for your codebase.

Comprehension debt is real

Addy Osmani describes this as comprehension debt.

When AI generates code faster than engineers can understand it, teams borrow against future maintainability. The first 80% becomes deceptively fast. The last 20% becomes painful — debugging, edge cases, scaling, operational behavior, performance, refactoring, onboarding.

If engineers mentally disengage during generation, they lose the ability to reason about the resulting system.

That's where AI stops being leverage and starts becoming liability.

This Shift Goes Beyond Engineering

The same pattern is emerging across knowledge work — and it's accelerating.

Product and project management: AI can generate user stories, structure Jira tickets, define acceptance criteria, automate release notes, and map dependencies. The PM's role shifts toward defining outcomes clearly enough for systems to execute effectively.

QA and testing: AI dramatically lowers the friction of generating test cases, increasing coverage, building regression suites, creating E2E scenarios, and identifying edge cases. QA increasingly becomes validation, systems thinking, behavioral analysis, and risk identification — not manual execution.

Incident management and reporting: AI can summarize incidents, logs, metrics, deployment timelines, and outage sequences. That removes transcription overhead and lets engineers focus on root-cause analysis and prevention.

Prototyping and experimentation: A/B test variants, feature prototypes, internal tooling, operational dashboards — things that used to take days now take hours. That changes organizational speed dramatically.

The Bottleneck Moved

For decades, engineering bottlenecks were dominated by execution.

Now execution is cheaper.

Specification is harder.

The engineers who thrive in this era won't necessarily be the fastest typists, the best prompters, or the people generating the most code.

They'll be the engineers who can design systems clearly, define constraints precisely, provide high-quality context, validate tradeoffs critically, and recognize immediately when something is wrong.

AI made software execution dramatically cheaper.

Bad engineering decisions are now cheaper to produce too.

That's why fundamentals matter more now — not less.

Because AI amplifies whatever is already there.

The future of engineering isn't "AI replaces software engineers."

It's spec-driven, AI-augmented engineering — where the human engineer remains the director of the system.

So — which kind of engineer are you building yourself into?

The one whose judgment compounds with every tool that ships?

Or the one whose shortcuts are quietly compounding into something else?

Drop it in the comments. 👇

The AI Engineer Illusion: Why Calling LLM APIs Is Not Enough

David Inyang-Etoh — Mon, 11 May 2026 20:03:22 +0000

The AI Engineer Illusion: Why Calling LLM APIs Is Not Enough

Three engineers interviewed for the same role last month.

One had 5 years of Node.js and spent 6 months calling OpenAI APIs.
One had ML fundamentals and shipped two RAG pipelines to production.
One had built and evaluated a multi-agent system — with observability, evals, and drift monitoring in place.

All three called themselves AI Engineers.
Only one actually was.

And the industry has no consensus on which one.

Job boards are flooded with titles like:

AI Engineer
Agentic AI Engineer
Applied AI Engineer
AI Product Engineer
Forward Deployed Engineer
LLM Engineer

Sometimes they describe completely different jobs.
Sometimes they describe the exact same job with different salaries.

Recruiters are confused.
Developers are confused.
Even the companies posting these roles are still working out what they actually mean.

The issue isn't that more people are learning AI. That's a good thing.

The issue is that many people still think AI Engineering is just traditional software engineering with LLM APIs attached to it.

It's not.

Calling the OpenAI SDK, adding a vector database, wrapping everything with LangChain, and shipping a chatbot does not automatically make someone an AI Engineer.

That's just the entry point.

The real work starts after the demo impresses everyone.

Why AI Engineering is becoming a separate discipline
Why RAG and vector databases are not enough
The role of experimentation, evaluation, and observability
Why I built a separate "AI Playground" lab
The hidden cost and latency problems in production AI systems
What building real-world AI infrastructure actually looks like

The Mental Shift Most Engineers Underestimate

Traditional software engineering trained most of us to think in deterministic systems:

inputs → business logic → outputs → tests → deployment.

AI systems break that model completely.

The job is no longer just: "How do I build this?"

It becomes:

Should this even use AI?
Which parts should stay deterministic?
Where does a human need to stay in the loop?
Is the reasoning worth the latency and the cost?
What happens when the model drifts?
Can this scale economically under real production traffic?
Which model is good enough — not just the most powerful?

That's a completely different engineering mindset.

You stop thinking purely like a software engineer.

You start thinking like a systems designer, a data scientist, an evaluator, a cost optimizer — and sometimes a behavioral analyst for systems that don't behave the same way twice.

The biggest misconception I see is engineers treating AI as just another API integration problem.

It isn't.

When your system can return a different output for the exact same input, everything downstream changes — how you test, how you monitor, how you measure quality, how you define "done."

That changes everything.

My "AI Playground" Changed How I Think About Engineering

One thing that completely changed my perspective was building a separate repository I call "AI Playground."

It's not product code.

It's a lab.

A place where I experiment in Jupyter notebooks long before production ever sees an idea.

That lab contains experiments around:

scraping pipelines
ingestion systems
chunking strategies
chunk enrichment before embeddings
retrieval evaluation
prompt engineering
context engineering
semantic search
BM25
reciprocal rank fusion (RRF)
hybrid retrieval systems
embedding evaluations
latency vs quality tradeoffs
model routing
hallucination reduction
agent orchestration
evaluation pipelines
open-source Hugging Face models vs frontier APIs

Because in real AI systems, almost nothing should be assumed.

You test everything.

A retrieval strategy that works perfectly for legal documents may completely fail for conversational memory.

A frontier model may outperform smaller models on reasoning tasks but become economically impossible at scale.

An open-source model may outperform expensive APIs for classification, routing, or embedding generation.

A tiny latency increase may look harmless in development but become catastrophic when multiplied across millions of agent calls in production.

This is why AI Engineering feels much closer to running a continuous lab than building traditional CRUD systems.

The real engineering challenge starts after the prototype impresses everyone.

RAG Is Not the Finish Line

One of the biggest misconceptions right now is treating RAG like the final form of AI Engineering.

RAG is important.
Vector databases are important.

But they are not enough.

Many engineers today are sprinkling AI buzzwords onto existing software engineering workflows and assuming that's the transformation.

That's like wearing a tuxedo with the wrong shoes.

You look the part. Until you don't.

The deeper you go into production AI systems, the more problems you start fighting:

retrieval inconsistency
context pollution
hallucinations
stale embeddings
ranking quality
orchestration complexity
token cost explosions
latency bottlenecks
evaluation drift
unreliable tool usage
memory corruption
unpredictable agent behavior

The "easy chatbot demo" phase ends quickly.

After that, you realize building reliable AI systems is less about generating responses and more about controlling behavior.

That's a very different engineering problem.

Evaluation Never Ends

Traditional software engineering gave most of us a clear testing contract:

unit tests → integration tests → end-to-end tests → ship.

AI systems break that contract.

I ran 200 test cases against Vera's retrieval pipeline before beta.

Completeness score: 2.1 out of 5.

After switching chunking strategy, adjusting overlap, and adding cross-encoder reranking with MMR retrieval — completeness hit 4.0. MRR went from below 0.7 to 0.95.

The unit tests were green the entire time.

That's the terrifying part.

Your dashboards can be green while your users are receiving degraded outputs. No error thrown. No alert fired. Just silent quality erosion.

So you evaluate:

prompts
retrieval quality
reasoning consistency
hallucination rates
ranking strategies
context windows
tool selection
model performance
latency
token efficiency

Then you deploy.

And then you evaluate again — because production behavior changes over time.

Models drift. Contexts drift. User behavior changes. Prompts degrade.

A system that performed well two weeks ago can silently regress without throwing a single technical error.

Evaluation isn't a phase. It's a permanent operating mode.

AI Observability Is a Completely Different Beast

Traditional observability: logs, traces, infrastructure metrics, uptime, exceptions.

AI observability is harder.

Now you're asking:

Why did the agent choose this tool?
Why did reasoning fail at this step?
Which prompt caused the regression?
Which workflow is burning the most tokens?
Where does hallucination frequency spike?
Which retrieval strategy is silently degrading quality?
Which agents are becoming unreliable without anyone noticing?

You're no longer just monitoring systems.

You're monitoring behavior.

Sometimes it feels like managing a team of extremely intelligent interns who occasionally hallucinate with full confidence.

Your agents are employees on permanent probation.

You don't fire-and-forget. You watch. You trace every decision. You hold every node accountable.

And one bad system prompt can quietly turn your green metrics red overnight.

The Hidden Cost Problem Nobody Talks About

Many teams underestimate compounding AI cost at scale.

A tiny latency increase multiplied across:

multi-agent systems
retries
tool calls
retrieval layers
orchestration chains
evaluation pipelines

…can quietly destroy both performance and unit economics.

This is why experienced AI Engineers obsess over:

routing
caching
hybrid architectures
inference optimization
selective reasoning
retrieval precision
token efficiency
model specialization
latency-aware workflows

Sometimes the smartest engineering decision is not using a larger model.

Sometimes the smartest decision is not using AI at all.

Calling an LLM to multiply two numbers or transform simple structured data isn't innovation.

It's misuse.

A lot of production AI engineering is really about knowing where not to use AI.

That's the part most people skip entirely.

AI Engineering Is Becoming Its Own Discipline

The industry is going through what software engineering itself went through years ago:

title inflation mixed with genuine transformation.

And yes — anyone can become an AI Engineer.

But eventually, the gap becomes visible. Between people who can integrate APIs and people who can design, evaluate, optimize, monitor, and evolve intelligent systems reliably in production.

The AI Engineer of the next few years won't look like a traditional application developer.

They'll look like an orchestrator, evaluator, systems thinker, experimentation lead, cost optimizer, and behavioral architect for autonomous systems.

For years, my job as a software engineer was mostly about finding bugs and fixing them.

Now I spend my time supervising semi-autonomous agents, evaluating reasoning behavior, optimizing workflows, controlling cost, designing cognitive systems, monitoring drift, and running lab experiments to make AI systems more reliable before they ever touch a user.

The job description changed completely.

Most people interviewing for the role haven't read it yet.

That's not a criticism. It's an opening.

The engineers who close that gap — who do the lab work, build the eval pipelines, instrument the observability, and develop the instinct for when AI is the wrong answer — those are the ones who will define what this role actually means.

Part engineer. Part scientist. Part strategist. Part guardian.

That's the AI Engineer.

DEV Community: David Inyang-Etoh

AI Amplifies Judgment — Or Confusion

How AI Changed the Way I Design, Build, Debug, and Ship Software

"Vibe Coding" vs Actual Engineering

AI Amplifies Whatever Is Already There

Where AI Actually Changed My Workflow

Bridging skill gaps without pretending they don't exist

Infrastructure as code became easier to evolve

Designing systems interactively with Claude

Internal libraries and abstractions became cheaper to build

Docker environments and production simulation

AI is surprisingly good at narrowing debugging scope

MCP integrations changed workflow orchestration

Documentation finally became part of the workflow

Where AI Wastes Time (And Creates Real Risk)

The README version bump problem

The Terraform mistake that became a billing disaster

AI-generated tests can create dangerous false confidence

Comprehension debt is real

This Shift Goes Beyond Engineering

The Bottleneck Moved

The AI Engineer Illusion: Why Calling LLM APIs Is Not Enough

The AI Engineer Illusion: Why Calling LLM APIs Is Not Enough

In This Article

The Mental Shift Most Engineers Underestimate

My "AI Playground" Changed How I Think About Engineering

RAG Is Not the Finish Line

Evaluation Never Ends

AI Observability Is a Completely Different Beast

The Hidden Cost Problem Nobody Talks About

AI Engineering Is Becoming Its Own Discipline