<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: David Inyang-Etoh</title>
    <description>The latest articles on DEV Community by David Inyang-Etoh (@dinyangetoh).</description>
    <link>https://dev.to/dinyangetoh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3925772%2F72b80d53-5363-4c4c-914a-cf71e22c57c1.jpg</url>
      <title>DEV Community: David Inyang-Etoh</title>
      <link>https://dev.to/dinyangetoh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dinyangetoh"/>
    <language>en</language>
    <item>
      <title>The AI Engineer Illusion: Why Calling LLM APIs Is Not Enough</title>
      <dc:creator>David Inyang-Etoh</dc:creator>
      <pubDate>Mon, 11 May 2026 20:03:22 +0000</pubDate>
      <link>https://dev.to/dinyangetoh/the-ai-engineer-illusion-why-calling-llm-apis-is-not-enough-4ip5</link>
      <guid>https://dev.to/dinyangetoh/the-ai-engineer-illusion-why-calling-llm-apis-is-not-enough-4ip5</guid>
      <description>&lt;h1&gt;
  
  
  The AI Engineer Illusion: Why Calling LLM APIs Is Not Enough
&lt;/h1&gt;

&lt;p&gt;Three engineers interviewed for the same role last month.&lt;/p&gt;

&lt;p&gt;One had 5 years of Node.js and spent 6 months calling OpenAI APIs.&lt;br&gt;
One had ML fundamentals and shipped two RAG pipelines to production.&lt;br&gt;
One had built and evaluated a multi-agent system — with observability, evals, and drift monitoring in place.&lt;/p&gt;

&lt;p&gt;All three called themselves AI Engineers.&lt;br&gt;
Only one actually was.&lt;/p&gt;

&lt;p&gt;And the industry has no consensus on which one.&lt;/p&gt;

&lt;p&gt;Job boards are flooded with titles like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI Engineer&lt;/li&gt;
&lt;li&gt;Agentic AI Engineer&lt;/li&gt;
&lt;li&gt;Applied AI Engineer&lt;/li&gt;
&lt;li&gt;AI Product Engineer&lt;/li&gt;
&lt;li&gt;Forward Deployed Engineer&lt;/li&gt;
&lt;li&gt;LLM Engineer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sometimes they describe completely different jobs.&lt;br&gt;
Sometimes they describe the exact same job with different salaries.&lt;/p&gt;

&lt;p&gt;Recruiters are confused.&lt;br&gt;
Developers are confused.&lt;br&gt;
Even the companies posting these roles are still working out what they actually mean.&lt;/p&gt;

&lt;p&gt;The issue isn't that more people are learning AI. That's a good thing.&lt;/p&gt;

&lt;p&gt;The issue is that many people still think AI Engineering is just traditional software engineering with LLM APIs attached to it.&lt;/p&gt;

&lt;p&gt;It's not.&lt;/p&gt;

&lt;p&gt;Calling the OpenAI SDK, adding a vector database, wrapping everything with LangChain, and shipping a chatbot does not automatically make someone an AI Engineer.&lt;/p&gt;

&lt;p&gt;That's just the entry point.&lt;/p&gt;

&lt;p&gt;The real work starts after the demo impresses everyone.&lt;/p&gt;




&lt;h2&gt;
  
  
  In This Article
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Why AI Engineering is becoming a separate discipline&lt;/li&gt;
&lt;li&gt;Why RAG and vector databases are not enough&lt;/li&gt;
&lt;li&gt;The role of experimentation, evaluation, and observability&lt;/li&gt;
&lt;li&gt;Why I built a separate "AI Playground" lab&lt;/li&gt;
&lt;li&gt;The hidden cost and latency problems in production AI systems&lt;/li&gt;
&lt;li&gt;What building real-world AI infrastructure actually looks like&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Mental Shift Most Engineers Underestimate
&lt;/h2&gt;

&lt;p&gt;Traditional software engineering trained most of us to think in deterministic systems:&lt;/p&gt;

&lt;p&gt;inputs → business logic → outputs → tests → deployment.&lt;/p&gt;

&lt;p&gt;AI systems break that model completely.&lt;/p&gt;

&lt;p&gt;The job is no longer just: &lt;em&gt;"How do I build this?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Should this even use AI?&lt;/li&gt;
&lt;li&gt;Which parts should stay deterministic?&lt;/li&gt;
&lt;li&gt;Where does a human need to stay in the loop?&lt;/li&gt;
&lt;li&gt;Is the reasoning worth the latency and the cost?&lt;/li&gt;
&lt;li&gt;What happens when the model drifts?&lt;/li&gt;
&lt;li&gt;Can this scale economically under real production traffic?&lt;/li&gt;
&lt;li&gt;Which model is &lt;em&gt;good enough&lt;/em&gt; — not just the most powerful?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a completely different engineering mindset.&lt;/p&gt;

&lt;p&gt;You stop thinking purely like a software engineer.&lt;/p&gt;

&lt;p&gt;You start thinking like a systems designer, a data scientist, an evaluator, a cost optimizer — and sometimes a behavioral analyst for systems that don't behave the same way twice.&lt;/p&gt;

&lt;p&gt;The biggest misconception I see is engineers treating AI as just another API integration problem.&lt;/p&gt;

&lt;p&gt;It isn't.&lt;/p&gt;

&lt;p&gt;When your system can return a different output for the exact same input, everything downstream changes — how you test, how you monitor, how you measure quality, how you define "done."&lt;/p&gt;

&lt;p&gt;That changes everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  My "AI Playground" Changed How I Think About Engineering
&lt;/h2&gt;

&lt;p&gt;One thing that completely changed my perspective was building a separate repository I call "AI Playground."&lt;/p&gt;

&lt;p&gt;It's not product code.&lt;/p&gt;

&lt;p&gt;It's a lab.&lt;/p&gt;

&lt;p&gt;A place where I experiment in Jupyter notebooks long before production ever sees an idea.&lt;/p&gt;

&lt;p&gt;That lab contains experiments around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scraping pipelines&lt;/li&gt;
&lt;li&gt;ingestion systems&lt;/li&gt;
&lt;li&gt;chunking strategies&lt;/li&gt;
&lt;li&gt;chunk enrichment before embeddings&lt;/li&gt;
&lt;li&gt;retrieval evaluation&lt;/li&gt;
&lt;li&gt;prompt engineering&lt;/li&gt;
&lt;li&gt;context engineering&lt;/li&gt;
&lt;li&gt;semantic search&lt;/li&gt;
&lt;li&gt;BM25&lt;/li&gt;
&lt;li&gt;reciprocal rank fusion (RRF)&lt;/li&gt;
&lt;li&gt;hybrid retrieval systems&lt;/li&gt;
&lt;li&gt;embedding evaluations&lt;/li&gt;
&lt;li&gt;latency vs quality tradeoffs&lt;/li&gt;
&lt;li&gt;model routing&lt;/li&gt;
&lt;li&gt;hallucination reduction&lt;/li&gt;
&lt;li&gt;agent orchestration&lt;/li&gt;
&lt;li&gt;evaluation pipelines&lt;/li&gt;
&lt;li&gt;open-source Hugging Face models vs frontier APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because in real AI systems, almost nothing should be assumed.&lt;/p&gt;

&lt;p&gt;You test everything.&lt;/p&gt;

&lt;p&gt;A retrieval strategy that works perfectly for legal documents may completely fail for conversational memory.&lt;/p&gt;

&lt;p&gt;A frontier model may outperform smaller models on reasoning tasks but become economically impossible at scale.&lt;/p&gt;

&lt;p&gt;An open-source model may outperform expensive APIs for classification, routing, or embedding generation.&lt;/p&gt;

&lt;p&gt;A tiny latency increase may look harmless in development but become catastrophic when multiplied across millions of agent calls in production.&lt;/p&gt;

&lt;p&gt;This is why AI Engineering feels much closer to running a continuous lab than building traditional CRUD systems.&lt;/p&gt;

&lt;p&gt;The real engineering challenge starts after the prototype impresses everyone.&lt;/p&gt;




&lt;h2&gt;
  
  
  RAG Is Not the Finish Line
&lt;/h2&gt;

&lt;p&gt;One of the biggest misconceptions right now is treating RAG like the final form of AI Engineering.&lt;/p&gt;

&lt;p&gt;RAG is important.&lt;br&gt;
Vector databases are important.&lt;/p&gt;

&lt;p&gt;But they are not enough.&lt;/p&gt;

&lt;p&gt;Many engineers today are sprinkling AI buzzwords onto existing software engineering workflows and assuming that's the transformation.&lt;/p&gt;

&lt;p&gt;That's like wearing a tuxedo with the wrong shoes.&lt;/p&gt;

&lt;p&gt;You look the part. Until you don't.&lt;/p&gt;

&lt;p&gt;The deeper you go into production AI systems, the more problems you start fighting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval inconsistency&lt;/li&gt;
&lt;li&gt;context pollution&lt;/li&gt;
&lt;li&gt;hallucinations&lt;/li&gt;
&lt;li&gt;stale embeddings&lt;/li&gt;
&lt;li&gt;ranking quality&lt;/li&gt;
&lt;li&gt;orchestration complexity&lt;/li&gt;
&lt;li&gt;token cost explosions&lt;/li&gt;
&lt;li&gt;latency bottlenecks&lt;/li&gt;
&lt;li&gt;evaluation drift&lt;/li&gt;
&lt;li&gt;unreliable tool usage&lt;/li&gt;
&lt;li&gt;memory corruption&lt;/li&gt;
&lt;li&gt;unpredictable agent behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "easy chatbot demo" phase ends quickly.&lt;/p&gt;

&lt;p&gt;After that, you realize building reliable AI systems is less about generating responses and more about controlling behavior.&lt;/p&gt;

&lt;p&gt;That's a very different engineering problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Evaluation Never Ends
&lt;/h2&gt;

&lt;p&gt;Traditional software engineering gave most of us a clear testing contract:&lt;/p&gt;

&lt;p&gt;unit tests → integration tests → end-to-end tests → ship.&lt;/p&gt;

&lt;p&gt;AI systems break that contract.&lt;/p&gt;

&lt;p&gt;I ran 200 test cases against Vera's retrieval pipeline before beta.&lt;/p&gt;

&lt;p&gt;Completeness score: 2.1 out of 5.&lt;/p&gt;

&lt;p&gt;After switching chunking strategy, adjusting overlap, and adding cross-encoder reranking with MMR retrieval — completeness hit 4.0. MRR went from below 0.7 to 0.95.&lt;/p&gt;

&lt;p&gt;The unit tests were green the entire time.&lt;/p&gt;

&lt;p&gt;That's the terrifying part.&lt;/p&gt;

&lt;p&gt;Your dashboards can be green while your users are receiving degraded outputs. No error thrown. No alert fired. Just silent quality erosion.&lt;/p&gt;

&lt;p&gt;So you evaluate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompts&lt;/li&gt;
&lt;li&gt;retrieval quality&lt;/li&gt;
&lt;li&gt;reasoning consistency&lt;/li&gt;
&lt;li&gt;hallucination rates&lt;/li&gt;
&lt;li&gt;ranking strategies&lt;/li&gt;
&lt;li&gt;context windows&lt;/li&gt;
&lt;li&gt;tool selection&lt;/li&gt;
&lt;li&gt;model performance&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;token efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you deploy.&lt;/p&gt;

&lt;p&gt;And then you evaluate again — because production behavior changes over time.&lt;/p&gt;

&lt;p&gt;Models drift. Contexts drift. User behavior changes. Prompts degrade.&lt;/p&gt;

&lt;p&gt;A system that performed well two weeks ago can silently regress without throwing a single technical error.&lt;/p&gt;

&lt;p&gt;Evaluation isn't a phase. It's a permanent operating mode.&lt;/p&gt;




&lt;h2&gt;
  
  
  AI Observability Is a Completely Different Beast
&lt;/h2&gt;

&lt;p&gt;Traditional observability: logs, traces, infrastructure metrics, uptime, exceptions.&lt;/p&gt;

&lt;p&gt;AI observability is harder.&lt;/p&gt;

&lt;p&gt;Now you're asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why did the agent choose this tool?&lt;/li&gt;
&lt;li&gt;Why did reasoning fail at this step?&lt;/li&gt;
&lt;li&gt;Which prompt caused the regression?&lt;/li&gt;
&lt;li&gt;Which workflow is burning the most tokens?&lt;/li&gt;
&lt;li&gt;Where does hallucination frequency spike?&lt;/li&gt;
&lt;li&gt;Which retrieval strategy is silently degrading quality?&lt;/li&gt;
&lt;li&gt;Which agents are becoming unreliable without anyone noticing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You're no longer just monitoring systems.&lt;/p&gt;

&lt;p&gt;You're monitoring behavior.&lt;/p&gt;

&lt;p&gt;Sometimes it feels like managing a team of extremely intelligent interns who occasionally hallucinate with full confidence.&lt;/p&gt;

&lt;p&gt;Your agents are employees on permanent probation.&lt;/p&gt;

&lt;p&gt;You don't fire-and-forget. You watch. You trace every decision. You hold every node accountable.&lt;/p&gt;

&lt;p&gt;And one bad system prompt can quietly turn your green metrics red overnight.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hidden Cost Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Many teams underestimate compounding AI cost at scale.&lt;/p&gt;

&lt;p&gt;A tiny latency increase multiplied across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-agent systems&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;retrieval layers&lt;/li&gt;
&lt;li&gt;orchestration chains&lt;/li&gt;
&lt;li&gt;evaluation pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…can quietly destroy both performance and unit economics.&lt;/p&gt;

&lt;p&gt;This is why experienced AI Engineers obsess over:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;caching&lt;/li&gt;
&lt;li&gt;hybrid architectures&lt;/li&gt;
&lt;li&gt;inference optimization&lt;/li&gt;
&lt;li&gt;selective reasoning&lt;/li&gt;
&lt;li&gt;retrieval precision&lt;/li&gt;
&lt;li&gt;token efficiency&lt;/li&gt;
&lt;li&gt;model specialization&lt;/li&gt;
&lt;li&gt;latency-aware workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sometimes the smartest engineering decision is not using a larger model.&lt;/p&gt;

&lt;p&gt;Sometimes the smartest decision is not using AI at all.&lt;/p&gt;

&lt;p&gt;Calling an LLM to multiply two numbers or transform simple structured data isn't innovation.&lt;/p&gt;

&lt;p&gt;It's misuse.&lt;/p&gt;

&lt;p&gt;A lot of production AI engineering is really about knowing where &lt;em&gt;not&lt;/em&gt; to use AI.&lt;/p&gt;

&lt;p&gt;That's the part most people skip entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  AI Engineering Is Becoming Its Own Discipline
&lt;/h2&gt;

&lt;p&gt;The industry is going through what software engineering itself went through years ago:&lt;/p&gt;

&lt;p&gt;title inflation mixed with genuine transformation.&lt;/p&gt;

&lt;p&gt;And yes — anyone can become an AI Engineer.&lt;/p&gt;

&lt;p&gt;But eventually, the gap becomes visible. Between people who can integrate APIs and people who can design, evaluate, optimize, monitor, and evolve intelligent systems reliably in production.&lt;/p&gt;

&lt;p&gt;The AI Engineer of the next few years won't look like a traditional application developer.&lt;/p&gt;

&lt;p&gt;They'll look like an orchestrator, evaluator, systems thinker, experimentation lead, cost optimizer, and behavioral architect for autonomous systems.&lt;/p&gt;

&lt;p&gt;For years, my job as a software engineer was mostly about finding bugs and fixing them.&lt;/p&gt;

&lt;p&gt;Now I spend my time supervising semi-autonomous agents, evaluating reasoning behavior, optimizing workflows, controlling cost, designing cognitive systems, monitoring drift, and running lab experiments to make AI systems more reliable before they ever touch a user.&lt;/p&gt;

&lt;p&gt;The job description changed completely.&lt;/p&gt;

&lt;p&gt;Most people interviewing for the role haven't read it yet.&lt;/p&gt;

&lt;p&gt;That's not a criticism. It's an opening.&lt;/p&gt;

&lt;p&gt;The engineers who close that gap — who do the lab work, build the eval pipelines, instrument the observability, and develop the instinct for when AI is the wrong answer — those are the ones who will define what this role actually means.&lt;/p&gt;

&lt;p&gt;Part engineer. Part scientist. Part strategist. Part guardian.&lt;/p&gt;

&lt;p&gt;That's the AI Engineer.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>agents</category>
      <category>openai</category>
    </item>
  </channel>
</rss>
