<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Adaka Ankita</title>
    <description>The latest articles on DEV Community by Adaka Ankita (@adaka_ankita_feab18f8583a).</description>
    <link>https://dev.to/adaka_ankita_feab18f8583a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3780375%2F6e8355fb-53bb-4e36-996e-34c157e4bb46.jpg</url>
      <title>DEV Community: Adaka Ankita</title>
      <link>https://dev.to/adaka_ankita_feab18f8583a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/adaka_ankita_feab18f8583a"/>
    <language>en</language>
    <item>
      <title>Why a 200 OK Isn’t Success in LLM Inference</title>
      <dc:creator>Adaka Ankita</dc:creator>
      <pubDate>Mon, 23 Feb 2026 11:14:50 +0000</pubDate>
      <link>https://dev.to/adaka_ankita_feab18f8583a/why-a-200-ok-isnt-success-in-llm-inference-9pi</link>
      <guid>https://dev.to/adaka_ankita_feab18f8583a/why-a-200-ok-isnt-success-in-llm-inference-9pi</guid>
      <description>&lt;h2&gt;
  
  
  Lessons from My First AI API Call
&lt;/h2&gt;

&lt;p&gt;The first time I received a clean response from an LLM API, I felt productive.&lt;/p&gt;

&lt;p&gt;The model returned something intelligent.&lt;br&gt;&lt;br&gt;
No errors, HTTP 200.&lt;/p&gt;

&lt;p&gt;I thought I had built something meaningful.&lt;/p&gt;

&lt;p&gt;Looking back, I hadn’t.&lt;/p&gt;

&lt;p&gt;I had only confirmed two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;My environment variables were configured correctly
&lt;/li&gt;
&lt;li&gt;The API endpoint was reachable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Backend Assumption I Carried With Me
&lt;/h3&gt;

&lt;p&gt;Coming from backend development, I’m used to APIs behaving predictably:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same input → same output
&lt;/li&gt;
&lt;li&gt;HTTP 200 → success
&lt;/li&gt;
&lt;li&gt;Failures → loud and obvious
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM inference doesn’t follow those rules.&lt;/p&gt;

&lt;p&gt;A 200 OK from an AI API only means the request was processed.&lt;/p&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; guarantee:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model completed its response
&lt;/li&gt;
&lt;li&gt;The output wasn’t truncated
&lt;/li&gt;
&lt;li&gt;The structure is valid
&lt;/li&gt;
&lt;li&gt;The cost was reasonable
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That difference matters more than I expected.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Mental Shift
&lt;/h3&gt;

&lt;p&gt;At some point, I stopped asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What did the model say?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And started asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Did it finish and what did that cost?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That small shift changed how I read every response.&lt;/p&gt;

&lt;p&gt;An LLM call isn’t a deterministic function.&lt;/p&gt;

&lt;p&gt;It’s a probabilistic system that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bills per token
&lt;/li&gt;
&lt;li&gt;Can stop mid-sentence
&lt;/li&gt;
&lt;li&gt;May return structurally invalid data
&lt;/li&gt;
&lt;li&gt;Doesn’t throw exceptions when logic breaks
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I accepted that, I stopped treating responses as answers and started treating them as signals that need validation.&lt;/p&gt;




&lt;h3&gt;
  
  
  Traditional API vs LLM Call
&lt;/h3&gt;

&lt;p&gt;Here’s how I now see the difference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traditional API&lt;/th&gt;
&lt;th&gt;LLM Inference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HTTP 200 = success&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;finish_reason&lt;/code&gt; matters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fixed / predictable cost&lt;/td&gt;
&lt;td&gt;Variable token cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strict JSON contract&lt;/td&gt;
&lt;td&gt;Probability-based text output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clear failure modes&lt;/td&gt;
&lt;td&gt;Silent truncation or hallucination&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The cost model alone changes how you architect features.&lt;/p&gt;

&lt;p&gt;With traditional APIs, cost is predictable.&lt;/p&gt;

&lt;p&gt;With LLMs, cost grows with tokens and tokens grow fast.&lt;/p&gt;




&lt;h3&gt;
  
  
  What I Now Check First
&lt;/h3&gt;

&lt;p&gt;Before reading response content, I now think in three checks:&lt;/p&gt;

&lt;h3&gt;
  
  
  1️⃣ Usage
&lt;/h3&gt;

&lt;p&gt;How many tokens did this call consume?&lt;/p&gt;

&lt;h3&gt;
  
  
  2️⃣ Finish State
&lt;/h3&gt;

&lt;p&gt;Did the model complete its response (&lt;code&gt;finish_reason == "stop"&lt;/code&gt;)?&lt;/p&gt;

&lt;h3&gt;
  
  
  3️⃣ Contract
&lt;/h3&gt;

&lt;p&gt;Does the output match what my system expects?&lt;/p&gt;

&lt;p&gt;Without these checks, I was essentially trusting output blindly.&lt;/p&gt;

&lt;p&gt;And that’s not engineering that’s optimism.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Lesson
&lt;/h3&gt;

&lt;p&gt;A 200 OK tells you the request succeeded.&lt;/p&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; tell you the inference succeeded.&lt;/p&gt;

&lt;p&gt;That was the mindset shift I needed before building real AI features.&lt;/p&gt;

&lt;p&gt;In the next post, I’ll walk through what this looks like in actual implementation and the subtle bug that made this lesson very real for me.&lt;/p&gt;




&lt;h3&gt;
  
  
  If You're Transitioning from Backend to AI
&lt;/h3&gt;

&lt;p&gt;If you're coming from traditional backend systems, you might run into the same assumption I did.&lt;/p&gt;

&lt;p&gt;LLM integration looks simple at first.&lt;/p&gt;

&lt;p&gt;But production behavior requires a slightly different mental model.&lt;/p&gt;

&lt;p&gt;I’m documenting my learning journey as I explore this shift from backend systems to AI-powered systems.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llmapi</category>
      <category>programming</category>
      <category>learning</category>
    </item>
    <item>
      <title>Prompting Is Not Engineering: Building Reliable LLM Production Systems with Control Layers</title>
      <dc:creator>Adaka Ankita</dc:creator>
      <pubDate>Thu, 19 Feb 2026 03:51:45 +0000</pubDate>
      <link>https://dev.to/adaka_ankita_feab18f8583a/prompting-is-not-engineering-building-reliable-llm-production-systems-with-control-layers-37lb</link>
      <guid>https://dev.to/adaka_ankita_feab18f8583a/prompting-is-not-engineering-building-reliable-llm-production-systems-with-control-layers-37lb</guid>
      <description>&lt;p&gt;When AI outputs become unstable, most teams try to fix the prompt.&lt;/p&gt;

&lt;p&gt;They add more instructions.&lt;br&gt;&lt;br&gt;
More examples.&lt;br&gt;&lt;br&gt;
More rules.&lt;/p&gt;

&lt;p&gt;Sometimes it works.&lt;/p&gt;

&lt;p&gt;But after some time, the model becomes inconsistent again.&lt;/p&gt;

&lt;p&gt;While learning about production AI systems, I started realizing something:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts guide the model.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Systems control the outcome.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI reliability is not just about writing better prompts.&lt;br&gt;&lt;br&gt;
It depends on how the entire system is designed around the model.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Four Layers That Make LLM Systems Reliable
&lt;/h2&gt;

&lt;p&gt;In production, stable AI systems usually rely on four control layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Behavioral Constraints&lt;/li&gt;
&lt;li&gt;Structural Contracts&lt;/li&gt;
&lt;li&gt;Controlled Randomness&lt;/li&gt;
&lt;li&gt;Validation Loops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not prompt tricks.&lt;/p&gt;

&lt;p&gt;They are system-level safeguards around a probabilistic model.&lt;/p&gt;

&lt;p&gt;Let's break them down.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. Behavioral Constraints
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Limit What the Model Is Allowed to Do
&lt;/h3&gt;

&lt;p&gt;The more open your instruction, the more unpredictable the output.&lt;/p&gt;

&lt;p&gt;Instead of saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Generate a customer response.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A production system might define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do not invent facts&lt;/li&gt;
&lt;li&gt;Do not offer discounts&lt;/li&gt;
&lt;li&gt;Do not speculate&lt;/li&gt;
&lt;li&gt;Keep the response under 120 words&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clear boundaries reduce hallucinations.&lt;/p&gt;

&lt;p&gt;Without constraints, you're relying purely on probability.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. Structural Contracts
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Make Output Safe for Your Backend
&lt;/h3&gt;

&lt;p&gt;LLMs generate text.&lt;br&gt;&lt;br&gt;
Your systems expect structure.&lt;/p&gt;

&lt;p&gt;If your application depends on model output, enforce a schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"approve | reject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"float"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the response doesn't match this format, reject it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No valid structure → no state change.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you allow unvalidated output to update your database, you're letting randomness modify your system.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Controlled Randomness
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Adjust Randomness Based on Task Risk
&lt;/h3&gt;

&lt;p&gt;LLMs don't always generate the same output.&lt;br&gt;&lt;br&gt;
That's how they work.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;temperature setting&lt;/strong&gt; controls how random the response is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low temperature:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More predictable&lt;/li&gt;
&lt;li&gt;Less variation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;High temperature:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More creative&lt;/li&gt;
&lt;li&gt;More variation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not every task should use the same level of randomness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Brainstorming ideas → higher randomness&lt;/li&gt;
&lt;li&gt;Fraud detection → low randomness&lt;/li&gt;
&lt;li&gt;Invoice parsing → low randomness&lt;/li&gt;
&lt;li&gt;Code generation → low randomness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using high randomness for high-risk tasks increases errors, retries, and cost.&lt;/p&gt;

&lt;p&gt;Randomness is not just about creativity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It affects reliability.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Validation Loops
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Never Trust a Single Response
&lt;/h3&gt;

&lt;p&gt;In demos, we generate once and accept the result.&lt;/p&gt;

&lt;p&gt;In production, systems usually work in stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate&lt;/li&gt;
&lt;li&gt;Validate&lt;/li&gt;
&lt;li&gt;Fix if needed&lt;/li&gt;
&lt;li&gt;Then commit&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Validation may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Required field checks&lt;/li&gt;
&lt;li&gt;Schema validation&lt;/li&gt;
&lt;li&gt;Number consistency checks&lt;/li&gt;
&lt;li&gt;Regeneration if rules fail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One-shot prompting works for demos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production systems need feedback loops.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Systems Perspective
&lt;/h2&gt;

&lt;p&gt;When AI fails in production, the root cause is rarely the model.&lt;/p&gt;

&lt;p&gt;But many failures come from missing system controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No schema validation&lt;/li&gt;
&lt;li&gt;No retry monitoring&lt;/li&gt;
&lt;li&gt;No randomness control&lt;/li&gt;
&lt;li&gt;No boundary checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prompts shape language.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Systems create reliability.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Is Heading
&lt;/h2&gt;

&lt;p&gt;Models are improving quickly.&lt;/p&gt;

&lt;p&gt;But randomness doesn't disappear.&lt;/p&gt;

&lt;p&gt;The real advantage may shift toward teams that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track retry rates&lt;/li&gt;
&lt;li&gt;Monitor cost per request&lt;/li&gt;
&lt;li&gt;Enforce structured outputs&lt;/li&gt;
&lt;li&gt;Measure first-pass success&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Access to powerful models is becoming easier.&lt;/p&gt;

&lt;p&gt;Designing safe and reliable systems around them is harder.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Simple Question
&lt;/h2&gt;

&lt;p&gt;Are we only improving prompts?&lt;/p&gt;

&lt;p&gt;Or are we designing systems that can safely handle probability?&lt;/p&gt;

&lt;p&gt;That difference may define the next stage of AI engineering.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What control layers are you using in your AI systems? Share your thoughts in the comments below!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
