<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Raj</title>
    <description>The latest articles on DEV Community by Raj (@raj_07).</description>
    <link>https://dev.to/raj_07</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3930667%2Fd4e20de3-dccf-4228-a54c-494bb8d10a0c.jpg</url>
      <title>DEV Community: Raj</title>
      <link>https://dev.to/raj_07</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raj_07"/>
    <language>en</language>
    <item>
      <title>Prototype to Production: What Nobody Tells You About Shipping AI in the Real World</title>
      <dc:creator>Raj</dc:creator>
      <pubDate>Thu, 14 May 2026 08:41:47 +0000</pubDate>
      <link>https://dev.to/raj_07/prototype-to-production-what-nobody-tells-you-about-shipping-ai-in-the-real-world-3ji5</link>
      <guid>https://dev.to/raj_07/prototype-to-production-what-nobody-tells-you-about-shipping-ai-in-the-real-world-3ji5</guid>
      <description>&lt;p&gt;You've built the demo. It runs clean, the stakeholders are impressed, and someone in the room says "let's ship this."&lt;/p&gt;

&lt;p&gt;Then reality hits.&lt;/p&gt;

&lt;p&gt;The model starts hallucinating on edge cases. Token costs spiral. Your clean prototype data doesn't look anything like what real users throw at it. The agentic workflow that looked elegant in the notebook turns into an infinite loop in staging.&lt;/p&gt;

&lt;p&gt;This is the gap almost no one talks about: the massive, messy distance between a working prototype and a production-grade AI application.&lt;/p&gt;

&lt;p&gt;I had a deep conversation with Manav Goyal, Principal Technical Consultant at Geekians, about exactly this , what breaks, what to build differently, and how to think about AI systems that actually hold up under real-world pressure. (You can watch the full discussion (&lt;a href="https://www.youtube.com/watch?v=PrIK6Z6TA_I" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=PrIK6Z6TA_I&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Here's what I took away.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Prototype vs. Production Mindset Shift
&lt;/h2&gt;

&lt;p&gt;The fundamentals are genuinely different between the two phases , and confusing them is where most teams go wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prototype fundamentals:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Speed of development&lt;/li&gt;
&lt;li&gt;Proof of concept&lt;/li&gt;
&lt;li&gt;Impressing stakeholders or investors&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Production fundamentals:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Security and compliance (GDPR, OWASP LLM Top         10, HIPAA if you're in healthcare)&lt;/li&gt;
&lt;li&gt;Reliability at scale , thousands of concurrent users, not just ten&lt;/li&gt;
&lt;li&gt;Data quality and diversity, not just clean sample data&lt;/li&gt;
&lt;li&gt;LLM Ops: monitoring token consumption, costs, latency&lt;/li&gt;
&lt;li&gt;User trust , will people come back tomorrow?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The failure mode Manav describes is teams treating a prototype win as a production green light. It isn't. A prototype proves the idea works once, under ideal conditions. Production means it works under pressure, at scale, across edge cases you didn't anticipate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Just Plug in an LLM" Doesn't Work
&lt;/h2&gt;

&lt;p&gt;There's a persistent myth that AI is plug-and-play , drop in a model, hand it a long system prompt, and watch it build your application.&lt;/p&gt;

&lt;p&gt;That's not how production systems work.&lt;/p&gt;

&lt;p&gt;Real agentic workflows involve:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Multiple agents with dedicated responsibilities, not one monolith&lt;/li&gt;
&lt;li&gt;Evaluation checkpoints between stages so failures don't cascade&lt;/li&gt;
&lt;li&gt;Token budget management (a single drift analysis can hit 1 million tokens , fast)&lt;/li&gt;
&lt;li&gt;Proper traces and logs for every internal and external agent call&lt;/li&gt;
&lt;li&gt;Handling ambiguity gracefully rather than silently failing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A useful mental model here: instead of one giant prompt, think &lt;strong&gt;decomposed tasks with dedicated agents.&lt;/strong&gt; Each agent owns a clearly defined scope. Each handoff gets validated. You don't just hope the context flows through cleanly , you verify it.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Across the Entire SDLC
&lt;/h2&gt;

&lt;p&gt;One of the more interesting shifts happening right now is AI entering every phase of the software development lifecycle, not just the coding phase.&lt;/p&gt;

&lt;p&gt;Here's what that looks like in practice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ideation &amp;amp; Research:&lt;/strong&gt; AI-driven market analysis and competitor research, replacing days of manual work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planning &amp;amp; Estimation:&lt;/strong&gt; LLM-assisted feature decomposition with effort and cost estimates, grounded in prior project data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design &amp;amp; Architecture:&lt;/strong&gt; Spec-driven development , architecture diagrams and TRDs as AI inputs, not afterthoughts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development&lt;/strong&gt;:Agents writing code against well-defined specs, with human review checkpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QA&lt;/strong&gt;: Automated evaluation against expected outputs, hallucination checks baked into the definition of done&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment&lt;/strong&gt;: Infrastructure-as-code managed by dedicated deployment agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance&lt;/strong&gt;: Continuous monitoring and drift analysis&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key word across all of these is decomposition. The more precisely you define the task, the better the output. Vague prompts produce vague results. Spec-driven, context-rich inputs produce outputs you can actually ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenges Nobody Budgets For
&lt;/h2&gt;

&lt;p&gt;When you're moving from prototype to production, expect to spend real time on things that weren't in the original estimate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Quality&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Prototypes run on clean, curated data. Production doesn't. Real users submit messy inputs, edge cases, and data that breaks your pipeline assumptions. You need to think hard about:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data ingestion frequency and rate limits from third-party APIs&lt;/li&gt;
&lt;li&gt;Cleansing and normalization before any processing&lt;/li&gt;
&lt;li&gt;How diversity in input data affects your model's behavior&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OWASP has a Top 10 for LLM applications now. Prompt injection, data leakage, insecure outputs , these aren't theoretical. If you're in fintech or healthcare, compliance isn't optional; it's table stakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User Trust&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one's easy to underestimate. A real example: a dental application that transcribes doctor-patient conversations to generate treatment plans. Impressive prototype. But in production, the audio inputs are unpredictable. If the transcription misses a specific crown type or an anesthesia dosage, the application becomes a liability, not an asset.&lt;br&gt;
The fix was an agent that detects ambiguity in the transcript and asks clarifying questions before finalizing the plan. That gap , from "it usually works" to "it handles what it doesn't know" , is what production-grade means.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: The Bridge Between Developers and the CXO
&lt;/h2&gt;

&lt;p&gt;One of the most practical pieces of advice from this conversation: &lt;strong&gt;shared observability is how you align technical and business stakeholders.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Developers care about feasibility and performance. Executives care about ROI and business impact. These aren't incompatible , but you need a shared language to connect them.&lt;/p&gt;

&lt;p&gt;That shared language is metrics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Token consumption per task , directly translates to cost&lt;/li&gt;
&lt;li&gt;Agent trace logs , what reasoning path did the agent take, and why?&lt;/li&gt;
&lt;li&gt;Evaluation scores , how close is the output to the intended design?&lt;/li&gt;
&lt;li&gt;Traditional infra metrics , CPU, DB storage, latency spikes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When both sides can look at the same dashboard and answer "is this worth what we're spending?", the conversation changes from "trust us" to "here's the data."&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer Roles Aren't Disappearing , They're Evolving
&lt;/h2&gt;

&lt;p&gt;The layoff news is real, and the fear is understandable. But the more accurate framing is that the shape of the developer role is changing, not disappearing.&lt;/p&gt;

&lt;p&gt;Developers who thrive in this environment will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Think like &lt;strong&gt;strategic orchestrators&lt;/strong&gt;, not just coders&lt;/li&gt;
&lt;li&gt;Practice *&lt;em&gt;evaluation-driven development *&lt;/em&gt;, hallucination checks, ethical inference, and continuous eval inside the definition of done&lt;/li&gt;
&lt;li&gt;Use AI tools (Cursor, Claude Code, etc.) with precision , spec-driven inputs, not blind generation&lt;/li&gt;
&lt;li&gt;Keep &lt;strong&gt;humans in the loop&lt;/strong&gt; at the right checkpoints, not treat AI as fully autonomous&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Blind autopilot coding is a liability. Thoughtful, spec-driven, human-reviewed AI-assisted development is a force multiplier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;If you're working on an AI product right now, here's the practical short list:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prototype proves the idea. Production proves the system. Don't confuse the two.&lt;/li&gt;
&lt;li&gt;Decompose your agentic workflows. One prompt to rule them all is a recipe for infinite loops and runaway costs.&lt;/li&gt;
&lt;li&gt;Define your evaluation criteria before you build, not after.&lt;/li&gt;
&lt;li&gt;Data quality is a production problem, not a data team problem. Budget for it.&lt;/li&gt;
&lt;li&gt;Token costs are a business metric. Track them like you track infrastructure spend.&lt;/li&gt;
&lt;li&gt;User trust is built through reliability and transparency, not just impressive demos.&lt;/li&gt;
&lt;li&gt;Document your specs before you code. Architecture diagrams and TRDs aren't overhead , they're your most reliable AI input.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The teams shipping AI that lasts aren't moving the fastest. They're moving with the most precision.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
