<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DANISH ZULFIQAR </title>
    <description>The latest articles on DEV Community by DANISH ZULFIQAR  (@danish08654).</description>
    <link>https://dev.to/danish08654</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3982268%2Fe467060e-30ef-4dac-862e-2170b6eb8dfd.jpeg</url>
      <title>DEV Community: DANISH ZULFIQAR </title>
      <link>https://dev.to/danish08654</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/danish08654"/>
    <language>en</language>
    <item>
      <title>I Built 48 Production AI Systems in 60 Days — Here Is What Nobody Tells You About Real AI Engineering</title>
      <dc:creator>DANISH ZULFIQAR </dc:creator>
      <pubDate>Sat, 13 Jun 2026 06:59:41 +0000</pubDate>
      <link>https://dev.to/danish08654/i-built-48-production-ai-systems-in-60-days-here-is-what-nobody-tells-you-about-real-ai-1461</link>
      <guid>https://dev.to/danish08654/i-built-48-production-ai-systems-in-60-days-here-is-what-nobody-tells-you-about-real-ai-1461</guid>
      <description>&lt;h2&gt;
  
  
  I Built 48 Production AI Systems
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Here Is What Nobody Tells You About Real AI Engineering
&lt;/h2&gt;




&lt;p&gt;I did not study AI engineering. I built it.&lt;/p&gt;

&lt;p&gt;For 60 days I woke up at 6 AM, opened VS Code, and shipped one production AI system every day. Not notebooks. Not tutorials. Not demos. Systems — with a live REST API, an interactive dashboard, a trained model, and a GitHub repo with a README that explains the business problem it solves.&lt;/p&gt;

&lt;p&gt;48 systems later, I want to tell you what courses do not cover.&lt;/p&gt;

&lt;p&gt;Not the architecture patterns. Not the frameworks. The real stuff. The 3 AM stuff. The "why is this working on Colab but crashing on my laptop" stuff.&lt;/p&gt;

&lt;p&gt;This is that article.&lt;/p&gt;




&lt;h2&gt;
  
  
  First — What I Actually Built
&lt;/h2&gt;

&lt;p&gt;Before I get to the lessons, here is the scope so you understand why these lessons matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 — Production ML (Days 1-7)&lt;/strong&gt;&lt;br&gt;
Credit scoring for gig workers. B2B intent detection. Dynamic pricing. Carbon estimation. Clinical trial matching. Supplier risk intelligence. Economic forecasting. Every one deployed as a FastAPI endpoint with a Streamlit dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 — Deep Learning and Computer Vision (Days 8–14)&lt;/strong&gt;&lt;br&gt;
Deepfake detector. Satellite change detector. Document OCR. Plant disease detection. Fitness pose coach. Real models, real inference, real errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3 — LLMs and Agents (Days 15–21)&lt;/strong&gt;&lt;br&gt;
LangGraph multi-agent research pipeline. MCP business agent. Text-to-image generator. Vertical RAG for construction. Voice agent. Every one using free APIs — Groq, Tavily, gTTS, Whisper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 4 — MLOps (Days 22–30)&lt;/strong&gt;&lt;br&gt;
End-to-end MLOps pipeline with MLflow, Evidently AI, auto-retraining, Grafana monitoring, and Docker deployment.&lt;/p&gt;

&lt;p&gt;That is what I shipped. Now here is what it cost me.&lt;/p&gt;


&lt;h2&gt;
  
  
  The 5 Bugs That Taught Me More Than Any Course
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Bug 1 — OpenCV and Non-Contiguous Arrays
&lt;/h3&gt;

&lt;p&gt;On Day 8 I was building a deepfake detector. XceptionNet was working. The preprocessing pipeline was clean. Then I hit this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: OpenCV(4.13.0) :-1: error: (-5:Bad argument)
in function 'ellipse'
&amp;gt; Layout of the output array img is incompatible with cv::Mat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I stared at this for four hours.&lt;/p&gt;

&lt;p&gt;The problem was not my code. It was memory layout. When you do &lt;code&gt;np.where&lt;/code&gt;, &lt;code&gt;np.clip&lt;/code&gt;, or pass an array through PIL and back to numpy, the resulting array is sometimes stored non-contiguously in memory — rows scattered across RAM instead of packed together. OpenCV's C++ backend cannot read non-contiguous memory and throws this exact error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix is one line:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ascontiguousarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call this before every single OpenCV operation. Not just the ones that fail. Every one. Because the failure is not deterministic — it depends on which numpy operation preceded the cv2 call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; The gap between a working notebook and a working system is often not logic. It is memory, types, and environment — things that tutorials never mention because they never hit production.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bug 2 — XGBoost 3.x Broke SHAP
&lt;/h3&gt;

&lt;p&gt;On Day 1 I was building a credit scoring system. I had trained a LightGBM model with SHAP explainability — regulatory compliance, every decision explained. It worked perfectly on Google Colab.&lt;/p&gt;

&lt;p&gt;I moved to VS Code. Everything crashed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ValueError: &amp;lt;class 'numpy.random._mt19937.MT19937'&amp;gt;
is not a known BitGenerator module.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The root cause was a numpy version mismatch — Colab was using a newer numpy than my local environment. But the deeper problem was that XGBoost 3.x and the SHAP library had an internal incompatibility nobody documented clearly.&lt;/p&gt;

&lt;p&gt;The solution I found was to stop using SHAP entirely and use XGBoost's native contributions instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Instead of this (breaks on XGBoost 3.x)
&lt;/span&gt;&lt;span class="n"&gt;explainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TreeExplainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;shap_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;explainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shap_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Use this (works on all XGBoost versions)
&lt;/span&gt;&lt;span class="n"&gt;contributions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DMatrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;pred_contribs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The math is identical. The result is identical. The dependency conflict disappears.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; Version pinning is not optional in production ML. The first thing every new project needs is a locked requirements file. Ship the environment, not just the code.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bug 3 — LangGraph on Windows Kills Async
&lt;/h3&gt;

&lt;p&gt;On Day 21 I was building an MCP business agent — LangGraph orchestrating 8 MCP tools for invoice processing, AP approval, and Slack notifications. The API was running. The workflow triggered. Then silence.&lt;/p&gt;

&lt;p&gt;No error. No output. Just a FastAPI background thread that started and disappeared.&lt;/p&gt;

&lt;p&gt;The problem was Windows. Python's &lt;code&gt;asyncio.run()&lt;/code&gt; creates a new event loop each time it is called. On Windows, FastAPI background threads already have an event loop running — and &lt;code&gt;asyncio.run()&lt;/code&gt; conflicts with it. On Linux this never happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# At the top of main.py — Windows only
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;platform&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;win32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_event_loop_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;WindowsProactorEventLoopPolicy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# In background thread functions
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_workflow_background&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;platform&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;win32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ProactorEventLoop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_event_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_until_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; Cross-platform is a real constraint, not a theoretical one. If you build on Windows and deploy on Linux — or the reverse — test the async behavior explicitly. It will not tell you it is broken. It will just silently do nothing.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bug 4 — timm Renamed Xception Without Warning
&lt;/h3&gt;

&lt;p&gt;On Day 8 my deepfake detector used XceptionNet from the &lt;code&gt;timm&lt;/code&gt; library. I had trained the model on Colab, saved the weights, and moved everything to VS Code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UserWarning: Mapping deprecated model name xception
to current legacy_xception.
RuntimeError: Error(s) in loading state_dict for XceptionDetector:
Missing key(s) in state_dict: "head.0.weight", "head.0.bias"...
Unexpected key(s) in state_dict: "classifier.0.weight"...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two separate bugs, same crash.&lt;/p&gt;

&lt;p&gt;First: &lt;code&gt;timm&lt;/code&gt; renamed &lt;code&gt;xception&lt;/code&gt; to &lt;code&gt;legacy_xception&lt;/code&gt;. Use the new name to remove the warning and avoid future breakage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old — throws deprecation warning
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;xception&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pretrained&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# New — explicit, no warning
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;legacy_xception&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pretrained&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second: I had named the classification head &lt;code&gt;self.classifier&lt;/code&gt; in Colab but &lt;code&gt;self.head&lt;/code&gt; in VS Code. PyTorch saves weights by key name — &lt;code&gt;classifier.0.weight&lt;/code&gt; and &lt;code&gt;head.0.weight&lt;/code&gt; are completely different keys even if the architecture is identical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Name your model layers once. Never rename them. The name is part of the contract between your training environment and your serving environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; Model serialization is more fragile than it looks. The weight file is not just numbers — it is numbers plus the exact architecture key names. Document both.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bug 5 — joblib Cannot Cross Python Versions
&lt;/h3&gt;

&lt;p&gt;On Day 29 I saved a &lt;code&gt;GradientBoostingClassifier&lt;/code&gt; with joblib on Google Colab (Python 3.10, numpy 1.24) and loaded it on VS Code (Python 3.10, numpy 1.26).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ValueError: &amp;lt;class 'numpy.random._mt19937.MT19937'&amp;gt;
is not a known BitGenerator module.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same Python version. Different numpy. Dead model.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;GradientBoostingClassifier&lt;/code&gt; internally stores a numpy &lt;code&gt;RandomState&lt;/code&gt; object. When numpy changes how it serializes random state between minor versions, joblib files become unreadable across those versions even when everything else matches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three solutions in order of preference:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Solution 1 — Save with protocol 2 (maximum compatibility)
&lt;/span&gt;&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model.joblib&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;protocol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Solution 2 — Use XGBoost native format instead of joblib
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# XGBoost only
&lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;loaded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Solution 3 — Retrain locally (fastest for synthetic data)
# Never transfer joblib files across environments
# Always retrain in the environment you serve from
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; joblib is not a portable format. It is a snapshot of a specific Python environment. If your training and serving environments differ — even slightly — retrain in the serving environment. Always.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern Behind All 5 Bugs
&lt;/h2&gt;

&lt;p&gt;Look at what they have in common:&lt;/p&gt;

&lt;p&gt;Every single one of them was invisible in a tutorial context.&lt;/p&gt;

&lt;p&gt;You cannot hit the numpy contiguous array bug in a Jupyter notebook because notebooks do not use OpenCV in a production pipeline. You cannot hit the joblib cross-version bug in a course because courses do not move models between environments. You cannot hit the LangGraph Windows async bug if you only run &lt;code&gt;python script.py&lt;/code&gt; from the command line.&lt;/p&gt;

&lt;p&gt;These bugs only exist in the gap between "it works on my machine" and "it works in production."&lt;/p&gt;

&lt;p&gt;That gap is where real AI engineering lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Startup Hidden Inside Every ML Project
&lt;/h2&gt;

&lt;p&gt;Here is something else courses never tell you.&lt;/p&gt;

&lt;p&gt;Every production ML project you build is also a startup idea. You just have to look at it correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1 — Gig Worker Credit Scorer&lt;/strong&gt;&lt;br&gt;
60 million gig workers in the US are rejected by traditional credit systems not because they are risky borrowers but because their income does not fit a W-2 pattern. ROC-AUC 0.84. Sub-200ms API. This is a $300 billion lending gap. Startups like Petal and Chime raised hundreds of millions solving exactly this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 6 — Supplier Risk Intelligence&lt;/strong&gt;&lt;br&gt;
Supply chain disruptions cost companies $228 million on average per incident. My model predicts supplier risk 3-6 months ahead using 31 signals — news sentiment, financial stress, geopolitical exposure. SAP charges enterprise customers $500K/year for similar capability. I built the core in 2 days.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 15 — LangGraph Research Agent&lt;/strong&gt;&lt;br&gt;
A research analyst costs $80-150K/year and produces one report per day. My 5-agent pipeline produces an 800-word verified research report on any topic in 90 seconds using entirely free APIs. The unit economics are violent.&lt;/p&gt;

&lt;p&gt;The pattern: find a process that is currently done by expensive humans or legacy enterprise software. Build the AI version. Price it at 10-20% of the incumbent. That is the playbook.&lt;/p&gt;


&lt;h2&gt;
  
  
  3 Things I Would Tell Myself on Day 1
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Pin your versions before you write the first line of code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create a &lt;code&gt;requirements.txt&lt;/code&gt; on day one with exact versions of every dependency. The most painful bugs I hit were not architectural mistakes — they were &lt;code&gt;torch==2.3.0&lt;/code&gt; vs &lt;code&gt;torch==2.4.0&lt;/code&gt; differences. Version drift is silent and expensive.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# requirements.txt — always pin, never assume
&lt;/span&gt;&lt;span class="py"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=2.3.0&lt;/span&gt;
&lt;span class="py"&gt;torchvision&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=0.18.0&lt;/span&gt;
&lt;span class="py"&gt;timm&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=0.9.16&lt;/span&gt;
&lt;span class="py"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.26.4&lt;/span&gt;
&lt;span class="py"&gt;xgboost&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=2.1.1&lt;/span&gt;
&lt;span class="py"&gt;langchain&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.3.0&lt;/span&gt;
&lt;span class="py"&gt;langgraph&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.0.5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Build the API before you tune the model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I lost days fine-tuning models before I knew if the API would work. The right order is: build the minimal API first, confirm the pipeline end-to-end, then improve the model. A working 0.75 AUC model in production beats a 0.85 AUC model still in a notebook.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Every bug is a blog post.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every time something breaks and I fix it, I write it down. Those 5 bugs above? Each one is a Stack Overflow answer, a dev.to article, a tweet thread. The person who googles "OpenCV non-contiguous array error 2026" and finds my explanation follows me on GitHub. That compound over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Am Building Next — June 2026
&lt;/h2&gt;

&lt;p&gt;The 30-day series covered breadth. June is depth.&lt;/p&gt;

&lt;p&gt;Eight advanced systems targeting real unsolved gaps in production AI:&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Persistent Memory Architecture&lt;/strong&gt; — LangGraph agents that remember across sessions using pgvector + FAISS (solving the biggest gap in enterprise agentic AI)&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;LLM Evaluation Framework&lt;/strong&gt; — automated hallucination detection as a CI/CD pipeline step (because 87% of companies shipping AI have no systematic evaluation)&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;LoRA Fine-Tuning Pipeline&lt;/strong&gt; — LLaMA 3.1 8B on private domain data with GGUF quantization for CPU deployment (the technique every regulated industry needs)&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Knowledge Graph + LLM&lt;/strong&gt; — GraphRAG outperforms vector RAG on multi-hop questions by 40% per Microsoft Research. I am building the production implementation.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Federated Learning System&lt;/strong&gt; — ML across hospitals that cannot share patient data (GDPR compliance by design, not retrofit)&lt;/p&gt;

&lt;p&gt;Each one solves a problem that companies are paying $500K+ in consulting fees to figure out.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The most important thing I learned in 60 days is not a framework or a model architecture.&lt;/p&gt;

&lt;p&gt;It is that production AI engineering is a craft that only gets built through shipping.&lt;/p&gt;

&lt;p&gt;You can read every paper, watch every tutorial, and follow every course. None of it prepares you for the moment when your model loads perfectly in training and silently returns wrong predictions in production because the preprocessing pipeline has a different random seed.&lt;/p&gt;

&lt;p&gt;The only way to learn production is to build for production.&lt;/p&gt;

&lt;p&gt;Start shipping.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;All systems are open source:&lt;/strong&gt;&lt;br&gt;
🔗 &lt;a href="https://github.com/Danish08654" rel="noopener noreferrer"&gt;github.com/Danish08654&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Follow for daily updates on the June advanced projects:&lt;/strong&gt;&lt;br&gt;
🔗 &lt;a href="https://www.linkedin.com/in/danish-zulfiqar-53884b24a/" rel="noopener noreferrer"&gt;LinkedIn — Danish Zulfiqar&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you hit any of these bugs? Drop them in the comments — I want to hear what production broke for you.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>langchain</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
