<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DANISH ZULFIQAR </title>
    <description>The latest articles on DEV Community by DANISH ZULFIQAR  (@danish08654).</description>
    <link>https://dev.to/danish08654</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3982268%2Fe467060e-30ef-4dac-862e-2170b6eb8dfd.jpeg</url>
      <title>DEV Community: DANISH ZULFIQAR </title>
      <link>https://dev.to/danish08654</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/danish08654"/>
    <language>en</language>
    <item>
      <title>**A clean, complete guide to version control, collaboration, and containerization. Commands, workflows, and concepts - all in one place.**</title>
      <dc:creator>DANISH ZULFIQAR </dc:creator>
      <pubDate>Sun, 05 Jul 2026 03:41:24 +0000</pubDate>
      <link>https://dev.to/danish08654/a-clean-complete-guide-to-version-control-collaboration-and-containerization-commands-47m0</link>
      <guid>https://dev.to/danish08654/a-clean-complete-guide-to-version-control-collaboration-and-containerization-commands-47m0</guid>
      <description>&lt;p&gt;&lt;strong&gt;Cheat sheet - all essential commands&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                               **Git**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;git init Initialize a new repo&lt;br&gt;
git clone  Clone a remote repo locally&lt;br&gt;
git status Show working tree status&lt;br&gt;
git add&amp;nbsp;. Stage all changes&lt;br&gt;
git commit -m Commit staged changes with message&lt;br&gt;
git log - oneline Compact commit history&lt;br&gt;
git switch -c  Create and switch to a new branch&lt;br&gt;
git merge  Merge branch into current&lt;br&gt;
git stash Stash uncommitted changes&lt;br&gt;
git stash pop Re-apply stashed changes&lt;br&gt;
git revert  Safely undo a commit&lt;br&gt;
git tag v1.0.0 Tag the current commit&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                  **GitHub / Remote**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;git remote add origin  Link local repo to remote&lt;br&gt;
git push -u origin main Push and set tracking branch&lt;br&gt;
git pull Fetch and merge remote changes&lt;br&gt;
git fetch origin Fetch without merging&lt;br&gt;
gh pr create Open a pull request via CLI&lt;br&gt;
gh pr list List open pull requests&lt;br&gt;
gh run list View GitHub Actions runs&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                     **Docker**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;docker build -t name:tag&amp;nbsp;. Build image from Dockerfile&lt;br&gt;
docker run -p 8000:8000 name Run container with port mapping&lt;br&gt;
docker run -d - name x name Run container in background&lt;br&gt;
docker ps List running containers&lt;br&gt;
docker logs -f  Follow container logs&lt;br&gt;
docker exec -it  bash Shell into a running container&lt;br&gt;
docker stop  Stop a container&lt;br&gt;
docker rm  Remove a container&lt;br&gt;
docker images List local images&lt;br&gt;
docker rmi &lt;a href="" class="article-body-image-wrapper"&gt;&lt;img&gt;&lt;/a&gt; Delete an image&lt;br&gt;
docker compose up -d Start all services in background&lt;br&gt;
docker compose down Stop and remove all services&lt;br&gt;
docker compose logs -f Follow logs for all services&lt;br&gt;
docker system prune Clean up unused images/containers&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's the full toolkit.&lt;br&gt;
Git tracks it. GitHub shares it. Docker ships it. Together they form the backbone of every modern production AI system.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>development</category>
    </item>
    <item>
      <title>Maybe you face it?</title>
      <dc:creator>DANISH ZULFIQAR </dc:creator>
      <pubDate>Sun, 28 Jun 2026 03:18:26 +0000</pubDate>
      <link>https://dev.to/danish08654/maybe-you-face-it-214n</link>
      <guid>https://dev.to/danish08654/maybe-you-face-it-214n</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/danish08654/i-deployed-6-ai-systems-live-heres-what-actually-broke-4neo" class="crayons-story__hidden-navigation-link"&gt;I Deployed 6 AI Systems Live — Here's What Actually Broke&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
      &lt;a href="https://dev.to/danish08654/i-deployed-6-ai-systems-live-heres-what-actually-broke-4neo" class="crayons-article__context-note crayons-article__context-note__feed"&gt;&lt;p&gt;Environment drift and dependency version traps&lt;/p&gt;

&lt;/a&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/danish08654" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3982268%2Fe467060e-30ef-4dac-862e-2170b6eb8dfd.jpeg" alt="danish08654 profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/danish08654" class="crayons-story__secondary fw-medium m:hidden"&gt;
              DANISH ZULFIQAR 
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                DANISH ZULFIQAR 
                
              
              &lt;div id="story-author-preview-content-4009471" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/danish08654" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3982268%2Fe467060e-30ef-4dac-862e-2170b6eb8dfd.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;DANISH ZULFIQAR &lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/danish08654/i-deployed-6-ai-systems-live-heres-what-actually-broke-4neo" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 28&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/danish08654/i-deployed-6-ai-systems-live-heres-what-actually-broke-4neo" id="article-link-4009471"&gt;
          I Deployed 6 AI Systems Live — Here's What Actually Broke
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/productivity"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;productivity&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/opensource"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;opensource&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/product"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;product&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/danish08654/i-deployed-6-ai-systems-live-heres-what-actually-broke-4neo" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt;&amp;nbsp;reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/danish08654/i-deployed-6-ai-systems-live-heres-what-actually-broke-4neo#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              5&lt;span class="hidden s:inline"&gt;&amp;nbsp;comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            5 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>I Deployed 6 AI Systems Live — Here's What Actually Broke</title>
      <dc:creator>DANISH ZULFIQAR </dc:creator>
      <pubDate>Sun, 28 Jun 2026 03:16:53 +0000</pubDate>
      <link>https://dev.to/danish08654/i-deployed-6-ai-systems-live-heres-what-actually-broke-4neo</link>
      <guid>https://dev.to/danish08654/i-deployed-6-ai-systems-live-heres-what-actually-broke-4neo</guid>
      <description>&lt;h2&gt;
  
  
  I Deployed 6 AI Systems Live — Here's What Actually Broke
&lt;/h2&gt;

&lt;p&gt;A few weeks ago I wrote about the 5 bugs that cost me 60+ hours building 49 AI systems. Every one of those bugs lived inside the code itself wrong array layout, a renamed model class, a serialization mismatch.&lt;/p&gt;

&lt;p&gt;This article is the second half of that story, and it taught me something more uncomfortable: &lt;strong&gt;code that runs perfectly on your machine can fail completely the moment it leaves your machine for reasons that have nothing to do with your code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I took 6 of my pinned GitHub projects and deployed every one of them live on Streamlit Cloud. Locally, all 6 worked without a single error. Deploying them surfaced 5 failures I had never seen before, none of which were bugs in my logic.&lt;/p&gt;

&lt;p&gt;Here they are, in the order I hit them.&lt;/p&gt;




&lt;h3&gt;
  
  
  Failure 1 — A Module That Existed Yesterday, Gone Today
&lt;/h3&gt;

&lt;p&gt;My RAG chatbot used this import, unchanged for weeks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chains&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationalRetrievalChain&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Locally: works. Deployed: instant crash.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ModuleNotFoundError: No module named 'langchain.chains'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cause had nothing to do with my code. My local environment had an old, cached version of LangChain installed months ago. The deploy environment did a clean install and pulled whatever the latest version was at that moment and recent LangChain releases moved legacy chain classes like this one out of the core package entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix that actually worked&lt;/strong&gt; pin the exact version that still contains the class, rather than chasing the newest API pattern under deployment pressure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;langchain&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=0.3.7&lt;/span&gt;
&lt;span class="py"&gt;langchain-community&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=0.3.7&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; "it works on my machine" is frequently true specifically &lt;em&gt;because&lt;/em&gt; your machine never reinstalled anything recently. A clean deploy environment has no such luxury it gets whatever is newest the moment it builds. Pin your versions before you ever need to debug this at 1 AM.&lt;/p&gt;




&lt;h3&gt;
  
  
  Failure 2 — A File That Exists, Until It Doesn't
&lt;/h3&gt;

&lt;p&gt;My construction RAG project loads a prebuilt FAISS vector index from disk:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faiss_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allow_dangerous_deserialization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Locally, instant load. Deployed, a raw crash deep inside FAISS's C++ binding with no clean Python traceback the kind of failure that gives you nothing to Google.&lt;/p&gt;

&lt;p&gt;The actual cause: Git LFS. My index file had been quietly stored via Git LFS, which keeps a tiny text pointer in your git history instead of the real binary. Locally, my LFS client silently resolved that pointer into the real file, so I never noticed. The cloud platform's git clone fetched the pointer file only a few hundred bytes of text and handed that to FAISS, expecting a binary index.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git lfs untrack &lt;span class="s2"&gt;"faiss_index/*"&lt;/span&gt;
git &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;--cached&lt;/span&gt; faiss_index/index.faiss
git add faiss_index/index.faiss
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Stop using Git LFS — commit as plain binary"&lt;/span&gt;
git push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Git LFS is invisible exactly when it's working correctly on your machine. The only time you discover you were depending on it is the first time you deploy somewhere that doesn't support it.&lt;/p&gt;




&lt;h3&gt;
  
  
  Failure 3 — Two Different Size Limits Wearing the Same Error Message
&lt;/h3&gt;

&lt;p&gt;Uploading an 83MB PyTorch model checkpoint through GitHub's website gave me this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Yowza, that's a big file. Try again with a file smaller than 25MB.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I assumed GitHub simply couldn't take files that size. It can the website's drag-and-drop has a 25MB ceiling, but &lt;code&gt;git push&lt;/code&gt; from the command line has a completely separate 100MB ceiling. Same platform, two different limits depending on which door you walk through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add model.pth
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Add trained model checkpoint"&lt;/span&gt;
git config &lt;span class="nt"&gt;--global&lt;/span&gt; http.postBuffer 157286400
git push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;postBuffer&lt;/code&gt; line matters specifically for files in the 50-100MB range without it, larger pushes can silently time out mid-transfer on a slower connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; a platform's documented limit and a UI's enforced limit are not always the same number. When something fails at a suspiciously round threshold, check whether you're hitting the actual platform limit or an arbitrary limit of the specific interface you happened to use.&lt;/p&gt;




&lt;h3&gt;
  
  
  Failure 4 — The Platform Changed Under Me, Without Asking
&lt;/h3&gt;

&lt;p&gt;Midway through this deployment sprint, an app that had been working for days suddenly broke with a wall of import errors &lt;code&gt;torchvision&lt;/code&gt; missing, dozens of warnings cascading from deep inside &lt;code&gt;transformers&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Nothing in my code had changed. What had changed was the Python version my deploy platform silently selected newer than what I'd pinned, and several of my dependencies didn't yet have compatible builds for it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix that actually held:&lt;/strong&gt; I stopped trying to pin a specific Python version against a platform that wasn't reliably honoring the pin, and instead removed every heavy compiled dependency I didn't strictly need no &lt;code&gt;torch&lt;/code&gt;, no &lt;code&gt;transformers&lt;/code&gt;, no &lt;code&gt;faiss&lt;/code&gt; for a project whose knowledge base was small enough to live directly in a prompt instead of a vector store. A requirements file with three lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;streamlit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.40.0&lt;/span&gt;
&lt;span class="py"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=2.32.3&lt;/span&gt;
&lt;span class="py"&gt;python-dotenv&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.0.1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;cannot break this way, because there is nothing in it with compiled platform-specific wheels to break.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; when a managed platform controls the runtime, the most resilient strategy is not fighting to pin every variable it's minimizing how many variables you depend on in the first place.&lt;/p&gt;




&lt;h3&gt;
  
  
  Failure 5 — A Push That Succeeds and Goes Nowhere You're Looking
&lt;/h3&gt;

&lt;p&gt;The most disorienting failure of the five: &lt;code&gt;git push&lt;/code&gt; reported complete success files written, no errors, a clean exit. The file was simply not visible anywhere on GitHub afterward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git branch &lt;span class="nt"&gt;-a&lt;/span&gt;
&lt;span class="k"&gt;*&lt;/span&gt; master
  remotes/origin/main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My repository had been created with a default &lt;code&gt;main&lt;/code&gt; branch already on GitHub. I had been committing and pushing to &lt;code&gt;master&lt;/code&gt; the entire time a branch that existed locally and now also existed remotely, sitting parallel to &lt;code&gt;main&lt;/code&gt;, never appearing on the page I was checking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git push origin master:main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or, going forward, simply commit directly to whichever branch GitHub actually shows by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; a successful push confirms your laptop and the remote agree with each other. It confirms nothing about whether that destination is the one a human is looking at in a browser tab.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern Across All Five
&lt;/h2&gt;

&lt;p&gt;None of these were logic bugs. My code was correct in every case. Every failure came from a gap between two environments that I had assumed were equivalent and were not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;my cached dependencies vs. a fresh install&lt;/li&gt;
&lt;li&gt;my local LFS resolution vs. a clone that skips LFS&lt;/li&gt;
&lt;li&gt;a UI's limit vs. a protocol's limit&lt;/li&gt;
&lt;li&gt;the runtime I requested vs. the runtime I was actually given&lt;/li&gt;
&lt;li&gt;the branch I was typing into vs. the branch being displayed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;"Works locally" is a claim about one specific environment. Deployment is the process of discovering every assumption that claim was quietly resting on.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Short Checklist For Next Time
&lt;/h2&gt;

&lt;p&gt;Before deploying anything again, I now check:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Are my dependency versions pinned to exact numbers, not ranges?&lt;/li&gt;
&lt;li&gt;Does anything in this repo rely on Git LFS — and does my deploy target support it?&lt;/li&gt;
&lt;li&gt;Are any committed files close to a platform's size limits, and which limit — UI or protocol?&lt;/li&gt;
&lt;li&gt;Can I remove a heavy compiled dependency instead of fighting to pin its version?&lt;/li&gt;
&lt;li&gt;Does &lt;code&gt;git branch -a&lt;/code&gt; show exactly one branch I'm pushing to, with no silent second branch sitting beside it?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Five questions, thirty seconds, asked before the deploy instead of discovered after.&lt;/p&gt;




&lt;p&gt;All 6 systems are live and open source:&lt;br&gt;
🔗 &lt;a href="https://github.com/Danish08654" rel="noopener noreferrer"&gt;github.com/Danish08654&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've hit a deployment failure that had nothing to do with your actual code I'd like to hear it. Drop it in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>product</category>
    </item>
    <item>
      <title>I Built 48 Production AI Systems in 60 Days — Here Is What Nobody Tells You About Real AI Engineering</title>
      <dc:creator>DANISH ZULFIQAR </dc:creator>
      <pubDate>Sat, 13 Jun 2026 06:59:41 +0000</pubDate>
      <link>https://dev.to/danish08654/i-built-48-production-ai-systems-in-60-days-here-is-what-nobody-tells-you-about-real-ai-1461</link>
      <guid>https://dev.to/danish08654/i-built-48-production-ai-systems-in-60-days-here-is-what-nobody-tells-you-about-real-ai-1461</guid>
      <description>&lt;h2&gt;
  
  
  I Built 48 Production AI Systems
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Here Is What Nobody Tells You About Real AI Engineering
&lt;/h2&gt;




&lt;p&gt;I did not study AI engineering. I built it.&lt;/p&gt;

&lt;p&gt;For 60 days I woke up at 6 AM, opened VS Code, and shipped one production AI system every day. Not notebooks. Not tutorials. Not demos. Systems — with a live REST API, an interactive dashboard, a trained model, and a GitHub repo with a README that explains the business problem it solves.&lt;/p&gt;

&lt;p&gt;48 systems later, I want to tell you what courses do not cover.&lt;/p&gt;

&lt;p&gt;Not the architecture patterns. Not the frameworks. The real stuff. The 3 AM stuff. The "why is this working on Colab but crashing on my laptop" stuff.&lt;/p&gt;

&lt;p&gt;This is that article.&lt;/p&gt;




&lt;h2&gt;
  
  
  First — What I Actually Built
&lt;/h2&gt;

&lt;p&gt;Before I get to the lessons, here is the scope so you understand why these lessons matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 — Production ML (Days 1-7)&lt;/strong&gt;&lt;br&gt;
Credit scoring for gig workers. B2B intent detection. Dynamic pricing. Carbon estimation. Clinical trial matching. Supplier risk intelligence. Economic forecasting. Every one deployed as a FastAPI endpoint with a Streamlit dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 — Deep Learning and Computer Vision (Days 8–14)&lt;/strong&gt;&lt;br&gt;
Deepfake detector. Satellite change detector. Document OCR. Plant disease detection. Fitness pose coach. Real models, real inference, real errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3 — LLMs and Agents (Days 15–21)&lt;/strong&gt;&lt;br&gt;
LangGraph multi-agent research pipeline. MCP business agent. Text-to-image generator. Vertical RAG for construction. Voice agent. Every one using free APIs — Groq, Tavily, gTTS, Whisper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 4 — MLOps (Days 22–30)&lt;/strong&gt;&lt;br&gt;
End-to-end MLOps pipeline with MLflow, Evidently AI, auto-retraining, Grafana monitoring, and Docker deployment.&lt;/p&gt;

&lt;p&gt;That is what I shipped. Now here is what it cost me.&lt;/p&gt;


&lt;h2&gt;
  
  
  The 5 Bugs That Taught Me More Than Any Course
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Bug 1 — OpenCV and Non-Contiguous Arrays
&lt;/h3&gt;

&lt;p&gt;On Day 8 I was building a deepfake detector. XceptionNet was working. The preprocessing pipeline was clean. Then I hit this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: OpenCV(4.13.0) :-1: error: (-5:Bad argument)
in function 'ellipse'
&amp;gt; Layout of the output array img is incompatible with cv::Mat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I stared at this for four hours.&lt;/p&gt;

&lt;p&gt;The problem was not my code. It was memory layout. When you do &lt;code&gt;np.where&lt;/code&gt;, &lt;code&gt;np.clip&lt;/code&gt;, or pass an array through PIL and back to numpy, the resulting array is sometimes stored non-contiguously in memory — rows scattered across RAM instead of packed together. OpenCV's C++ backend cannot read non-contiguous memory and throws this exact error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix is one line:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ascontiguousarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call this before every single OpenCV operation. Not just the ones that fail. Every one. Because the failure is not deterministic — it depends on which numpy operation preceded the cv2 call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; The gap between a working notebook and a working system is often not logic. It is memory, types, and environment — things that tutorials never mention because they never hit production.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bug 2 — XGBoost 3.x Broke SHAP
&lt;/h3&gt;

&lt;p&gt;On Day 1 I was building a credit scoring system. I had trained a LightGBM model with SHAP explainability — regulatory compliance, every decision explained. It worked perfectly on Google Colab.&lt;/p&gt;

&lt;p&gt;I moved to VS Code. Everything crashed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ValueError: &amp;lt;class 'numpy.random._mt19937.MT19937'&amp;gt;
is not a known BitGenerator module.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The root cause was a numpy version mismatch — Colab was using a newer numpy than my local environment. But the deeper problem was that XGBoost 3.x and the SHAP library had an internal incompatibility nobody documented clearly.&lt;/p&gt;

&lt;p&gt;The solution I found was to stop using SHAP entirely and use XGBoost's native contributions instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Instead of this (breaks on XGBoost 3.x)
&lt;/span&gt;&lt;span class="n"&gt;explainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TreeExplainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;shap_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;explainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shap_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Use this (works on all XGBoost versions)
&lt;/span&gt;&lt;span class="n"&gt;contributions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DMatrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;pred_contribs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The math is identical. The result is identical. The dependency conflict disappears.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; Version pinning is not optional in production ML. The first thing every new project needs is a locked requirements file. Ship the environment, not just the code.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bug 3 — LangGraph on Windows Kills Async
&lt;/h3&gt;

&lt;p&gt;On Day 21 I was building an MCP business agent — LangGraph orchestrating 8 MCP tools for invoice processing, AP approval, and Slack notifications. The API was running. The workflow triggered. Then silence.&lt;/p&gt;

&lt;p&gt;No error. No output. Just a FastAPI background thread that started and disappeared.&lt;/p&gt;

&lt;p&gt;The problem was Windows. Python's &lt;code&gt;asyncio.run()&lt;/code&gt; creates a new event loop each time it is called. On Windows, FastAPI background threads already have an event loop running — and &lt;code&gt;asyncio.run()&lt;/code&gt; conflicts with it. On Linux this never happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# At the top of main.py — Windows only
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;platform&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;win32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_event_loop_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;WindowsProactorEventLoopPolicy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# In background thread functions
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_workflow_background&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;platform&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;win32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ProactorEventLoop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_event_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_until_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; Cross-platform is a real constraint, not a theoretical one. If you build on Windows and deploy on Linux — or the reverse — test the async behavior explicitly. It will not tell you it is broken. It will just silently do nothing.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bug 4 — timm Renamed Xception Without Warning
&lt;/h3&gt;

&lt;p&gt;On Day 8 my deepfake detector used XceptionNet from the &lt;code&gt;timm&lt;/code&gt; library. I had trained the model on Colab, saved the weights, and moved everything to VS Code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UserWarning: Mapping deprecated model name xception
to current legacy_xception.
RuntimeError: Error(s) in loading state_dict for XceptionDetector:
Missing key(s) in state_dict: "head.0.weight", "head.0.bias"...
Unexpected key(s) in state_dict: "classifier.0.weight"...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two separate bugs, same crash.&lt;/p&gt;

&lt;p&gt;First: &lt;code&gt;timm&lt;/code&gt; renamed &lt;code&gt;xception&lt;/code&gt; to &lt;code&gt;legacy_xception&lt;/code&gt;. Use the new name to remove the warning and avoid future breakage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old — throws deprecation warning
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;xception&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pretrained&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# New — explicit, no warning
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;legacy_xception&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pretrained&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second: I had named the classification head &lt;code&gt;self.classifier&lt;/code&gt; in Colab but &lt;code&gt;self.head&lt;/code&gt; in VS Code. PyTorch saves weights by key name — &lt;code&gt;classifier.0.weight&lt;/code&gt; and &lt;code&gt;head.0.weight&lt;/code&gt; are completely different keys even if the architecture is identical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Name your model layers once. Never rename them. The name is part of the contract between your training environment and your serving environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; Model serialization is more fragile than it looks. The weight file is not just numbers — it is numbers plus the exact architecture key names. Document both.&lt;/p&gt;




&lt;h3&gt;
  
  
  Bug 5 — joblib Cannot Cross Python Versions
&lt;/h3&gt;

&lt;p&gt;On Day 29 I saved a &lt;code&gt;GradientBoostingClassifier&lt;/code&gt; with joblib on Google Colab (Python 3.10, numpy 1.24) and loaded it on VS Code (Python 3.10, numpy 1.26).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ValueError: &amp;lt;class 'numpy.random._mt19937.MT19937'&amp;gt;
is not a known BitGenerator module.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same Python version. Different numpy. Dead model.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;GradientBoostingClassifier&lt;/code&gt; internally stores a numpy &lt;code&gt;RandomState&lt;/code&gt; object. When numpy changes how it serializes random state between minor versions, joblib files become unreadable across those versions even when everything else matches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three solutions in order of preference:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Solution 1 — Save with protocol 2 (maximum compatibility)
&lt;/span&gt;&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model.joblib&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;protocol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Solution 2 — Use XGBoost native format instead of joblib
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# XGBoost only
&lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;loaded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Solution 3 — Retrain locally (fastest for synthetic data)
# Never transfer joblib files across environments
# Always retrain in the environment you serve from
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this taught me:&lt;/strong&gt; joblib is not a portable format. It is a snapshot of a specific Python environment. If your training and serving environments differ — even slightly — retrain in the serving environment. Always.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern Behind All 5 Bugs
&lt;/h2&gt;

&lt;p&gt;Look at what they have in common:&lt;/p&gt;

&lt;p&gt;Every single one of them was invisible in a tutorial context.&lt;/p&gt;

&lt;p&gt;You cannot hit the numpy contiguous array bug in a Jupyter notebook because notebooks do not use OpenCV in a production pipeline. You cannot hit the joblib cross-version bug in a course because courses do not move models between environments. You cannot hit the LangGraph Windows async bug if you only run &lt;code&gt;python script.py&lt;/code&gt; from the command line.&lt;/p&gt;

&lt;p&gt;These bugs only exist in the gap between "it works on my machine" and "it works in production."&lt;/p&gt;

&lt;p&gt;That gap is where real AI engineering lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Startup Hidden Inside Every ML Project
&lt;/h2&gt;

&lt;p&gt;Here is something else courses never tell you.&lt;/p&gt;

&lt;p&gt;Every production ML project you build is also a startup idea. You just have to look at it correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1 — Gig Worker Credit Scorer&lt;/strong&gt;&lt;br&gt;
60 million gig workers in the US are rejected by traditional credit systems not because they are risky borrowers but because their income does not fit a W-2 pattern. ROC-AUC 0.84. Sub-200ms API. This is a $300 billion lending gap. Startups like Petal and Chime raised hundreds of millions solving exactly this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 6 — Supplier Risk Intelligence&lt;/strong&gt;&lt;br&gt;
Supply chain disruptions cost companies $228 million on average per incident. My model predicts supplier risk 3-6 months ahead using 31 signals — news sentiment, financial stress, geopolitical exposure. SAP charges enterprise customers $500K/year for similar capability. I built the core in 2 days.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 15 — LangGraph Research Agent&lt;/strong&gt;&lt;br&gt;
A research analyst costs $80-150K/year and produces one report per day. My 5-agent pipeline produces an 800-word verified research report on any topic in 90 seconds using entirely free APIs. The unit economics are violent.&lt;/p&gt;

&lt;p&gt;The pattern: find a process that is currently done by expensive humans or legacy enterprise software. Build the AI version. Price it at 10-20% of the incumbent. That is the playbook.&lt;/p&gt;


&lt;h2&gt;
  
  
  3 Things I Would Tell Myself on Day 1
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Pin your versions before you write the first line of code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create a &lt;code&gt;requirements.txt&lt;/code&gt; on day one with exact versions of every dependency. The most painful bugs I hit were not architectural mistakes — they were &lt;code&gt;torch==2.3.0&lt;/code&gt; vs &lt;code&gt;torch==2.4.0&lt;/code&gt; differences. Version drift is silent and expensive.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# requirements.txt — always pin, never assume
&lt;/span&gt;&lt;span class="py"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=2.3.0&lt;/span&gt;
&lt;span class="py"&gt;torchvision&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=0.18.0&lt;/span&gt;
&lt;span class="py"&gt;timm&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=0.9.16&lt;/span&gt;
&lt;span class="py"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.26.4&lt;/span&gt;
&lt;span class="py"&gt;xgboost&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=2.1.1&lt;/span&gt;
&lt;span class="py"&gt;langchain&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.3.0&lt;/span&gt;
&lt;span class="py"&gt;langgraph&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.0.5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Build the API before you tune the model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I lost days fine-tuning models before I knew if the API would work. The right order is: build the minimal API first, confirm the pipeline end-to-end, then improve the model. A working 0.75 AUC model in production beats a 0.85 AUC model still in a notebook.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Every bug is a blog post.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every time something breaks and I fix it, I write it down. Those 5 bugs above? Each one is a Stack Overflow answer, a dev.to article, a tweet thread. The person who googles "OpenCV non-contiguous array error 2026" and finds my explanation follows me on GitHub. That compound over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Am Building Next — June 2026
&lt;/h2&gt;

&lt;p&gt;The 30-day series covered breadth. June is depth.&lt;/p&gt;

&lt;p&gt;Eight advanced systems targeting real unsolved gaps in production AI:&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Persistent Memory Architecture&lt;/strong&gt; — LangGraph agents that remember across sessions using pgvector + FAISS (solving the biggest gap in enterprise agentic AI)&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;LLM Evaluation Framework&lt;/strong&gt; — automated hallucination detection as a CI/CD pipeline step (because 87% of companies shipping AI have no systematic evaluation)&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;LoRA Fine-Tuning Pipeline&lt;/strong&gt; — LLaMA 3.1 8B on private domain data with GGUF quantization for CPU deployment (the technique every regulated industry needs)&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Knowledge Graph + LLM&lt;/strong&gt; — GraphRAG outperforms vector RAG on multi-hop questions by 40% per Microsoft Research. I am building the production implementation.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Federated Learning System&lt;/strong&gt; — ML across hospitals that cannot share patient data (GDPR compliance by design, not retrofit)&lt;/p&gt;

&lt;p&gt;Each one solves a problem that companies are paying $500K+ in consulting fees to figure out.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The most important thing I learned in 60 days is not a framework or a model architecture.&lt;/p&gt;

&lt;p&gt;It is that production AI engineering is a craft that only gets built through shipping.&lt;/p&gt;

&lt;p&gt;You can read every paper, watch every tutorial, and follow every course. None of it prepares you for the moment when your model loads perfectly in training and silently returns wrong predictions in production because the preprocessing pipeline has a different random seed.&lt;/p&gt;

&lt;p&gt;The only way to learn production is to build for production.&lt;/p&gt;

&lt;p&gt;Start shipping.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;All systems are open source:&lt;/strong&gt;&lt;br&gt;
🔗 &lt;a href="https://github.com/Danish08654" rel="noopener noreferrer"&gt;github.com/Danish08654&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Follow for daily updates on the June advanced projects:&lt;/strong&gt;&lt;br&gt;
🔗 &lt;a href="https://www.linkedin.com/in/danish-zulfiqar-53884b24a/" rel="noopener noreferrer"&gt;LinkedIn — Danish Zulfiqar&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you hit any of these bugs? Drop them in the comments — I want to hear what production broke for you.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
