DEV Community

Cover image for The Zombie AI: How I Built a Model That Refuses to Die (and Why I Fired BERT)
praveena 0506
praveena 0506

Posted on

The Zombie AI: How I Built a Model That Refuses to Die (and Why I Fired BERT)

A story of failure, Cloudflare blocks, and why I chose a "dumb" model over a Transformer to save money.

🥔 The Problem: AI Models age like Milk
Let’s be honest. Most of our ML projects are liars. We train them on a CSV from 2023, show off the 98% accuracy on LinkedIn, and then... abandon them. The moment you deploy that model, it starts rotting. The world changes. The Supreme Court passes new judgements. But your model is still living in the past, blissfully ignorant.

I refused to build another "Potato Model" (one that sits there and rots). I wanted to build Legal Eagle AI—a system that wakes up, drinks coffee, reads the news, and gets smarter every day without me nagging it.

🤦‍♂️ Phase 1: The "I am a Genius" Phase (And the inevitable crash)
My Grand Plan:

Write a script to scrape Indian Kanoon (Government legal archives).

Train a massive Transformer.

Change the world.

The Reality: I wrote the scraper. I hit "Run". And Cloudflare immediately punched me in the face. 🥊 403 Forbidden. Access Denied. Are you a robot?

I tried header spoofing. I tried rotating user agents. Cloudflare looked at my cute little Python script and laughed. I had a fancy architecture but literally zero data.

💡 Phase 2: The "Lazy Engineer" Pivot
They say "Laziness is the mother of invention." Instead of fighting the firewall, I went around it.

I realized Google News RSS Feeds are:

XML (Deliciously easy to parse).

Real-time.

Unblockable. (Google wants you to read the news).

But wait! RSS feeds don't come with labels like "Win" or "Loss." I didn't have a budget to hire interns to label data. So, I built the "Vacuum Cleaner." 🧹

I wrote a "dumb" Heuristic Engine that scans headlines for words like "Acquitted" or "Allowed" and stamps them as WIN. If it sees "Dismissed", it stamps LOSS. Boom. Infinite, free, labeled training data. Take that, Cloudflare.

🧠 Phase 3: Why I Fired BERT (The Controversy)
Warning: This section might offend NLP purists.

Everyone asked me: "Why didn't you use a Transformer? BERT is SOTA! Do you even Attention Mechanism bro?"

Look, I know how Transformers work. I actually built one from scratch (seriously, I hand-coded the Multi-Head Attention math—you can check the pain and suffering in my Deep Dive Repo here
).

But here is the thing about Transformers:

They are Divas. 💅

They demand GPUs.

They eat RAM like Chrome tabs.

They are s-l-o-w.

I am a student running on free cloud tiers. I cannot afford a Diva. I need a Toyota Corolla. So, I used a PyTorch EmbeddingBag network.

The difference?

BERT: Reads the text, contemplates the existential meaning of the word "the," checks the context of the previous 500 tokens... [Latency: 400ms].

My Model: Averages the vectors. Smashes them into a Linear Layer. Done. [Latency: 12ms].

It is 10x faster, runs on a standard CPU, and costs $0. And guess what? It’s 95% accurate. Efficiency > Hype.

☁️ Phase 4: The Immortal Architecture (MongoDB + Airflow)
A Zombie AI needs a brain that doesn't get wiped when the server restarts. If I stored my data in a .csv on Heroku, it would vanish every 24 hours. (RIP).

So I brought in the heavy hitters:

MongoDB Atlas (The Brain Bucket):

Why? Because legal text is messy. SQL tables scream if you miss a column. MongoDB just takes the JSON and says "Thank you."

Now, even if my code crashes, the Knowledge Base survives in the cloud.

The "Groundhog Day" Loop:

I designed the system to mimic an Apache Airflow DAG.

6:00 AM: Wake up.

6:05 AM: Scrape Google.

6:10 AM: Auto-Label.

6:15 AM: Retrain the Model.

If the Supreme Court changes a law today, my model learns it by dinner time.

🐳 The "Works on My Machine" Vaccine
To ensure this delicate house of cards doesn't collapse when I move it from my laptop to the cloud, I wrapped the whole thing in Docker. And I used Poetry because requirements.txt is stuck in 2015 and I like my dependencies deterministic, thank you very much.

📈 The Verdict
Initial Loss: 1.08 (My model was basically flipping a coin).

Final Loss: 0.26 (After 5 epochs of auto-scraped data).

I built a system that feeds itself, teaches itself, and runs for free. Sometimes, the "dumb" solution is actually the smartest one.

🔗 Code & Proof
The Immortal Project: Legal Eagle AI Repo

Discussion: Have you ever ditched a "State of the Art" model because it was just too expensive/slow for production? Let me know in the comments! 👇

Top comments (0)