DEV Community

Brenda ..
Brenda ..

Posted on

Rebuilding the national narrative with AI and Docker

Whether you’re a Founder in the 22@ district trying to track market shifts, or a Tech Enthusiast looking for your first break in the industry, the problem is identical: the "Information Sludge." Every day, Spain generates headlines across Finance, Tech, and Real Estate.
Most of it is noise... I want the signal.

So I built a refinery.

The refinery isn't just to "scrape news". I want to build an automated engine that reads the national sentiment, distilling it into actionable insights.


🍽️ The "Data Kitchen" Architecture

To explain this to my non-tech friends, I tell them to imagine a high-end restaurant in Poblenou:

  • The Ingestion (Mage AI): My Head Chef... Every 6 hours, he hits the digital markets (Google News) to find the freshest raw ingredients.
  • The Brain (Robertuito NLP): My Specialist Saucier... This is a specialized AI model trained on native Spanish text. It doesn't just "read"; it understands the cultural nuance to detect if the "vibe" is Positive, Neutral, or Negative.
  • The Transformation (dbt): My Bouncer. He runs integrity tests. If a headline is a duplicate or a "ghost" with no link, it doesn't get past the velvet rope.
  • The Showroom (Streamlit): My Waiter. He serves the final, high-purity insights on interactive charts.

🏗️ For the Docker Wizards: Forging the "Steel"

If you live in the terminal, you know that "Integrity" isn't a buzzword—it’s a configuration. Building this refinery required overcoming some hurdles that nearly melted my CPU.

  1. The Cartesian Explosion (The "Freeze" Bug) 🧊
    Early on, I joined my raw data to my "date spine" too early in the pipeline. It created a Cartesian product that multiplied rows exponentially. It was like trying to fit the entire crowd of a Barça match into a single tiny bar in the Raval.
    The Fix: Pre-Aggregation. I moved a lot of the "cooking" into CTEs, shrinking the data to its daily grain before the joins. The machine finally stopped freezing.

  2. The Docker Path Labyrinth (The GPS Mismatch) 🧭
    Containers are isolated worlds. My code was looking for database files in a local Barcelona folder, while the container was sitting in its own "Virtual Madrid."
    The Fix: Absolute Path Sovereignty. I enforced deterministic paths starting from the root (/home/src/). The system is now self-healing; if the environment resets, the refinery knows exactly how to rebuild its own infrastructure.

  3. The Runtime vs. Build-time War ⚔️
    Installing heavy AI libraries like PyTorch every time the container started made the "cold start" slower than a siesta in August.
    The Fix: I shifted from runtime installation to Build-time Provisioning. By "baking" the heavy dependencies into the Docker image, the refinery is now Ready to Serve the second it turns on.


In Data Engineering, your Lineage Graph is your bond. Inspired by the principle of being an "ensample in purity," I codify my ethics into the SQL.

By using TRY_CAST and NULLIF patterns, I neutralized data "sludge" (like literal 'null' strings in CSVs) before it poisons the metrics. I standardized codes and fix typos.
Why? Because Clean Data is the highest form of professional honesty.


🚀 The Roadmap: What’s Next?

I have set a deadline to land my next role here in Barcelona, and the refinery is my primary proof of work. But a true architect never stops building:

  • Market Deep-Listening: Expanding the "Loophole Ingestor" to catch specific industrial shifts, predicting economic trends before they hit the mainstream.

🏁 The Bottom Line

I didn’t just build a dashboard; I built an engine. In a world of fake news and fragmented data, the Spanish Pulse is a reminder that we can use technology to find the truth—and have a little Joie de Vivre while doing it.

Check out the repo

Top comments (0)