DEV Community

Cover image for A data platform tracking news and social media across 167 cities in Rio Grande do Norte for under R$5/month.
Guilherme Cavalcante
Guilherme Cavalcante

Posted on

A data platform tracking news and social media across 167 cities in Rio Grande do Norte for under R$5/month.

I put a data pipeline into production paying 30x less than a traditional setup would require.

Over the last few months, I built a platform that monitors all 167 municipalities in Rio Grande do Norte using data from news outlets, Facebook, Instagram, X, and TikTok. The data goes through Portuguese text processing, is served through a FastAPI API, and reaches a React dashboard almost in real time.

What reduced the cost the most? Here it is:

→ Cloud Run Jobs instead of keeping Airflow running 24/7. The container starts, runs the pipeline, and shuts down. Since processing takes only a few minutes and happens a few times a day, the bill practically disappears.
→ Local DuckDB during development, BigQuery only in production. I can test everything without burning cloud quota.
→ NLP without relying on LLMs for everything. I used spaCy with deterministic rules in Portuguese: fast, cheap, and auditable. LLMs only come in when someone asks for an explanation of specific content.
→ Data Lake in Parquet before BigQuery. Reprocessing became trivial and raw data stays preserved.
→ GitHub Actions authenticating to GCP through Workload Identity Federation with OIDC. Zero private keys stored in secrets.

A lot of expensive architecture exists because copying tutorial stacks became a habit. Every project decision (and every alternative I discarded) is documented in the README.
Stack: Python, dbt, BigQuery, Terraform, GCP, FastAPI, React, TypeScript, spaCy, and uv workspace.

Part of the development was done through pair programming with Claude Code. I make that explicit because the tool accelerates writing, but does not replace technical decision-making.

More details, architecture, and dashboard: guilhermecavalcante.works
Open source: https://lnkd.in/da3ZZiZ3

I’m open to opportunities in Data Engineering, Analytics Engineering, and AI solutions.

Question for people working with infrastructure and data: do you also feel that many stacks today already start oversized?

DataEngineering #AnalyticsEngineering #MLOps #Python #BigQuery #GCP

Top comments (0)