DEV Community

James
James

Posted on

What is the Modern Data Stack?

Introduction

When I started working on personal data projects, I kept running into the same obstacles. Governance was missing, so I often questioned which version of the data was correct. Scaling pipelines meant starting over instead of building on what I had. Reproducibility was frustrating.Running the same process twice sometimes gave different outcomes. Even small updates could break the flow and leave me backtracking.

These challenges are why hybrid architectures matter. Modern data work rarely lives in one place. Combining local systems with cloud platforms creates balance between control and scalability. Governance can be built into transformations, security can be enforced at multiple layers, and scaling does not depend on the limits of a single machine. A hybrid design makes the workflow more flexible while keeping it structured.

I have pieced together some of these requirements in a project to demonstrate why they matter. The Modern Data Stack applies DataOps practices and analytics engineering principles to show what it looks like when governance, reproducibility, and scalability are considered from the start.

Why These Practices Matter

DataOps applies the discipline of DevOps to data. It focuses on automating ingestion, testing transformations, and deploying changes with confidence. Analytics engineering builds on this foundation, shaping raw data into well-modeled tables that are easier to query and analyze.

Together they provide answers to the problems I faced in my projects:
• Silent pipeline failures are replaced with automated checks and alerts
• Business logic lives in code rather than scattered spreadsheets
• Environments can be recreated consistently with infrastructure as code

The aim is not to collect more tools. It is to make the workflow reliable, transparent, and scalable.

The Architecture

The project follows a layered approach to data.

  1. Bronze (Raw): Google Sheets data lands in PostgreSQL via Python scripts.
  2. Silver(Curated): New records are loaded incrementally into BigQuery.
  3. Gold (Analytics-ready): dbt Cloud transforms and tests the data, making it usable for analysis.
  4. Automation layer: Terraform provisions infrastructure, and GitHub Actions handle orchestration.

How It Works

Ingestion with Python and PostgreSQL

I started with Google Sheets as a source. Python scripts pull ticker data and load it into PostgreSQL. This creates a single entry point for raw data, instead of juggling multiple queries across tools.

last_timestamp = get_last_timestamp_from_bigquery()
new_data = query_postgres_for_new_data(last_timestamp)
append_to_bigquery(new_data)
Enter fullscreen mode Exit fullscreen mode

This incremental pattern ensures that only new records are processed, making ingestion both efficient and scalable.

Transformations with dbt Cloud

dbt Cloud handles the transformation logic. Models capture how raw data should be reshaped, while tests validate the assumptions. By codifying transformations, the workflow becomes both transparent and reproducible. Instead of second-guessing results, I can trust the outputs because the checks are built into the process.

Infrastructure with Terraform

Provisioning is defined in Terraform. From databases to permissions, the setup can be recreated without manual steps. Version control captures every change, so the infrastructure evolves in the same structured way as the code.

CI/CD with GitHub Actions

GitHub Actions orchestrate the workflow. Each commit can trigger ingestion, transformations, and tests. Deployments run automatically, so the pipeline is not dependent on manual execution. This brings consistency and speed to the process.

Closing Thoughts

Small design choices have a big impact. A simple incremental load pattern can save hours when working with larger datasets.
dbt is more than a transformation tool. It acts as a shared framework where logic, documentation, and testing converge.
Infrastructure as code removes uncertainty. Rebuilding an environment becomes predictable instead of experimental.

Data workflows are only as strong as the discipline behind them. Without governance, scaling, and reproducibility, even small projects become fragile. By weaving DataOps and analytics engineering principles into the design, the workflow stops being a collection of scripts and turns into a system that can grow, adapt, and be trusted.

This is not a finished product, but a working demonstration of what modern data practices can look like in action.

You can explore the full implementation here: Modern Data Stack on GitHub.

Top comments (0)