DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Architecting for the Crash: Why 'Clean Data' is the Only Safety Net in Trading Wind-Down (TWD)

Architecting for the Crash: Why 'Clean Data' is the Only Safety Net in Trading Wind-Down (TWD)

1
Comments
3 min read
How One Can Start Their Journey in Data Engineering

How One Can Start Their Journey in Data Engineering

Comments 2
4 min read
The Time Our Pipeline Processed the Same Day’s Data 47 Times

The Time Our Pipeline Processed the Same Day’s Data 47 Times

Comments
5 min read
Firehose and Iceberg Tables

Firehose and Iceberg Tables

Comments
4 min read
I Built an ETL Pipeline That Actually Thinks & And Cut Token Costs by 52% (And Here's What I Learned)

I Built an ETL Pipeline That Actually Thinks & And Cut Token Costs by 52% (And Here's What I Learned)

1
Comments
17 min read
Day 16: Delta Lake Explained - How Spark Finally Became Reliable for Production ETL

Day 16: Delta Lake Explained - How Spark Finally Became Reliable for Production ETL

Comments
2 min read
Google's LEGO tribute 🧩

Google's LEGO tribute 🧩

27
Comments 8
1 min read
Migrate the legacy Greenplum to Apache Cloudberry with cbcopy

Migrate the legacy Greenplum to Apache Cloudberry with cbcopy

Comments
7 min read
Unpacking the Google File System Paper: A Simple Breakdown

Unpacking the Google File System Paper: A Simple Breakdown

Comments
3 min read
Day 15: Running Spark in the Cloud - Dataproc vs Databricks

Day 15: Running Spark in the Cloud - Dataproc vs Databricks

Comments
2 min read
Rethinking Stream-Batch Unification: Real-Time Processing with Incremental Materialized Views in Apache Cloudberry

Rethinking Stream-Batch Unification: Real-Time Processing with Incremental Materialized Views in Apache Cloudberry

Comments
5 min read
Interesting links - December 2025

Interesting links - December 2025

Comments
13 min read
Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.

Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.

Comments
5 min read
Navigating the Future: Top Data Engineering Trends Shaping 2024 and Beyond

Navigating the Future: Top Data Engineering Trends Shaping 2024 and Beyond

Comments
4 min read
Apache Airflow: Complete Guide for Basic to Advanced Developers

Apache Airflow: Complete Guide for Basic to Advanced Developers

1
Comments
22 min read
Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions

Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions

Comments
1 min read
Day 13: Window Functions in PySpark

Day 13: Window Functions in PySpark

Comments
2 min read
Is CsvPath an easy or hard language?

Is CsvPath an easy or hard language?

Comments
16 min read
Day 17: Building a Real ETL Pipeline in Spark Using Bronze-Silver-Gold Architecture

Day 17: Building a Real ETL Pipeline in Spark Using Bronze-Silver-Gold Architecture

Comments
1 min read
Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Comments
3 min read
S3-Native Kafka Alternatives: What's Actually Different

S3-Native Kafka Alternatives: What's Actually Different

Comments
3 min read
Day 12: UDF vs Pandas UDF

Day 12: UDF vs Pandas UDF

Comments
2 min read
The Data Engineers Descent Into Datetime Hell

The Data Engineers Descent Into Datetime Hell

1
Comments
5 min read
Day 11: Choosing the Right File Format in Spark

Day 11: Choosing the Right File Format in Spark

Comments
2 min read
Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Comments
6 min read
loading...