DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Beyond SQL: Solving Data Warehouse Performance Bottlenecks with Smart Algorithms, Not Just Bigger Clusters

Beyond SQL: Solving Data Warehouse Performance Bottlenecks with Smart Algorithms, Not Just Bigger Clusters

5
Comments
13 min read
Day 16: Delta Lake Explained - How Spark Finally Became Reliable for Production ETL

Day 16: Delta Lake Explained - How Spark Finally Became Reliable for Production ETL

Comments
2 min read
Migrate the legacy Greenplum to Apache Cloudberry with cbcopy

Migrate the legacy Greenplum to Apache Cloudberry with cbcopy

Comments
7 min read
Unpacking the Google File System Paper: A Simple Breakdown

Unpacking the Google File System Paper: A Simple Breakdown

Comments
3 min read
Day 15: Running Spark in the Cloud - Dataproc vs Databricks

Day 15: Running Spark in the Cloud - Dataproc vs Databricks

Comments
2 min read
The Myth of Distributed Computing as a Silver Bullet for Big Data

The Myth of Distributed Computing as a Silver Bullet for Big Data

5
Comments
10 min read
Rethinking Stream-Batch Unification: Real-Time Processing with Incremental Materialized Views in Apache Cloudberry

Rethinking Stream-Batch Unification: Real-Time Processing with Incremental Materialized Views in Apache Cloudberry

Comments
5 min read
Interesting links - December 2025

Interesting links - December 2025

Comments
13 min read
Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.

Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.

Comments
5 min read
Navigating the Future: Top Data Engineering Trends Shaping 2024 and Beyond

Navigating the Future: Top Data Engineering Trends Shaping 2024 and Beyond

Comments
4 min read
Apache Airflow: Complete Guide for Basic to Advanced Developers

Apache Airflow: Complete Guide for Basic to Advanced Developers

1
Comments
22 min read
Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions

Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions

Comments
1 min read
Learning SQL in Practice: LeetCode Challenges and Setting Up PostgreSQL

Learning SQL in Practice: LeetCode Challenges and Setting Up PostgreSQL

Comments
2 min read
Day 13: Window Functions in PySpark

Day 13: Window Functions in PySpark

Comments
2 min read
REST API Calls for Data Engineers: A Practical Guide with Examples

REST API Calls for Data Engineers: A Practical Guide with Examples

Comments
3 min read
Is CsvPath an easy or hard language?

Is CsvPath an easy or hard language?

Comments
16 min read
Day 17: Building a Real ETL Pipeline in Spark Using Bronze-Silver-Gold Architecture

Day 17: Building a Real ETL Pipeline in Spark Using Bronze-Silver-Gold Architecture

Comments
1 min read
Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Comments
3 min read
S3-Native Kafka Alternatives: What's Actually Different

S3-Native Kafka Alternatives: What's Actually Different

Comments
3 min read
Day 12: UDF vs Pandas UDF

Day 12: UDF vs Pandas UDF

Comments
2 min read
The Data Engineers Descent Into Datetime Hell

The Data Engineers Descent Into Datetime Hell

1
Comments
5 min read
Streamlit desde cero: cómo crear una app para explorar y visualizar datos

Streamlit desde cero: cómo crear una app para explorar y visualizar datos

Comments
4 min read
Day 11: Choosing the Right File Format in Spark

Day 11: Choosing the Right File Format in Spark

Comments
2 min read
Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Comments
6 min read
Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Comments
2 min read
loading...