DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Day 13: Window Functions in PySpark

Day 13: Window Functions in PySpark

Comments
2 min read
Shine in Your Next Data Engineering Interview with Pandas

Shine in Your Next Data Engineering Interview with Pandas

Comments
10 min read
Why Idempotency Is So Important in Data Engineering

Why Idempotency Is So Important in Data Engineering

Comments
6 min read
REST API Calls for Data Engineers: A Practical Guide with Examples

REST API Calls for Data Engineers: A Practical Guide with Examples

Comments
3 min read
Analyzing and Optimizing a Parquet ClickHouse Ingestion Pipeline

Analyzing and Optimizing a Parquet ClickHouse Ingestion Pipeline

2
Comments
3 min read
Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Comments
3 min read
Day 12: UDF vs Pandas UDF

Day 12: UDF vs Pandas UDF

Comments
2 min read
 Day 2: Data Engineering vs Data Science vs Data Analytics

 Day 2: Data Engineering vs Data Science vs Data Analytics

Comments
2 min read
Day 11: Choosing the Right File Format in Spark

Day 11: Choosing the Right File Format in Spark

Comments
2 min read
Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Comments
6 min read
Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Comments
2 min read
Top Open-Source Data Engineering Tools- Unravelling the Best in 2026

Top Open-Source Data Engineering Tools- Unravelling the Best in 2026

Comments
10 min read
map

map

Comments
1 min read
Data Engineering in 30 Days - Day 2

Data Engineering in 30 Days - Day 2

Comments
2 min read
Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Comments
2 min read
Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration

Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration

Comments
4 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Comments
6 min read
How to Guarantee True Ordering in Complex Kafka Replays: Solving the Determinism Nightmare

How to Guarantee True Ordering in Complex Kafka Replays: Solving the Determinism Nightmare

Comments
4 min read
Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Comments
2 min read
AWSChallenge - Week 2

AWSChallenge - Week 2

Comments
4 min read
Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Comments
2 min read
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Comments
2 min read
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Comments
3 min read
DP-600 Fabric Analytics Engineer – Structured Study Notes

DP-600 Fabric Analytics Engineer – Structured Study Notes

Comments
11 min read
The Boring Debug Checklist That Fixes Most “RAG Failures”

The Boring Debug Checklist That Fixes Most “RAG Failures”

Comments
2 min read
loading...