DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Star vs. Snowflake Schema

Star vs. Snowflake Schema

Comments
4 min read
The Bear Awakens: From Pure Speed to Massive Endurance (640 Million Rows Tested)

The Bear Awakens: From Pure Speed to Massive Endurance (640 Million Rows Tested)

Comments
16 min read
Data Engineer — Người Kiến Tạo “Dòng Chảy Dữ Liệu” Trong Kỷ Nguyên Số

Data Engineer — Người Kiến Tạo “Dòng Chảy Dữ Liệu” Trong Kỷ Nguyên Số

Comments
2 min read
Sustainability in retail is a Software Problem Now

Sustainability in retail is a Software Problem Now

Comments
2 min read
Join Data from Anywhere: The Streaming SQL Engine That Bridges Databases, APIs, and Files

Join Data from Anywhere: The Streaming SQL Engine That Bridges Databases, APIs, and Files

8
Comments 1
17 min read
Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study

Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study

Comments
5 min read
Part 1: Database Concepts & Architecture

Part 1: Database Concepts & Architecture

Comments
14 min read
AWS Glue ETL Jobs: Transform Your Data at Scale

AWS Glue ETL Jobs: Transform Your Data at Scale

1
Comments
4 min read
Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Comments
4 min read
From Pandas to Upstream Control: The Evolution PyData Needs Next

From Pandas to Upstream Control: The Evolution PyData Needs Next

Comments
6 min read
Building Reliable Legal AI: Never Missing a Supreme Court Case

Building Reliable Legal AI: Never Missing a Supreme Court Case

2
Comments
26 min read
Statistics Day 2: Correlation Isn’t Causation — Here’s Why It Matters!

Statistics Day 2: Correlation Isn’t Causation — Here’s Why It Matters!

5
Comments
4 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Dec 9th - Dec15th, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Dec 9th - Dec15th, 2025)

1
Comments
7 min read
Kafka consumer lag—Measure and reduce

Kafka consumer lag—Measure and reduce

Comments
5 min read
Understanding Kafka Consumer Lag: Causes, Risks, and How to Fix It

Understanding Kafka Consumer Lag: Causes, Risks, and How to Fix It

Comments
3 min read
Building a dbt-UI I Wish Existed

Building a dbt-UI I Wish Existed

1
Comments
3 min read
Building a Real-Time Crypto Data Pipeline with Debezium CDC

Building a Real-Time Crypto Data Pipeline with Debezium CDC

Comments
5 min read
Undestanding Kafka Lag, Why It Happens and How To Fix It.

Undestanding Kafka Lag, Why It Happens and How To Fix It.

2
Comments
4 min read
The State of Apache Iceberg, Polaris, and Arrow: November 5-11

The State of Apache Iceberg, Polaris, and Arrow: November 5-11

Comments
5 min read
Understanding Kafka Lag: Why It Happens and How to Fix It

Understanding Kafka Lag: Why It Happens and How to Fix It

Comments
4 min read
Right Approach to JSON Log Analysis: A Hands-on Guide to Efficient Practices with Alibaba Cloud SLS

Right Approach to JSON Log Analysis: A Hands-on Guide to Efficient Practices with Alibaba Cloud SLS

Comments
7 min read
Understanding reasons behind Kafka lag and how to minimize it.

Understanding reasons behind Kafka lag and how to minimize it.

Comments
3 min read
Why Idempotency Is So Important in Data Engineering

Why Idempotency Is So Important in Data Engineering

Comments
6 min read
Reducing Consumer Lag in Apache Kafka

Reducing Consumer Lag in Apache Kafka

5
Comments
3 min read
🚀 Day 1: Introduction to Apache Spark

🚀 Day 1: Introduction to Apache Spark

1
Comments
2 min read
loading...