DEV Community

# dataengineering

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Google's LEGO tribute đź§©

Google's LEGO tribute đź§©

27
Comments 8
1 min read
The Ultimate Linux Command Cheat Sheet for Data Engineers and Analysts

The Ultimate Linux Command Cheat Sheet for Data Engineers and Analysts

85
Comments 4
4 min read
When Small Parquet Files Become a Big Problem (and How I Ended Up Writing a Compactor in PyArrow)

When Small Parquet Files Become a Big Problem (and How I Ended Up Writing a Compactor in PyArrow)

17
Comments 2
5 min read
Join Data from Anywhere: The Streaming SQL Engine That Bridges Databases, APIs, and Files

Join Data from Anywhere: The Streaming SQL Engine That Bridges Databases, APIs, and Files

8
Comments 1
17 min read
Why 71,000 Data Engineers Read My Article: What I Learned About Technical Writing

Why 71,000 Data Engineers Read My Article: What I Learned About Technical Writing

4
Comments 1
6 min read
Real-Time is an SLA, Not an Architecture: When You Actually Need Kafka (And When You Don't)

Real-Time is an SLA, Not an Architecture: When You Actually Need Kafka (And When You Don't)

1
Comments
10 min read
🌍 Automating Africa’s Energy Data Collection Using Python, Playwright(+Why Playwright ?), and MongoDB (2000–2024)

🌍 Automating Africa’s Energy Data Collection Using Python, Playwright(+Why Playwright ?), and MongoDB (2000–2024)

5
Comments
5 min read
S3-Native Kafka Alternatives: What's Actually Different

S3-Native Kafka Alternatives: What's Actually Different

Comments
3 min read
Why Parquet Is Everywhere - And What Makes It Actually Fast?

Why Parquet Is Everywhere - And What Makes It Actually Fast?

2
Comments
3 min read
RIP Amazon Data Firehose Change Data Capture

RIP Amazon Data Firehose Change Data Capture

7
Comments 3
4 min read
Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

1
Comments
4 min read
Writes, 3 ways: Postgres, Apache Kafka® and Apache Iceberg™

Writes, 3 ways: Postgres, Apache Kafka® and Apache Iceberg™

1
Comments
10 min read
From smog to streams: how data engineering helps us breathe easier.

From smog to streams: how data engineering helps us breathe easier.

1
Comments 1
4 min read
Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Comments
8 min read
The Waterfall Pattern: A Tiered Strategy for Reliable Data Extraction

The Waterfall Pattern: A Tiered Strategy for Reliable Data Extraction

1
Comments 1
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.