DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
The Data Analytics Lifecycle

The Data Analytics Lifecycle

Comments
3 min read
Set up an open-source AI analyst for PostgreSQL in 2 minutes

Set up an open-source AI analyst for PostgreSQL in 2 minutes

1
Comments
5 min read
The Semantic Gap in Data Quality: Why Your Monitoring is Lying to You

The Semantic Gap in Data Quality: Why Your Monitoring is Lying to You

1
Comments 1
7 min read
Building an Automated Data Pipeline: Injuries vs Performance in the Premier League

Building an Automated Data Pipeline: Injuries vs Performance in the Premier League

Comments
6 min read
2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

Comments
9 min read
Evolution of Processing: SPL One-Click Acceleration for Log-to-Metric Conversion

Evolution of Processing: SPL One-Click Acceleration for Log-to-Metric Conversion

Comments
6 min read
My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

Comments
6 min read
Containerization for Data Engineering: A Practical Guide with Docker and Docker Compose

Containerization for Data Engineering: A Practical Guide with Docker and Docker Compose

Comments
2 min read
Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)

Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)

Comments
3 min read
AWS Glue for ETL

AWS Glue for ETL

Comments
5 min read
Real-Time Earthquake CDC Pipeline

Real-Time Earthquake CDC Pipeline

Comments
5 min read
🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary

🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary

2
Comments
3 min read
The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

Comments
4 min read
Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

1
Comments
4 min read
An Exploration of the Commercial Iceberg Catalog Ecosystem

An Exploration of the Commercial Iceberg Catalog Ecosystem

Comments
14 min read
🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters

🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters

6
Comments
2 min read
Getting Started Building a Data Platform

Getting Started Building a Data Platform

Comments
3 min read
Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Comments
10 min read
Real-time Data Analytics at Scale: Integrating Apache Flink and Apache Doris with Flink Doris Connector and Flink CDC

Real-time Data Analytics at Scale: Integrating Apache Flink and Apache Doris with Flink Doris Connector and Flink CDC

Comments
10 min read
Chinese DBA's Story: Hu Zhonghao - The Journey of Becoming a DBA for Domestic Distributed Databases

Chinese DBA's Story: Hu Zhonghao - The Journey of Becoming a DBA for Domestic Distributed Databases

Comments
7 min read
Optimizing Kafka Performance: Best Practices for High Throughput and Low Latency

Optimizing Kafka Performance: Best Practices for High Throughput and Low Latency

Comments
7 min read
Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark

Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark

Comments
20 min read
Fixing Type Hints for Callable Objects with Custom Signatures in Dagster

Fixing Type Hints for Callable Objects with Custom Signatures in Dagster

4
Comments
3 min read
Apache Spark সহজভাবে জানি

Apache Spark সহজভাবে জানি

1
Comments
1 min read
Building a Test Data Platform After Watching Teams Secretly Use Production for Years

Building a Test Data Platform After Watching Teams Secretly Use Production for Years

1
Comments
3 min read
loading...