DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Real-Time Earthquake CDC Pipeline

Real-Time Earthquake CDC Pipeline

Comments
5 min read
🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary

🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary

2
Comments
3 min read
The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

Comments
4 min read
Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

1
Comments
4 min read
An Exploration of the Commercial Iceberg Catalog Ecosystem

An Exploration of the Commercial Iceberg Catalog Ecosystem

Comments
14 min read
🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters

🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters

6
Comments
2 min read
Getting Started Building a Data Platform

Getting Started Building a Data Platform

Comments
3 min read
Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Comments
10 min read
Real-time Data Analytics at Scale: Integrating Apache Flink and Apache Doris with Flink Doris Connector and Flink CDC

Real-time Data Analytics at Scale: Integrating Apache Flink and Apache Doris with Flink Doris Connector and Flink CDC

Comments
10 min read
Elusion v8.0.0 is the best END-TO-END Data Engineering library writen in RUST

Elusion v8.0.0 is the best END-TO-END Data Engineering library writen in RUST

2
Comments
2 min read
Chinese DBA's Story: Hu Zhonghao - The Journey of Becoming a DBA for Domestic Distributed Databases

Chinese DBA's Story: Hu Zhonghao - The Journey of Becoming a DBA for Domestic Distributed Databases

Comments
7 min read
Optimizing Kafka Performance: Best Practices for High Throughput and Low Latency

Optimizing Kafka Performance: Best Practices for High Throughput and Low Latency

Comments
7 min read
Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark

Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark

Comments
20 min read
Fixing Type Hints for Callable Objects with Custom Signatures in Dagster

Fixing Type Hints for Callable Objects with Custom Signatures in Dagster

4
Comments
3 min read
Apache Spark সহজভাবে জানি

Apache Spark সহজভাবে জানি

1
Comments
1 min read
Building a Test Data Platform After Watching Teams Secretly Use Production for Years

Building a Test Data Platform After Watching Teams Secretly Use Production for Years

1
Comments
3 min read
Chinese DBA's Story: Sui Haifeng - Grasp the two most important five-year periods of your career

Chinese DBA's Story: Sui Haifeng - Grasp the two most important five-year periods of your career

Comments
5 min read
Snowflake 自律化サービスがもたらすデータエンジニアの新時代2

Snowflake 自律化サービスがもたらすデータエンジニアの新時代2

Comments
1 min read
Realtime Data Streaming Platform: Building a Unified Monitoring Stack

Realtime Data Streaming Platform: Building a Unified Monitoring Stack

5
Comments
8 min read
The State of Apache Iceberg, Polaris, and Arrow: October–November 2025

The State of Apache Iceberg, Polaris, and Arrow: October–November 2025

2
Comments
7 min read
Real-Time Data Streaming Platform: From 140K to 1 Million Messages/Sec - A Flink Performance Tuning Journey

Real-Time Data Streaming Platform: From 140K to 1 Million Messages/Sec - A Flink Performance Tuning Journey

1
Comments
10 min read
Real-Time Streaming Platform with Pulsar, Flink & ClickHouse

Real-Time Streaming Platform with Pulsar, Flink & ClickHouse

5
Comments
6 min read
🎓 Building a Smart LMS Assistant: RAG System with Pinecone for Multi-Source Learning Data

🎓 Building a Smart LMS Assistant: RAG System with Pinecone for Multi-Source Learning Data

Comments
3 min read
Big Data Processing (Hadoop, Spark)

Big Data Processing (Hadoop, Spark)

2
Comments
5 min read
Building a clean Energy Data Pipeline for Africa( from raw CSVs to MongoDB)

Building a clean Energy Data Pipeline for Africa( from raw CSVs to MongoDB)

Comments
1 min read
loading...