DEV Community

# dataengineering

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
🌍 Automating Africa’s Energy Data Collection Using Python, Playwright(+Why Playwright ?), and MongoDB (2000–2024)

🌍 Automating Africa’s Energy Data Collection Using Python, Playwright(+Why Playwright ?), and MongoDB (2000–2024)

5
Comments
5 min read
From 8 Minutes to 40 Seconds: Solving Data Pipeline Deployment Bottlenecks with Git Sparse Checkout

From 8 Minutes to 40 Seconds: Solving Data Pipeline Deployment Bottlenecks with Git Sparse Checkout

Comments
5 min read
Create a Microsoft Fabric Lakehouse

Create a Microsoft Fabric Lakehouse

5
Comments
6 min read
Core Concepts of Kafka

Core Concepts of Kafka

Comments
8 min read
From Kafka to Clean Tables: Building a Confluent Snowflake Pipeline with Streams & Tasks.

From Kafka to Clean Tables: Building a Confluent Snowflake Pipeline with Streams & Tasks.

Comments
9 min read
Apache Kafka: ZooKeeper vs. KRaft — A Complete Comparison of Approaches

Apache Kafka: ZooKeeper vs. KRaft — A Complete Comparison of Approaches

Comments
6 min read
Introduction to Apache Airflow

Introduction to Apache Airflow

1
Comments
4 min read
Building a Production-Ready Data Lake: PostgreSQL to S3 with AWS DMS, Glue, and Athena using CDK

Building a Production-Ready Data Lake: PostgreSQL to S3 with AWS DMS, Glue, and Athena using CDK

2
Comments
8 min read
From Postgres to Iceberg

From Postgres to Iceberg

1
Comments
11 min read
Real-Time Cryptocurrency Data Pipeline

Real-Time Cryptocurrency Data Pipeline

Comments
12 min read
Synthetic Data for RAG: Safe Generation, Deduplication, and Drift-Aware Curation in 2025

Synthetic Data for RAG: Safe Generation, Deduplication, and Drift-Aware Curation in 2025

2
Comments
10 min read
Personal Picks: Data Product News (October 1, 2025)

Personal Picks: Data Product News (October 1, 2025)

Comments
7 min read
1 billion JSON records, 1-second query response: Apache Doris vs. ClickHouse, Elasticsearch, and PostgreSQL

1 billion JSON records, 1-second query response: Apache Doris vs. ClickHouse, Elasticsearch, and PostgreSQL

6
Comments
7 min read
SQL: is there a better way to code this?

SQL: is there a better way to code this?

Comments 1
1 min read
Building Real-Time Data Pipelines from PostgreSQL Using Flink CDC

Building Real-Time Data Pipelines from PostgreSQL Using Flink CDC

Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.