DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Set up an open-source AI analyst for PostgreSQL in 2 minutes

Set up an open-source AI analyst for PostgreSQL in 2 minutes

1
Comments
5 min read
The Semantic Gap in Data Quality: Why Your Monitoring is Lying to You

The Semantic Gap in Data Quality: Why Your Monitoring is Lying to You

1
Comments 1
7 min read
Building an Enterprise Patching Dashboard with AWS - A Complete Guide

Building an Enterprise Patching Dashboard with AWS - A Complete Guide

5
Comments
5 min read
Building an Automated Data Pipeline: Injuries vs Performance in the Premier League

Building an Automated Data Pipeline: Injuries vs Performance in the Premier League

Comments
6 min read
2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

Comments
9 min read
Evolution of Processing: SPL One-Click Acceleration for Log-to-Metric Conversion

Evolution of Processing: SPL One-Click Acceleration for Log-to-Metric Conversion

Comments
6 min read
My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

Comments
6 min read
The Data Engineer’s Codex: From First Principles to the Modern Lakehouse

The Data Engineer’s Codex: From First Principles to the Modern Lakehouse

6
Comments
10 min read
Building a Real-Time Data Lake on AWS: S3, Glue, and Athena in Production

Building a Real-Time Data Lake on AWS: S3, Glue, and Athena in Production

1
Comments
5 min read
Embeddings and Vector Similarity: How Machines Understand Meaning

Embeddings and Vector Similarity: How Machines Understand Meaning

1
Comments
19 min read
Containerization for Data Engineering: A Practical Guide with Docker and Docker Compose

Containerization for Data Engineering: A Practical Guide with Docker and Docker Compose

Comments
2 min read
Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)

Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)

Comments
3 min read
AWS Glue for ETL

AWS Glue for ETL

Comments
5 min read
What to use for data preparation in report, query or analysis business?

What to use for data preparation in report, query or analysis business?

5
Comments
10 min read
Optimizing Data Processing on AWS with Data Compaction

Optimizing Data Processing on AWS with Data Compaction

2
Comments
7 min read
Real-Time Earthquake CDC Pipeline

Real-Time Earthquake CDC Pipeline

Comments
5 min read
The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane

The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane

4
Comments
5 min read
Understanding Kafka Architecture, Schema Registry, ksqlDB, PostgreSQL, Couchbase, and Microservices

Understanding Kafka Architecture, Schema Registry, ksqlDB, PostgreSQL, Couchbase, and Microservices

2
Comments
3 min read
The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

Comments
4 min read
Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

1
Comments
4 min read
An Exploration of the Commercial Iceberg Catalog Ecosystem

An Exploration of the Commercial Iceberg Catalog Ecosystem

Comments
14 min read
🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters

🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters

5
Comments
2 min read
Getting Started Building a Data Platform

Getting Started Building a Data Platform

Comments
3 min read
Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Comments
10 min read
Real-time Data Analytics at Scale: Integrating Apache Flink and Apache Doris with Flink Doris Connector and Flink CDC

Real-time Data Analytics at Scale: Integrating Apache Flink and Apache Doris with Flink Doris Connector and Flink CDC

Comments
10 min read
loading...