DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Comments
2 min read
A Stranger In a New Town: CsvPath metadata fields

A Stranger In a New Town: CsvPath metadata fields

Comments
6 min read
Interesting links - November 2025

Interesting links - November 2025

Comments
19 min read
💀 RIP Copy-Paste: Google NotebookLM Just Killed Manual Data Entry

💀 RIP Copy-Paste: Google NotebookLM Just Killed Manual Data Entry

Comments
3 min read
dupl

dupl

Comments
1 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)

Comments
5 min read
How to Sync Data from an Oracle Table to Elasticsearch using Kafka Connect

How to Sync Data from an Oracle Table to Elasticsearch using Kafka Connect

1
Comments 1
5 min read
Agent Cost Optimization: A Data Engineer's Guide

Agent Cost Optimization: A Data Engineer's Guide

Comments
13 min read
From Raw to Refined: Data Pipeline Architecture at Scale

From Raw to Refined: Data Pipeline Architecture at Scale

Comments
12 min read
INTRODUCTION TO DBT(Data Build Tool)

INTRODUCTION TO DBT(Data Build Tool)

1
Comments
2 min read
What's New in dbt 1.10

What's New in dbt 1.10

3
Comments
12 min read
The Day Our Pipeline Went From 10 Minutes to 6 Seconds (Part 2 of 3)

The Day Our Pipeline Went From 10 Minutes to 6 Seconds (Part 2 of 3)

2
Comments
7 min read
How Strategic Image Cropping Transforms Data Ingestion Pipelines

How Strategic Image Cropping Transforms Data Ingestion Pipelines

1
Comments
4 min read
Taming the Data Beast: Build Pipelines That Bend, Not Break by Arvind Sundararajan

Taming the Data Beast: Build Pipelines That Bend, Not Break by Arvind Sundararajan

Comments
2 min read
Snowflake + Postgres: A Small Feature That Signals a Big Shift

Snowflake + Postgres: A Small Feature That Signals a Big Shift

Comments
6 min read
Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Comments
8 min read
Prompt Engineering Patterns: From Zero-Shot to Chain-of-Thought Reasoning

Prompt Engineering Patterns: From Zero-Shot to Chain-of-Thought Reasoning

1
Comments
14 min read
Why We Need Schema Registry in Kafka

Why We Need Schema Registry in Kafka

2
Comments
17 min read
Introduction to the Confluent REST Proxy

Introduction to the Confluent REST Proxy

2
Comments
4 min read
While We're Measuring Developer Productivity, Won't Someone Think of the Data Engineers?

While We're Measuring Developer Productivity, Won't Someone Think of the Data Engineers?

Comments
9 min read
Analyzing and Optimizing a Parquet ClickHouse Ingestion Pipeline

Analyzing and Optimizing a Parquet ClickHouse Ingestion Pipeline

2
Comments 2
3 min read
Behind the Scenes of Data Ingestion: How Small Issues Cause Big Headaches

Behind the Scenes of Data Ingestion: How Small Issues Cause Big Headaches

2
Comments
3 min read
Modern Data Pipelines: Why Five Layers Changed Everything (Part 1 of 3)

Modern Data Pipelines: Why Five Layers Changed Everything (Part 1 of 3)

2
Comments
6 min read
Azure Synapse Analytics

Azure Synapse Analytics

Comments
5 min read
Debugging Windows Race Conditions in Dagster

Debugging Windows Race Conditions in Dagster

Comments
3 min read
loading...