DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

Comments
2 min read
Marmot: Data catalog without the complex infrastructure

Marmot: Data catalog without the complex infrastructure

1
Comments
3 min read
Schema, COPY, MERGE, and Immutability — A First-Principles Guide for Data Engineers

Schema, COPY, MERGE, and Immutability — A First-Principles Guide for Data Engineers

Comments
5 min read
HackerRank 'The Pads' MySQL

HackerRank 'The Pads' MySQL

Comments
3 min read
🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

Comments
2 min read
Generating Table Schema from AWS Glue Table

Generating Table Schema from AWS Glue Table

2
Comments
1 min read
Comparing Great Expectations and CsvPath Framework

Comparing Great Expectations and CsvPath Framework

Comments
8 min read
Financial Transaction Data Reconciler PayPal

Financial Transaction Data Reconciler PayPal

Comments
5 min read
Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

Comments
9 min read
Stifel Modern Data Platform

Stifel Modern Data Platform

Comments
4 min read
Building Bulletproof Data Pipelines: Orchestration, Testing, and Monitoring (Part 3 of 3)

Building Bulletproof Data Pipelines: Orchestration, Testing, and Monitoring (Part 3 of 3)

Comments
9 min read
Core Microsoft Fabric Concepts

Core Microsoft Fabric Concepts

1
Comments
3 min read
Implementing a CDC pipeline with Debezium

Implementing a CDC pipeline with Debezium

Comments
8 min read
LogInSight: A Lightweight CloudWatch Log Analytics Tool for Faster Debugging and Real-Time Insights

LogInSight: A Lightweight CloudWatch Log Analytics Tool for Faster Debugging and Real-Time Insights

2
Comments
3 min read
RAG Evaluation Metrics: Measuring What Actually Matters

RAG Evaluation Metrics: Measuring What Actually Matters

1
Comments
10 min read
Building Streaming Iceberg Tables for Real-Time Logistics Analytics

Building Streaming Iceberg Tables for Real-Time Logistics Analytics

Comments
4 min read
Building a Scalable Community Health Worker Analytics Platform: My Journey with dbt and Snowflake

Building a Scalable Community Health Worker Analytics Platform: My Journey with dbt and Snowflake

Comments
4 min read
The Great Table Format Debate: A Deep Dive into Apache Iceberg, Delta Lake, and Apache Hudi

The Great Table Format Debate: A Deep Dive into Apache Iceberg, Delta Lake, and Apache Hudi

Comments
18 min read
Amazon Kinesis vs Amazon MSK: The Complete Guide for Stream Processing on AWS

Amazon Kinesis vs Amazon MSK: The Complete Guide for Stream Processing on AWS

Comments
29 min read
Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Comments
2 min read
A Stranger In a New Town: CsvPath metadata fields

A Stranger In a New Town: CsvPath metadata fields

Comments
6 min read
Interesting links - November 2025

Interesting links - November 2025

Comments
19 min read
💀 RIP Copy-Paste: Google NotebookLM Just Killed Manual Data Entry

💀 RIP Copy-Paste: Google NotebookLM Just Killed Manual Data Entry

Comments
3 min read
dupl

dupl

Comments
1 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)

Comments
5 min read
loading...