DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
From Raw Claims and Clinical Data to PCORnet CDM: End-to-End ETL on Snowflake

From Raw Claims and Clinical Data to PCORnet CDM: End-to-End ETL on Snowflake

Comments
7 min read
GSoC Student Crushes It! The Inside Story Behind the OIDC Upgrade for Apache DolphinScheduler

GSoC Student Crushes It! The Inside Story Behind the OIDC Upgrade for Apache DolphinScheduler

Comments
10 min read
🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

Comments
2 min read
🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

Comments
2 min read
🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

Comments
2 min read
Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Comments
2 min read
From Raw to Refined: Data Pipeline Architecture at Scale

From Raw to Refined: Data Pipeline Architecture at Scale

Comments
12 min read
Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Comments
8 min read
Building Real-Time Lakehouse with S3 Tables, AWS Glue, and Apache Doris

Building Real-Time Lakehouse with S3 Tables, AWS Glue, and Apache Doris

Comments
3 min read
The Big Data Showdown: Apache Spark vs. Hadoop in 2026

The Big Data Showdown: Apache Spark vs. Hadoop in 2026

5
Comments
4 min read
Starting My Dev.to Journey: Learning, Building & Sharing

Starting My Dev.to Journey: Learning, Building & Sharing

Comments
1 min read
10x Query Performance Improvement: The Design and Implementation of the New Unique Key

10x Query Performance Improvement: The Design and Implementation of the New Unique Key

Comments
30 min read
How Does Apache SeaTunnel Convert CDC Streams to Append-Only Mode?

How Does Apache SeaTunnel Convert CDC Streams to Append-Only Mode?

Comments
4 min read
6 Essential Data Formats in Cloud Analytics: A Complete Guide with Examples

6 Essential Data Formats in Cloud Analytics: A Complete Guide with Examples

Comments
5 min read
Final Project Report 2| Apache SeaTunnel Adds Metalake Support

Final Project Report 2| Apache SeaTunnel Adds Metalake Support

Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.