DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

Comments
2 min read
🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

Comments
2 min read
🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

Comments
2 min read
Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Comments
2 min read
2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow

2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow

Comments
6 min read
From Raw to Refined: Data Pipeline Architecture at Scale

From Raw to Refined: Data Pipeline Architecture at Scale

Comments
12 min read
Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Comments
8 min read
Building Real-Time Lakehouse with S3 Tables, AWS Glue, and Apache Doris

Building Real-Time Lakehouse with S3 Tables, AWS Glue, and Apache Doris

Comments
3 min read
Starting My Dev.to Journey: Learning, Building & Sharing

Starting My Dev.to Journey: Learning, Building & Sharing

Comments
1 min read
10x Query Performance Improvement: The Design and Implementation of the New Unique Key

10x Query Performance Improvement: The Design and Implementation of the New Unique Key

Comments
30 min read
How Does Apache SeaTunnel Convert CDC Streams to Append-Only Mode?

How Does Apache SeaTunnel Convert CDC Streams to Append-Only Mode?

Comments
4 min read
6 Essential Data Formats in Cloud Analytics: A Complete Guide with Examples

6 Essential Data Formats in Cloud Analytics: A Complete Guide with Examples

Comments
5 min read
Final Project Report 2| Apache SeaTunnel Adds Metalake Support

Final Project Report 2| Apache SeaTunnel Adds Metalake Support

Comments
4 min read
Overview of Real-Time Data Synchronization from PostgreSQL to VeloDB

Overview of Real-Time Data Synchronization from PostgreSQL to VeloDB

1
Comments
6 min read
Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.