DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Why Parquet Is Everywhere - And What Makes It Actually Fast?

Why Parquet Is Everywhere - And What Makes It Actually Fast?

2
Comments
3 min read
Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Comments
8 min read
Code for a Better Planet: Hacking UN SDGs 7-12 with Big Data

Code for a Better Planet: Hacking UN SDGs 7-12 with Big Data

4
Comments
7 min read
Drips to Data Streams: Hacking Water Scarcity with IoT & Big Data

Drips to Data Streams: Hacking Water Scarcity with IoT & Big Data

Comments
6 min read
đŸ”„ Day 5: Introduction to DataFrames - The Most Importantce of Spark API

đŸ”„ Day 5: Introduction to DataFrames - The Most Importantce of Spark API

Comments
2 min read
Beyond Tagging: A Blueprint for Real-Time Cost Attribution in Data Platforms

Beyond Tagging: A Blueprint for Real-Time Cost Attribution in Data Platforms

Comments
9 min read
From Raw to Refined: Data Pipeline Architecture at Scale

From Raw to Refined: Data Pipeline Architecture at Scale

Comments
12 min read
Fueling Climate Action with Code: A Dev's Guide to First, Second, and Third-Party Data

Fueling Climate Action with Code: A Dev's Guide to First, Second, and Third-Party Data

Comments
7 min read
Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Comments
2 min read
Day 12: UDF vs Pandas UDF

Day 12: UDF vs Pandas UDF

Comments
2 min read
10x Query Performance Improvement: The Design and Implementation of the New Unique Key

10x Query Performance Improvement: The Design and Implementation of the New Unique Key

Comments
30 min read
Blockchain Analytics: Exploring Ethereum Data with BigQuery, RAG, and AI

Blockchain Analytics: Exploring Ethereum Data with BigQuery, RAG, and AI

1
Comments 1
1 min read
Overview of Real-Time Data Synchronization from PostgreSQL to VeloDB

Overview of Real-Time Data Synchronization from PostgreSQL to VeloDB

Comments
6 min read
Spark & Scala Cache Lessons from ETL Project

Spark & Scala Cache Lessons from ETL Project

2
Comments 1
3 min read
How to build real-time user-facing analytics with Kafka + Flink + Doris

How to build real-time user-facing analytics with Kafka + Flink + Doris

4
Comments
9 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.