DEV Community

# bigdata

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
How to Choose Between Serverless and Dedicated Compute in Databricks

How to Choose Between Serverless and Dedicated Compute in Databricks

2
Comments
3 min read
build-my-own-datalake: Improve metadata with caching

build-my-own-datalake: Improve metadata with caching

3
Comments
19 min read
Part 1 | A Scheduler Is More Than Just a “Timer”

Part 1 | A Scheduler Is More Than Just a “Timer”

Comments
4 min read
Orchestrating Our Way Out of Chaos: How I Compared Airflow, Prefect, and Dagster (and Picked What to Ship)

Orchestrating Our Way Out of Chaos: How I Compared Airflow, Prefect, and Dagster (and Picked What to Ship)

2
Comments
6 min read
Part 3 | How Does Scheduling Actually “Start Running”?

Part 3 | How Does Scheduling Actually “Start Running”?

3
Comments
5 min read
How to Implement Data Modelling in Power BI

How to Implement Data Modelling in Power BI

2
Comments
2 min read
The future of Data Engineering in Databricks - From Pipelines to Intent

The future of Data Engineering in Databricks - From Pipelines to Intent

2
Comments
2 min read
Designing a Cross-Cloud Data Plane with Apache Iceberg

Designing a Cross-Cloud Data Plane with Apache Iceberg

2
Comments
5 min read
How to Size a Spark Cluster. And How Not To.

How to Size a Spark Cluster. And How Not To.

2
Comments
6 min read
Arisyn: Rebuilding Data Relationship Discovery as Infrastructure

Arisyn: Rebuilding Data Relationship Discovery as Infrastructure

1
Comments 1
3 min read
How I Built a Big Data Survival Guide - Because My Semester Was Not Surviving Me

How I Built a Big Data Survival Guide - Because My Semester Was Not Surviving Me

2
Comments 1
3 min read
From TB-Scale MongoDB to Doris: 5 Critical Challenges and Fixes with Apache SeaTunnel

From TB-Scale MongoDB to Doris: 5 Critical Challenges and Fixes with Apache SeaTunnel

3
Comments
9 min read
(I) An Overview of Data Warehouses and Data Lakes

(I) An Overview of Data Warehouses and Data Lakes

3
Comments
4 min read
Fuzzy-match millions of rows in Databricks (2026)

Fuzzy-match millions of rows in Databricks (2026)

9
Comments
5 min read
Overview of Real-Time Data Synchronization from PostgreSQL to VeloDB

Overview of Real-Time Data Synchronization from PostgreSQL to VeloDB

Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.