DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
How to Scrape Flipkart Products

How to Scrape Flipkart Products

Comments
30 min read
A major culprit in the slow running and collapse of a database

A major culprit in the slow running and collapse of a database

5
Comments
10 min read
AWS Lake Formation Summarization

AWS Lake Formation Summarization

3
Comments
3 min read
Here comes big data technology that rivals clusters on a single machine

Here comes big data technology that rivals clusters on a single machine

6
Comments
6 min read
Test Driving Redshift AI-Driven Scaling

Test Driving Redshift AI-Driven Scaling

1
Comments
3 min read
Building Robust Data Pipelines: A Comprehensive Guide

Building Robust Data Pipelines: A Comprehensive Guide

4
Comments
3 min read
How to store and calculate historical big data with lower usage frequency

How to store and calculate historical big data with lower usage frequency

6
Comments
4 min read
Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog

Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog

4
Comments
4 min read
Use Selenium with Python to Target the XPath of a Particular Object

Use Selenium with Python to Target the XPath of a Particular Object

Comments
9 min read
Simplifying ETL Pipelines with SQL: Three Tips for Data Processing

Simplifying ETL Pipelines with SQL: Three Tips for Data Processing

19
Comments
3 min read
🏆How to master 📊 Big Data pipelines with Taipy and PySpark 🐍

🏆How to master 📊 Big Data pipelines with Taipy and PySpark 🐍

219
Comments 8
9 min read
Working with Parquet files in Java using Protocol Buffers

Working with Parquet files in Java using Protocol Buffers

Comments
7 min read
IoT and Data Analytics: Unleashing the Power of Big Data

IoT and Data Analytics: Unleashing the Power of Big Data

Comments 1
3 min read
Understanding Concurrency Through Amdahl's Law

Understanding Concurrency Through Amdahl's Law

3
Comments
3 min read
From Hadoop to Cloud: Why and How to Decouple Storage and Compute in Big Data Platforms

From Hadoop to Cloud: Why and How to Decouple Storage and Compute in Big Data Platforms

Comments
13 min read
Data Engineering Terminology: Understanding Upstream and Downstream in Data Pipelines

Data Engineering Terminology: Understanding Upstream and Downstream in Data Pipelines

1
Comments
1 min read
Big data models 📊 vs. Computer memory 💾

Big data models 📊 vs. Computer memory 💾

187
Comments 3
11 min read
Working with Parquet files in Java using Avro

Working with Parquet files in Java using Avro

1
Comments
10 min read
Business Intelligence Data Analyst vs. BI Developer

Business Intelligence Data Analyst vs. BI Developer

3
Comments
3 min read
Cloud Data Analytics: A Journey to Actionable Insights & Data-driven Success

Cloud Data Analytics: A Journey to Actionable Insights & Data-driven Success

Comments
2 min read
BigData Journey from Hadoop and MapReduce to AWS EMR

BigData Journey from Hadoop and MapReduce to AWS EMR

Comments
9 min read
S3 Multi-Part Upload: Part 2 Conclusion

S3 Multi-Part Upload: Part 2 Conclusion

3
Comments
11 min read
Most common errors when setting up Amazon EMR

Most common errors when setting up Amazon EMR

6
Comments
2 min read
15 top AI tools for marketing, infrastructure, and LLMOps

15 top AI tools for marketing, infrastructure, and LLMOps

Comments
3 min read
Bridging Data and Marketing in the AI Arena: My Journey

Bridging Data and Marketing in the AI Arena: My Journey

Comments
2 min read
HyperLogLog | Un algoritmo para contarlos (aproximadamente) a todos

HyperLogLog | Un algoritmo para contarlos (aproximadamente) a todos

2
Comments
6 min read
Install Hadoop on Ubuntu

Install Hadoop on Ubuntu

3
Comments
6 min read
Which Scenarios Does ClickHouse Applies to?

Which Scenarios Does ClickHouse Applies to?

5
Comments 1
9 min read
Data-Powered Accessibility: How to Build Inclusive Product for Any User Need

Data-Powered Accessibility: How to Build Inclusive Product for Any User Need

48
Comments
7 min read
SPL computing performance test series: in-group accumulation

SPL computing performance test series: in-group accumulation

5
Comments
12 min read
Log Analysis: Elasticsearch VS Apache Doris

Log Analysis: Elasticsearch VS Apache Doris

2
Comments
11 min read
SPL computing performance test series: funnel analysis

SPL computing performance test series: funnel analysis

5
Comments
16 min read
SPL computing performance test series: position association

SPL computing performance test series: position association

1
Comments
12 min read
SPL computing performance test series: multi-index aggregating

SPL computing performance test series: multi-index aggregating

1
Comments
6 min read
Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

7
Comments
3 min read
SPL computing performance test series: associate tables and wide table

SPL computing performance test series: associate tables and wide table

Comments
6 min read
Leveraging AI in Education: Exploring Big Data and Related Applications

Leveraging AI in Education: Exploring Big Data and Related Applications

Comments
11 min read
GlusterFS vs. JuiceFS

GlusterFS vs. JuiceFS

Comments
7 min read
50%+ Cut in Both Storage & Compute Costs: Designing NetEase Games' Cloud Big Data Platform

50%+ Cut in Both Storage & Compute Costs: Designing NetEase Games' Cloud Big Data Platform

Comments
9 min read
What is '_spark_metadata' Directory in Spark Structured Streaming ?

What is '_spark_metadata' Directory in Spark Structured Streaming ?

2
Comments
3 min read
SQL is consuming the lives of data scientists

SQL is consuming the lives of data scientists

6
Comments 3
20 min read
⛏ Get Mining into Data with These Top 5 Resources

⛏ Get Mining into Data with These Top 5 Resources

5
Comments 2
6 min read
Data warehouse with “no house” performs better than the one with “the house”

Data warehouse with “no house” performs better than the one with “the house”

1
Comments
11 min read
Is Your Latest Data Really the Latest? Check the Data Update Mechanism of Your Database

Is Your Latest Data Really the Latest? Check the Data Update Mechanism of Your Database

2
Comments 1
6 min read
Introduction to Big-data

Introduction to Big-data

2
Comments 2
3 min read
The performance problems of data warehouse and solutions

The performance problems of data warehouse and solutions

Comments
14 min read
Snowflake: Revolutionizing data warehousing

Snowflake: Revolutionizing data warehousing

3
Comments 1
6 min read
Next Big Data System

Next Big Data System

Comments
1 min read
Listen to That Poor BI Engineer: We Need Fast Joins

Listen to That Poor BI Engineer: We Need Fast Joins

Comments
5 min read
Data warehouse running on file system

Data warehouse running on file system

Comments
9 min read
Apache Doris 2.0 Beta Now Available: Faster, Stabler, and More Versatile

Apache Doris 2.0 Beta Now Available: Faster, Stabler, and More Versatile

Comments
15 min read
3 Data Observability Tools

3 Data Observability Tools

Comments
3 min read
Spark AI - Bringing Chat GPT to Data Engineering

Spark AI - Bringing Chat GPT to Data Engineering

11
Comments 1
5 min read
Why Are There So Many Snapshot Tables in BI Systems?

Why Are There So Many Snapshot Tables in BI Systems?

5
Comments
9 min read
Why does wide table prevail?

Why does wide table prevail?

5
Comments
13 min read
Open-source SPL: The Breaker of Closed Database Computing System

Open-source SPL: The Breaker of Closed Database Computing System

Comments 1
8 min read
Bulk load to Elastic Search with PySpark

Bulk load to Elastic Search with PySpark

6
Comments
2 min read
Routable computing engine implements front-end database

Routable computing engine implements front-end database

Comments
5 min read
How does the in-memory database bring memory’s advantage into play?

How does the in-memory database bring memory’s advantage into play?

Comments
12 min read
How to clone tables in BigQuery

How to clone tables in BigQuery

2
Comments
1 min read
loading...