DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
PySpark: missing value

PySpark: missing value

Comments
2 min read
AI enthusiasm #3 - AlphaFold2, a game-changer🧬

AI enthusiasm #3 - AlphaFold2, a game-changer🧬

Comments
2 min read
GenAI Model Optimization: Guide to Fine-Tuning and Quantization

GenAI Model Optimization: Guide to Fine-Tuning and Quantization

Comments
4 min read
Are There “Queries over Trillion-Row Tables in Seconds”? Is “N-Times Faster Than ORACLE” an Exaggeration?

Are There “Queries over Trillion-Row Tables in Seconds”? Is “N-Times Faster Than ORACLE” an Exaggeration?

Comments
4 min read
Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis

Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis

Comments
12 min read
The Role of Big Data Analytics in BFSI: Leveraging Data for Competitive Advantage

The Role of Big Data Analytics in BFSI: Leveraging Data for Competitive Advantage

Comments
4 min read
Amazon EMR deployment on EKS

Amazon EMR deployment on EKS

Comments
7 min read
SQL Pro Tips : industrial GCP BigQuery SQL using WITH

SQL Pro Tips : industrial GCP BigQuery SQL using WITH

3
Comments
5 min read
SQL Pro Tips : industrial AWS Athena SQL using WITH

SQL Pro Tips : industrial AWS Athena SQL using WITH

3
Comments
4 min read
Tools Every Data Scientist Should Know

Tools Every Data Scientist Should Know

Comments
2 min read
The Role of AI in Enhancing Data Governance Strategies

The Role of AI in Enhancing Data Governance Strategies

Comments
5 min read
What is Surrogate Key in SQL?

What is Surrogate Key in SQL?

Comments
2 min read
Redis License Change: A Look at the Competitive Game between OSS and Cloud Computing Giants

Redis License Change: A Look at the Competitive Game between OSS and Cloud Computing Giants

Comments
5 min read
MWAA Plugins and Dependency Survival Guide

MWAA Plugins and Dependency Survival Guide

2
Comments
3 min read
SQL Pro Tips : industrial Oracle SQL using WITH

SQL Pro Tips : industrial Oracle SQL using WITH

3
Comments
4 min read
How come there are tens of thousands of tables in a database

How come there are tens of thousands of tables in a database

2
Comments 1
5 min read
Data Streaming Architecture

Data Streaming Architecture

4
Comments
4 min read
Understanding the Battle of Database Storage: Row-Oriented vs. Columnar

Understanding the Battle of Database Storage: Row-Oriented vs. Columnar

1
Comments 1
6 min read
Leveraging API Management for Building Scalable Applications

Leveraging API Management for Building Scalable Applications

Comments
4 min read
Data Science Landscape

Data Science Landscape

Comments
1 min read
Why Python and SQL are Must-Have Skills for Marketing Analysts in the Age of Big Data

Why Python and SQL are Must-Have Skills for Marketing Analysts in the Age of Big Data

10
Comments
6 min read
BigQuery Machine Learning

BigQuery Machine Learning

2
Comments
5 min read
Big data with Software Systems

Big data with Software Systems

1
Comments
1 min read
Understanding Elasticsearch. A Guide for Beginners

Understanding Elasticsearch. A Guide for Beginners

1
Comments
4 min read
BigQuery best practices

BigQuery best practices

1
Comments
2 min read
Serverless Apache Zeppelin on AWS

Serverless Apache Zeppelin on AWS

Comments
6 min read
How to use BigQuery Query Caching with Dynamic Wildcard Tables

How to use BigQuery Query Caching with Dynamic Wildcard Tables

Comments
2 min read
Supercharge Your S3 Data with AWS S3 Transfer Acceleration

Supercharge Your S3 Data with AWS S3 Transfer Acceleration

1
Comments
3 min read
Building Robust Data Pipelines: A Comprehensive Guide

Building Robust Data Pipelines: A Comprehensive Guide

Comments
3 min read
Choosing the right AWS Database

Choosing the right AWS Database

5
Comments
4 min read
How to Scrape Flipkart Products

How to Scrape Flipkart Products

Comments
30 min read
AWS Lake Formation Summarization

AWS Lake Formation Summarization

3
Comments
3 min read
A major culprit in the slow running and collapse of a database

A major culprit in the slow running and collapse of a database

5
Comments
10 min read
Business Intelligence Data Analyst vs. BI Developer

Business Intelligence Data Analyst vs. BI Developer

2
Comments
3 min read
Here comes big data technology that rivals clusters on a single machine

Here comes big data technology that rivals clusters on a single machine

6
Comments
6 min read
Test Driving Redshift AI-Driven Scaling

Test Driving Redshift AI-Driven Scaling

1
Comments
3 min read
How to store and calculate historical big data with lower usage frequency

How to store and calculate historical big data with lower usage frequency

6
Comments
4 min read
Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog

Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog

1
Comments
4 min read
Use Selenium with Python to Target the XPath of a Particular Object

Use Selenium with Python to Target the XPath of a Particular Object

Comments
9 min read
Simplifying ETL Pipelines with SQL: Three Tips for Data Processing

Simplifying ETL Pipelines with SQL: Three Tips for Data Processing

18
Comments
3 min read
🏆How to master 📊 Big Data pipelines with Taipy and PySpark 🐍

🏆How to master 📊 Big Data pipelines with Taipy and PySpark 🐍

218
Comments 8
9 min read
Working with Parquet files in Java using Protocol Buffers

Working with Parquet files in Java using Protocol Buffers

Comments
7 min read
IoT and Data Analytics: Unleashing the Power of Big Data

IoT and Data Analytics: Unleashing the Power of Big Data

Comments 1
3 min read
Understanding Concurrency Through Amdahl's Law

Understanding Concurrency Through Amdahl's Law

1
Comments
3 min read
From Hadoop to Cloud: Why and How to Decouple Storage and Compute in Big Data Platforms

From Hadoop to Cloud: Why and How to Decouple Storage and Compute in Big Data Platforms

Comments
13 min read
Data Engineering Terminology: Understanding Upstream and Downstream in Data Pipelines

Data Engineering Terminology: Understanding Upstream and Downstream in Data Pipelines

Comments
1 min read
Big data models 📊 vs. Computer memory 💾

Big data models 📊 vs. Computer memory 💾

186
Comments 3
11 min read
Working with Parquet files in Java using Avro

Working with Parquet files in Java using Avro

1
Comments
10 min read
BigData Journey from Hadoop and MapReduce to AWS EMR

BigData Journey from Hadoop and MapReduce to AWS EMR

Comments
9 min read
S3 Multi-Part Upload: Part 2 Conclusion

S3 Multi-Part Upload: Part 2 Conclusion

6
Comments
11 min read
Most common errors when setting up Amazon EMR

Most common errors when setting up Amazon EMR

8
Comments
2 min read
HyperLogLog | Un algoritmo para contarlos (aproximadamente) a todos

HyperLogLog | Un algoritmo para contarlos (aproximadamente) a todos

2
Comments
6 min read
Data-Powered Accessibility: How to Build Inclusive Product for Any User Need

Data-Powered Accessibility: How to Build Inclusive Product for Any User Need

48
Comments
7 min read
Install Hadoop on Ubuntu

Install Hadoop on Ubuntu

1
Comments
6 min read
Which Scenarios Does ClickHouse Applies to?

Which Scenarios Does ClickHouse Applies to?

5
Comments 1
9 min read
Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

4
Comments
3 min read
SPL computing performance test series: in-group accumulation

SPL computing performance test series: in-group accumulation

5
Comments
12 min read
Log Analysis: Elasticsearch VS Apache Doris

Log Analysis: Elasticsearch VS Apache Doris

Comments
11 min read
SPL computing performance test series: funnel analysis

SPL computing performance test series: funnel analysis

5
Comments
16 min read
SPL computing performance test series: associate tables and wide table

SPL computing performance test series: associate tables and wide table

Comments
6 min read
loading...