DEV Community

# bigdata

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Data ingestion – definition, types and best practices

Data ingestion – definition, types and best practices

Comments
8 min read
How to Handle Databases with Billions of Records

How to Handle Databases with Billions of Records

3
Comments
1 min read
Effective Strategies for Scaling Databases: Enhancing Performance for Growing Data Needs

Effective Strategies for Scaling Databases: Enhancing Performance for Growing Data Needs

4
Comments
5 min read
Databricks - Variant Type Analysis

Databricks - Variant Type Analysis

Comments
7 min read
Working with Parquet files in Java using Carpet

Working with Parquet files in Java using Carpet

1
Comments
6 min read
Optimizing ETL Processes for Efficient Data Loading in EDWs

Optimizing ETL Processes for Efficient Data Loading in EDWs

Comments
4 min read
Patient-Centered Care and Data Integration in Population Health Management

Patient-Centered Care and Data Integration in Population Health Management

Comments
4 min read
The Basics of Big Data: What You Need to Know

The Basics of Big Data: What You Need to Know

Comments
3 min read
Why Apache Doris is the Best Open Source Alternative to Rockset

Why Apache Doris is the Best Open Source Alternative to Rockset

3
Comments
3 min read
Introduction to Apache Hadoop & MapReduce

Introduction to Apache Hadoop & MapReduce

5
Comments
3 min read
Blazingly-Fast Serialization: Apache Fury 0.5.1 released

Blazingly-Fast Serialization: Apache Fury 0.5.1 released

Comments
3 min read
Metadata for win — Apache Parquet

Metadata for win — Apache Parquet

Comments
5 min read
Comprehensive Guide to Schema Inference with MongoDB Spark Connector in PySpark

Comprehensive Guide to Schema Inference with MongoDB Spark Connector in PySpark

Comments
3 min read
Advanced Insights into Automated Data Processing Tools

Advanced Insights into Automated Data Processing Tools

1
Comments
4 min read
Real-Time Sentiment Analysis using PySpark and FastAPI

Real-Time Sentiment Analysis using PySpark and FastAPI

2
Comments
1 min read
Documenting Rate Limits and Throttling in REST APIs

Documenting Rate Limits and Throttling in REST APIs

Comments
5 min read
How to Build an API with Strong Security Measures

How to Build an API with Strong Security Measures

Comments
4 min read
GraphQL API Design Best Practices for Efficient Data Management

GraphQL API Design Best Practices for Efficient Data Management

Comments
5 min read
The current Lakehouse is like a false proposition

The current Lakehouse is like a false proposition

6
Comments 1
10 min read
Is distributed technology the panacea for big data processing?

Is distributed technology the panacea for big data processing?

7
Comments 1
10 min read
What Should Be Followed While Scraping Data From Local Citations?

What Should Be Followed While Scraping Data From Local Citations?

Comments
1 min read
Big Data: a ferramenta que precisamos.

Big Data: a ferramenta que precisamos.

Comments
2 min read
PySpark: missing value

PySpark: missing value

Comments
2 min read
Cross-cluster replication for read-write separation

Cross-cluster replication for read-write separation

2
Comments
4 min read
Stream Data at scale from millions of sources with Amazon Kinesis (Serverless)

Stream Data at scale from millions of sources with Amazon Kinesis (Serverless)

12
Comments
7 min read
Trino & Iceberg Made Easy: A Ready-to-Use Playground

Trino & Iceberg Made Easy: A Ready-to-Use Playground

15
Comments
3 min read
The Role of Data Integration in Healthcare Research and Precision Medicine

The Role of Data Integration in Healthcare Research and Precision Medicine

Comments
4 min read
Automating Data Processes for Efficiency and Accuracy

Automating Data Processes for Efficiency and Accuracy

Comments
5 min read
Auto-increment columns in Apache Doris

Auto-increment columns in Apache Doris

Comments
11 min read
What to use parquet or CSV?

What to use parquet or CSV?

17
Comments
3 min read
Accelerating ETL Processes for Timely Business Intelligence

Accelerating ETL Processes for Timely Business Intelligence

Comments
4 min read
Are There “Queries over Trillion-Row Tables in Seconds”? Is “N-Times Faster Than ORACLE” an Exaggeration?

Are There “Queries over Trillion-Row Tables in Seconds”? Is “N-Times Faster Than ORACLE” an Exaggeration?

Comments
4 min read
A glimpse into the future of data processing infrastructure.

A glimpse into the future of data processing infrastructure.

Comments
9 min read
Safeguarding Data Quality By Addressing Data Privacy and Security Concerns

Safeguarding Data Quality By Addressing Data Privacy and Security Concerns

1
Comments 1
4 min read
Best Practices for Designing an Efficient ETL Pipeline

Best Practices for Designing an Efficient ETL Pipeline

4
Comments
4 min read
The Role of Big Data Analytics in BFSI: Leveraging Data for Competitive Advantage

The Role of Big Data Analytics in BFSI: Leveraging Data for Competitive Advantage

Comments
4 min read
LLMs, DevOps, and Big Data Musings

LLMs, DevOps, and Big Data Musings

Comments
3 min read
Understanding and Mitigating Message Loss in Apache Kafka

Understanding and Mitigating Message Loss in Apache Kafka

11
Comments
9 min read
Snowflake 101: A Comprehensive Guide to the Data Cloud

Snowflake 101: A Comprehensive Guide to the Data Cloud

2
Comments
4 min read
Blockchain Technology and Data Governance: Enhancing Security and Trust

Blockchain Technology and Data Governance: Enhancing Security and Trust

1
Comments 1
4 min read
SQL Pro Tips : industrial AWS Athena SQL using WITH

SQL Pro Tips : industrial AWS Athena SQL using WITH

3
Comments
4 min read
SQL Pro Tips : industrial GCP BigQuery SQL using WITH

SQL Pro Tips : industrial GCP BigQuery SQL using WITH

3
Comments
5 min read
Tools Every Data Scientist Should Know

Tools Every Data Scientist Should Know

Comments
2 min read
AI enthusiasm #3 - AlphaFold2, a game-changer🧬

AI enthusiasm #3 - AlphaFold2, a game-changer🧬

Comments
2 min read
Redis License Change: A Look at the Competitive Game between OSS and Cloud Computing Giants

Redis License Change: A Look at the Competitive Game between OSS and Cloud Computing Giants

Comments
5 min read
MWAA Plugins and Dependency Survival Guide

MWAA Plugins and Dependency Survival Guide

5
Comments
3 min read
GenAI Model Optimization: Guide to Fine-Tuning and Quantization

GenAI Model Optimization: Guide to Fine-Tuning and Quantization

2
Comments
4 min read
What is Surrogate Key in SQL?

What is Surrogate Key in SQL?

Comments
2 min read
SQL Pro Tips : industrial Oracle SQL using WITH

SQL Pro Tips : industrial Oracle SQL using WITH

3
Comments
4 min read
How come there are tens of thousands of tables in a database

How come there are tens of thousands of tables in a database

2
Comments 1
5 min read
Data Streaming Architecture

Data Streaming Architecture

4
Comments
4 min read
Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis

Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis

Comments
12 min read
Amazon EMR deployment on EKS

Amazon EMR deployment on EKS

2
Comments
7 min read
Understanding the Battle of Database Storage: Row-Oriented vs. Columnar

Understanding the Battle of Database Storage: Row-Oriented vs. Columnar

1
Comments 1
6 min read
The Role of AI in Enhancing Data Governance Strategies

The Role of AI in Enhancing Data Governance Strategies

2
Comments
5 min read
Why Python and SQL are Must-Have Skills for Marketing Analysts in the Age of Big Data

Why Python and SQL are Must-Have Skills for Marketing Analysts in the Age of Big Data

10
Comments
6 min read
Big data with Software Systems

Big data with Software Systems

1
Comments
1 min read
BigQuery Machine Learning

BigQuery Machine Learning

2
Comments
5 min read
Understanding Elasticsearch. A Guide for Beginners

Understanding Elasticsearch. A Guide for Beginners

1
Comments
4 min read
BigQuery best practices

BigQuery best practices

4
Comments
2 min read
loading...