DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Data Lake vs Data Warehouse

Data Lake vs Data Warehouse

9
Comments
2 min read
Life Beyond Kafka with Apache Pulsar

Life Beyond Kafka with Apache Pulsar

19
Comments
4 min read
Explain MapReduce Like I'm Five

Explain MapReduce Like I'm Five

8
Comments
5 min read
Toward GCP Data Engineer certification

Toward GCP Data Engineer certification

9
Comments
1 min read
Azure Blob Storage with Pyspark

Azure Blob Storage with Pyspark

12
Comments 1
2 min read
Building simple data pipelines in Azure using Cosmos DB, Databricks and Blob Storage

Building simple data pipelines in Azure using Cosmos DB, Databricks and Blob Storage

10
Comments
15 min read
How to handle BigData?

How to handle BigData?

4
Comments 4
2 min read
Big Data file formats explained

Big Data file formats explained

10
Comments
7 min read
Spark. Anatomy of Spark application

Spark. Anatomy of Spark application

15
Comments
6 min read
Categorical Variables and Cardinality

Categorical Variables and Cardinality

5
Comments
1 min read
Event Tracking and Analytics via Ruby on Rails, DynamoDB (with Streams), Kinesis Firehose and Athena and CloudWatch Dashboard! 21:24

Event Tracking and Analytics via Ruby on Rails, DynamoDB (with Streams), Kinesis Firehose and Athena and CloudWatch Dashboard!

88
Comments
13 min read
Book on Advanced Data Structures and Algorithms for Big Data Applications

Book on Advanced Data Structures and Algorithms for Big Data Applications

9
Comments
3 min read
Data Engineering — Complete Reference Guide From A-Z [2019]

Data Engineering — Complete Reference Guide From A-Z [2019]

30
Comments
16 min read
MongoDB Atlas Data Lake

MongoDB Atlas Data Lake

10
Comments
5 min read
How we built a highly scalable distributed state machine

How we built a highly scalable distributed state machine

9
Comments
16 min read
PySpark and Parquet - Analysis

PySpark and Parquet - Analysis

14
Comments
3 min read
Creating a proof of concept for Spatial Joins

Creating a proof of concept for Spatial Joins

4
Comments
4 min read
Understanding Partitioning in Azure Cosmos DB

Understanding Partitioning in Azure Cosmos DB

5
Comments 4
5 min read
Extending Business Intelligence Features of Kibana

Extending Business Intelligence Features of Kibana

22
Comments 1
4 min read
Top 5 Online Courses to Learn Big Data and Hadoop for Beginners

Top 5 Online Courses to Learn Big Data and Hadoop for Beginners

68
Comments
10 min read
Multiple databases in Big Data projects

Multiple databases in Big Data projects

7
Comments
4 min read
Basic introduction to Big data

Basic introduction to Big data

14
Comments
3 min read
5 Best Practices for Setting Up Your Data Warehouse in the Cloud

5 Best Practices for Setting Up Your Data Warehouse in the Cloud

6
Comments
6 min read
Building Hadoop native libraries on Mac in 2019

Building Hadoop native libraries on Mac in 2019

13
Comments 18
5 min read
Kafka Monitoring in Production - eBook

Kafka Monitoring in Production - eBook

10
Comments
1 min read
Data lakes are hard

Data lakes are hard

17
Comments
4 min read
Become a Pro at Pandas, Python’s data manipulation Library

Become a Pro at Pandas, Python’s data manipulation Library

10
Comments
6 min read
Kafka Getting Started - Kafka Series - Part 2

Kafka Getting Started - Kafka Series - Part 2

15
Comments
4 min read
How Apache Kafka works? Kafka Series - Part 1

How Apache Kafka works? Kafka Series - Part 1

17
Comments 4
3 min read
[Antisèche] Apache Spark : structure d'une application Spark

[Antisèche] Apache Spark : structure d'une application Spark

6
Comments
2 min read
Learn BigData from Google Cloud Platform.

Learn BigData from Google Cloud Platform.

11
Comments
2 min read
Installing, Configuring and Using the Azure Databricks CLI

Installing, Configuring and Using the Azure Databricks CLI

8
Comments
3 min read
Different ways to word count in apache spark

Different ways to word count in apache spark

10
Comments
2 min read
How to Deal with Big Data Analytics Easily?

How to Deal with Big Data Analytics Easily?

8
Comments
9 min read
What is the Future of Big Data Analytics and Hadoop?

What is the Future of Big Data Analytics and Hadoop?

8
Comments
2 min read
Google BigQuery's Python SDK: Creating Tables Programmatically

Google BigQuery's Python SDK: Creating Tables Programmatically

14
Comments
7 min read
How to Process Epic Amounts of Data in NodeJS

How to Process Epic Amounts of Data in NodeJS

108
Comments 1
6 min read
From CSVs to Tables: Infer Schema Data Types From Raw Spreadsheets

From CSVs to Tables: Infer Schema Data Types From Raw Spreadsheets

7
Comments
8 min read
Wielding the power of web transparency

Wielding the power of web transparency

15
Comments 1
9 min read
[Video] Visualizing data at scale with Google Data Studio

[Video] Visualizing data at scale with Google Data Studio

7
Comments
1 min read
Apache Hadoop - TLS and SSL Notes

Apache Hadoop - TLS and SSL Notes

9
Comments
4 min read
Big Data Analysis with Hadoop, Spark, and R Shiny

Big Data Analysis with Hadoop, Spark, and R Shiny

31
Comments 1
12 min read
Processing Streaming Twitter Data using Kafka and Spark - Part 2: Creating Kafka Twitter producer

Processing Streaming Twitter Data using Kafka and Spark - Part 2: Creating Kafka Twitter producer

21
Comments 5
7 min read
Processing Streaming Twitter Data using Kafka and Spark — Part 1: Setting Up Kafka Cluster

Processing Streaming Twitter Data using Kafka and Spark — Part 1: Setting Up Kafka Cluster

18
Comments
4 min read
Processing Streaming Twitter Data using Kafka and Spark — The Plan

Processing Streaming Twitter Data using Kafka and Spark — The Plan

11
Comments
2 min read
Streams For the Win: A Performance Comparison of Node.js Methods for Reading Large Datasets (Pt 2)

Streams For the Win: A Performance Comparison of Node.js Methods for Reading Large Datasets (Pt 2)

5
Comments
9 min read
Window Functions in Stream Analytics

Window Functions in Stream Analytics

25
Comments 5
9 min read
What makes code slow to execute

What makes code slow to execute

14
Comments
1 min read
Amazon Athena vs AWS Lambda: Comparing two solutions for Big Data Analysis

Amazon Athena vs AWS Lambda: Comparing two solutions for Big Data Analysis

22
Comments 5
8 min read
Super simple and fast delimited CSV data normalization with AWK

Super simple and fast delimited CSV data normalization with AWK

10
Comments
2 min read
Streaming Data in Databricks Delta Tables

Streaming Data in Databricks Delta Tables

14
Comments 3
3 min read
Managing and Configuring Clusters within Azure Databricks

Managing and Configuring Clusters within Azure Databricks

11
Comments
9 min read
Databases and Tables in Azure Databricks

Databases and Tables in Azure Databricks

13
Comments
5 min read
What Is MapReduce?

What Is MapReduce?

47
Comments 3
7 min read
生醫大數據:從集權治理到公眾參與

生醫大數據:從集權治理到公眾參與

18
Comments
2 min read
Expertise and context-based answer rating system for Q&A websites.

Expertise and context-based answer rating system for Q&A websites.

7
Comments
1 min read
Local hadoop on laptop for practice

Local hadoop on laptop for practice

20
Comments
4 min read
Apache Livy - Apache Spark, HDFS, and Kerberos

Apache Livy - Apache Spark, HDFS, and Kerberos

14
Comments
2 min read
Using Hadoop in Azure HDInsight to process Big Data

Using Hadoop in Azure HDInsight to process Big Data

13
Comments
6 min read
Apache HBase - REST API - Atomic Operations

Apache HBase - REST API - Atomic Operations

9
Comments
6 min read
loading...