DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
3 Ways To Improve Your Data Science Teams Efficiency

3 Ways To Improve Your Data Science Teams Efficiency

17
Comments
7 min read
Apache Spark Java Tutorial: Simplest Guide to Get Started

Apache Spark Java Tutorial: Simplest Guide to Get Started

7
Comments
3 min read
Simulate IoT sensor, use Kafka to process data in real-time, save to Elasticsearch

Simulate IoT sensor, use Kafka to process data in real-time, save to Elasticsearch

15
Comments
4 min read
Change Data Capture from PostgreSQL to Azure Data Explorer using Kafka Connect

Change Data Capture from PostgreSQL to Azure Data Explorer using Kafka Connect

8
Comments
17 min read
S3 vs HDFS

S3 vs HDFS

3
Comments 3
1 min read
Top Hadoop Interview Questions

Top Hadoop Interview Questions

5
Comments
2 min read
Predicting machine failures with distributed computing (Spark, AWS EMR, and DL)

Predicting machine failures with distributed computing (Spark, AWS EMR, and DL)

9
Comments
10 min read
Introduction to Data Pipelines

Introduction to Data Pipelines

2
Comments 1
4 min read
Enterprise Digital Transformation Guide in the Post Covid World

Enterprise Digital Transformation Guide in the Post Covid World

2
Comments 1
4 min read
Dark Data and why it matters in Big Data

Dark Data and why it matters in Big Data

2
Comments
3 min read
Please ELI5 big data and privacy concerns, and possible black hacks

Please ELI5 big data and privacy concerns, and possible black hacks

2
Comments 3
1 min read
Demystify Apache Spark with Azure Synapse Analytics

Demystify Apache Spark with Azure Synapse Analytics

6
Comments
1 min read
MLOps

MLOps

5
Comments
2 min read
Spark Journey begins...

Spark Journey begins...

8
Comments
3 min read
Data Ingestion into Azure Data Explorer using Kafka Connect on Kubernetes

Data Ingestion into Azure Data Explorer using Kafka Connect on Kubernetes

8
Comments 1
12 min read
Data Scraping and Data Crawling, what are they for?

Data Scraping and Data Crawling, what are they for?

6
Comments 1
5 min read
Transform AWS CloudTrail data using AWS Data Wrangler

Transform AWS CloudTrail data using AWS Data Wrangler

3
Comments
8 min read
Working with nested structures in Spark

Working with nested structures in Spark

6
Comments 1
3 min read
Guide - AWS Glue and PySpark

Guide - AWS Glue and PySpark

26
Comments
14 min read
Intoduction to Apache Spark

Intoduction to Apache Spark

10
Comments
6 min read
Kafka Connect in 60 seconds 01:00

Kafka Connect in 60 seconds

4
Comments
2 min read
Data Governance 101

Data Governance 101

6
Comments
4 min read
Big Data - Testing Strategy

Big Data - Testing Strategy

2
Comments
1 min read
Supply Chain Risk Management with Data Analytics

Supply Chain Risk Management with Data Analytics

2
Comments
2 min read
Tutorial: How to Ingest data from Kafka into Azure Data Explorer

Tutorial: How to Ingest data from Kafka into Azure Data Explorer

12
Comments
10 min read
Streaming data into Kafka S01/E02 - Loading XML file

Streaming data into Kafka S01/E02 - Loading XML file

2
Comments 2
10 min read
Unit Testing Apache Spark Structured Streaming using MemoryStream

Unit Testing Apache Spark Structured Streaming using MemoryStream

7
Comments
4 min read
Exploiting Schema Inference in Apache Spark

Exploiting Schema Inference in Apache Spark

2
Comments
3 min read
Apache Kafka WebSocket data ingestion using Spring Cloud Stream

Apache Kafka WebSocket data ingestion using Spring Cloud Stream

2
Comments
6 min read
Dados & Informações

Dados & Informações

5
Comments
4 min read
How to use Azure Go SDK to manage Azure Data Explorer clusters

How to use Azure Go SDK to manage Azure Data Explorer clusters

6
Comments
9 min read
Tutorial: Getting started with Azure Data Explorer using the Go SDK

Tutorial: Getting started with Azure Data Explorer using the Go SDK

13
Comments
9 min read
How to create a low-cost Apache Spark cluster on Microsoft Azure

How to create a low-cost Apache Spark cluster on Microsoft Azure

7
Comments
4 min read
Hadoop vs Spark: Which is a better framework to select for processing Big Data?

Hadoop vs Spark: Which is a better framework to select for processing Big Data?

6
Comments
5 min read
Why are we building DevOps platform for Big Data?

Why are we building DevOps platform for Big Data?

3
Comments
3 min read
The Big Data Bravura: Introducing Apache Spark

The Big Data Bravura: Introducing Apache Spark

21
Comments 2
3 min read
5 Reasons Why You Should Consider Presenting at Flink Forward Global Virtual 2020

5 Reasons Why You Should Consider Presenting at Flink Forward Global Virtual 2020

10
Comments 1
3 min read
Introduction to Hive for dummies [Module1.3]

Introduction to Hive for dummies [Module1.3]

12
Comments
10 min read
Get Started with BigData for dummies [Module 1.1]

Get Started with BigData for dummies [Module 1.1]

8
Comments 6
10 min read
Building a Spark cluster with two PCs and a Raspberry Pi.

Building a Spark cluster with two PCs and a Raspberry Pi.

7
Comments
5 min read
On.NET Episode: Scaling .NET for Apache Spark processing jobs

On.NET Episode: Scaling .NET for Apache Spark processing jobs

7
Comments
1 min read
On.NET Episode: Data processing with .NET for Apache Spark

On.NET Episode: Data processing with .NET for Apache Spark

7
Comments
1 min read
Migrate From Hadoop To Apache Spark

Migrate From Hadoop To Apache Spark

3
Comments
1 min read
How to compare your data in/with Spark

How to compare your data in/with Spark

6
Comments
6 min read
Deep Data Dive with Kusto for Azure Data Explorer and Log Analytics

Deep Data Dive with Kusto for Azure Data Explorer and Log Analytics

17
Comments 1
7 min read
How Can Organizations Ensure the Success of Their Customer Master Data Management Initiatives?

How Can Organizations Ensure the Success of Their Customer Master Data Management Initiatives?

4
Comments
5 min read
Immersive Big Data Visualization

Immersive Big Data Visualization

6
Comments
1 min read
Install Hadoop in linux (Debian) for Big Data Analysis

Install Hadoop in linux (Debian) for Big Data Analysis

8
Comments 1
3 min read
An Upgrade: Part 2 — Diving Deeper into DynamoDB

An Upgrade: Part 2 — Diving Deeper into DynamoDB

6
Comments
6 min read
Sobre a Lei de Newcomb-Benford, e sua relação com a Matemática

Sobre a Lei de Newcomb-Benford, e sua relação com a Matemática

3
Comments
3 min read
The 5-minute guide to using bucketing in Pyspark

The 5-minute guide to using bucketing in Pyspark

9
Comments 5
4 min read
spark-submit command builder with live preview

spark-submit command builder with live preview

8
Comments
1 min read
Database normalization may be harmful to efficiency on large scale analytics projects.

Database normalization may be harmful to efficiency on large scale analytics projects.

12
Comments 2
2 min read
AWS Certified Big Data: Specialty study blueprint

AWS Certified Big Data: Specialty study blueprint

16
Comments
18 min read
My Databricks article compilation of 2019

My Databricks article compilation of 2019

6
Comments
2 min read
Converting CSV to ORC/Parquet fast without a cluster!

Converting CSV to ORC/Parquet fast without a cluster!

7
Comments
6 min read
Cloud Data Fusion, a game-changer for GCP

Cloud Data Fusion, a game-changer for GCP

12
Comments 7
4 min read
6 big data trends and forecasts worthy of attention in 2020

6 big data trends and forecasts worthy of attention in 2020

5
Comments
3 min read
Multi-Class Image Classification With Transfer Learning In PySpark

Multi-Class Image Classification With Transfer Learning In PySpark

10
Comments
9 min read
AWS: Redshift – quick start and SQL-workbench connection configuration

AWS: Redshift – quick start and SQL-workbench connection configuration

13
Comments
4 min read
loading...