DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Optimizing massive MongoDB inserts, load 50 million records faster by 33%!

Optimizing massive MongoDB inserts, load 50 million records faster by 33%!

15
Comments 1
12 min read
Docker Alternatives That Can Boost Your Productivity

Docker Alternatives That Can Boost Your Productivity

1
Comments
4 min read
Building Apache Pinot and Presto

Building Apache Pinot and Presto

2
Comments
4 min read
O que é dark data?

O que é dark data?

10
Comments
1 min read
Apache-Spark introduction for SQL developers

Apache-Spark introduction for SQL developers

2
Comments
7 min read
Learning Big Data - Step by Step

Learning Big Data - Step by Step

2
Comments
1 min read
SeaTunnel Connector Access Plan

SeaTunnel Connector Access Plan

4
Comments
12 min read
Entrepreneurs must learn from Lord Ganesha!!!

Entrepreneurs must learn from Lord Ganesha!!!

6
Comments
2 min read
What is Big Data? Characteristics, types, and technologies

What is Big Data? Characteristics, types, and technologies

1
Comments
11 min read
Why we don’t use Spark

Why we don’t use Spark

7
Comments
7 min read
Top Skills You Need in Testing Big Data projects

Top Skills You Need in Testing Big Data projects

Comments
3 min read
Design Pattern of Streaming Enrichment

Design Pattern of Streaming Enrichment

2
Comments
6 min read
Data Lake vs Data Warehouse

Data Lake vs Data Warehouse

8
Comments
3 min read
Spark tip: Disable Coalescing Post Shuffle Partitions for compute intensive tasks

Spark tip: Disable Coalescing Post Shuffle Partitions for compute intensive tasks

2
Comments 3
3 min read
Stream Processing Introduction

Stream Processing Introduction

2
Comments 1
6 min read
How to run Amazon EMR Serverless with --packages flag

How to run Amazon EMR Serverless with --packages flag

8
Comments 2
6 min read
The Relational DBs (RDB)

The Relational DBs (RDB)

12
Comments 2
4 min read
The story behind Apache SeaTunnel’s evolving from a data integration component to an enterprise-level service

The story behind Apache SeaTunnel’s evolving from a data integration component to an enterprise-level service

5
Comments
12 min read
Big Data Vs Small Data

Big Data Vs Small Data

7
Comments 1
2 min read
Learning Workflow Schedulers (Oozie)

Learning Workflow Schedulers (Oozie)

2
Comments
5 min read
There will be 175 Zettabytes of data in the world by 2025. Where will we store it?

There will be 175 Zettabytes of data in the world by 2025. Where will we store it?

18
Comments 2
1 min read
How discord manage 300M socket connection

How discord manage 300M socket connection

13
Comments
2 min read
Here is why you need a message broker

Here is why you need a message broker

57
Comments 4
7 min read
How to filter columns in HBase Shell

How to filter columns in HBase Shell

5
Comments
3 min read
Visual task orchestration & Drag & Drop, Scaleph Data integration practice based on SeaTunnel

Visual task orchestration & Drag & Drop, Scaleph Data integration practice based on SeaTunnel

10
Comments
12 min read
The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection

The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection

9
Comments
5 min read
Creating a Subtitle Search Engine using the Stanford Parts of Speech Tagger

Creating a Subtitle Search Engine using the Stanford Parts of Speech Tagger

3
Comments
4 min read
Data Mesh: Scaling Delivery of Data as Product

Data Mesh: Scaling Delivery of Data as Product

4
Comments 1
9 min read
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

5
Comments
7 min read
Data engineers must-see: The future trend of big data cloud services

Data engineers must-see: The future trend of big data cloud services

8
Comments 1
8 min read
New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!

New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!

5
Comments
4 min read
Best Practices for Successful Data Quality

Best Practices for Successful Data Quality

5
Comments
3 min read
What's new in Apache Spark 3.3.0

What's new in Apache Spark 3.3.0

8
Comments 1
4 min read
A New One-stop AI development and production platform, AlphaIDE

A New One-stop AI development and production platform, AlphaIDE

10
Comments
4 min read
Usage Guide:Quickly deploy an intelligent data platform with the One-stop AI development and production platform, AlphaIDE

Usage Guide:Quickly deploy an intelligent data platform with the One-stop AI development and production platform, AlphaIDE

8
Comments
3 min read
Data Pipelines with Apache Airflow - Book Review

Data Pipelines with Apache Airflow - Book Review

8
Comments
2 min read
Why Big Data Analytics Is In The Big Picture in Banking Market?

Why Big Data Analytics Is In The Big Picture in Banking Market?

9
Comments 2
4 min read
Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics

Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics

7
Comments
3 min read
What is the Lakehouse, the latest Direction of Big Data Architecture?

What is the Lakehouse, the latest Direction of Big Data Architecture?

9
Comments
10 min read
BigQuery transactions over multiple queries, with sessions

BigQuery transactions over multiple queries, with sessions

16
Comments 2
3 min read
Dynamic way doing ETL through Pyspark

Dynamic way doing ETL through Pyspark

16
Comments 2
4 min read
Auto discovering and auto actions in data monitoring or How to drink coffee instead of routine tasks

Auto discovering and auto actions in data monitoring or How to drink coffee instead of routine tasks

13
Comments
9 min read
May 9th in Streaming

May 9th in Streaming

6
Comments
1 min read
Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul

Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul

11
Comments
7 min read
Leveraging Change Data Capture for Fraud Detection using Arcion Cloud

Leveraging Change Data Capture for Fraud Detection using Arcion Cloud

7
Comments
9 min read
Apache Spark, Hive, and Spring Boot — Testing Guide

Apache Spark, Hive, and Spring Boot — Testing Guide

16
Comments 4
18 min read
Design concept of a best opensource project about big data and data lakehouse

Design concept of a best opensource project about big data and data lakehouse

9
Comments
9 min read
How to prepare for the GCP Professional Data Engineer certification

How to prepare for the GCP Professional Data Engineer certification

31
Comments 4
8 min read
Details of 4 best opensource projects about big data you should try out(Ⅰ)

Details of 4 best opensource projects about big data you should try out(Ⅰ)

8
Comments
5 min read
HIVE installation on WSL

HIVE installation on WSL

10
Comments
3 min read
How to create a DIY Inexpensive Cloud Data Lake

How to create a DIY Inexpensive Cloud Data Lake

8
Comments
3 min read
Create a Hadoop playground with Docker Desktop on Windows in minutes

Create a Hadoop playground with Docker Desktop on Windows in minutes

10
Comments
4 min read
Quick use of CDC: A new demo from lakesoul makes it easier to set up the environment

Quick use of CDC: A new demo from lakesoul makes it easier to set up the environment

8
Comments
5 min read
Big Data in Cloud Computing - AWS

Big Data in Cloud Computing - AWS

14
Comments
2 min read
4 best opensource projects about big data you should try out

4 best opensource projects about big data you should try out

16
Comments 3
3 min read
A new unified streaming and batch table storage solution similar to iceberg/hudi/delta lake but with several new functions

A new unified streaming and batch table storage solution similar to iceberg/hudi/delta lake but with several new functions

8
Comments
2 min read
[OPINIÃO] Construindo uma Carreira como Data Engineer

[OPINIÃO] Construindo uma Carreira como Data Engineer

2
Comments
2 min read
Characteristics of Big Data

Characteristics of Big Data

4
Comments
8 min read
Apache Spark Unit Testing Strategies

Apache Spark Unit Testing Strategies

9
Comments
3 min read
NodeJS - Get data from Redash v6 API

NodeJS - Get data from Redash v6 API

6
Comments
2 min read
loading...