Skip to content
Navigation menu
Search
Powered by
Search
Algolia
Search
Log in
Create account
DEV Community
Close
#
bigdata
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Real Time Data Infra Stack
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Dec 5 '22
Real Time Data Infra Stack
#
eventdriven
#
architecture
#
tutorial
#
bigdata
4
 reactions
Comments
Add Comment
6 min read
Example of applying CDC to JSON files with PySpark
romerito
romerito
romerito
Follow
Nov 30 '22
Example of applying CDC to JSON files with PySpark
#
cdc
#
spark
#
bigdata
#
deltalake
5
 reactions
Comments
1
 comment
7 min read
To study Apache Kafka Architecture in details, and how to install, deploy configure Apache kafka.
Ashwin Telmore
Ashwin Telmore
Ashwin Telmore
Follow
Nov 17 '22
To study Apache Kafka Architecture in details, and how to install, deploy configure Apache kafka.
#
bigdata
#
apache
#
kafka
#
manual
4
 reactions
Comments
Add Comment
3 min read
How to create Stored Procedure in MySQL
The Dream Coding
The Dream Coding
The Dream Coding
Follow
Nov 13 '22
How to create Stored Procedure in MySQL
#
mysql
#
sql
#
bigdata
#
database
2
 reactions
Comments
Add Comment
1 min read
How to use delimiter in MySQL
The Dream Coding
The Dream Coding
The Dream Coding
Follow
Nov 12 '22
How to use delimiter in MySQL
#
mysql
#
sql
#
database
#
bigdata
2
 reactions
Comments
Add Comment
1 min read
Apache Spark with java
J S SUNIL
J S SUNIL
J S SUNIL
Follow
Oct 29 '22
Apache Spark with java
#
apachespark
#
java
#
bigdata
#
spark
5
 reactions
Comments
Add Comment
5 min read
Playing PyFlink in a Nutshell
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Oct 24 '22
Playing PyFlink in a Nutshell
#
bigdata
#
eventdriven
#
python
#
tutorial
8
 reactions
Comments
Add Comment
5 min read
Podcast with Josh Long on Apache Pulsar and Spring
Timothy Spann. 🇺🇦
Timothy Spann. 🇺🇦
Timothy Spann. 🇺🇦
Follow
Sep 16 '22
Podcast with Josh Long on Apache Pulsar and Spring
#
apachepulsar
#
spring
#
java
#
bigdata
3
 reactions
Comments
Add Comment
1 min read
Playing PyFlink from Scratch
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Oct 17 '22
Playing PyFlink from Scratch
#
bigdata
#
tutorial
#
eventdriven
#
programming
2
 reactions
Comments
Add Comment
4 min read
Optimizing massive MongoDB inserts, load 50 million records faster by 33%!
Dmtro Harazdovskiy
Dmtro Harazdovskiy
Dmtro Harazdovskiy
Follow
Oct 16 '22
Optimizing massive MongoDB inserts, load 50 million records faster by 33%!
#
mongodb
#
bigdata
#
node
#
performance
15
 reactions
Comments
1
 comment
12 min read
Docker Alternatives That Can Boost Your Productivity
James Wilson
James Wilson
James Wilson
Follow
Sep 22 '22
Docker Alternatives That Can Boost Your Productivity
#
cloud
#
docker
#
devops
#
bigdata
1
 reaction
Comments
Add Comment
4 min read
Building Apache Pinot and Presto
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Oct 10 '22
Building Apache Pinot and Presto
#
bigdata
#
eventdriven
#
tutorial
#
programming
2
 reactions
Comments
Add Comment
4 min read
O que é dark data?
Rita Carolina
Rita Carolina
Rita Carolina
Follow
for
Feministech
Oct 6 '22
O que é dark data?
#
bigdata
#
braziliandevs
#
darkdata
10
 reactions
Comments
Add Comment
1 min read
Apache-Spark introduction for SQL developers
Cesar Mostacero
Cesar Mostacero
Cesar Mostacero
Follow
Sep 29 '22
Apache-Spark introduction for SQL developers
#
apachespark
#
dataengineering
#
beginners
#
bigdata
2
 reactions
Comments
Add Comment
7 min read
Learning Big Data - Step by Step
Areeba Farooq
Areeba Farooq
Areeba Farooq
Follow
Sep 27 '22
Learning Big Data - Step by Step
#
bigdata
#
aws
#
hive
#
programming
2
 reactions
Comments
Add Comment
1 min read
SeaTunnel Connector Access Plan
Apache SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Follow
Sep 20 '22
SeaTunnel Connector Access Plan
#
connectordevelopment
#
bigdata
#
datascience
#
programming
4
 reactions
Comments
Add Comment
12 min read
Entrepreneurs must learn from Lord Ganesha!!!
Arpit Shrivastava
Arpit Shrivastava
Arpit Shrivastava
Follow
Sep 1 '22
Entrepreneurs must learn from Lord Ganesha!!!
#
bigdata
#
webdev
#
beginners
#
startup
6
 reactions
Comments
Add Comment
2 min read
What is Big Data? Characteristics, types, and technologies
Hunter Johnson
Hunter Johnson
Hunter Johnson
Follow
for
Educative
Sep 7 '22
What is Big Data? Characteristics, types, and technologies
#
datascience
#
database
#
bigdata
#
tutorial
1
 reaction
Comments
Add Comment
11 min read
Why we don’t use Spark
Karel Vanden Bussche
Karel Vanden Bussche
Karel Vanden Bussche
Follow
for
Lighthouse
Sep 7 '22
Why we don’t use Spark
#
python
#
spark
#
googlecloud
#
bigdata
7
 reactions
Comments
Add Comment
7 min read
Top Skills You Need in Testing Big Data projects
Renee Betina Esperas
Renee Betina Esperas
Renee Betina Esperas
Follow
Aug 31 '22
Top Skills You Need in Testing Big Data projects
#
testing
#
bigdata
Comments
Add Comment
3 min read
Design Pattern of Streaming Enrichment
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Aug 29 '22
Design Pattern of Streaming Enrichment
#
eventdriven
#
bigdata
#
architecture
#
programming
2
 reactions
Comments
Add Comment
6 min read
Data Lake vs Data Warehouse
Muhammad Rameez
Muhammad Rameez
Muhammad Rameez
Follow
Aug 28 '22
Data Lake vs Data Warehouse
#
datascience
#
lake
#
difference
#
bigdata
8
 reactions
Comments
Add Comment
3 min read
Spark tip: Disable Coalescing Post Shuffle Partitions for compute intensive tasks
Artem Plotnikov
Artem Plotnikov
Artem Plotnikov
Follow
Aug 26 '22
Spark tip: Disable Coalescing Post Shuffle Partitions for compute intensive tasks
#
spark
#
performance
#
bigdata
#
machinelearning
2
 reactions
Comments
3
 comments
3 min read
Stream Processing Introduction
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Aug 22 '22
Stream Processing Introduction
#
eventdriven
#
bigdata
#
tutorial
#
architecture
2
 reactions
Comments
1
 comment
6 min read
How to run Amazon EMR Serverless with --packages flag
Neylson Crepalde
Neylson Crepalde
Neylson Crepalde
Follow
for
AWS Community Builders
Aug 18 '22
How to run Amazon EMR Serverless with --packages flag
#
aws
#
bigdata
#
spark
#
emrserverless
8
 reactions
Comments
2
 comments
6 min read
The Relational DBs (RDB)
Augusto Valdivia
Augusto Valdivia
Augusto Valdivia
Follow
for
AWS Community Builders
Aug 14 '22
The Relational DBs (RDB)
#
database
#
aws
#
terraform
#
bigdata
12
 reactions
Comments
2
 comments
4 min read
The story behind Apache SeaTunnel’s evolving from a data integration component to an enterprise-level service
Apache SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Follow
Aug 10 '22
The story behind Apache SeaTunnel’s evolving from a data integration component to an enterprise-level service
#
bigdata
#
service
5
 reactions
Comments
Add Comment
12 min read
Big Data Vs Small Data
Muhammad Rameez
Muhammad Rameez
Muhammad Rameez
Follow
Aug 5 '22
Big Data Vs Small Data
#
bigdata
#
smalldata
#
hadoop
#
datamining
7
 reactions
Comments
1
 comment
2 min read
Learning Workflow Schedulers (Oozie)
Ruikai Li
Ruikai Li
Ruikai Li
Follow
Jul 29 '22
Learning Workflow Schedulers (Oozie)
#
bigdata
#
datascience
#
dataengineering
2
 reactions
Comments
Add Comment
5 min read
There will be 175 Zettabytes of data in the world by 2025. Where will we store it?
Augusto Valdivia
Augusto Valdivia
Augusto Valdivia
Follow
for
AWS Community Builders
Jul 18 '22
There will be 175 Zettabytes of data in the world by 2025. Where will we store it?
#
awsdatabases
#
terraform
#
bigdata
#
aws
18
 reactions
Comments
2
 comments
1 min read
How discord manage 300M socket connection
Abdulrahman S.
Abdulrahman S.
Abdulrahman S.
Follow
Jul 15 '22
How discord manage 300M socket connection
#
discord
#
algorithms
#
programming
#
bigdata
13
 reactions
Comments
Add Comment
2 min read
Here is why you need a message broker
Memphis.dev team
Memphis.dev team
Memphis.dev team
Follow
for
Memphis.dev
Jul 7 '22
Here is why you need a message broker
#
beginners
#
architecture
#
opensource
#
bigdata
57
 reactions
Comments
4
 comments
7 min read
How to filter columns in HBase Shell
DataPotion
DataPotion
DataPotion
Follow
Jul 8 '22
How to filter columns in HBase Shell
#
database
#
nosql
#
bigdata
5
 reactions
Comments
Add Comment
3 min read
Visual task orchestration & Drag & Drop, Scaleph Data integration practice based on SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Follow
Jul 8 '22
Visual task orchestration & Drag & Drop, Scaleph Data integration practice based on SeaTunnel
#
datascience
#
bigdata
10
 reactions
Comments
Add Comment
12 min read
The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Jul 8 '22
The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection
#
opensource
#
bigdata
#
database
#
datascience
9
 reactions
Comments
Add Comment
5 min read
Creating a Subtitle Search Engine using the Stanford Parts of Speech Tagger
Paul Preibisch
Paul Preibisch
Paul Preibisch
Follow
Jun 2 '22
Creating a Subtitle Search Engine using the Stanford Parts of Speech Tagger
#
bigdata
#
elasticsearch
#
programming
3
 reactions
Comments
Add Comment
4 min read
Data Mesh: Scaling Delivery of Data as Product
Gabriel Luz
Gabriel Luz
Gabriel Luz
Follow
Jun 30 '22
Data Mesh: Scaling Delivery of Data as Product
#
datamesh
#
bigdata
#
datascience
4
 reactions
Comments
1
 comment
9 min read
Introduction to Reinforcement Learning
Bittsanalytics
Bittsanalytics
Bittsanalytics
Follow
Jun 29 '22
Introduction to Reinforcement Learning
#
machinelearning
#
datascience
#
ai
#
bigdata
5
 reactions
Comments
Add Comment
7 min read
Data engineers must-see: The future trend of big data cloud services
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Jun 26 '22
Data engineers must-see: The future trend of big data cloud services
#
database
#
dataengineering
#
bigdata
#
opensource
8
 reactions
Comments
1
 comment
8 min read
New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!
Apache SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Follow
Jun 21 '22
New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!
#
bigdata
#
opensource
#
apache
5
 reactions
Comments
Add Comment
4 min read
Best Practices for Successful Data Quality
BPB Online
BPB Online
BPB Online
Follow
Jun 19 '22
Best Practices for Successful Data Quality
#
datascience
#
beginners
#
bigdata
5
 reactions
Comments
Add Comment
3 min read
What's new in Apache Spark 3.3.0
DataPotion
DataPotion
DataPotion
Follow
Jun 19 '22
What's new in Apache Spark 3.3.0
#
news
#
bigdata
#
scala
#
python
8
 reactions
Comments
1
 comment
4 min read
A New One-stop AI development and production platform, AlphaIDE
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Jun 15 '22
A New One-stop AI development and production platform, AlphaIDE
#
ai
#
machinelearning
#
bigdata
#
opensource
10
 reactions
Comments
Add Comment
4 min read
Usage Guide:Quickly deploy an intelligent data platform with the One-stop AI development and production platform, AlphaIDE
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Jun 15 '22
Usage Guide:Quickly deploy an intelligent data platform with the One-stop AI development and production platform, AlphaIDE
#
machinelearning
#
ai
#
bigdata
#
productivity
8
 reactions
Comments
Add Comment
3 min read
Data Pipelines with Apache Airflow - Book Review
Albert Ulysses
Albert Ulysses
Albert Ulysses
Follow
Jun 13 '22
Data Pipelines with Apache Airflow - Book Review
#
python
#
dataengineering
#
books
#
bigdata
8
 reactions
Comments
Add Comment
2 min read
Why Big Data Analytics Is In The Big Picture in Banking Market?
Henny Jones
Henny Jones
Henny Jones
Follow
Jun 10 '22
Why Big Data Analytics Is In The Big Picture in Banking Market?
#
bigdata
#
banking
#
programming
9
 reactions
Comments
2
 comments
4 min read
Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics
DMetaSoul
DMetaSoul
DMetaSoul
Follow
May 29 '22
Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics
#
opensource
#
database
#
dataengineering
#
bigdata
7
 reactions
Comments
Add Comment
3 min read
What is the Lakehouse, the latest Direction of Big Data Architecture?
DMetaSoul
DMetaSoul
DMetaSoul
Follow
May 14 '22
What is the Lakehouse, the latest Direction of Big Data Architecture?
#
opensource
#
dataengineering
#
bigdata
#
database
9
 reactions
Comments
Add Comment
10 min read
BigQuery transactions over multiple queries, with sessions
matthieucham
matthieucham
matthieucham
Follow
for
Stack Labs
May 9 '22
BigQuery transactions over multiple queries, with sessions
#
googlecloud
#
python
#
database
#
bigdata
18
 reactions
Comments
2
 comments
3 min read
Dynamic way doing ETL through Pyspark
mustafasajid
mustafasajid
mustafasajid
Follow
May 9 '22
Dynamic way doing ETL through Pyspark
#
pyspark
#
etl
#
bigdata
#
python
16
 reactions
Comments
2
 comments
4 min read
Auto discovering and auto actions in data monitoring or How to drink coffee instead of routine tasks
ricklatham
ricklatham
ricklatham
Follow
May 9 '22
Auto discovering and auto actions in data monitoring or How to drink coffee instead of routine tasks
#
monitoring
#
machinelearning
#
bigdata
#
devops
13
 reactions
Comments
Add Comment
9 min read
May 9th in Streaming
Timothy Spann. 🇺🇦
Timothy Spann. 🇺🇦
Timothy Spann. 🇺🇦
Follow
May 9 '22
May 9th in Streaming
#
apache
#
apachepulsar
#
realtimestreaming
#
bigdata
6
 reactions
Comments
Add Comment
1 min read
Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul
DMetaSoul
DMetaSoul
DMetaSoul
Follow
May 6 '22
Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul
#
opensource
#
datascience
#
bigdata
#
database
11
 reactions
Comments
Add Comment
7 min read
Leveraging Change Data Capture for Fraud Detection using Arcion Cloud
John Vester
John Vester
John Vester
Follow
May 3 '22
Leveraging Change Data Capture for Fraud Detection using Arcion Cloud
#
tutorial
#
bigdata
#
cloud
#
datascience
7
 reactions
Comments
Add Comment
9 min read
Apache Spark, Hive, and Spring Boot — Testing Guide
Semyon Kirekov
Semyon Kirekov
Semyon Kirekov
Follow
Apr 22 '22
Apache Spark, Hive, and Spring Boot — Testing Guide
#
bigdata
#
testing
#
java
#
docker
17
 reactions
Comments
4
 comments
18 min read
Design concept of a best opensource project about big data and data lakehouse
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Apr 16 '22
Design concept of a best opensource project about big data and data lakehouse
#
opensource
#
dataengineering
#
bigdata
#
datascience
9
 reactions
Comments
Add Comment
9 min read
How to prepare for the GCP Professional Data Engineer certification
Gabriel Luz
Gabriel Luz
Gabriel Luz
Follow
May 2 '22
How to prepare for the GCP Professional Data Engineer certification
#
googlecloud
#
dataengineering
#
gcp
#
bigdata
33
 reactions
Comments
4
 comments
8 min read
Details of 4 best opensource projects about big data you should try outï¼ˆâ… ï¼‰
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Apr 7 '22
Details of 4 best opensource projects about big data you should try outï¼ˆâ… ï¼‰
#
opensource
#
dataengineering
#
bigdata
#
spark
8
 reactions
Comments
Add Comment
5 min read
HIVE installation on WSL
Anuj Vaghani
Anuj Vaghani
Anuj Vaghani
Follow
Apr 1 '22
HIVE installation on WSL
#
hadoop
#
hive
#
bigdata
11
 reactions
Comments
Add Comment
3 min read
How to create a DIY Inexpensive Cloud Data Lake
Eric See
Eric See
Eric See
Follow
Mar 26 '22
How to create a DIY Inexpensive Cloud Data Lake
#
python
#
datascience
#
design
#
bigdata
8
 reactions
Comments
Add Comment
3 min read
loading...
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account