Skip to content
Navigation menu
Search
Powered by
Search
Algolia
Search
Log in
Create account
DEV Community
Close
#
bigdata
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Optimizing massive MongoDB inserts, load 50 million records faster by 33%!
Dmtro Harazdovskiy
Dmtro Harazdovskiy
Dmtro Harazdovskiy
Follow
Oct 16 '22
Optimizing massive MongoDB inserts, load 50 million records faster by 33%!
#
mongodb
#
bigdata
#
node
#
performance
15
reactions
Comments
1
comment
12 min read
Docker Alternatives That Can Boost Your Productivity
James Wilson
James Wilson
James Wilson
Follow
Sep 22 '22
Docker Alternatives That Can Boost Your Productivity
#
cloud
#
docker
#
devops
#
bigdata
1
reaction
Comments
Add Comment
4 min read
Building Apache Pinot and Presto
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Oct 10 '22
Building Apache Pinot and Presto
#
bigdata
#
eventdriven
#
tutorial
#
programming
2
reactions
Comments
Add Comment
4 min read
O que é dark data?
Rita Carolina
Rita Carolina
Rita Carolina
Follow
for
Feministech
Oct 6 '22
O que é dark data?
#
bigdata
#
braziliandevs
#
darkdata
10
reactions
Comments
Add Comment
1 min read
Apache-Spark introduction for SQL developers
Cesar Mostacero
Cesar Mostacero
Cesar Mostacero
Follow
Sep 29 '22
Apache-Spark introduction for SQL developers
#
apachespark
#
dataengineering
#
beginners
#
bigdata
2
reactions
Comments
Add Comment
7 min read
Learning Big Data - Step by Step
Areeba Farooq
Areeba Farooq
Areeba Farooq
Follow
Sep 27 '22
Learning Big Data - Step by Step
#
bigdata
#
aws
#
hive
#
programming
2
reactions
Comments
Add Comment
1 min read
SeaTunnel Connector Access Plan
Apache SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Follow
Sep 20 '22
SeaTunnel Connector Access Plan
#
connectordevelopment
#
bigdata
#
datascience
#
programming
4
reactions
Comments
Add Comment
12 min read
Entrepreneurs must learn from Lord Ganesha!!!
Arpit Shrivastava
Arpit Shrivastava
Arpit Shrivastava
Follow
Sep 1 '22
Entrepreneurs must learn from Lord Ganesha!!!
#
bigdata
#
webdev
#
beginners
#
startup
6
reactions
Comments
Add Comment
2 min read
What is Big Data? Characteristics, types, and technologies
Hunter Johnson
Hunter Johnson
Hunter Johnson
Follow
for
Educative
Sep 7 '22
What is Big Data? Characteristics, types, and technologies
#
datascience
#
database
#
bigdata
#
tutorial
1
reaction
Comments
Add Comment
11 min read
Why we don’t use Spark
Karel Vanden Bussche
Karel Vanden Bussche
Karel Vanden Bussche
Follow
for
Lighthouse
Sep 7 '22
Why we don’t use Spark
#
python
#
spark
#
googlecloud
#
bigdata
7
reactions
Comments
Add Comment
7 min read
Top Skills You Need in Testing Big Data projects
Renee Betina Esperas
Renee Betina Esperas
Renee Betina Esperas
Follow
Aug 31 '22
Top Skills You Need in Testing Big Data projects
#
testing
#
bigdata
Comments
Add Comment
3 min read
Design Pattern of Streaming Enrichment
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Aug 29 '22
Design Pattern of Streaming Enrichment
#
eventdriven
#
bigdata
#
architecture
#
programming
2
reactions
Comments
Add Comment
6 min read
Data Lake vs Data Warehouse
Muhammad Rameez
Muhammad Rameez
Muhammad Rameez
Follow
Aug 28 '22
Data Lake vs Data Warehouse
#
datascience
#
lake
#
difference
#
bigdata
8
reactions
Comments
Add Comment
3 min read
Spark tip: Disable Coalescing Post Shuffle Partitions for compute intensive tasks
Artem Plotnikov
Artem Plotnikov
Artem Plotnikov
Follow
Aug 26 '22
Spark tip: Disable Coalescing Post Shuffle Partitions for compute intensive tasks
#
spark
#
performance
#
bigdata
#
machinelearning
2
reactions
Comments
3
comments
3 min read
Stream Processing Introduction
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Aug 22 '22
Stream Processing Introduction
#
eventdriven
#
bigdata
#
tutorial
#
architecture
2
reactions
Comments
1
comment
6 min read
How to run Amazon EMR Serverless with --packages flag
Neylson Crepalde
Neylson Crepalde
Neylson Crepalde
Follow
for
AWS Community Builders
Aug 18 '22
How to run Amazon EMR Serverless with --packages flag
#
aws
#
bigdata
#
spark
#
emrserverless
8
reactions
Comments
2
comments
6 min read
The Relational DBs (RDB)
Augusto Valdivia
Augusto Valdivia
Augusto Valdivia
Follow
for
AWS Community Builders
Aug 14 '22
The Relational DBs (RDB)
#
database
#
aws
#
terraform
#
bigdata
12
reactions
Comments
2
comments
4 min read
The story behind Apache SeaTunnel’s evolving from a data integration component to an enterprise-level service
Apache SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Follow
Aug 10 '22
The story behind Apache SeaTunnel’s evolving from a data integration component to an enterprise-level service
#
bigdata
#
service
5
reactions
Comments
Add Comment
12 min read
Big Data Vs Small Data
Muhammad Rameez
Muhammad Rameez
Muhammad Rameez
Follow
Aug 5 '22
Big Data Vs Small Data
#
bigdata
#
smalldata
#
hadoop
#
datamining
7
reactions
Comments
1
comment
2 min read
Learning Workflow Schedulers (Oozie)
Ruikai Li
Ruikai Li
Ruikai Li
Follow
Jul 29 '22
Learning Workflow Schedulers (Oozie)
#
bigdata
#
datascience
#
dataengineering
2
reactions
Comments
Add Comment
5 min read
There will be 175 Zettabytes of data in the world by 2025. Where will we store it?
Augusto Valdivia
Augusto Valdivia
Augusto Valdivia
Follow
for
AWS Community Builders
Jul 18 '22
There will be 175 Zettabytes of data in the world by 2025. Where will we store it?
#
awsdatabases
#
terraform
#
bigdata
#
aws
18
reactions
Comments
2
comments
1 min read
How discord manage 300M socket connection
Abdulrahman S.
Abdulrahman S.
Abdulrahman S.
Follow
Jul 15 '22
How discord manage 300M socket connection
#
discord
#
algorithms
#
programming
#
bigdata
13
reactions
Comments
Add Comment
2 min read
Here is why you need a message broker
Memphis.dev team
Memphis.dev team
Memphis.dev team
Follow
for
Memphis.dev
Jul 7 '22
Here is why you need a message broker
#
beginners
#
architecture
#
opensource
#
bigdata
57
reactions
Comments
4
comments
7 min read
How to filter columns in HBase Shell
DataPotion
DataPotion
DataPotion
Follow
Jul 8 '22
How to filter columns in HBase Shell
#
database
#
nosql
#
bigdata
5
reactions
Comments
Add Comment
3 min read
Visual task orchestration & Drag & Drop, Scaleph Data integration practice based on SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Follow
Jul 8 '22
Visual task orchestration & Drag & Drop, Scaleph Data integration practice based on SeaTunnel
#
datascience
#
bigdata
10
reactions
Comments
Add Comment
12 min read
The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Jul 8 '22
The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection
#
opensource
#
bigdata
#
database
#
datascience
9
reactions
Comments
Add Comment
5 min read
Creating a Subtitle Search Engine using the Stanford Parts of Speech Tagger
Paul Preibisch
Paul Preibisch
Paul Preibisch
Follow
Jun 2 '22
Creating a Subtitle Search Engine using the Stanford Parts of Speech Tagger
#
bigdata
#
elasticsearch
#
programming
3
reactions
Comments
Add Comment
4 min read
Data Mesh: Scaling Delivery of Data as Product
Gabriel Luz
Gabriel Luz
Gabriel Luz
Follow
Jun 30 '22
Data Mesh: Scaling Delivery of Data as Product
#
datamesh
#
bigdata
#
datascience
4
reactions
Comments
1
comment
9 min read
Introduction to Reinforcement Learning
Bittsanalytics
Bittsanalytics
Bittsanalytics
Follow
Jun 29 '22
Introduction to Reinforcement Learning
#
machinelearning
#
datascience
#
ai
#
bigdata
5
reactions
Comments
Add Comment
7 min read
Data engineers must-see: The future trend of big data cloud services
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Jun 26 '22
Data engineers must-see: The future trend of big data cloud services
#
database
#
dataengineering
#
bigdata
#
opensource
8
reactions
Comments
1
comment
8 min read
New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!
Apache SeaTunnel
Apache SeaTunnel
Apache SeaTunnel
Follow
Jun 21 '22
New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!
#
bigdata
#
opensource
#
apache
5
reactions
Comments
Add Comment
4 min read
Best Practices for Successful Data Quality
BPB Online
BPB Online
BPB Online
Follow
Jun 19 '22
Best Practices for Successful Data Quality
#
datascience
#
beginners
#
bigdata
5
reactions
Comments
Add Comment
3 min read
What's new in Apache Spark 3.3.0
DataPotion
DataPotion
DataPotion
Follow
Jun 19 '22
What's new in Apache Spark 3.3.0
#
news
#
bigdata
#
scala
#
python
8
reactions
Comments
1
comment
4 min read
A New One-stop AI development and production platform, AlphaIDE
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Jun 15 '22
A New One-stop AI development and production platform, AlphaIDE
#
ai
#
machinelearning
#
bigdata
#
opensource
10
reactions
Comments
Add Comment
4 min read
Usage Guide:Quickly deploy an intelligent data platform with the One-stop AI development and production platform, AlphaIDE
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Jun 15 '22
Usage Guide:Quickly deploy an intelligent data platform with the One-stop AI development and production platform, AlphaIDE
#
machinelearning
#
ai
#
bigdata
#
productivity
8
reactions
Comments
Add Comment
3 min read
Data Pipelines with Apache Airflow - Book Review
Albert Ulysses
Albert Ulysses
Albert Ulysses
Follow
Jun 13 '22
Data Pipelines with Apache Airflow - Book Review
#
python
#
dataengineering
#
books
#
bigdata
8
reactions
Comments
Add Comment
2 min read
Why Big Data Analytics Is In The Big Picture in Banking Market?
Henny Jones
Henny Jones
Henny Jones
Follow
Jun 10 '22
Why Big Data Analytics Is In The Big Picture in Banking Market?
#
bigdata
#
banking
#
programming
9
reactions
Comments
2
comments
4 min read
Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics
DMetaSoul
DMetaSoul
DMetaSoul
Follow
May 29 '22
Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics
#
opensource
#
database
#
dataengineering
#
bigdata
7
reactions
Comments
Add Comment
3 min read
What is the Lakehouse, the latest Direction of Big Data Architecture?
DMetaSoul
DMetaSoul
DMetaSoul
Follow
May 14 '22
What is the Lakehouse, the latest Direction of Big Data Architecture?
#
opensource
#
dataengineering
#
bigdata
#
database
9
reactions
Comments
Add Comment
10 min read
BigQuery transactions over multiple queries, with sessions
matthieucham
matthieucham
matthieucham
Follow
for
Stack Labs
May 9 '22
BigQuery transactions over multiple queries, with sessions
#
googlecloud
#
python
#
database
#
bigdata
16
reactions
Comments
2
comments
3 min read
Dynamic way doing ETL through Pyspark
mustafasajid
mustafasajid
mustafasajid
Follow
May 9 '22
Dynamic way doing ETL through Pyspark
#
pyspark
#
etl
#
bigdata
#
python
16
reactions
Comments
2
comments
4 min read
Auto discovering and auto actions in data monitoring or How to drink coffee instead of routine tasks
ricklatham
ricklatham
ricklatham
Follow
May 9 '22
Auto discovering and auto actions in data monitoring or How to drink coffee instead of routine tasks
#
monitoring
#
machinelearning
#
bigdata
#
devops
13
reactions
Comments
Add Comment
9 min read
May 9th in Streaming
Timothy Spann. 🇺🇦
Timothy Spann. 🇺🇦
Timothy Spann. 🇺🇦
Follow
May 9 '22
May 9th in Streaming
#
apache
#
apachepulsar
#
realtimestreaming
#
bigdata
6
reactions
Comments
Add Comment
1 min read
Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul
DMetaSoul
DMetaSoul
DMetaSoul
Follow
May 6 '22
Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul
#
opensource
#
datascience
#
bigdata
#
database
11
reactions
Comments
Add Comment
7 min read
Leveraging Change Data Capture for Fraud Detection using Arcion Cloud
John Vester
John Vester
John Vester
Follow
May 3 '22
Leveraging Change Data Capture for Fraud Detection using Arcion Cloud
#
tutorial
#
bigdata
#
cloud
#
datascience
7
reactions
Comments
Add Comment
9 min read
Apache Spark, Hive, and Spring Boot — Testing Guide
Semyon Kirekov
Semyon Kirekov
Semyon Kirekov
Follow
Apr 22 '22
Apache Spark, Hive, and Spring Boot — Testing Guide
#
bigdata
#
testing
#
java
#
docker
16
reactions
Comments
4
comments
18 min read
Design concept of a best opensource project about big data and data lakehouse
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Apr 16 '22
Design concept of a best opensource project about big data and data lakehouse
#
opensource
#
dataengineering
#
bigdata
#
datascience
9
reactions
Comments
Add Comment
9 min read
How to prepare for the GCP Professional Data Engineer certification
Gabriel Luz
Gabriel Luz
Gabriel Luz
Follow
May 2 '22
How to prepare for the GCP Professional Data Engineer certification
#
googlecloud
#
dataengineering
#
gcp
#
bigdata
31
reactions
Comments
4
comments
8 min read
Details of 4 best opensource projects about big data you should try out(Ⅰ)
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Apr 7 '22
Details of 4 best opensource projects about big data you should try out(Ⅰ)
#
opensource
#
dataengineering
#
bigdata
#
spark
8
reactions
Comments
Add Comment
5 min read
HIVE installation on WSL
Anuj Vaghani
Anuj Vaghani
Anuj Vaghani
Follow
Apr 1 '22
HIVE installation on WSL
#
hadoop
#
hive
#
bigdata
10
reactions
Comments
Add Comment
3 min read
How to create a DIY Inexpensive Cloud Data Lake
Eric See
Eric See
Eric See
Follow
Mar 26 '22
How to create a DIY Inexpensive Cloud Data Lake
#
python
#
datascience
#
design
#
bigdata
8
reactions
Comments
Add Comment
3 min read
Create a Hadoop playground with Docker Desktop on Windows in minutes
Zishuo Ding
Zishuo Ding
Zishuo Ding
Follow
Mar 25 '22
Create a Hadoop playground with Docker Desktop on Windows in minutes
#
docker
#
java
#
bigdata
#
codenewbie
10
reactions
Comments
Add Comment
4 min read
Quick use of CDC: A new demo from lakesoul makes it easier to set up the environment
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Mar 25 '22
Quick use of CDC: A new demo from lakesoul makes it easier to set up the environment
#
opensource
#
dataengineering
#
bigdata
#
spark
8
reactions
Comments
Add Comment
5 min read
Big Data in Cloud Computing - AWS
Warda Liaqat
Warda Liaqat
Warda Liaqat
Follow
Mar 24 '22
Big Data in Cloud Computing - AWS
#
aws
#
bigdata
#
bigdataincloud
#
bigdataandaws
14
reactions
Comments
Add Comment
2 min read
4 best opensource projects about big data you should try out
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Mar 24 '22
4 best opensource projects about big data you should try out
#
opensource
#
dataengineering
#
bigdata
#
spark
16
reactions
Comments
3
comments
3 min read
A new unified streaming and batch table storage solution similar to iceberg/hudi/delta lake but with several new functions
DMetaSoul
DMetaSoul
DMetaSoul
Follow
Mar 17 '22
A new unified streaming and batch table storage solution similar to iceberg/hudi/delta lake but with several new functions
#
dataengineering
#
opensource
#
bigdata
#
programming
8
reactions
Comments
Add Comment
2 min read
[OPINIÃO] Construindo uma Carreira como Data Engineer
Lis R. Barreto
Lis R. Barreto
Lis R. Barreto
Follow
Mar 9 '22
[OPINIÃO] Construindo uma Carreira como Data Engineer
#
bigdata
#
dataengineering
#
tips
2
reactions
Comments
Add Comment
2 min read
Characteristics of Big Data
Aarti Yadav
Aarti Yadav
Aarti Yadav
Follow
Mar 3 '22
Characteristics of Big Data
#
bigdata
4
reactions
Comments
Add Comment
8 min read
Apache Spark Unit Testing Strategies
Sukumaar Mane
Sukumaar Mane
Sukumaar Mane
Follow
Feb 28 '22
Apache Spark Unit Testing Strategies
#
scala
#
programming
#
apachespark
#
bigdata
9
reactions
Comments
Add Comment
3 min read
NodeJS - Get data from Redash v6 API
IRPAN KUSUMA W
IRPAN KUSUMA W
IRPAN KUSUMA W
Follow
Feb 26 '22
NodeJS - Get data from Redash v6 API
#
redash
#
node
#
bigdata
#
analytics
6
reactions
Comments
Add Comment
2 min read
loading...
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account