Skip to content
Navigation menu
Search
Search
Log in
Create account
DEV Community
Close
#
bigdata
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
PySpark: missing value
ChelseaLiu0822
ChelseaLiu0822
ChelseaLiu0822
Follow
Apr 18
PySpark: missing value
#
pyspark
#
python
#
dataengineering
#
bigdata
Comments
Add Comment
2 min read
AI enthusiasm #3 - AlphaFold2, a game-changer🧬
Astra Bertelli
Astra Bertelli
Astra Bertelli
Follow
Apr 12
AI enthusiasm #3 - AlphaFold2, a game-changer🧬
#
opensource
#
learning
#
bigdata
#
ai
Comments
Add Comment
2 min read
GenAI Model Optimization: Guide to Fine-Tuning and Quantization
Farrruh
Farrruh
Farrruh
Follow
Apr 3
GenAI Model Optimization: Guide to Fine-Tuning and Quantization
#
ai
#
aiops
#
cloud
#
bigdata
Comments
Add Comment
4 min read
Are There “Queries over Trillion-Row Tables in Seconds”? Is “N-Times Faster Than ORACLE” an Exaggeration?
jbx1279
jbx1279
jbx1279
Follow
Apr 13
Are There “Queries over Trillion-Row Tables in Seconds”? Is “N-Times Faster Than ORACLE” an Exaggeration?
#
sql
#
performance
#
bigdata
#
database
Comments
Add Comment
4 min read
Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis
Apache Doris
Apache Doris
Apache Doris
Follow
Mar 27
Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis
#
database
#
dataengineering
#
bigdata
#
logging
Comments
Add Comment
12 min read
The Role of Big Data Analytics in BFSI: Leveraging Data for Competitive Advantage
Ajay
Ajay
Ajay
Follow
Mar 27
The Role of Big Data Analytics in BFSI: Leveraging Data for Competitive Advantage
#
bigdata
#
bfsi
#
data
#
analytics
Comments
Add Comment
4 min read
Amazon EMR deployment on EKS
vivekpophale
vivekpophale
vivekpophale
Follow
Mar 23
Amazon EMR deployment on EKS
#
emr
#
eks
#
bigdata
#
aws
Comments
Add Comment
7 min read
SQL Pro Tips : industrial GCP BigQuery SQL using WITH
hexfloor
hexfloor
hexfloor
Follow
Mar 28
SQL Pro Tips : industrial GCP BigQuery SQL using WITH
#
gcp
#
sql
#
database
#
bigdata
3
reactions
Comments
Add Comment
5 min read
SQL Pro Tips : industrial AWS Athena SQL using WITH
hexfloor
hexfloor
hexfloor
Follow
Mar 28
SQL Pro Tips : industrial AWS Athena SQL using WITH
#
aws
#
database
#
bigdata
#
sql
3
reactions
Comments
Add Comment
4 min read
Tools Every Data Scientist Should Know
Shaheryar
Shaheryar
Shaheryar
Follow
Mar 14
Tools Every Data Scientist Should Know
#
datascience
#
python
#
machinelearning
#
bigdata
Comments
Add Comment
2 min read
The Role of AI in Enhancing Data Governance Strategies
Ovais
Ovais
Ovais
Follow
Mar 12
The Role of AI in Enhancing Data Governance Strategies
#
datascience
#
bigdata
#
ai
#
webdev
Comments
Add Comment
5 min read
What is Surrogate Key in SQL?
Sandeep
Sandeep
Sandeep
Follow
Apr 2
What is Surrogate Key in SQL?
#
sql
#
database
#
bigdata
Comments
Add Comment
2 min read
Redis License Change: A Look at the Competitive Game between OSS and Cloud Computing Giants
AutoMQ
AutoMQ
AutoMQ
Follow
Apr 10
Redis License Change: A Look at the Competitive Game between OSS and Cloud Computing Giants
#
bsl
#
opensource
#
automq
#
bigdata
Comments
Add Comment
5 min read
MWAA Plugins and Dependency Survival Guide
elliott cordo
elliott cordo
elliott cordo
Follow
for
AWS Heroes
Apr 5
MWAA Plugins and Dependency Survival Guide
#
airflow
#
bigdata
#
dataengineering
#
aws
2
reactions
Comments
Add Comment
3 min read
SQL Pro Tips : industrial Oracle SQL using WITH
hexfloor
hexfloor
hexfloor
Follow
Mar 28
SQL Pro Tips : industrial Oracle SQL using WITH
#
sql
#
oracle
#
bigdata
#
database
3
reactions
Comments
Add Comment
4 min read
How come there are tens of thousands of tables in a database
jbx1279
jbx1279
jbx1279
Follow
Mar 23
How come there are tens of thousands of tables in a database
#
database
#
bigdata
#
sql
2
reactions
Comments
1
comment
5 min read
Data Streaming Architecture
Jose Luis Sastoque Rey
Jose Luis Sastoque Rey
Jose Luis Sastoque Rey
Follow
for
AWS Community Builders
Mar 27
Data Streaming Architecture
#
aws
#
bigdata
#
architecture
4
reactions
Comments
Add Comment
4 min read
Understanding the Battle of Database Storage: Row-Oriented vs. Columnar
Sunny Srinidhi
Sunny Srinidhi
Sunny Srinidhi
Follow
Mar 8
Understanding the Battle of Database Storage: Row-Oriented vs. Columnar
#
database
#
bigdata
#
storage
#
datascience
1
reaction
Comments
1
comment
6 min read
Leveraging API Management for Building Scalable Applications
Ovais
Ovais
Ovais
Follow
Feb 7
Leveraging API Management for Building Scalable Applications
#
api
#
bigdata
#
datascience
#
webdev
Comments
Add Comment
4 min read
Data Science Landscape
Eddie Adams
Eddie Adams
Eddie Adams
Follow
Jan 22
Data Science Landscape
#
datascience
#
data
#
bigdata
#
machinelearning
Comments
Add Comment
1 min read
Why Python and SQL are Must-Have Skills for Marketing Analysts in the Age of Big Data
Scofield Idehen
Scofield Idehen
Scofield Idehen
Follow
Feb 23
Why Python and SQL are Must-Have Skills for Marketing Analysts in the Age of Big Data
#
bigdata
#
python
#
datascience
#
sql
10
reactions
Comments
Add Comment
6 min read
BigQuery Machine Learning
Cris Crawford
Cris Crawford
Cris Crawford
Follow
Feb 10
BigQuery Machine Learning
#
bigdata
#
machinelearning
#
googlecloud
#
sql
2
reactions
Comments
Add Comment
5 min read
Big data with Software Systems
Ravikanth Kowdeed
Ravikanth Kowdeed
Ravikanth Kowdeed
Follow
Feb 14
Big data with Software Systems
#
softwareengineering
#
bigdata
1
reaction
Comments
Add Comment
1 min read
Understanding Elasticsearch. A Guide for Beginners
nivelepsilon
nivelepsilon
nivelepsilon
Follow
Feb 10
Understanding Elasticsearch. A Guide for Beginners
#
elasticsearch
#
devops
#
bigdata
#
beginners
1
reaction
Comments
Add Comment
4 min read
BigQuery best practices
Cris Crawford
Cris Crawford
Cris Crawford
Follow
Feb 10
BigQuery best practices
#
dataengineering
#
bigdata
1
reaction
Comments
Add Comment
2 min read
Serverless Apache Zeppelin on AWS
Gianluigi Mucciolo
Gianluigi Mucciolo
Gianluigi Mucciolo
Follow
for
AWS Community Builders
Feb 4
Serverless Apache Zeppelin on AWS
#
serverless
#
tutorial
#
aws
#
bigdata
Comments
Add Comment
6 min read
How to use BigQuery Query Caching with Dynamic Wildcard Tables
Marcelo Costa
Marcelo Costa
Marcelo Costa
Follow
Dec 29 '23
How to use BigQuery Query Caching with Dynamic Wildcard Tables
#
bigdata
#
googlecloud
#
bigquery
#
python
Comments
Add Comment
2 min read
Supercharge Your S3 Data with AWS S3 Transfer Acceleration
Nils Whitmont
Nils Whitmont
Nils Whitmont
Follow
Jan 24
Supercharge Your S3 Data with AWS S3 Transfer Acceleration
#
s3
#
aws
#
performance
#
bigdata
1
reaction
Comments
Add Comment
3 min read
Building Robust Data Pipelines: A Comprehensive Guide
Hiren Dhaduk
Hiren Dhaduk
Hiren Dhaduk
Follow
Dec 21 '23
Building Robust Data Pipelines: A Comprehensive Guide
#
datapipeline
#
data
#
pipelines
#
bigdata
Comments
Add Comment
3 min read
Choosing the right AWS Database
Gaurav Raje
Gaurav Raje
Gaurav Raje
Follow
for
AWS Community Builders
Jan 17
Choosing the right AWS Database
#
bigdata
#
beginners
#
architecture
#
database
5
reactions
Comments
Add Comment
4 min read
How to Scrape Flipkart Products
Crawlbase
Crawlbase
Crawlbase
Follow
Jan 15
How to Scrape Flipkart Products
#
webscraping
#
bigdata
#
flipcart
#
javascript
Comments
Add Comment
30 min read
AWS Lake Formation Summarization
عبدالله عياد | Abdullah Ayad
عبدالله عياد | Abdullah Ayad
عبدالله عياد | Abdullah Ayad
Follow
for
AWS Community Builders
Dec 24 '23
AWS Lake Formation Summarization
#
aws
#
beginners
#
cloud
#
bigdata
3
reactions
Comments
Add Comment
3 min read
A major culprit in the slow running and collapse of a database
jbx1279
jbx1279
jbx1279
Follow
Jan 13
A major culprit in the slow running and collapse of a database
#
bigdata
#
database
#
datawarehouse
#
performance
5
reactions
Comments
Add Comment
10 min read
Business Intelligence Data Analyst vs. BI Developer
ai-jobs.net
ai-jobs.net
ai-jobs.net
Follow
Nov 22 '23
Business Intelligence Data Analyst vs. BI Developer
#
bigdata
#
analyst
#
career
#
programming
2
reactions
Comments
Add Comment
3 min read
Here comes big data technology that rivals clusters on a single machine
jbx1279
jbx1279
jbx1279
Follow
Dec 23 '23
Here comes big data technology that rivals clusters on a single machine
#
bigdata
#
database
#
performance
#
sql
6
reactions
Comments
Add Comment
6 min read
Test Driving Redshift AI-Driven Scaling
elliott cordo
elliott cordo
elliott cordo
Follow
for
AWS Heroes
Dec 21 '23
Test Driving Redshift AI-Driven Scaling
#
aws
#
bigdata
#
dataengineering
#
analytics
1
reaction
Comments
Add Comment
3 min read
How to store and calculate historical big data with lower usage frequency
jbx1279
jbx1279
jbx1279
Follow
Dec 9 '23
How to store and calculate historical big data with lower usage frequency
#
database
#
bigdata
#
programming
#
sql
6
reactions
Comments
Add Comment
4 min read
Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
ChunTing Wu
ChunTing Wu
ChunTing Wu
Follow
Dec 18 '23
Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
#
datascience
#
bigdata
#
architecture
#
tutorial
1
reaction
Comments
Add Comment
4 min read
Use Selenium with Python to Target the XPath of a Particular Object
Paige Niedringhaus
Paige Niedringhaus
Paige Niedringhaus
Follow
Dec 19 '23
Use Selenium with Python to Target the XPath of a Particular Object
#
python
#
selenium
#
bigdata
#
webdriver
Comments
Add Comment
9 min read
Simplifying ETL Pipelines with SQL: Three Tips for Data Processing
gupta
gupta
gupta
Follow
Dec 10 '23
Simplifying ETL Pipelines with SQL: Three Tips for Data Processing
#
database
#
sql
#
bigdata
#
programming
18
reactions
Comments
Add Comment
3 min read
🏆How to master 📊 Big Data pipelines with Taipy and PySpark 🐍
Marine
Marine
Marine
Follow
for
Taipy
Nov 29 '23
🏆How to master 📊 Big Data pipelines with Taipy and PySpark 🐍
#
python
#
opensource
#
bigdata
#
tutorial
218
reactions
Comments
8
comments
9 min read
Working with Parquet files in Java using Protocol Buffers
Jerónimo López
Jerónimo López
Jerónimo López
Follow
Dec 7 '23
Working with Parquet files in Java using Protocol Buffers
#
parquet
#
java
#
protocolbuffers
#
bigdata
Comments
Add Comment
7 min read
IoT and Data Analytics: Unleashing the Power of Big Data
Ajay
Ajay
Ajay
Follow
Dec 6 '23
IoT and Data Analytics: Unleashing the Power of Big Data
#
iot
#
bigdata
#
dataanalytics
Comments
1
comment
3 min read
Understanding Concurrency Through Amdahl's Law
luminousmen
luminousmen
luminousmen
Follow
Dec 4 '23
Understanding Concurrency Through Amdahl's Law
#
bigdata
#
data
1
reaction
Comments
Add Comment
3 min read
From Hadoop to Cloud: Why and How to Decouple Storage and Compute in Big Data Platforms
DASWU
DASWU
DASWU
Follow
Nov 3 '23
From Hadoop to Cloud: Why and How to Decouple Storage and Compute in Big Data Platforms
#
opensource
#
bigdata
Comments
Add Comment
13 min read
Data Engineering Terminology: Understanding Upstream and Downstream in Data Pipelines
luminousmen
luminousmen
luminousmen
Follow
Dec 2 '23
Data Engineering Terminology: Understanding Upstream and Downstream in Data Pipelines
#
bigdata
#
data
Comments
Add Comment
1 min read
Big data models 📊 vs. Computer memory 💾
Marine
Marine
Marine
Follow
for
Taipy
Nov 23 '23
Big data models 📊 vs. Computer memory 💾
#
bigdata
#
pipeline
#
dataengineering
#
dask
186
reactions
Comments
3
comments
11 min read
Working with Parquet files in Java using Avro
Jerónimo López
Jerónimo López
Jerónimo López
Follow
Nov 26 '23
Working with Parquet files in Java using Avro
#
parquet
#
java
#
avro
#
bigdata
1
reaction
Comments
Add Comment
10 min read
BigData Journey from Hadoop and MapReduce to AWS EMR
Olga Woschitz
Olga Woschitz
Olga Woschitz
Follow
Nov 21 '23
BigData Journey from Hadoop and MapReduce to AWS EMR
#
bigdata
#
emr
#
spark
#
hadoop
Comments
Add Comment
9 min read
S3 Multi-Part Upload: Part 2 Conclusion
Mitansh Gor
Mitansh Gor
Mitansh Gor
Follow
for
Distinction Dev
Nov 18 '23
S3 Multi-Part Upload: Part 2 Conclusion
#
aws
#
bigdata
#
s3
#
multipart
6
reactions
Comments
Add Comment
11 min read
Most common errors when setting up Amazon EMR
Nowsath
Nowsath
Nowsath
Follow
for
AWS Community Builders
Nov 14 '23
Most common errors when setting up Amazon EMR
#
emr
#
dynamodb
#
hive
#
bigdata
8
reactions
Comments
Add Comment
2 min read
HyperLogLog | Un algoritmo para contarlos (aproximadamente) a todos
Javi AS
Javi AS
Javi AS
Follow
Oct 4 '23
HyperLogLog | Un algoritmo para contarlos (aproximadamente) a todos
#
algorithms
#
computerscience
#
bigdata
#
spanish
2
reactions
Comments
Add Comment
6 min read
Data-Powered Accessibility: How to Build Inclusive Product for Any User Need
Natalia
Natalia
Natalia
Follow
Oct 24 '23
Data-Powered Accessibility: How to Build Inclusive Product for Any User Need
#
datapowered
#
bigdata
#
inclusiveproduct
#
a11y
48
reactions
Comments
Add Comment
7 min read
Install Hadoop on Ubuntu
Atul Vishwakarma
Atul Vishwakarma
Atul Vishwakarma
Follow
Nov 4 '23
Install Hadoop on Ubuntu
#
bigdata
#
hadoop
#
ubuntu
#
learning
1
reaction
Comments
Add Comment
6 min read
Which Scenarios Does ClickHouse Applies to?
jbx1279
jbx1279
jbx1279
Follow
Oct 28 '23
Which Scenarios Does ClickHouse Applies to?
#
bigdata
#
performance
#
database
#
sql
5
reactions
Comments
1
comment
9 min read
Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections
Shahab Ranjbary
Shahab Ranjbary
Shahab Ranjbary
Follow
Sep 25 '23
Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections
#
clickhouse
#
kafka
#
dataintegration
#
bigdata
4
reactions
Comments
Add Comment
3 min read
SPL computing performance test series: in-group accumulation
jbx1279
jbx1279
jbx1279
Follow
Oct 22 '23
SPL computing performance test series: in-group accumulation
#
performance
#
bigdata
#
database
5
reactions
Comments
Add Comment
12 min read
Log Analysis: Elasticsearch VS Apache Doris
Apache Doris
Apache Doris
Apache Doris
Follow
Oct 16 '23
Log Analysis: Elasticsearch VS Apache Doris
#
beginners
#
database
#
dataengineering
#
bigdata
Comments
Add Comment
11 min read
SPL computing performance test series: funnel analysis
jbx1279
jbx1279
jbx1279
Follow
Oct 14 '23
SPL computing performance test series: funnel analysis
#
performance
#
bigdata
#
database
5
reactions
Comments
Add Comment
16 min read
SPL computing performance test series: associate tables and wide table
jbx1279
jbx1279
jbx1279
Follow
Sep 23 '23
SPL computing performance test series: associate tables and wide table
#
database
#
performance
#
bigdata
#
sql
Comments
Add Comment
6 min read
loading...
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account