DEV Community

# bigdata

Posts

ūüĎč Sign in for the ability to sort posts by relevant, latest, or top.
Big Data Vs Small Data

Big Data Vs Small Data

Reactions 7 Comments
2 min read
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

Reactions 5 Comments
6 min read
Data Mesh: Scaling Delivery of Data as Product

Data Mesh: Scaling Delivery of Data as Product

Reactions 3 Comments 1
9 min read
Learning Workflow Schedulers (Oozie)

Learning Workflow Schedulers (Oozie)

Reactions 1 Comments
5 min read
Visual task orchestration & Drag & Drop, Scaleph Data integration practice based on SeaTunnel

Visual task orchestration & Drag & Drop, Scaleph Data integration practice based on SeaTunnel

Reactions 9 Comments
12 min read
The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection

The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection

Reactions 9 Comments
5 min read
A New One-stop AI development and production platform, AlphaIDE

A New One-stop AI development and production platform, AlphaIDE

Reactions 10 Comments
4 min read
There will be 175 Zettabytes of data in the world by 2025. Where will we store it?

There will be 175 Zettabytes of data in the world by 2025. Where will we store it?

Reactions 12 Comments 2
1 min read
Usage GuideÔľöQuickly deploy an intelligent data platform with the One-stop AI development and production platform, AlphaIDE

Usage GuideÔľöQuickly deploy an intelligent data platform with the One-stop AI development and production platform, AlphaIDE

Reactions 8 Comments
3 min read
How discord manage 300M socket connection

How discord manage 300M socket connection

Reactions 12 Comments
2 min read
How to filter columns in HBase Shell

How to filter columns in HBase Shell

Reactions 4 Comments
3 min read
Here is why you need a message broker

Here is why you need a message broker

Reactions 53 Comments 4
6 min read
Creating a Subtitle Search Engine using the Stanford Parts of Speech Tagger

Creating a Subtitle Search Engine using the Stanford Parts of Speech Tagger

Reactions 3 Comments
4 min read
Data engineers must-see: The future trend of big data cloud services

Data engineers must-see: The future trend of big data cloud services

Reactions 8 Comments
8 min read
Apache DolphinScheduler version 3.0.0-beta-1 was released with new FlinkSQL, Zeppelin task types!

Apache DolphinScheduler version 3.0.0-beta-1 was released with new FlinkSQL, Zeppelin task types!

Reactions 7 Comments
4 min read
New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!

New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!

Reactions 5 Comments
4 min read
Best Practices for Successful Data Quality

Best Practices for Successful Data Quality

Reactions 5 Comments
3 min read
What's new in Apache Spark 3.3.0

What's new in Apache Spark 3.3.0

Reactions 8 Comments 1
4 min read
Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics

Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics

Reactions 7 Comments
3 min read
Data Pipelines with Apache Airflow - Book Review

Data Pipelines with Apache Airflow - Book Review

Reactions 6 Comments
2 min read
May 9th in Streaming

May 9th in Streaming

Reactions 6 Comments
1 min read
What is big data analytics?

What is big data analytics?

Reactions 7 Comments
7 min read
Why Big Data Analytics Is In The Big Picture in Banking Market?

Why Big Data Analytics Is In The Big Picture in Banking Market?

Reactions 8 Comments 2
4 min read
What is the Lakehouse, the latest Direction of Big Data Architecture?

What is the Lakehouse, the latest Direction of Big Data Architecture?

Reactions 9 Comments
10 min read
Leveraging Change Data Capture for Fraud Detection using Arcion Cloud

Leveraging Change Data Capture for Fraud Detection using Arcion Cloud

Reactions 10 Comments
9 min read
Dynamic way doing ETL through Pyspark

Dynamic way doing ETL through Pyspark

Reactions 13 Comments 2
4 min read
BigQuery transactions over multiple queries, with sessions

BigQuery transactions over multiple queries, with sessions

Reactions 10 Comments
3 min read
Auto discovering and auto actions in data monitoring or How to drink coffee instead of routine tasks

Auto discovering and auto actions in data monitoring or How to drink coffee instead of routine tasks

Reactions 13 Comments
9 min read
Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul

Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul

Reactions 11 Comments
7 min read
Fully Embracing K8s, Cisco Hangzhou Seeks to Support K8s Tasks Based on Apache DolphinScheduler

Fully Embracing K8s, Cisco Hangzhou Seeks to Support K8s Tasks Based on Apache DolphinScheduler

Reactions 4 Comments
5 min read
How to prepare for the GCP Professional Data Engineer certification

How to prepare for the GCP Professional Data Engineer certification

Reactions 17 Comments
8 min read
Apache Spark, Hive, and Spring Boot ‚ÄĒ Testing Guide

Apache Spark, Hive, and Spring Boot ‚ÄĒ Testing Guide

Reactions 34 Comments 2
18 min read
A Brief Comparison of Apache DolphinScheduler With Other Alternatives

A Brief Comparison of Apache DolphinScheduler With Other Alternatives

Reactions 4 Comments
10 min read
Design concept of a best opensource project about big data and data lakehouse

Design concept of a best opensource project about big data and data lakehouse

Reactions 9 Comments
9 min read
Details of 4 best opensource projects about big data you should try outÔľą‚Ö†ÔľČ

Details of 4 best opensource projects about big data you should try outÔľą‚Ö†ÔľČ

Reactions 8 Comments
5 min read
How to Build A System Popular Among Data Analysts?

How to Build A System Popular Among Data Analysts?

Reactions 7 Comments
5 min read
Create a Hadoop playground with Docker Desktop on Windows in minutes

Create a Hadoop playground with Docker Desktop on Windows in minutes

Reactions 6 Comments
4 min read
Characteristics of Big Data

Characteristics of Big Data

Reactions 4 Comments
8 min read
HIVE installation on WSL

HIVE installation on WSL

Reactions 6 Comments
3 min read
How to create a DIY Inexpensive Cloud Data Lake

How to create a DIY Inexpensive Cloud Data Lake

Reactions 8 Comments
3 min read
Quick use of CDC: A new demo from lakesoul makes it easier to set up the environment

Quick use of CDC: A new demo from lakesoul makes it easier to set up the environment

Reactions 8 Comments
5 min read
4 best opensource projects about big data you should try out

4 best opensource projects about big data you should try out

Reactions 16 Comments 3
3 min read
Big Data in Cloud Computing - AWS

Big Data in Cloud Computing - AWS

Reactions 14 Comments
2 min read
A new unified streaming and batch table storage solution similar to iceberg/hudi/delta lake but with several new functions

A new unified streaming and batch table storage solution similar to iceberg/hudi/delta lake but with several new functions

Reactions 8 Comments
2 min read
Fast Multivalue Look-ups For Huge Data Sets

Fast Multivalue Look-ups For Huge Data Sets

Reactions 5 Comments
6 min read
Apache Spark Unit Testing Strategies

Apache Spark Unit Testing Strategies

Reactions 7 Comments
3 min read
[OPINIÃO] Construindo uma Carreira como Data Engineer

[OPINIÃO] Construindo uma Carreira como Data Engineer

Reactions 2 Comments
2 min read
How to handle nested JSON with Apache Spark

How to handle nested JSON with Apache Spark

Reactions 3 Comments
3 min read
Presenting ML-based COVID-19 Risk Assessment App Pandemonium

Presenting ML-based COVID-19 Risk Assessment App Pandemonium

Reactions 3 Comments
3 min read
NodeJS - Get data from Redash v6 API

NodeJS - Get data from Redash v6 API

Reactions 6 Comments
2 min read
Building an Apache ECharts dashboard with React and Cube

Building an Apache ECharts dashboard with React and Cube

Reactions 13 Comments
11 min read
Quill- Most efficient Scala driver for Apache Cassandra and Spark

Quill- Most efficient Scala driver for Apache Cassandra and Spark

Reactions 2 Comments
4 min read
What are the best practices while using BigQuery?

What are the best practices while using BigQuery?

Reactions 11 Comments
2 min read
Building a Bubble Dashboard with Cube

Building a Bubble Dashboard with Cube

Reactions 9 Comments
14 min read
[ARTIGO] Data Warehouse, Data Lake e Data Lakehouse: Conceitos e Diferenças

[ARTIGO] Data Warehouse, Data Lake e Data Lakehouse: Conceitos e Diferenças

Reactions 4 Comments
3 min read
Dagster: The Best Free and Open-Source Alternative to Airflow With Python!

Dagster: The Best Free and Open-Source Alternative to Airflow With Python!

Reactions 3 Comments
1 min read
What is the SingleStore and why should we use it?

What is the SingleStore and why should we use it?

Reactions 9 Comments 2
3 min read
Machine Learning Lifecycle Process

Machine Learning Lifecycle Process

Reactions 41 Comments
4 min read
Introduction to Hive(A SQL layer above Hadoop)

Introduction to Hive(A SQL layer above Hadoop)

Reactions 6 Comments
9 min read
Cleaning And Normalizing Data Using AWS Glue DataBrew

Cleaning And Normalizing Data Using AWS Glue DataBrew

Reactions 13 Comments 1
9 min read
loading...