DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
RoadMap to Data-Analytics 2024!

RoadMap to Data-Analytics 2024!

3
Comments
2 min read
DBT and Software Engineering

DBT and Software Engineering

4
Comments
7 min read
Effective Techniques for Handling Imbalanced Datasets: My Proven Approach

Effective Techniques for Handling Imbalanced Datasets: My Proven Approach

Comments
3 min read
The Developer’s Guide to Real-Time Data Platforms!

The Developer’s Guide to Real-Time Data Platforms!

9
Comments
6 min read
Understanding Apache Iceberg's metadata.json file

Understanding Apache Iceberg's metadata.json file

3
Comments
7 min read
🌐 开始使用: MongoDB Operational Data Layer 是什么? (第1部分)

🌐 开始使用: MongoDB Operational Data Layer 是什么? (第1部分)

5
Comments
1 min read
🌐 Get started: What is MongoDB operational data layer? (Part 2) 🌐

🌐 Get started: What is MongoDB operational data layer? (Part 2) 🌐

5
Comments
2 min read
Mastering SQL Joins and Unions: Integrate Data for Incredible Insights

Mastering SQL Joins and Unions: Integrate Data for Incredible Insights

Comments
6 min read
Feature Engineering: The Ultimate Guide

Feature Engineering: The Ultimate Guide

1
Comments
2 min read
🦆 💏 🐘 Let PostgreSQL & duckdb "sql" together

🦆 💏 🐘 Let PostgreSQL & duckdb "sql" together

1
Comments 2
3 min read
What Apache Iceberg REST Catalog is and isn't

What Apache Iceberg REST Catalog is and isn't

8
Comments
3 min read
Transforming Data Engineering: A Business Domain Approach with Data Mesh

Transforming Data Engineering: A Business Domain Approach with Data Mesh

Comments
5 min read
Speeding Up Data on AWS: From Ingestion to Insights

Speeding Up Data on AWS: From Ingestion to Insights

4
Comments
11 min read
การนำเข้าข้อมูลจากไฟล์ CSV เข้ามาใน Posstgres : ทักษะเบื้องต้นของ Data Engineer

การนำเข้าข้อมูลจากไฟล์ CSV เข้ามาใน Posstgres : ทักษะเบื้องต้นของ Data Engineer

Comments
1 min read
The Ultimate Guide to Data Analytics: Techniques and Tools.

The Ultimate Guide to Data Analytics: Techniques and Tools.

Comments
3 min read
Building an Agnostic Data Pipeline: Pros and Cons

Building an Agnostic Data Pipeline: Pros and Cons

1
Comments
4 min read
🐚 My Pacific Dataviz Challenge 2024 submission : violence & graphdatascience

🐚 My Pacific Dataviz Challenge 2024 submission : violence & graphdatascience

3
Comments 10
2 min read
Useful Python Libraries for AI/ML

Useful Python Libraries for AI/ML

2
Comments
1 min read
Understanding RAID Levels: A Comprehensive Guide to RAID 0, 1, 5, 6, 10, and Beyond

Understanding RAID Levels: A Comprehensive Guide to RAID 0, 1, 5, 6, 10, and Beyond

6
Comments
9 min read
Understanding Your Data: The Essentials of Exploratory Data Analysis (EDA)

Understanding Your Data: The Essentials of Exploratory Data Analysis (EDA)

1
Comments
3 min read
Engenharia de Dados com Scala: masterizando o processamento de dados em tempo real com Apache Flink e Google Pub/Sub

Engenharia de Dados com Scala: masterizando o processamento de dados em tempo real com Apache Flink e Google Pub/Sub

6
Comments
16 min read
Data Lakehouse 101: The Who, What and Why of Data Lakehouses

Data Lakehouse 101: The Who, What and Why of Data Lakehouses

Comments
7 min read
Aggregation in GROUP BY vs. Window Functions Using OVER()

Aggregation in GROUP BY vs. Window Functions Using OVER()

2
Comments
3 min read
Elasticsearch: Finding Missing Documents between 2 indices

Elasticsearch: Finding Missing Documents between 2 indices

2
Comments
3 min read
Breaking Into Data Science: A Comprehensive Guide for Aspiring Data Scientists

Breaking Into Data Science: A Comprehensive Guide for Aspiring Data Scientists

Comments
5 min read
"Data Engineering 101: A Beginner's Guide"

"Data Engineering 101: A Beginner's Guide"

3
Comments
3 min read
Understanding the Polaris Iceberg Catalog and Its Architecture

Understanding the Polaris Iceberg Catalog and Its Architecture

2
Comments
8 min read
Automatically Update BigQuery View Schema Changes

Automatically Update BigQuery View Schema Changes

3
Comments
5 min read
How I contributed my first data pipeline to the open source.

How I contributed my first data pipeline to the open source.

1
Comments
3 min read
On Orchestrators: You Are All Right, But You Are All Wrong Too

On Orchestrators: You Are All Right, But You Are All Wrong Too

1
Comments
10 min read
From Messy Data to Super Mario Pipeline: My First Adventure in Data Engineering

From Messy Data to Super Mario Pipeline: My First Adventure in Data Engineering

Comments
12 min read
Data Engineer and Databricks

Data Engineer and Databricks

1
Comments
3 min read
What is the REST API Source toolkit?

What is the REST API Source toolkit?

1
Comments
7 min read
Working with Parquet files in Java using Carpet

Working with Parquet files in Java using Carpet

1
Comments
6 min read
HNG STAGE ZERO: ANALYZING RETAIL SALES DATA AT FIRST GLANCE

HNG STAGE ZERO: ANALYZING RETAIL SALES DATA AT FIRST GLANCE

Comments
3 min read
🪄 Debezium: the magic behind data capture & async replication (for free)

🪄 Debezium: the magic behind data capture & async replication (for free)

Comments 2
2 min read
Ways to load data in DW from External Data Source

Ways to load data in DW from External Data Source

1
Comments
6 min read
Apache Doris Job Scheduler for Task Automation

Apache Doris Job Scheduler for Task Automation

1
Comments
6 min read
Tracking Health with Data Engineering - Chapter 1: Meal Optimization

Tracking Health with Data Engineering - Chapter 1: Meal Optimization

Comments
6 min read
Software OR Hardware Raid: What's Better In 2024?

Software OR Hardware Raid: What's Better In 2024?

4
Comments
7 min read
Azure Synapse Analytics Security: Access Control

Azure Synapse Analytics Security: Access Control

2
Comments
7 min read
Databases Deconstructed: The Value of Data Lakehouses and Table Formats

Databases Deconstructed: The Value of Data Lakehouses and Table Formats

4
Comments
8 min read
BigQuery Schema Generation Made Easier with PyPI’s bigquery-schema-generator

BigQuery Schema Generation Made Easier with PyPI’s bigquery-schema-generator

5
Comments 2
2 min read
Embrace simple tech stacks and code generation in DevOps and data engineering

Embrace simple tech stacks and code generation in DevOps and data engineering

2
Comments
6 min read
Apache Doris for log and time series data analysis in NetEase, why not Elasticsearch and InfluxDB?

Apache Doris for log and time series data analysis in NetEase, why not Elasticsearch and InfluxDB?

1
Comments
9 min read
MapReduce Vs Tez

MapReduce Vs Tez

6
Comments
2 min read
Azure Synapse Analytics Security: Data Protection

Azure Synapse Analytics Security: Data Protection

2
Comments
6 min read
Leveraging PySpark.Pandas for Efficient Data Pipelines

Leveraging PySpark.Pandas for Efficient Data Pipelines

Comments
3 min read
Why Apache Doris is the Best Open Source Alternative to Rockset

Why Apache Doris is the Best Open Source Alternative to Rockset

3
Comments
3 min read
Apache Spark-Structured Streaming :: Cab Aggregator Use-case

Apache Spark-Structured Streaming :: Cab Aggregator Use-case

1
Comments
4 min read
Introduction to Apache Hadoop & MapReduce

Introduction to Apache Hadoop & MapReduce

5
Comments
3 min read
MySQL: Using and Enhancing `DATETIME` and `TIMESTAMP`

MySQL: Using and Enhancing `DATETIME` and `TIMESTAMP`

1
Comments
3 min read
Analytics don't want duplicated data, so get it exactly-once with Flink/Kafka

Analytics don't want duplicated data, so get it exactly-once with Flink/Kafka

Comments
3 min read
Metadata for win — Apache Parquet

Metadata for win — Apache Parquet

Comments
5 min read
Remove unwanted partition data in Azure Synapse (SQL DW)

Remove unwanted partition data in Azure Synapse (SQL DW)

1
Comments
6 min read
Replacing Saas ETL with Python dlt: A painless experience for Yummy.eu

Replacing Saas ETL with Python dlt: A painless experience for Yummy.eu

2
Comments
3 min read
Simplifying SDMX Data Integration with Python

Simplifying SDMX Data Integration with Python

2
Comments
3 min read
Clustering vs Partitioning your Apache Iceberg Tables

Clustering vs Partitioning your Apache Iceberg Tables

4
Comments
10 min read
The Data Professions

The Data Professions

1
Comments
3 min read
Database generated events: LiveSync’s database connector vs CDC

Database generated events: LiveSync’s database connector vs CDC

4
Comments
5 min read
loading...