DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Comments
19 min read
Mastering Workflow Automation with Apache Airflow for Data Engineering

Mastering Workflow Automation with Apache Airflow for Data Engineering

Comments
6 min read
Data Modeling - Entities and Events

Data Modeling - Entities and Events

Comments
6 min read
My journey learning Apache Spark

My journey learning Apache Spark

Comments
2 min read
SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.

SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.

Comments
1 min read
My Journey into Data AI and Machine Learning

My Journey into Data AI and Machine Learning

Comments
1 min read
Why Data Security is Broken and How to Fix it?

Why Data Security is Broken and How to Fix it?

1
Comments
5 min read
From ETL and ELT to Reverse ETL

From ETL and ELT to Reverse ETL

Comments
4 min read
*Mastering Informatica Intelligent Cloud Services (IICS) for Cloud Data Integration*

*Mastering Informatica Intelligent Cloud Services (IICS) for Cloud Data Integration*

1
Comments
3 min read
Building a Big Data Playground Sandbox for Learning

Building a Big Data Playground Sandbox for Learning

4
Comments
5 min read
What is Data Engineering?

What is Data Engineering?

Comments
1 min read
Explaining the History of Data Lakehouse

Explaining the History of Data Lakehouse

Comments
2 min read
End-to-End ETL and Sales Dashboard on WWI dataset in Microsoft Fabric

End-to-End ETL and Sales Dashboard on WWI dataset in Microsoft Fabric

Comments
7 min read
All About Parquet Part 01 - An Introduction

All About Parquet Part 01 - An Introduction

Comments
4 min read
All About Parquet Part 09 - Parquet in Data Lake Architectures

All About Parquet Part 09 - Parquet in Data Lake Architectures

Comments
5 min read
All About Parquet Part 02 - Parquet's Columnar Storage Model

All About Parquet Part 02 - Parquet's Columnar Storage Model

Comments
4 min read
All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

Comments
6 min read
All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

Comments
6 min read
All About Parquet Part 08 - Reading and Writing Parquet Files in Python

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

Comments
5 min read
Data Analysis: The Unsung Hero of Modern Business

Data Analysis: The Unsung Hero of Modern Business

Comments
2 min read
From a Unified Bronze Layer to Multiple Silver Layers: Streamlining Data Transformation in Databricks Unity Catalog

From a Unified Bronze Layer to Multiple Silver Layers: Streamlining Data Transformation in Databricks Unity Catalog

1
Comments
5 min read
Analyzing Airbnb Listings in Chicago: A Power BI Dashboard Project

Analyzing Airbnb Listings in Chicago: A Power BI Dashboard Project

1
Comments
4 min read
5 Best ETL Tools: A Comprehensive Comparison Guide

5 Best ETL Tools: A Comprehensive Comparison Guide

1
Comments
3 min read
Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

1
Comments
15 min read
AWS DATA ENGINEER - 101

AWS DATA ENGINEER - 101

2
Comments
2 min read
The Journey From a CSV File to Apache Hive Table

The Journey From a CSV File to Apache Hive Table

6
Comments
6 min read
Why Apache Spark RDD is immutable?

Why Apache Spark RDD is immutable?

Comments
3 min read
All About Parquet Part 04 - Schema Evolution in Parquet

All About Parquet Part 04 - Schema Evolution in Parquet

1
Comments
5 min read
Data Engineering in Observability: The Backbone of Modern Monitoring

Data Engineering in Observability: The Backbone of Modern Monitoring

1
Comments
5 min read
AnĂĄlise de dados de trĂĄfego aĂŠreo em tempo real com Spark Structured Streaming e Apache Kafka

AnĂĄlise de dados de trĂĄfego aĂŠreo em tempo real com Spark Structured Streaming e Apache Kafka

1
Comments
8 min read
Oracle to Snowflake Migration: Steps, Challenges & Best Practices

Oracle to Snowflake Migration: Steps, Challenges & Best Practices

1
Comments
3 min read
Data Engineering in 2024: Innovations and Trends Shaping the Future

Data Engineering in 2024: Innovations and Trends Shaping the Future

4
Comments 1
13 min read
Achieving Clean and Scalable PySpark Code: A Guide to Avoiding Redundancy

Achieving Clean and Scalable PySpark Code: A Guide to Avoiding Redundancy

Comments
5 min read
Explaining CDC (Change Data Capture)

Explaining CDC (Change Data Capture)

Comments
1 min read
CapĂ­tulo 2 - Modelos de Datos y Lenguajes de Consulta

CapĂ­tulo 2 - Modelos de Datos y Lenguajes de Consulta

2
Comments
8 min read
All About Parquet Part 05 - Compression Techniques in Parquet

All About Parquet Part 05 - Compression Techniques in Parquet

1
Comments
5 min read
All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

1
Comments
5 min read
All About Parquet Part 03 - Parquet File Structure | Pages, Row Groups, and Columns

All About Parquet Part 03 - Parquet File Structure | Pages, Row Groups, and Columns

1
Comments
5 min read
Clear Link Between DevSecOps and Data Engineering

Clear Link Between DevSecOps and Data Engineering

Comments
1 min read
Still Using SQL, Python, & Excel for Data Deduplication? Here's Why You Need Better Tools.

Still Using SQL, Python, & Excel for Data Deduplication? Here's Why You Need Better Tools.

5
Comments
4 min read
Capture Browser XHR/Fetch API Response Automatically into JSON Files

Capture Browser XHR/Fetch API Response Automatically into JSON Files

Comments
1 min read
The True Cost of Poor Data Quality: Why It Matters and How to Improve It

The True Cost of Poor Data Quality: Why It Matters and How to Improve It

3
Comments
6 min read
Building a User-Friendly, Budget-Friendly Alternative to dbt Cloud

Building a User-Friendly, Budget-Friendly Alternative to dbt Cloud

Comments
1 min read
O que ĂŠ Engenharia de Dados?

O que ĂŠ Engenharia de Dados?

3
Comments
1 min read
How SQL Spatial Data Solves Real-World Problems

How SQL Spatial Data Solves Real-World Problems

Comments
6 min read
Working with Gigantic Google BigQuery Partitioned Tables in DBT

Working with Gigantic Google BigQuery Partitioned Tables in DBT

1
Comments
3 min read
Handling Outliers 101: Why the IQR Method is Your Go-To Tool

Handling Outliers 101: Why the IQR Method is Your Go-To Tool

2
Comments
3 min read
Go vs Python for File Processing: A Performance and Architecture Perspective

Go vs Python for File Processing: A Performance and Architecture Perspective

2
Comments 2
5 min read
Exploring Data Operations with PySpark, Pandas, DuckDB, Polars, and DataFusion in a Python Notebook

Exploring Data Operations with PySpark, Pandas, DuckDB, Polars, and DataFusion in a Python Notebook

2
Comments
13 min read
Secure Data Stack: Navigating Adoption Challenges of Data Encryption

Secure Data Stack: Navigating Adoption Challenges of Data Encryption

1
Comments
5 min read
Python 101: Introduction to Python as a Data Analytics Tool

Python 101: Introduction to Python as a Data Analytics Tool

Comments
3 min read
Ultimate Directory of Apache Iceberg Resources

Ultimate Directory of Apache Iceberg Resources

Comments
14 min read
Understanding OLTP and Choosing the Right Database

Understanding OLTP and Choosing the Right Database

1
Comments
6 min read
Change Data Capture (CDC) when there is no CDC

Change Data Capture (CDC) when there is no CDC

Comments
11 min read
The Ultimate Guide to Data Engineering

The Ultimate Guide to Data Engineering

Comments
2 min read
Evolution of Data Sharding Towards Automation and Flexibility

Evolution of Data Sharding Towards Automation and Flexibility

Comments
15 min read
Data Showdown: OLAP vs. OLTP – The Battle of Real-Time and Analytics Titans

Data Showdown: OLAP vs. OLTP – The Battle of Real-Time and Analytics Titans

Comments
5 min read
The Power of Data Analytics – Transforming Businesses with Insights

The Power of Data Analytics – Transforming Businesses with Insights

Comments
5 min read
Serverless PDF Processing with AWS Lambda and Textract

Serverless PDF Processing with AWS Lambda and Textract

9
Comments 1
9 min read
The Simplest Data Architecture

The Simplest Data Architecture

1
Comments
21 min read
loading...