DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Analyzing Airbnb Listings in Chicago: A Power BI Dashboard Project

Analyzing Airbnb Listings in Chicago: A Power BI Dashboard Project

1
Comments
4 min read
Intro to SQL using Apache Iceberg and Dremio

Intro to SQL using Apache Iceberg and Dremio

4
Comments
22 min read
5 Best ETL Tools: A Comprehensive Comparison Guide

5 Best ETL Tools: A Comprehensive Comparison Guide

1
Comments
3 min read
Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

1
Comments
15 min read
SAP S/4HANA Cloud

SAP S/4HANA Cloud

Comments 1
2 min read
Why Apache Spark RDD is immutable?

Why Apache Spark RDD is immutable?

Comments
3 min read
Data Modeling - Entities and Events

Data Modeling - Entities and Events

1
Comments
6 min read
Data Engineering in Observability: The Backbone of Modern Monitoring

Data Engineering in Observability: The Backbone of Modern Monitoring

1
Comments
5 min read
Análise de dados de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

Análise de dados de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

1
Comments
8 min read
Oracle to Snowflake Migration: Steps, Challenges & Best Practices

Oracle to Snowflake Migration: Steps, Challenges & Best Practices

1
Comments
3 min read
Data Engineering in 2024: Innovations and Trends Shaping the Future

Data Engineering in 2024: Innovations and Trends Shaping the Future

6
Comments 2
13 min read
My journey learning Apache Spark

My journey learning Apache Spark

1
Comments
2 min read
AWS DATA ENGINEER - 101

AWS DATA ENGINEER - 101

3
Comments
2 min read
The Journey From a CSV File to Apache Hive Table

The Journey From a CSV File to Apache Hive Table

5
Comments
6 min read
Achieving Clean and Scalable PySpark Code: A Guide to Avoiding Redundancy

Achieving Clean and Scalable PySpark Code: A Guide to Avoiding Redundancy

Comments
5 min read
Capítulo 2 - Modelos de Datos y Lenguajes de Consulta

Capítulo 2 - Modelos de Datos y Lenguajes de Consulta

2
Comments
7 min read
All About Parquet Part 01 - An Introduction

All About Parquet Part 01 - An Introduction

1
Comments
4 min read
All About Parquet Part 05 - Compression Techniques in Parquet

All About Parquet Part 05 - Compression Techniques in Parquet

3
Comments
5 min read
All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

7
Comments
6 min read
All About Parquet Part 03 - Parquet File Structure | Pages, Row Groups, and Columns

All About Parquet Part 03 - Parquet File Structure | Pages, Row Groups, and Columns

2
Comments
5 min read
All About Parquet Part 08 - Reading and Writing Parquet Files in Python

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

9
Comments
5 min read
All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

1
Comments
5 min read
All About Parquet Part 04 - Schema Evolution in Parquet

All About Parquet Part 04 - Schema Evolution in Parquet

3
Comments
5 min read
All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

1
Comments
6 min read
From a Unified Bronze Layer to Multiple Silver Layers: Streamlining Data Transformation in Databricks Unity Catalog

From a Unified Bronze Layer to Multiple Silver Layers: Streamlining Data Transformation in Databricks Unity Catalog

2
Comments
5 min read
Clear Link Between DevSecOps and Data Engineering

Clear Link Between DevSecOps and Data Engineering

Comments
1 min read
Still Using SQL, Python, & Excel for Data Deduplication? Here's Why You Need Better Tools.

Still Using SQL, Python, & Excel for Data Deduplication? Here's Why You Need Better Tools.

5
Comments
4 min read
Building a Big Data Playground Sandbox for Learning

Building a Big Data Playground Sandbox for Learning

5
Comments
5 min read
Capture Browser XHR/Fetch API Response Automatically into JSON Files

Capture Browser XHR/Fetch API Response Automatically into JSON Files

Comments
1 min read
The True Cost of Poor Data Quality: Why It Matters and How to Improve It

The True Cost of Poor Data Quality: Why It Matters and How to Improve It

3
Comments
6 min read
Explaining the History of Data Lakehouse

Explaining the History of Data Lakehouse

Comments
2 min read
Building a User-Friendly, Budget-Friendly Alternative to dbt Cloud

Building a User-Friendly, Budget-Friendly Alternative to dbt Cloud

Comments
1 min read
O que é Engenharia de Dados?

O que é Engenharia de Dados?

3
Comments
1 min read
How SQL Spatial Data Solves Real-World Problems

How SQL Spatial Data Solves Real-World Problems

Comments
6 min read
Explaining CDC (Change Data Capture)

Explaining CDC (Change Data Capture)

Comments
1 min read
Handling Outliers 101: Why the IQR Method is Your Go-To Tool

Handling Outliers 101: Why the IQR Method is Your Go-To Tool

2
Comments
3 min read
Go vs Python for File Processing: A Performance and Architecture Perspective

Go vs Python for File Processing: A Performance and Architecture Perspective

2
Comments 2
5 min read
Exploring Data Operations with PySpark, Pandas, DuckDB, Polars, and DataFusion in a Python Notebook

Exploring Data Operations with PySpark, Pandas, DuckDB, Polars, and DataFusion in a Python Notebook

4
Comments
13 min read
Secure Data Stack: Navigating Adoption Challenges of Data Encryption

Secure Data Stack: Navigating Adoption Challenges of Data Encryption

1
Comments
5 min read
Python 101: Introduction to Python as a Data Analytics Tool

Python 101: Introduction to Python as a Data Analytics Tool

Comments
3 min read
Ultimate Directory of Apache Iceberg Resources

Ultimate Directory of Apache Iceberg Resources

1
Comments
14 min read
Understanding OLTP and Choosing the Right Database

Understanding OLTP and Choosing the Right Database

1
Comments
6 min read
Change Data Capture (CDC) when there is no CDC

Change Data Capture (CDC) when there is no CDC

1
Comments
11 min read
The Ultimate Guide to Data Engineering

The Ultimate Guide to Data Engineering

Comments
2 min read
Evolution of Data Sharding Towards Automation and Flexibility

Evolution of Data Sharding Towards Automation and Flexibility

Comments
15 min read
Data Showdown: OLAP vs. OLTP – The Battle of Real-Time and Analytics Titans

Data Showdown: OLAP vs. OLTP – The Battle of Real-Time and Analytics Titans

Comments
5 min read
Serverless PDF Processing with AWS Lambda and Textract

Serverless PDF Processing with AWS Lambda and Textract

10
Comments 1
9 min read
The Simplest Data Architecture

The Simplest Data Architecture

1
Comments
21 min read
🌐 Get started: What is MongoDB Operational Data Layer? (Part 1)

🌐 Get started: What is MongoDB Operational Data Layer? (Part 1)

Comments
2 min read
ETL Real Estate Data Engineering with Redfin: From Extraction to Visualization

ETL Real Estate Data Engineering with Redfin: From Extraction to Visualization

Comments
3 min read
End-to-End AWS KMS Encryption and Decryption Tutorial

End-to-End AWS KMS Encryption and Decryption Tutorial

2
Comments
3 min read
Working with Gigantic Google BigQuery Partitioned Tables in DBT

Working with Gigantic Google BigQuery Partitioned Tables in DBT

2
Comments
3 min read
Cogumelos Mágicos: explorando e tratando dados nulos com Mage

Cogumelos Mágicos: explorando e tratando dados nulos com Mage

Comments
6 min read
Apache Airflow

Apache Airflow

2
Comments
4 min read
Building and Managing Production-Ready Apache Airflow: From Setup to Troubleshooting

Building and Managing Production-Ready Apache Airflow: From Setup to Troubleshooting

Comments
2 min read
The Must-Have Features of Modern Data Transformation Tools

The Must-Have Features of Modern Data Transformation Tools

Comments
6 min read
An End-to-End Guide to dbt (Data Build Tool) with a Use Case Example

An End-to-End Guide to dbt (Data Build Tool) with a Use Case Example

4
Comments
4 min read
Data Pipeline Techniques in Action

Data Pipeline Techniques in Action

1
Comments
1 min read
From Data Lakes to Data Mesh: The Emerging Trends of Data Management and Analytics

From Data Lakes to Data Mesh: The Emerging Trends of Data Management and Analytics

1
Comments
8 min read
Building Powerful Social Media APIs for Twitter and Telegram: A Developer's Journey

Building Powerful Social Media APIs for Twitter and Telegram: A Developer's Journey

1
Comments 1
3 min read
loading...