DEV Community

# dataengineering

Posts

ūüĎč Sign in for the ability to sort posts by relevant, latest, or top.
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows

PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows

3
Comments
1 min read
Class to Airflow Custom Operator

Class to Airflow Custom Operator

Comments
3 min read
How to pivot data using Dynamic SQL in SQL Server

How to pivot data using Dynamic SQL in SQL Server

4
Comments 4
3 min read
Apply CDC From MySQL To Clickhouse on local environment

Apply CDC From MySQL To Clickhouse on local environment

1
Comments
6 min read
How to clone tables in BigQuery

How to clone tables in BigQuery

2
Comments
1 min read
kafka: event driven microservices

kafka: event driven microservices

2
Comments
6 min read
Getting started with Apache Flink: A guide to stream processing

Getting started with Apache Flink: A guide to stream processing

1
Comments
8 min read
How to rotate data using Pivot & Unpivot operators

How to rotate data using Pivot & Unpivot operators

3
Comments 2
3 min read
Mage Battlegrounds: Craft insights from real-time customer behavior analysis

Mage Battlegrounds: Craft insights from real-time customer behavior analysis

2
Comments
2 min read
Apache Flink vs Apache Spark: A detailed comparison for data processing

Apache Flink vs Apache Spark: A detailed comparison for data processing

2
Comments
5 min read
Abstract Configurations

Abstract Configurations

1
Comments
3 min read
Apache Flink episode 1: A comprehensive introduction

Apache Flink episode 1: A comprehensive introduction

1
Comments
6 min read
Data sources episode 2: AWS S3 to Postgres Data Sync using Singer

Data sources episode 2: AWS S3 to Postgres Data Sync using Singer

2
Comments
4 min read
Data sources episode 1: Common data sources in modern pipelines

Data sources episode 1: Common data sources in modern pipelines

1
Comments
6 min read
Handling NULL in the DBs

Handling NULL in the DBs

5
Comments 1
2 min read
Unleashing the Magic of Job Schedulers: How to Tame Your Code and Save Your Sanity

Unleashing the Magic of Job Schedulers: How to Tame Your Code and Save Your Sanity

4
Comments
3 min read
Scraper Function to Airflow DAG

Scraper Function to Airflow DAG

1
Comments 1
3 min read
Code optimization

Code optimization

Comments
2 min read
From Class to Abstract Classes

From Class to Abstract Classes

1
Comments
3 min read
Deep Drive SQL ( part 01 )

Deep Drive SQL ( part 01 )

Comments
10 min read
SQL 102:Intermediate SQL

SQL 102:Intermediate SQL

Comments
10 min read
From Functional to Class: a look at SOLID coding

From Functional to Class: a look at SOLID coding

1
Comments
3 min read
Hadoop Migration: How we pulled this off together

Hadoop Migration: How we pulled this off together

Comments
8 min read
Quick Detour on Unit Testing with PyTest

Quick Detour on Unit Testing with PyTest

1
Comments
3 min read
Trigger Azure Data Factory Pipeline from Event Grid (Using Webhook Endpoint)

Trigger Azure Data Factory Pipeline from Event Grid (Using Webhook Endpoint)

Comments 2
4 min read
Bootstrapped to Functional

Bootstrapped to Functional

1
Comments
3 min read
AWS Cloud9 for Data Engineers

AWS Cloud9 for Data Engineers

1
Comments
5 min read
The Pyramid of Alerting

The Pyramid of Alerting

2
Comments
6 min read
Batch Processing vs Stream Processing: Why Batch is dying and Streaming takes over

Batch Processing vs Stream Processing: Why Batch is dying and Streaming takes over

Comments
13 min read
Introduction to Data Version Control

Introduction to Data Version Control

Comments
6 min read
Structure Query Language

Structure Query Language

6
Comments
2 min read
Using python dictionary in data engineering.

Using python dictionary in data engineering.

2
Comments 2
2 min read
Integrando uma Web API com Datastore Emulator

Integrando uma Web API com Datastore Emulator

Comments
4 min read
How we mastered dbt: A true story

How we mastered dbt: A true story

5
Comments
14 min read
Important Questions related to Data Engineering

Important Questions related to Data Engineering

2
Comments
1 min read
Python functions and lambda functions in data engineering.

Python functions and lambda functions in data engineering.

2
Comments
3 min read
Data Platform Architecture Types

Data Platform Architecture Types

1
Comments
9 min read
Data Wrangling in Python: Tips and Tricks

Data Wrangling in Python: Tips and Tricks

Comments
3 min read
Website Monitoring using AWS Lambda and Aurora

Website Monitoring using AWS Lambda and Aurora

2
Comments
4 min read
Apache Airflow - Deep Dive | All you need to know about Airflow

Apache Airflow - Deep Dive | All you need to know about Airflow

5
Comments
20 min read
How I Decreased ETL Cost by Leveraging the Apache Arrow Ecosystem

How I Decreased ETL Cost by Leveraging the Apache Arrow Ecosystem

Comments
6 min read
Creating Data Pipelines as DAGs in Apache Airflow (Part 1)

Creating Data Pipelines as DAGs in Apache Airflow (Part 1)

Comments
6 min read
SQL101: Introduction to SQL

SQL101: Introduction to SQL

Comments 2
14 min read
Data Pipelines with Great Expectations | Introduction

Data Pipelines with Great Expectations | Introduction

2
Comments
2 min read
22 Best DataOps Tools To Optimize Your Data Management and Observability In 2023

22 Best DataOps Tools To Optimize Your Data Management and Observability In 2023

16
Comments 1
30 min read
Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset

Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset

13
Comments 2
8 min read
Nesting Columns like a Pro: A Guide to Mastering Nested Structs in PySpark

Nesting Columns like a Pro: A Guide to Mastering Nested Structs in PySpark

Comments
4 min read
AWS Data Engineering Services: Everything you need to know

AWS Data Engineering Services: Everything you need to know

5
Comments
9 min read
PySpark: A brief analysis to the most common words in Dracula, by Bram Stoker

PySpark: A brief analysis to the most common words in Dracula, by Bram Stoker

13
Comments
5 min read
Working with Map() function in Python, Pyspark and Apache Beam

Working with Map() function in Python, Pyspark and Apache Beam

1
Comments
3 min read
Working with large CSV files in Python from Scratch

Working with large CSV files in Python from Scratch

6
Comments
1 min read
Job Search API

Job Search API

6
Comments
1 min read
Redshift Deep Dive

Redshift Deep Dive

1
Comments
5 min read
Azure Data Factory - Incrementally load data from Azure SQL to Azure Data Lake using Watermark

Azure Data Factory - Incrementally load data from Azure SQL to Azure Data Lake using Watermark

4
Comments
1 min read
What is data integration?

What is data integration?

10
Comments 2
4 min read
Data Engineering Trends for 2023

Data Engineering Trends for 2023

3
Comments
4 min read
The Changing Face Of ETL

The Changing Face Of ETL

3
Comments 1
12 min read
Ultimate guide to becoming a Data Analyst/Data Scientist

Ultimate guide to becoming a Data Analyst/Data Scientist

3
Comments
4 min read
SkyX: desenvolvimento de uma análise de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

SkyX: desenvolvimento de uma análise de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

1
Comments
8 min read
2022 Beginner Friendly Modern Data Engineering Career path With Learning Resources.

2022 Beginner Friendly Modern Data Engineering Career path With Learning Resources.

20
Comments 2
2 min read
loading...