DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Workflow of Data Engineering Project on AWS

Workflow of Data Engineering Project on AWS

1
Comments
4 min read
Feature Engineering Has a Language Problem

Feature Engineering Has a Language Problem

1
Comments
15 min read
Debugging Python Data Pipelines

Debugging Python Data Pipelines

Comments
3 min read
What is data engineering and a B.I architecture

What is data engineering and a B.I architecture

5
Comments
6 min read
How To Create Dataflow Job with Scio

How To Create Dataflow Job with Scio

2
Comments
8 min read
Using pyspark to stream data from coingecko API and visualise using dash

Using pyspark to stream data from coingecko API and visualise using dash

2
Comments
6 min read
AWS Redshift: Robust and Scalable Data Warehousing

AWS Redshift: Robust and Scalable Data Warehousing

3
Comments
6 min read
Stream data processing with Mage

Stream data processing with Mage

6
Comments
8 min read
How to pivot data using Dynamic SQL in SQL Server

How to pivot data using Dynamic SQL in SQL Server

5
Comments 4
3 min read
How to clone tables in BigQuery

How to clone tables in BigQuery

2
Comments
1 min read
kafka: event driven microservices

kafka: event driven microservices

3
Comments
6 min read
Getting started with Apache Flink: A guide to stream processing

Getting started with Apache Flink: A guide to stream processing

23
Comments
8 min read
How to rotate data using Pivot & Unpivot operators

How to rotate data using Pivot & Unpivot operators

3
Comments 2
3 min read
Apply CDC From MySQL To Clickhouse on local environment

Apply CDC From MySQL To Clickhouse on local environment

6
Comments
6 min read
Mage Battlegrounds: Craft insights from real-time customer behavior analysis

Mage Battlegrounds: Craft insights from real-time customer behavior analysis

2
Comments
2 min read
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows

PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows

4
Comments
1 min read
Apache Flink vs Apache Spark: A detailed comparison for data processing

Apache Flink vs Apache Spark: A detailed comparison for data processing

14
Comments 1
5 min read
Abstract Configurations

Abstract Configurations

1
Comments
3 min read
Apache Flink episode 1: A comprehensive introduction

Apache Flink episode 1: A comprehensive introduction

2
Comments
6 min read
Data sources episode 2: AWS S3 to Postgres Data Sync using Singer

Data sources episode 2: AWS S3 to Postgres Data Sync using Singer

3
Comments
4 min read
Data sources episode 1: Common data sources in modern pipelines

Data sources episode 1: Common data sources in modern pipelines

1
Comments
6 min read
Handling NULL in the DBs

Handling NULL in the DBs

5
Comments 1
2 min read
Unleashing the Magic of Job Schedulers: How to Tame Your Code and Save Your Sanity

Unleashing the Magic of Job Schedulers: How to Tame Your Code and Save Your Sanity

4
Comments
3 min read
Class to Airflow Custom Operator

Class to Airflow Custom Operator

Comments
3 min read
Scraper Function to Airflow DAG

Scraper Function to Airflow DAG

5
Comments 1
3 min read
From Class to Abstract Classes

From Class to Abstract Classes

1
Comments
3 min read
SQL 102:Intermediate SQL

SQL 102:Intermediate SQL

Comments
10 min read
From Functional to Class: a look at SOLID coding

From Functional to Class: a look at SOLID coding

1
Comments
3 min read
Hadoop Migration: How we pulled this off together

Hadoop Migration: How we pulled this off together

Comments
8 min read
Quick Detour on Unit Testing with PyTest

Quick Detour on Unit Testing with PyTest

1
Comments
3 min read
Trigger Azure Data Factory Pipeline from Event Grid (Using Webhook Endpoint)

Trigger Azure Data Factory Pipeline from Event Grid (Using Webhook Endpoint)

7
Comments 4
4 min read
Bootstrapped to Functional

Bootstrapped to Functional

1
Comments
3 min read
All about Structure Query Language (SQL)

All about Structure Query Language (SQL)

Comments
10 min read
AWS Cloud9 for Data Engineers

AWS Cloud9 for Data Engineers

1
Comments
5 min read
The Pyramid of Alerting

The Pyramid of Alerting

7
Comments 1
6 min read
Code optimization

Code optimization

Comments
2 min read
Batch Processing vs Stream Processing: Why Batch is dying and Streaming takes over

Batch Processing vs Stream Processing: Why Batch is dying and Streaming takes over

Comments
14 min read
Introduction to Data Version Control

Introduction to Data Version Control

1
Comments
6 min read
Data Council: The Highlights of Day 2

Data Council: The Highlights of Day 2

1
Comments
4 min read
Structure Query Language

Structure Query Language

6
Comments
2 min read
How we mastered dbt: A true story

How we mastered dbt: A true story

7
Comments
14 min read
Important Questions related to Data Engineering

Important Questions related to Data Engineering

2
Comments
1 min read
Data Wrangling in Python: Tips and Tricks

Data Wrangling in Python: Tips and Tricks

Comments
3 min read
Website Monitoring using AWS Lambda and Aurora

Website Monitoring using AWS Lambda and Aurora

3
Comments
4 min read
Apache Airflow - Deep Dive | All you need to know about Airflow

Apache Airflow - Deep Dive | All you need to know about Airflow

6
Comments
20 min read
How I Decreased ETL Cost by Leveraging the Apache Arrow Ecosystem

How I Decreased ETL Cost by Leveraging the Apache Arrow Ecosystem

Comments
6 min read
Data Platform Architecture Types

Data Platform Architecture Types

3
Comments
9 min read
Integrando uma Web API com Datastore Emulator

Integrando uma Web API com Datastore Emulator

1
Comments
4 min read
Python functions and lambda functions in data engineering.

Python functions and lambda functions in data engineering.

7
Comments
3 min read
Creating Data Pipelines as DAGs in Apache Airflow (Part 1)

Creating Data Pipelines as DAGs in Apache Airflow (Part 1)

1
Comments
6 min read
Using python dictionary in data engineering.

Using python dictionary in data engineering.

6
Comments 2
2 min read
prefect vs apache airflow

prefect vs apache airflow

4
Comments
4 min read
SQL101: Introduction to SQL

SQL101: Introduction to SQL

Comments 2
14 min read
22 Best DataOps Tools To Optimize Your Data Management and Observability In 2023

22 Best DataOps Tools To Optimize Your Data Management and Observability In 2023

16
Comments 2
30 min read
Data Pipelines with Great Expectations | Introduction

Data Pipelines with Great Expectations | Introduction

4
Comments
2 min read
Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset

Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset

16
Comments 2
8 min read
Nesting Columns like a Pro: A Guide to Mastering Nested Structs in PySpark

Nesting Columns like a Pro: A Guide to Mastering Nested Structs in PySpark

3
Comments
4 min read
PySpark: A brief analysis to the most common words in Dracula, by Bram Stoker

PySpark: A brief analysis to the most common words in Dracula, by Bram Stoker

13
Comments
5 min read
AWS Data Engineering Services: Everything you need to know

AWS Data Engineering Services: Everything you need to know

5
Comments 1
9 min read
Working with Map() function in Python, Pyspark and Apache Beam

Working with Map() function in Python, Pyspark and Apache Beam

1
Comments
3 min read
loading...