DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Data Platform Architecture Types

Data Platform Architecture Types

1
Comments
9 min read
Integrando uma Web API com Datastore Emulator

Integrando uma Web API com Datastore Emulator

1
Comments
4 min read
Python functions and lambda functions in data engineering.

Python functions and lambda functions in data engineering.

6
Comments
3 min read
Creating Data Pipelines as DAGs in Apache Airflow (Part 1)

Creating Data Pipelines as DAGs in Apache Airflow (Part 1)

Comments
6 min read
Using python dictionary in data engineering.

Using python dictionary in data engineering.

6
Comments 2
2 min read
prefect vs apache airflow

prefect vs apache airflow

3
Comments
4 min read
SQL101: Introduction to SQL

SQL101: Introduction to SQL

Comments 2
14 min read
22 Best DataOps Tools To Optimize Your Data Management and Observability In 2023

22 Best DataOps Tools To Optimize Your Data Management and Observability In 2023

16
Comments 2
30 min read
Data Pipelines with Great Expectations | Introduction

Data Pipelines with Great Expectations | Introduction

2
Comments
2 min read
Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset

Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset

15
Comments 2
8 min read
Nesting Columns like a Pro: A Guide to Mastering Nested Structs in PySpark

Nesting Columns like a Pro: A Guide to Mastering Nested Structs in PySpark

Comments
4 min read
AWS Data Engineering Services: Everything you need to know

AWS Data Engineering Services: Everything you need to know

5
Comments 1
9 min read
PySpark: A brief analysis to the most common words in Dracula, by Bram Stoker

PySpark: A brief analysis to the most common words in Dracula, by Bram Stoker

13
Comments
5 min read
Working with Map() function in Python, Pyspark and Apache Beam

Working with Map() function in Python, Pyspark and Apache Beam

1
Comments
3 min read
Working with large CSV files in Python from Scratch

Working with large CSV files in Python from Scratch

6
Comments
1 min read
Time Series Database and Analytics using Azure Data Explorer

Time Series Database and Analytics using Azure Data Explorer

1
Comments
4 min read
How I built a real-time Machine Learning system with Kafka, Elasticsearch, Kibana, and Docker

How I built a real-time Machine Learning system with Kafka, Elasticsearch, Kibana, and Docker

1
Comments
4 min read
Handling schema changes in snowflake

Handling schema changes in snowflake

3
Comments
5 min read
Redshift Deep Dive

Redshift Deep Dive

1
Comments
5 min read
Azure Data Factory - Incrementally load data from Azure SQL to Azure Data Lake using Watermark

Azure Data Factory - Incrementally load data from Azure SQL to Azure Data Lake using Watermark

4
Comments
1 min read
What is data integration?

What is data integration?

10
Comments 2
4 min read
Data Engineering Trends for 2023

Data Engineering Trends for 2023

3
Comments
4 min read
The Changing Face Of ETL

The Changing Face Of ETL

3
Comments 1
12 min read
Ultimate guide to becoming a Data Analyst/Data Scientist

Ultimate guide to becoming a Data Analyst/Data Scientist

4
Comments
4 min read
Amazon SQS and serverless DataEngineering workloads

Amazon SQS and serverless DataEngineering workloads

2
Comments
3 min read
2022 Beginner Friendly Modern Data Engineering Career path With Learning Resources.

2022 Beginner Friendly Modern Data Engineering Career path With Learning Resources.

20
Comments 2
2 min read
SkyX: desenvolvimento de uma análise de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

SkyX: desenvolvimento de uma análise de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

2
Comments
8 min read
Learn Ansible and how to Install it in Ubuntu 22.04.

Learn Ansible and how to Install it in Ubuntu 22.04.

Comments
3 min read
Uma breve Introdução ao processamento de dados em tempo real com Spark Structured Streaming e Apache Kafka

Uma breve Introdução ao processamento de dados em tempo real com Spark Structured Streaming e Apache Kafka

5
Comments
8 min read
Apache-Spark introduction for SQL developers

Apache-Spark introduction for SQL developers

2
Comments
7 min read
PySpark: uma breve análise das palavras mais comuns em Drácula, por Bram Stoker

PySpark: uma breve análise das palavras mais comuns em Drácula, por Bram Stoker

4
Comments 6
6 min read
Create Jira Ticket on Prefect Task Failure

Create Jira Ticket on Prefect Task Failure

Comments
2 min read
Introdução à análise de dados com PySpark utilizando os dados dos campeões de League of Legends

Introdução à análise de dados com PySpark utilizando os dados dos campeões de League of Legends

3
Comments
8 min read
Pokemons Flow: desenvolvendo uma pipeline de dados com apache airflow para extração de pokemon via API

Pokemons Flow: desenvolvendo uma pipeline de dados com apache airflow para extração de pokemon via API

9
Comments
6 min read
Apache PySpark for Data Engineering

Apache PySpark for Data Engineering

6
Comments 4
9 min read
Data Engineering 101: Introduction to Data Engineering

Data Engineering 101: Introduction to Data Engineering

10
Comments
3 min read
Introduction to Python for Data Engineering

Introduction to Python for Data Engineering

4
Comments
5 min read
Kubernetes Was Never Designed for Batch Jobs

Kubernetes Was Never Designed for Batch Jobs

3
Comments 2
17 min read
Data Engineering 102: Introduction to Python for Data Engineering.

Data Engineering 102: Introduction to Python for Data Engineering.

5
Comments
10 min read
Introduction to Python for Data Engineering

Introduction to Python for Data Engineering

4
Comments
7 min read
INTRODUCTION TO PYTHON FOR DATA ENGINEERING

INTRODUCTION TO PYTHON FOR DATA ENGINEERING

Comments
4 min read
DATA ENGINEERING 101:INTRODUCTION TO DATA ENGINNERING.

DATA ENGINEERING 101:INTRODUCTION TO DATA ENGINNERING.

5
Comments
2 min read
Fundamentos da Engenharia de Dados

Fundamentos da Engenharia de Dados

6
Comments
9 min read
Data Engineering 101: Introduction to Data Engineering

Data Engineering 101: Introduction to Data Engineering

5
Comments
2 min read
Online SQL Client for low code data management

Online SQL Client for low code data management

5
Comments 1
5 min read
Data Engineering 101: Introduction to Data Engineering.

Data Engineering 101: Introduction to Data Engineering.

4
Comments 1
6 min read
Introduction to data engineering

Introduction to data engineering

5
Comments
4 min read
Hash Personal Identifiable Information (PII) in your ELT pipelines

Hash Personal Identifiable Information (PII) in your ELT pipelines

3
Comments
3 min read
Difference Between Data Engineer and Data Scientist?

Difference Between Data Engineer and Data Scientist?

7
Comments
3 min read
Learning Workflow Schedulers (Oozie)

Learning Workflow Schedulers (Oozie)

1
Comments
5 min read
Solving AttributeError: 'float' object has no attribute 'rint'

Solving AttributeError: 'float' object has no attribute 'rint'

3
Comments
2 min read
[Spark-k8s] — Getting started # Part 1

[Spark-k8s] — Getting started # Part 1

2
Comments
4 min read
Websites to find Dataset for your Data Engineering projects.

Websites to find Dataset for your Data Engineering projects.

5
Comments
1 min read
Data engineers must-see: The future trend of big data cloud services

Data engineers must-see: The future trend of big data cloud services

8
Comments
8 min read
Data Engineering Projects for Beginners

Data Engineering Projects for Beginners

23
Comments 2
2 min read
Data Pipelines with Apache Airflow - Book Review

Data Pipelines with Apache Airflow - Book Review

6
Comments
2 min read
ETL vs Interactive Queries: The Case for Both

ETL vs Interactive Queries: The Case for Both

6
Comments
8 min read
Data Engineering - Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow

Data Engineering - Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow

9
Comments
4 min read
Parsing logs from multiple data sources with Ahana and Cube

Parsing logs from multiple data sources with Ahana and Cube

14
Comments
24 min read
Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics

Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics

7
Comments
3 min read
loading...