DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Quick profiling of data in Apache Kafka using kafkacat and visidata

Quick profiling of data in Apache Kafka using kafkacat and visidata

2
Comments 1
2 min read
📼 ksqlDB HOWTO - A mini video series 📼

📼 ksqlDB HOWTO - A mini video series 📼

10
Comments
4 min read
Tech Exceptions Show: Accelerating Data Engineering with Azure

Tech Exceptions Show: Accelerating Data Engineering with Azure

12
Comments
2 min read
Apache Spark Ecosystem, Jan 2021 Highlights

Apache Spark Ecosystem, Jan 2021 Highlights

11
Comments
4 min read
Running a self-managed Kafka Connect worker for Confluent Cloud

Running a self-managed Kafka Connect worker for Confluent Cloud

8
Comments
11 min read
ETL com Apache Airflow, Web Scraping, AWS S3, Apache Spark e Redshift | Parte 1

ETL com Apache Airflow, Web Scraping, AWS S3, Apache Spark e Redshift | Parte 1

19
Comments 1
7 min read
Kafka Connect - Deep Dive into Single Message Transforms

Kafka Connect - Deep Dive into Single Message Transforms

4
Comments
3 min read
First Look: AWS Glue DataBrew

First Look: AWS Glue DataBrew

10
Comments
7 min read
My favourite re:Invent data announcements

My favourite re:Invent data announcements

8
Comments
5 min read
🎄 Twelve Days of SMT 🎄 - Day 6: InsertField II

🎄 Twelve Days of SMT 🎄 - Day 6: InsertField II

6
Comments
3 min read
New Features in Amazon DynamoDB - PartiQL, Export to S3, Integration with Kinesis Data Streams

New Features in Amazon DynamoDB - PartiQL, Export to S3, Integration with Kinesis Data Streams

11
Comments
12 min read
🎄 Twelve Days of SMT 🎄 - Day 1: InsertField (timestamp)

🎄 Twelve Days of SMT 🎄 - Day 1: InsertField (timestamp)

5
Comments
3 min read
Datetimes Are Hard: Part 1 - Incoming data and formats

Datetimes Are Hard: Part 1 - Incoming data and formats

4
Comments 1
4 min read
Tidying up Pipelines with DataClasses

Tidying up Pipelines with DataClasses

5
Comments
5 min read
Uniform Data Distribution Among Kinesis Data Stream Shards

Uniform Data Distribution Among Kinesis Data Stream Shards

2
Comments 2
3 min read
Cut data warehouse costs with run caching

Cut data warehouse costs with run caching

5
Comments
3 min read
Introduction to Data Pipelines

Introduction to Data Pipelines

2
Comments 1
4 min read
Dagster with User Code Deployments (gRPC)

Dagster with User Code Deployments (gRPC)

14
Comments 2
6 min read
12 Ways of Applying a Function to Python Pandas DataFrame

12 Ways of Applying a Function to Python Pandas DataFrame

4
Comments
1 min read
Data engineering essentials

Data engineering essentials

4
Comments
1 min read
Some of my favourite public data sets

Some of my favourite public data sets

8
Comments 3
2 min read
Becoming a Data Engineer

Becoming a Data Engineer

64
Comments 2
1 min read
Transform AWS CloudTrail data using AWS Data Wrangler

Transform AWS CloudTrail data using AWS Data Wrangler

3
Comments
8 min read
5 Essential skills for becoming a Data Engineer

5 Essential skills for becoming a Data Engineer

8
Comments
6 min read
The Most Popular Data Science Newsletters

The Most Popular Data Science Newsletters

11
Comments
9 min read
Build a monitored code-based pipeline to move data from Postgres to Snowflake

Build a monitored code-based pipeline to move data from Postgres to Snowflake

7
Comments
9 min read
Handling upstream data changes via Change Data Capture

Handling upstream data changes via Change Data Capture

8
Comments
8 min read
Intoduction to Apache Spark

Intoduction to Apache Spark

10
Comments
6 min read
Kafka Connect in 60 seconds 01:00

Kafka Connect in 60 seconds

4
Comments
2 min read
Deploying data pipelines to AWS Fargate - with monitoring and alerts built-in

Deploying data pipelines to AWS Fargate - with monitoring and alerts built-in

6
Comments
3 min read
Windowing in Streaming Data: Theory and a Scikit-Multiflow Example

Windowing in Streaming Data: Theory and a Scikit-Multiflow Example

2
Comments
4 min read
Data Warehouse - The Minimal Architectural Approach

Data Warehouse - The Minimal Architectural Approach

3
Comments 1
2 min read
Data Lake - 5 Major Principles

Data Lake - 5 Major Principles

2
Comments
2 min read
Scrape Structured Data with Python and Extruct

Scrape Structured Data with Python and Extruct

10
Comments
16 min read
How To Run Airflow on Windows (with Docker)

How To Run Airflow on Windows (with Docker)

15
Comments 3
8 min read
Implementing a graph network pipeline with Dagster

Implementing a graph network pipeline with Dagster

22
Comments 1
12 min read
What differentiates schema on read from schema on write?

What differentiates schema on read from schema on write?

3
Comments 2
3 min read
Loading CSV data into Kafka - video walkthrough

Loading CSV data into Kafka - video walkthrough

5
Comments
10 min read
Scraping Data on the Web with BeautifulSoup

Scraping Data on the Web with BeautifulSoup

33
Comments
12 min read
CI/CD for ETL/ELT pipelines

CI/CD for ETL/ELT pipelines

18
Comments
3 min read
A proven approach to land a Data Engineering job

A proven approach to land a Data Engineering job

6
Comments
5 min read
Data Engineering Project for Beginners - Batch edition

Data Engineering Project for Beginners - Batch edition

25
Comments
19 min read
10 Key skills, to help you become a data engineer

10 Key skills, to help you become a data engineer

9
Comments
3 min read
Airflow UI with Role-Based Access Control

Airflow UI with Role-Based Access Control

5
Comments
1 min read
Apache Airflow Installation - mysql+celery

Apache Airflow Installation - mysql+celery

7
Comments
1 min read
Extract Nested Data From Complex JSON

Extract Nested Data From Complex JSON

9
Comments
6 min read
DataOps - A Made-Up Term or Actual Practice

DataOps - A Made-Up Term or Actual Practice

13
Comments
7 min read
🛢Create New Kedro Pipeline (kedro new)

🛢Create New Kedro Pipeline (kedro new)

5
Comments
4 min read
🤷‍♀️ What is Kedro (The Parts)

🤷‍♀️ What is Kedro (The Parts)

18
Comments 4
3 min read
Data engineering portfolio projects?

Data engineering portfolio projects?

29
Comments 1
1 min read
Apache Airflow Core Concepts

Apache Airflow Core Concepts

26
Comments
4 min read
Coding MapReduce in C from Scratch using Threads: Map

Coding MapReduce in C from Scratch using Threads: Map

7
Comments
9 min read
I am a junior data engineer without a senior engineer. What should I do?

I am a junior data engineer without a senior engineer. What should I do?

7
Comments 1
1 min read
Toward GCP Data Engineer certification

Toward GCP Data Engineer certification

9
Comments
1 min read
Data Engineering Skills 00:31

Data Engineering Skills

14
Comments
1 min read
Intro to Data Ingestion and Data Lakes

Intro to Data Ingestion and Data Lakes

8
Comments 1
3 min read
Data Engineering — Complete Reference Guide From A-Z [2019]

Data Engineering — Complete Reference Guide From A-Z [2019]

30
Comments
16 min read
Overview of the different approaches to putting Machine Learning (ML) models in production

Overview of the different approaches to putting Machine Learning (ML) models in production

9
Comments
14 min read
ON the evolution of Data Engineering

ON the evolution of Data Engineering

15
Comments
4 min read
10 Days to Become a Google Cloud Certified Professional Data Engineer

10 Days to Become a Google Cloud Certified Professional Data Engineer

26
Comments 2
11 min read
loading...