DEV Community

Kelly Kiiru
Kelly Kiiru

Posted on

#Data Engineering 101 : Introduction to Data Engineering

Personal Intro

Hi there.
I recently graduated from quite an intensive brutal boot camp. Needless to say, I am now a certified full stack developer. I use AngularJS for frontend applications and python(django or flask for the backend) with postgresql as my preferred database(for the moment).

What however led me to Data Engineering? To plainly put it, I came to find out that the market is saturated with software developers than it is with data engineers. It was therefore a no-brainer that the switch had to happen.

I Just finished my first week in a Data Engineering boot camp and will share my journey throughout the whole duration. And today we are going to look at the Introduction to Data Engineering.

Introduction to Data Engineering

Data Eng. can simply be described as the process of building pipelines to enable companies, organizations, institutions etc to make data driven decisions. The emphasis on relying on credible data to make financial, health or even decisions that affect the climate cannot be stressed enough. And this is where a Data Eng. comes in.

A Data Eng is tasked with the collection of high quality data, using credible means. They are required to put that data into useable formats from which they can derive information that could otherwise not have been noticed. At the same time, they may go on to add value to the data which goes a step further into trying to decipher more info that the raw data couldn't provide.

General Data Engineering Practices

The Data Engineering process starts with the collection of data from various sources which may include: the cloud, company's database, files or even various apps among others. This data which can be referred to as raw data is dumped into what is called a data lake. In this stage, data is in it's native format exactly as it is from the source.

The next step is to put the data into useable format. This may include the deletion of unnecessary fields, joining relevant data or even creating new fields.

After this, it is then directed into what we call a data warehouse. This is the point from which the data is directed by various pipelines in relation an organizations need(s).An example is the Business Intelligence Pipeline where reporting happens or even a Feature Pipeline which helps with Machine Learning models.

This whole process is abbreviated as ETL. Where you Extract, Transform and Load data in that order. It's also wise to say that there is a modern approach dubbed the ELT process. Simply, the loading of the data will happen before its transformation.

It is very important for the engineering team to have restrictions as to who can access what data and at what stage of the process. The team therefore has to put up security features to prevent a breach in data security.

Pre-requisites to becoming a Data Eng.

Getting into the field will require you to have knowlege of the following technologies.
1.Python
2.SQL
3.Understanding of data lakes and data Warehouses
4.Data Structures
5.Cloud Services(e.g Azure and AWS)
6.Visualization Tools(e.g Tableau and PowerBI)
7.Big Data(e.g Apache Spark and Hadoop)

If you are new to any of these technologies, I'd suggest you start with Python and SQL.

Python: Link
Python: Link
SQL: Link

Top comments (0)