DEV Community

Haji Rufai
Haji Rufai

Posted on

101 DATA ENGINEERING

Data pipeline chart
Hello there!! You may have heard or not about Data Engineering and Data Engineers. What is it? Who are these data engineers? What do they do? Are they paid well? I mean if yes! what stuff (technology and knowledge) do they have? Maybe I want to become one if its feasible - Then: ..




What is Data Engineering?

Data engineering is the profession of creating systems for massively scalable data collection, storing, and analysis.

That's it. Read it again if you are just interested.

Relationship and difference with Data Science

A chart comparing and contrasting Data science and Data Engineer
Data science and data engineering are both big data and data analytics fields that have gained momentum in recent years.

Data science is a multi-disciplinary field that involves extracting knowledge from data to solve problems.

Data engineers are those who are responsible for building the data pipeline that will move data from where it is stored to where it can be used e.g. by Data scientists.


What does a Data engineer do?

A data engineer is a person responsible for the data engineering aspects of building and maintaining data systems.

With a background in machine learning, analytics, and data sciences, the data engineer works to combine and manage data sets while providing its users with a wealth of data visualization tools.


Pay??

Hehe, data engineers are among the top paid tech professionals, more than software engineers and data scientists.

Furthermore, why is it better ? For a job posting, there are around 8x fewer applicants than software engineers.


Requirements?

No requirements but a degree in Mathematics, Computer Science, Engineering and related field is a plus.

Though the majority of those entering the field are software engineers, data analysts, and data scientists, the field is not limited to complete beginners. 


What is the career path?

There are self-taught data engineers, but the most effective are those who follow a guideline on what and what to cover. There are online courses eg Udacity Data Engineering, edx.IBM Professional Data Engineering etc. most of which are premium (Ouch!).

That's not the end. You can follow a certain guideline and strictly stick with it, and each topic can be independently found on the internet.

Here are the base foundations that you need

1. Python

No compromising here. You need to get started and dig deep into Python. Also, keep in mind that it is Python, not any other programming language. 

Where to learn

LearnPython.org
It is a nice interactive website and beginner friendly for python language. There are several topics arranged in order, for each topic there is a coding exercise at the end to test your if you have mastered the topic.

The good part (not the lazy part) there is solution to all exercises!!

2. Structured Query Language (SQL)

A data engineer interacts a lot more with databases than a back-end software engineer and a data scientist. To grasp SQL is mandatory for data engineers.

Where to learn

W3Schools/SQL
There is such a large SQL resource at w3school that you may be overwhelmed by it, especially if you are starting. But hey! It is very interactive and user-friendly and should be considered 50% of your journey of data engineering database conquest. The good part is that you can practice your SQL skills on their platform, which is very flexible. Take your time , take key notes as you learn. Learn how you learn best and improve on it. Cheers!

3. Big Data (Spark and Hadoop)

Let's talk about big data and the trend. The term big data is often used to describe large data sets and data warehouses used by businesses to store and analyze large amounts of data.

The technologies (framework) used are spark and Hadoop.

Where to learn

SkillUP by simplilearn
The above will give you the best dive in even if you are beginner. Yes ,It's free!

4. Cloud Computing

What is it?

Cloud computing is the use of computing resources (computers, storage, networking equipment, and applications) that are accessed through a web browser, over the Internet.

These resources are provided remotely - usually through the Internet - so that users do not have to be at the workplace to get their computing needs met.

There are several cloud computing providers. Some of them are:

  • Google Cloud Platform (GCP)
  • Amazon Web Service (AWS)
  • Microsoft Azure
  • Oracle Cloud

What you need to have

You need to master at least one cloud service and have a basic understanding of the others, as different companies use different cloud services.

My opinion master with AWS first as it is the largest cloud service provider.

Where to learn

Again Simplilearn-SkillUp Getting started with AWS fundamental will give you a headstart especially if you are a beginner.

It is a total of 4 hours of video divided into well-explained lessons, and it is free. From there, you are now robust enough to go to 999 Data Engineering.



How long will it take?

With the right plan and guidelines, giving 30 hrs/week for 6 months, you are ready for it!



Conclusion

Starting the data engineering journey can be easy. But maintaining the quest needs grit.

Have some patience as you begin your journey. For sure, you will face difficulties in grasping some concepts on the way (as you learn and do projects).

Another thing I want to add is

Difficulty is relative. With some concepts, you may find it easier and others harder compared to your peers. Push yourself and learn.

Congrats!



Have you started your data engineering journey yet?

Top comments (0)