A proven approach to land a Data Engineering job

#database #career #beginners #dataengineering

I have seen and been asked the following questions by students, backend engineers and analysts who want to get into the data engineering industry.

What approach should i take to land a Data Engineering job?

I really want to get into DE. What can I do to learn more about it?

In this article, I will try to provide a general approach that you as a beginner, student, backend engineer or analyst can use to land your first data engineering job. This is the approach I followed to get my first data engineering job.

What is a Data Engineer

Before we try to solve a problem, we should define it clearly. The job definition of a data engineer varies widely depending on the company and team. Many backend engineers are basically data engineers in many cases. For our case, we define a data engineer purely based on the skills and not on the job title.

Upon researching the popular job boards, understanding the current direction of the data engineering industry, interviewing and hiring data engineers, we can define a beginner data engineer as someone with the skills shown below

Knowledge of a scripting language such as python.
In depth knowledge of OLTP data modeling and when to use it such as star schema, indexes, etc
In depth knowledge of OLAP data modeling and when to use it such as distribution key, partitioning, etc
Understanding of Unix based system and commands.
Knowledge of a distributed data store such as HDFS.
Knowledge of a distributed batch data processing framework such as Apache spark
Knowledge of a data pipeline orchestration tool such as Apache Airflow
Basic knowledge of queuing system such as kafka

There is a more detailed list here.

As you can see, it is an extensive list. This is in addition to knowing CS engineering basics such as basics of web development, FE constructs, BE constructs, APIs and databases.

Starting point

Now that we have defined what a data engineer is, we can form a plan to get there. Your starting point may be different given your individual circumstances, but generally people who want to move into data engineering fall into one of the following categories

Beginner (no knowledge of computers)
Student (CS degree either undergrad or grad school)
Backend/ Fullstack/ Frontend Engineer (either FE or BE or other disciplines that involves creating software)
Data Analyst

1. Beginner

Everyone starts here. The best and surest way for someone here to get into data engineering is to first get a job as a Backend Engineer or Fullstack Engineer. These jobs will provide you the basic skills you will need to land a DE role. So how do you get into Backend/Fullstack Engineering? There are 3 main approaches

College, CS Degree - very long, tried and true approach, expensive, good chance of getting an engineering job(depending on the college and individual pre)
Coding Bootcamp - short, a bit of wild west but mostly good approach, expensive, ok-ish chance of getting an engineering job (this may be changing)
Self learning - can be very very long, without a mentor or someone to guide you this can get tricky, not a great chance of getting an engineering job

Which route you choose will depend on your individual circumstances. Irrespective of the route you take, Leetcode is crucial. make sure you know the commonly asked interview questions for the company you are interviewing at.

2. Student (CS)

This is a good place to be in. You have good knowledge of computers, a few programming languages, what an API is, algorithms, data structures, machine learning, distributed systems and operating systems. If you are here, there is a possibility that you might be able to land a junior data engineer but these roles are very rare. In order to land a good Backend/Fullstack engineer role, you will need to

Build a few projects(ideally >=3 or a big one), probably a CRUD based web app with complex logic and a database. Make sure it actually works and that potential employers can try it out easily online. Adding in a design diagram, description, why and how you built it as a README.md on github would show understanding of product requirements and clear communication.
Leetcode is crucial. For better or worse, companies heavily rely on algorithms and data structure type questions to recruit engineers. Make sure you know the commonly asked interview questions for the company you are interviewing at.

3. Backend / Fullstack / Frontend Engineer

You might already be doing some data engineering work. Luckily, you have most of the skills needed to learn the rest. Side projects are great, but work experience weighs a lot more in hiring decisions. Here are some actions you can take to increase your chances of getting an interview

Take initiative and build a data pipeline at your current job. This can become a point on your resume which leads to interviews. For example
1. If you are working on a webapp at your job, in your free time, build a simple data processing pipeline using python and cron to analyze the logs and find the places where the most errors occur in your code base. This might have the potential to reduce bugs.
2. If this provides valuable information, present it to your boss. Even if there are no new projects building trust, showing interest and recognition will make sure that when a new project comes in, you are the go-to person.
3. Most companies these days are aware of the benefits of data and analytics, so keep trying to come up with and implement new ideas for work. This will lead to more valuable experience than just side projects.
Another interesting project maybe understanding the changes in your database using debezium and kafka, like shown here.
Sometimes it is difficult to implement a new project at work for various reasons. In such cases, try building a side project (example) and make sure to write out a detailed README.md on your github repo and note the skills you learnt on your linkedin profile for keyword based search discovery by recruiters.

As I mention in the previous sections, Leetcode is crucial for interviews.

4. Data Analyst

You are already in a good position to transition, but there will probably need to be work done in the engineering part. You might already be using SQL to pull data from a data warehouse. You can

Automate one data pull using python.
Schedule that data pull to run at certain time every day using cron.
Automate more data pulls. For complex data pulls setup Airflow and use it sample project.
Understand your data warehouse infrastructure. (e.g. size of the warehouse cluster, partitions, how data is loaded etc).
If you can, do some NLP or big data processing in Apache Spark using AWS EMR or in GCP dataflow. This would be great.

As I mention in the previous sections, Leetcode is crucial for interviews.

Conclusion

`TL; DR`

Research the data engineering job postings.
Use the technical requirements as a base to build projects at work/self that actually help your company/you in some way.
Put those points as experience on your resume.
Practice interview questions on Leetcode regularly.

Repeat the above 4 steps and you will be in a much better place in 3 months.

Hope this article gives you some direction and helps you land your first data engineering job. Good luck.
Let me know the approach you plan to take in the comments below or send us an email here.

NOTE: This is a cross post from https://www.startdataengineering.com/