Introduction
This has been inspired by the Full Stack Developer's Roadmap post written by @ender_minyard ๐
With the ever growing data volumes and demands, the data engineering career has been one of the fastest growing jobs for the past few years.
According to the 2021 Stack Overflow survey, data engineers are one of the top 5 highest paid professionals right after SREs and DevOps engineers:
If you are looking to become a data engineer, here are some resources for data engineering that you can save for later.
Table Of Contents
- ๐ป Fundamentals
- ๐ฉโ๐ป Programming basics
- ๐งช Testing
- ๐ Database Fundamentals
- ๐ Data warehouses
- ๐ฆ Object storage
- โก Data processing
- ๐ฉ Messaging
- ๐ฝ Cluster computing
- โฒ Workflow Scheduling
- ๐บ Monitoring data pipelines
- ๐จโ๐ป Infrastructure as Code
- ๐ซ CI/CD
๐ป Fundamentals
Having a solid understanding of the Linux operating system could be a must in many IT related roles. You are going to benefit a lot if you know the basics of the following:
- Basic Terminal Usagehttps://devdojo.com/course/linux-command-line-basics
- Shell Scripting
- Git and GitHub
- Networking
๐ฉโ๐ป Programming basics
As with any IT related role it is essential to have fundamental knowledge of programming in general. The programming language itself does not matter that much, but you need to have good understanding of programming paradigms and best practices.
๐งช Testing
- Unit Testing
- Functional testing
๐ Database Fundamentals
Having a solid understanding of SQL, data normalization and ACID transactions is a must for all data engineers.
Relational Databases
Non-relational databases
- Document: MongoDB, Elasticsearch
- Wide column: Apache Cassandra, Apache HBase
- Graph: Neo4j
- Key-value: Redis, Memcached
๐ Data warehouses
๐ฆ Object storage
โก Data processing
Batch
Hybrid
Streaming
- Materialize - The Streaming Database for Real-time Analytics
- Apache Kafka
- Apache Storm
๐ฉ Messaging
๐ฝ Cluster computing
โฒ Workflow Scheduling
๐บ Monitoring data pipelines
๐จโ๐ป Infrastructure as Code
- Containers: Docker
- Orchestration: Kubernetes, Docker Swarm
- Provisioning: Terraform
- Automation: Ansible
๐ซ CI/CD
Conclusion
This was inspired by the Data Engineer Roadmap open source repository here:
https://github.com/datastacktv/data-engineer-roadmap
I wanted to build upon the roadmap and provide a list of resources for each topic.
Let me know if I've missed anything! Hope you find this useful and make sure to keep learning ๐
You can follow me on Twitter at: @bobbyiliev_
Top comments (4)
Bobby, you never fail to impress! Amazing post! ๐
Thank you Ruan! Really appreciate this ๐
Really excellent content!
Thanks
Thank you! Really appreciate this!