As an engineer with several years of experience in Backend and Frontend projects it feels like the next natural step is big data challenges.
In the big data world I expect to find computing, IO and scaling challenges not usually found in ordinary/plain/textbook architectures.
I decided that Spark is the best way to get started. Specifically - the Databricks certification, which is focused on Spark programming and architecture.
My game plan to pass the Databricks spark certification is to:
- Read "Learning Spark Lightning fast big data analysis" book and work through all the examples + summarising important insights and lessons so I can repeat those later.
- Go over the skeletons of Databricks Developer course that I found on GitHub from 15 months ago. Should be pretty updated - https://github.com/vivek-bombatkar/spark-training + https://github.com/vivek-bombatkar/Spark-with-Python---My-learning-notes-
- Going through example questions.
Please, If you can advice on any source of preparation - write in the comments it will help me.
I will update as I go for others (and myself).
Learning Schedule
Theory
Reading throughly the book "Learning Spark Lightning-fast..."
I think it's reasonable to go through 2 chapters per week.
this means: reading, summarizing and running important code snippets on my own.
Week 1
Chapter 3
Chapter 4
Week 2
Chapter 5
Chapter 6
Week 3
Chapter 7
Chapter 8
Week 4
Chapter 9
Chapter 10
Week 5
Chapter 11 - Quick read it's not that important
Hands on coding
Basics (4 notebooks)
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-pyspark
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-weather-df
Advanced topics (10 notebooks)
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-advanced
Windows (4 notebook)
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-windows
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-advanced-windows
UDF (3 notebooks)
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-advanced-udf
Spark execution(1 notebooks)
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-advanced-execution
Caching (3 notebooks)
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-advanced-caching
Pivoting (1 notebook)
https://github.com/vivek-bombatkar/spark-training/tree/master/spark-python/jupyter-advanced-pivoting
total 26 notebooks
I hope to do 3-4 notebooks per week (some will be easy some harder, so taking the average). This will result in 8 weeks of going through the notebooks. Learning what I'm missing etc.
Everything should take 3 months until I'm ready for the exam.
Books PDFs
Learning Spark: Lightning-Fast Big Data Analysis
First Edition
https://b-ok.asia/book/2493162/9b8d4f?dsource=recommend
Second Edition
https://laptrinhx.com/learning-spark-lightning-fast-data-analytics-2nd-edition-436517903/
Spark: The Definitive Guide: Big Data Processing Made Simple
https://b-ok.asia/book/3505368/f04c83?regionChanged
Spark in Action
https://b-ok.asia/book/3502170/d3383b
Top comments (0)