DEV Community

Cover image for Spark Journey begins...

Posted on • Updated on

Spark Journey begins...

As an engineer with several years of experience in Backend and Frontend projects it feels like the next natural step is big data challenges.
In the big data world I expect to find computing, IO and scaling challenges not usually found in ordinary/plain/textbook architectures.

I decided that Spark is the best way to get started. Specifically - the Databricks certification, which is focused on Spark programming and architecture.

My game plan to pass the Databricks spark certification is to:

  1. Read "Learning Spark Lightning fast big data analysis" book and work through all the examples + summarising important insights and lessons so I can repeat those later.
  2. Go over the skeletons of Databricks Developer course that I found on GitHub from 15 months ago. Should be pretty updated - +
  3. Going through example questions.

Please, If you can advice on any source of preparation - write in the comments it will help me.

I will update as I go for others (and myself).

Learning Schedule


Reading throughly the book "Learning Spark Lightning-fast..."
I think it's reasonable to go through 2 chapters per week.
this means: reading, summarizing and running important code snippets on my own.

Week 1
Chapter 3
Chapter 4

Week 2
Chapter 5
Chapter 6

Week 3
Chapter 7
Chapter 8

Week 4
Chapter 9
Chapter 10

Week 5
Chapter 11 - Quick read it's not that important

Hands on coding

Basics (4 notebooks)

Advanced topics (10 notebooks)

Windows (4 notebook)

UDF (3 notebooks)

Spark execution(1 notebooks)

Caching (3 notebooks)

Pivoting (1 notebook)

total 26 notebooks
I hope to do 3-4 notebooks per week (some will be easy some harder, so taking the average). This will result in 8 weeks of going through the notebooks. Learning what I'm missing etc.

Everything should take 3 months until I'm ready for the exam.

Books PDFs

Learning Spark: Lightning-Fast Big Data Analysis
First Edition
Second Edition

Spark: The Definitive Guide: Big Data Processing Made Simple

Spark in Action

Top comments (0)