This content is all about what is needed to pass the Databricks: Spark Associate Developer
exam.
Books
Spark: The Definitive Guide
Learning Spark: 2nd Edition
The Data Engineering's Guide to Apache Spark
Lectures
Youtube
Advanced Apache Spark Training - Sameer Farooqui (Databricks)
Apache Spark CoreโDeep Dive
Udemy
Apache Spark 3 - Beyond Basics
Apache Spark 3 - Databricks Certified
Exams
Databricks Apache Spark 3.0 Dev Certification - Tests(Scala)
Databricks Certified Apache Spark 3.0 TESTS (Scala & Python)
Databricks Certified Developer for Spark 3.0 Practice Exams
PDF Exams
Databricks Certified Developer for Spark 3.0 Practice Exams
PDF Exams
More Demo Dumps
Topics touched on the exam
- When does a Spark application fail? (when executor fails, when driver fails, when data is not fully cached, etc.)
- What is the most granular unit in the Spark hierarchy? (jobs, stages, tasks, etc.)
- What does NOT help in optimizing a Spark application? (related to partitions, column merging, etc.)
- What happens if there are more slots than tasks to process in a worker node? (resources are not fully utilized, etc.)
- What is a task? (a unit of work that can fit into an executor, a unit of work that can fit into a machine, etc.)
- What is a job?
- What is the difference between actions and transformations?
- Which one of Dataset API methods is most likely to invoke a shuffle? (union, groupBy, filter, etc.)
- How many % of the following code will cache the dataframe? (a .show() is called on a Scala range)
- How many jobs will the following code create? (a dataframe reading and schema infering)
- A wide partitions exchanges data between which units? (partitions, executors, clusters, etc.)
- We want to generate 25 partitions after a join, what is the right configuration to use?
- What are valid Spark deployment modes? (YARN, Local, Standalone, etc.)
- Which of the options helps garbage collecting? (increasing java heap space, serialization or deserialization, etc.)
- Dataset API Questions
- Split function
- Explode function
- Joins (inner, left, crossJoin and anti)
- Renaming column
- Overwriting column
- Filtering with multiple conditions
- Using where vs using filter difference
- Date and time manipulation (to and from unix, formatting, etc.)
- Sorting asc and desc with and without nulls
- Literals
- Repartition and coalesce (more than 2 questions)
- UDFs
- Aggregate functions (dense rank and rank)
- Printing schema
- Finding transformations and actions
- Collecting a dataset, extracting values and casting
- Casting columns of a dataset
- Dataset Reading and Writing
- Reading a raw CSV file
- Reading a CSV file with schema and with separators
- Read and write modes
- Writing and overwriting a parquet
- Partitioning by a column and writing
Do not rely on documentation online!
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.