<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Labinot Vila</title>
    <description>The latest articles on DEV Community by Labinot Vila (@labinotvila).</description>
    <link>https://dev.to/labinotvila</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1350222%2F6d93f849-21ff-488c-a956-a3a0c2a0b985.png</url>
      <title>DEV Community: Labinot Vila</title>
      <link>https://dev.to/labinotvila</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/labinotvila"/>
    <language>en</language>
    <item>
      <title>Spark Associate Developer Certification Guide</title>
      <dc:creator>Labinot Vila</dc:creator>
      <pubDate>Tue, 19 Mar 2024 01:13:42 +0000</pubDate>
      <link>https://dev.to/labinotvila/spark-associate-developer-certification-guide-5fgj</link>
      <guid>https://dev.to/labinotvila/spark-associate-developer-certification-guide-5fgj</guid>
      <description>&lt;p&gt;This content is all about what is needed to pass the &lt;code&gt;Databricks: Spark Associate Developer&lt;/code&gt; exam.&lt;/p&gt;

&lt;h4&gt;
  
  
  Books
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.oreilly.com/library/view/spark-the-definitive/9781491912201/"&gt;Spark: The Definitive Guide&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/"&gt;Learning Spark: 2nd Edition&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.databricks.com/resources/ebook/the-data-engineers-guide-to-apache-spark-and-delta-lake"&gt;The Data Engineering's Guide to Apache Spark&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Lectures
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Youtube
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=7ooZ4S7Ay6Y&amp;amp;ab_channel=SparkSummit"&gt;Advanced Apache Spark Training - Sameer Farooqui (Databricks)&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=daXEp4HmS-E&amp;amp;t=7s&amp;amp;ab_channel=Databricks"&gt;Apache Spark Core—Deep Dive&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Udemy
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://www.udemy.com/course/apache-spark-3-beyond-basics/?couponCode=KEEPLEARNING"&gt;Apache Spark 3 - Beyond Basics&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.udemy.com/course/apache-spark-3-databricks-certified-associate-developer/?couponCode=KEEPLEARNING"&gt;Apache Spark 3 - Databricks Certified&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Exams
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.udemy.com/course/databricks-apache-spark-dev-certification-tests-scala/?couponCode=KEEPLEARNING"&gt;Databricks Apache Spark 3.0 Dev Certification - Tests(Scala)&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.udemy.com/course/databricks-certified-apache-spark-3-tests-scala-python/?couponCode=KEEPLEARNING"&gt;Databricks Certified Apache Spark 3.0 TESTS (Scala &amp;amp; Python)&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.udemy.com/course/databricks-certified-developer-for-apache-spark-30-practice-exams/?couponCode=KEEPLEARNING"&gt;Databricks Certified Developer for Spark 3.0 Practice Exams&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  PDF Exams
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DCADAS3-Python.pdf"&gt;Databricks Certified Developer for Spark 3.0 Practice Exams&lt;br&gt;
PDF Exams&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.dumpsbase.com/freedumps/2022-real-databricks-certified-associate-developer-for-apache-spark-3-0-exam-dumps-dumpsbase.html"&gt;More Demo Dumps&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Topics touched on the exam
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;When does a Spark application fail? (when executor fails, when driver fails, when data is not fully cached, etc.)&lt;/li&gt;
&lt;li&gt;What is the most granular unit in the Spark hierarchy? (jobs, stages, tasks, etc.)&lt;/li&gt;
&lt;li&gt;What does NOT help in optimizing a Spark application? (related to partitions, column merging, etc.)&lt;/li&gt;
&lt;li&gt;What happens if there are more slots than tasks to process in a worker node? (resources are not fully utilized, etc.)&lt;/li&gt;
&lt;li&gt;What is a task? (a unit of work that can fit into an executor, a unit of work that can fit into a machine, etc.)&lt;/li&gt;
&lt;li&gt;What is a job?&lt;/li&gt;
&lt;li&gt;What is the difference between actions and transformations?&lt;/li&gt;
&lt;li&gt;Which one of Dataset API methods is most likely to invoke a shuffle? (union, groupBy, filter, etc.)&lt;/li&gt;
&lt;li&gt;How many % of the following code will cache the dataframe? (a .show() is called on a Scala range)&lt;/li&gt;
&lt;li&gt;How many jobs will the following code create? (a dataframe reading and schema infering)&lt;/li&gt;
&lt;li&gt;A wide partitions exchanges data between which units? (partitions, executors, clusters, etc.)&lt;/li&gt;
&lt;li&gt;We want to generate 25 partitions after a join, what is the right configuration to use?&lt;/li&gt;
&lt;li&gt;What are valid Spark deployment modes? (YARN, Local, Standalone, etc.)&lt;/li&gt;
&lt;li&gt;Which of the options helps garbage collecting? (increasing java heap space, serialization or deserialization, etc.)&lt;/li&gt;
&lt;li&gt;Dataset API Questions&lt;/li&gt;
&lt;li&gt;Split function&lt;/li&gt;
&lt;li&gt;Explode function&lt;/li&gt;
&lt;li&gt;Joins (inner, left, crossJoin and anti)&lt;/li&gt;
&lt;li&gt;Renaming column&lt;/li&gt;
&lt;li&gt;Overwriting column&lt;/li&gt;
&lt;li&gt;Filtering with multiple conditions&lt;/li&gt;
&lt;li&gt;Using where vs using filter difference&lt;/li&gt;
&lt;li&gt;Date and time manipulation (to and from unix, formatting, etc.)&lt;/li&gt;
&lt;li&gt;Sorting asc and desc with and without nulls&lt;/li&gt;
&lt;li&gt;Literals&lt;/li&gt;
&lt;li&gt;Repartition and coalesce (more than 2 questions)&lt;/li&gt;
&lt;li&gt;UDFs&lt;/li&gt;
&lt;li&gt;Aggregate functions (dense rank and rank)&lt;/li&gt;
&lt;li&gt;Printing schema&lt;/li&gt;
&lt;li&gt;Finding transformations and actions&lt;/li&gt;
&lt;li&gt;Collecting a dataset, extracting values and casting&lt;/li&gt;
&lt;li&gt;Casting columns of a dataset&lt;/li&gt;
&lt;li&gt;Dataset Reading and Writing&lt;/li&gt;
&lt;li&gt;Reading a raw CSV file&lt;/li&gt;
&lt;li&gt;Reading a CSV file with schema and with separators&lt;/li&gt;
&lt;li&gt;Read and write modes&lt;/li&gt;
&lt;li&gt;Writing and overwriting a parquet&lt;/li&gt;
&lt;li&gt;Partitioning by a column and writing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Do not rely on documentation online!&lt;/p&gt;

</description>
      <category>spark</category>
      <category>certification</category>
      <category>databricks</category>
      <category>developer</category>
    </item>
  </channel>
</rss>
