DEV Community

Cover image for DAY 10 - Query Optimization & Explain Plans
Subhasis Das
Subhasis Das

Posted on

DAY 10 - Query Optimization & Explain Plans

Day 10 of Phase 2 focused on Query Optimization & Execution Analysis in Spark.

Visual Concept

The Objective was to run a heavy Analytical Query on the Event Dataset, inspect its Execution Plan, and Analyze how Query Design affects Performance. A Purchase Aggregation Query was executed to identify the Most Active Buyers in the Dataset.

Notebook

Using Sparkโ€™s EXPLAIN Functionality, the Parsed, Analyzed, Optimized, & Physical Execution Plans were examined. The Physical Plan revealed Stages such as Photon Scans, Hash Aggregation, Shuffle Exchanges, and Sorting Operations.

Notebook

Execution Timing demonstrated the effect of Query Complexity. The Aggregation Query executed in approximately 2.20 seconds. A Simplified Projection Query that removed Aggregation and Sorting reduced Execution Time to approximately 1.41 seconds.

Notebook

Caching was attempted as part of the Optimization Workflow, but Serverless Compute Restrictions prevented Persistence Operations. As a result, Optimization was demonstrated through Query Simplification and Explain-Plan Interpretation instead.

Notebook

During the process, ChatGPT assisted with Explain-Plan interpretation and Query Optimization Reasoning within Databricks.

Codes

Activity Log

Top comments (0)