Taking user code and converting it into a logical plan is the first phase of execution. The logical plan only converts the user’s set of expressions into the most optimized version.
It does this by converting the user code into an unresolved logical plan. This logical plan is unresolved because although your code might be valid, the tables or columns that it refers to may or may not exist. Spark uses a repository of all table, catalog and DataFrame information to resolve columns and tables in the analyzer. The analyzer may reject the unresolved logical plan if the required table or column name does not exist in the catalog. If the analyzer is able to resolve it, the result is passed through the Catalyst Optimizer.
The Packages can extend Catalyst to include their own rules for domain specific optimizations.
Top comments (0)