<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vladimir Ozerov</title>
    <description>The latest articles on DEV Community by Vladimir Ozerov (@devozerov).</description>
    <link>https://dev.to/devozerov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F553093%2F73f1c6ff-f6f3-4a33-a89a-7a728ac8cee1.jpg</url>
      <title>DEV Community: Vladimir Ozerov</title>
      <link>https://dev.to/devozerov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/devozerov"/>
    <language>en</language>
    <item>
      <title>Memoization in Cost-based Optimizers</title>
      <dc:creator>Vladimir Ozerov</dc:creator>
      <pubDate>Wed, 09 Jun 2021 19:55:15 +0000</pubDate>
      <link>https://dev.to/devozerov/memoization-in-cost-based-optimizers-5b8a</link>
      <guid>https://dev.to/devozerov/memoization-in-cost-based-optimizers-5b8a</guid>
      <description>&lt;p&gt;Query optimization is an expensive process that needs to explore multiple alternative ways to execute the query. The query optimization problem is &lt;a href="https://core.ac.uk/download/pdf/21271492.pdf"&gt;NP-hard&lt;/a&gt;, and the number of possible plans grows exponentially with the query's complexity. For example, a typical &lt;a href="http://www.tpc.org/tpch/"&gt;TPC-H&lt;/a&gt; query may have up to several thousand possible join orders, 2-3 algorithms per join, a couple of access methods per table, some filter/aggregate pushdown alternatives, etc. Combined, this could quickly explode the search space to millions of alternative plans.&lt;/p&gt;

&lt;p&gt;This blog post will discuss &lt;strong&gt;memoization&lt;/strong&gt; - an important technique that allows cost-based optimizers to consider billions of alternative plans in a reasonable time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Naïve Approach
&lt;/h2&gt;

&lt;p&gt;Consider that we are designing a rule-based optimizer. We want to apply a rule to a relational operator tree and produce another tree. If we insert a new operator in the middle of the tree, we need to update the parent to point to the new operator. Once we've changed the parent, we may need to change the parent of the parent, etc. If your operators are immutable by design or used by other parts of the program, you may need to copy large parts of the tree to create a new plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--M61dwHs9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065b4d9ce4c7f60b6b80638_memo-01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--M61dwHs9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065b4d9ce4c7f60b6b80638_memo-01.png" alt="alternatives"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach is wasteful because you need to propagate changes to parents over and over again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Indirection
&lt;/h2&gt;

&lt;p&gt;We may solve the problem with change propagation by applying an additional layer of indirection. Let us introduce a new surrogate operator that will store a &lt;strong&gt;reference&lt;/strong&gt; to a child operator. Before starting the optimization, we may traverse the initial relational tree and create copy of operators, where all concrete inputs are replaced with references.&lt;/p&gt;

&lt;p&gt;When applying a transformation, we may only change a reference without updating other parts of the tree. When the optimization is over, we remove the references and reconstruct the final tree.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--c3SaC8V2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065b1551fda64a7d2555d8b_memo-02.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--c3SaC8V2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065b1551fda64a7d2555d8b_memo-02.png" alt="references"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You may find a similar design in many production-grade heuristic optimizers. In our previous &lt;a href="https://www.querifylabs.com/blog/the-architecture-of-presto-optimizer-part-1"&gt;blog post&lt;/a&gt; about &lt;strong&gt;Presto&lt;/strong&gt;, we discussed the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Memo.java"&gt;Memo&lt;/a&gt; class that manages such references. In &lt;strong&gt;Apache Calcite&lt;/strong&gt;, the heuristic optimizer &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java"&gt;HepPlanner&lt;/a&gt; models node references through the class &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/plan/hep/HepRelVertex.java"&gt;HepRelVertex&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We realized how references might help us minimize change propagation overhead. But in a cost-based optimization, we need to consider multiple alternative plans at the same time. We need to go deeper.&lt;/p&gt;

&lt;h2&gt;
  
  
  MEMO
&lt;/h2&gt;

&lt;p&gt;In cost-based optimization, we need to generate multiple equivalent operators, link them together, and find the cheapest path to the root.&lt;/p&gt;

&lt;p&gt;Two relational operators are &lt;strong&gt;equivalent&lt;/strong&gt; if they generate the same result set on every legal database instance. How can we encode equivalent operators efficiently? Let's extend our references to point to multiple operators! We will refer to such a surrogate node as a &lt;strong&gt;group&lt;/strong&gt;, which is a collection of equivalent operators.&lt;/p&gt;

&lt;p&gt;We start the optimization by creating equivalence groups for existing operators and replacing concrete inputs with relevant groups. At this point, the process is similar to our previous approach with references.&lt;/p&gt;

&lt;p&gt;When a rule is applied to operator &lt;strong&gt;A&lt;/strong&gt;, and a new equivalent operator &lt;strong&gt;B&lt;/strong&gt; is produced, we add &lt;strong&gt;B&lt;/strong&gt; to &lt;strong&gt;A&lt;/strong&gt;'s equivalence group. The collection of groups that we consider during optimization is called &lt;strong&gt;MEMO&lt;/strong&gt;. The process of maintaining a MEMO is called &lt;strong&gt;memoization&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CHXan7Kc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065be2f874bac04397aca27_memo-03.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CHXan7Kc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065be2f874bac04397aca27_memo-03.png" alt="memo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MEMO is a variation of the &lt;a href="https://anilshanbhag.in/static/papers/rsgraph_vldb14.pdf"&gt;AND/OR graph&lt;/a&gt;. Operators are AND-nodes representing the subgoals of the query (e.g., applying a filter). Groups are OR-nodes, representing the alternative subgoals that could be used to achieve the parent goal (e.g., do a table scan or an index scan).&lt;/p&gt;

&lt;p&gt;When all interesting operators are generated, the MEMO is said to be explored. We now need to extract the &lt;strong&gt;cheapest plan&lt;/strong&gt; from it, which is the ultimate goal of cost-based optimization. To do this, we first assign costs to individual operators via the cost function. Then we traverse the graph bottom-up and select the cheapest operator from each group (often referred to as "winner"), combining costs of individual operators with costs of their inputs.&lt;/p&gt;

&lt;p&gt;Practical optimizers often maintain groups' winners up-to-date during the optimization to allow for search space &lt;strong&gt;pruning&lt;/strong&gt;, which we will discuss in future blog posts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HZxB78xR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065ce8aa449b40540d1253a_memo-04.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HZxB78xR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065ce8aa449b40540d1253a_memo-04.png" alt="best plan"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When the root group's cheapest operator is resolved, we construct the final plan through a top-down traverse across every group's cheapest operators.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4T0_gjXQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065ce9ae0bc9c1de43ee87c_memo-05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4T0_gjXQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/6065ce9ae0bc9c1de43ee87c_memo-05.png" alt="best plan 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Memoization is very efficient because it allows for the &lt;strong&gt;deduplication&lt;/strong&gt; of nodes, eliminating unnecessary work. Consider a query that has five joins. The total number of unique join orders for such a query is 30240. If we decide to create a new plan for every join order, we would need to instantiate 30240 * 5 = 151200 join operators. With memoization, you only need 602 join operators to encode the same search space - a dramatic improvement!&lt;/p&gt;

&lt;p&gt;The memoization idea is simple. Practical implementations of MEMO are much more involved. You need to design operator equivalence carefully, decide how to do the deduplication, manage the operator's physical properties (such as sort order), track already executed optimization rules, etc. We will cover some of these topics in future blog posts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Memoization is an efficient technique that allows you to encode the large search space in a very compact form and eliminate duplicate work. MEMO data structure routinely backs modern cost-based rule-based optimizers.&lt;/p&gt;

&lt;p&gt;In future posts, we will discuss the design of MEMO in practical cost-based optimizers. Stay tuned!&lt;/p&gt;

&lt;p&gt;We are always ready to help you with your query optimizer design. Just &lt;a href="https://www.querifylabs.com/#contact-form"&gt;let us know&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>calcite</category>
    </item>
    <item>
      <title>Rule-based Query Optimization</title>
      <dc:creator>Vladimir Ozerov</dc:creator>
      <pubDate>Sat, 08 May 2021 07:36:46 +0000</pubDate>
      <link>https://dev.to/devozerov/rule-based-query-optimization-4jnn</link>
      <guid>https://dev.to/devozerov/rule-based-query-optimization-4jnn</guid>
      <description>&lt;p&gt;The goal of the query optimizer is to find the query execution plan that computes the requested result efficiently. In this blog post, we discuss rule-based optimization - a common pattern to explore equivalent plans used by modern optimizers. Then we explore the implementation of several state-of-the-art rule-based optimizers. Then we analyze the rule-based optimization in Apache Calcite, Presto, and CockroachDB.&lt;/p&gt;

&lt;h1&gt;
  
  
  Transformations
&lt;/h1&gt;

&lt;p&gt;A query optimizer must explore the space &lt;strong&gt;equivalent&lt;/strong&gt; execution plans and pick the optimal one. Intuitively, plan B is equivalent to plan A if it produces the same result for all possible inputs.&lt;/p&gt;

&lt;p&gt;To generate the equivalent execution plans, we may apply one or more &lt;strong&gt;transformations&lt;/strong&gt; to the original plan. A transformation accepts one plan and produces zero, one, or more equivalent plans. As a query engine developer, you may implement hundreds of different transformations to generate a sufficient number of equivalent plans.&lt;/p&gt;

&lt;p&gt;Some transformations operate on bigger parts of the plan or even the whole plan. For example, an implementation of the join order selection with dynamic programming may enumerate all joins in the plan, generate alternative join sequences, and pick the best one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963ba5eb259ef7ebab78e4_blog04-dp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963ba5eb259ef7ebab78e4_blog04-dp.png" alt="Dynamic Programming"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Other transformations could be relatively isolated. Consider the transformation that pushes the filter operator past the aggregate operator. It works on an isolated part of the tree and doesn't require a global context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963bb7dc99fc0d65d4ebbf_blog04-filter-push.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963bb7dc99fc0d65d4ebbf_blog04-filter-push.png" alt="Filter Pushdown"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Rules
&lt;/h1&gt;

&lt;p&gt;Every optimizer follows some algorithm that decides when to apply particular transformations and how to process the newly created equivalent plans. As the number of transformations grows, it becomes not very convenient to hold them in a monolithic routine. Imagine a large if-else block of code that decides how to apply a hundred transformations to several dozens of relational operators.&lt;/p&gt;

&lt;p&gt;To facilitate your engine's evolution, you may want to abstract out some of your transformations behind a common interface. For every transformation, you may define a pattern that defines whether we can apply the transformation to the given part of the plan. A pair of a pattern and a transformation is called a &lt;strong&gt;rule&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The rule abstraction allows you to split the optimization logic into pluggable parts that evolve independently of each other, significantly simplifying the development of the optimizer. The optimizer that uses rules to generate the equivalent plans is called a &lt;strong&gt;rule-based optimizer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963bc8dc99fc4d98d4ec35_blog04-rules.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963bc8dc99fc4d98d4ec35_blog04-rules.png" alt="Rule-based Optimization"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice that the rules are, first of all, a pattern that helps you decompose the optimizer's codebase. The usage of rules doesn't force you to follow a specific optimization procedure, such as Volcano/Cascades. It doesn't prevent you from using particular optimization techniques, like dynamic programming for join enumeration. It doesn't require you to choose between heuristic or cost-based approaches. However, the isolated nature of rules may complicate some parts of your engine, such as join planning.&lt;/p&gt;

&lt;h1&gt;
  
  
  Examples
&lt;/h1&gt;

&lt;p&gt;Now, as we understand the idea behind the rule-based optimization, let's look at several real-world examples: Apache Calcite, Presto, and CockroachDB.&lt;/p&gt;

&lt;h2&gt;
  
  
  Apache Calcite
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://calcite.apache.org/" rel="noopener noreferrer"&gt;Apache Calcite&lt;/a&gt; is a dynamic data management framework. At its core, Apache Calcite has two rule-based optimizers and a library of transformation rules.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java" rel="noopener noreferrer"&gt;HepPlanner&lt;/a&gt; is a heuristic optimizer that applies rules one by one until no more transformations are possible.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java" rel="noopener noreferrer"&gt;VolcanoPlanner&lt;/a&gt; is a cost-based optimizer that generates multiple equivalent plans, put them into the MEMO data structure, and uses costs to choose the best one. The &lt;code&gt;VolcanoPlanner&lt;/code&gt; may fire rules in an arbitrary order or work in a recently introduced Cascades-like &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L256" rel="noopener noreferrer"&gt;top-down&lt;/a&gt; style.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L41" rel="noopener noreferrer"&gt;rule interface&lt;/a&gt; accepts the pattern and requires you to implement the &lt;code&gt;onMatch(context)&lt;/code&gt; method. This method doesn't return the new relational tree as one might expect. Instead, it returns &lt;code&gt;void&lt;/code&gt; but provides the ability to register new transformations in the context, which allows you to emit multiple equivalent trees from a single rule call. Apache Calcite comes with an extensive &lt;a href="https://github.com/apache/calcite/tree/branch-1.24/core/src/main/java/org/apache/calcite/rel/rules" rel="noopener noreferrer"&gt;library&lt;/a&gt; of built-in rules and allows you to add your own rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CustomRule&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;RelOptRule&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;CustomRule&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;super&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern_for_the_rule&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onMatch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RelOptRuleCall&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="n"&gt;equivalentNode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;...;&lt;/span&gt;

        &lt;span class="c1"&gt;// Register the new equivalent node in MEMO&lt;/span&gt;
        &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;transformTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;equivalentNode&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Apache Calcite, you may define one or more optimization stages. Every stage may use its own set of rules and optimizer. Many products based on Apache Calcite use multiple stages to minimize the optimization time at the cost of the possibility of producing a not optimal plan. See our previous &lt;a href="https://www.querifylabs.com/blog/assembling-a-query-optimizer-with-apache-calcite" rel="noopener noreferrer"&gt;blog post&lt;/a&gt; for more details on how to create a query optimizer with Apache Calcite.&lt;/p&gt;

&lt;p&gt;Let's take a look at a couple of rules for join planning. To explore all bushy join trees, you may use &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/rel/rules/JoinCommuteRule.java" rel="noopener noreferrer"&gt;JoinCommuteRule&lt;/a&gt; and &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/rel/rules/JoinAssociateRule.java" rel="noopener noreferrer"&gt;JoinAssociateRule&lt;/a&gt;. These rules are relatively simple and work on individual joins. The problem is that they may trigger duplicate derivations, as explained in this &lt;a href="https://dl.acm.org/doi/10.14778/2732977.2732997" rel="noopener noreferrer"&gt;paper&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963bdab571206fb3c6b90a_blog04-commute-associate.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963bdab571206fb3c6b90a_blog04-commute-associate.png" alt="Join Commute and Associate Rules"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alternatively, Apache Calcite may use a set of rules that convert multiple joins into a single &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/rel/rules/MultiJoin.java" rel="noopener noreferrer"&gt;n-way join&lt;/a&gt; and then apply a heuristic algorithm to produce a single optimized join order from the n-way join. This is an example of the rule, that works on a large part of the tree, rather than individual operators. You may use a similar approach to implement the rule to do the join planning with dynamic programming.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963be3ec20c2754457d5d0_blog04-calcite-multijoin.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fe5c475cb3c75040200bfe6%2F60963be3ec20c2754457d5d0_blog04-calcite-multijoin.png" alt="Multi-join Rule"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Apache Calcite example demonstrates that the rule-based optimization could be used with both heuristic and cost-based exploration strategies, as well as for complex join planning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Presto
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://prestodb.io/" rel="noopener noreferrer"&gt;Presto&lt;/a&gt; is a distributed query engine for big data. Like Apache Calcite, it uses rules to perform transformations. However, Presto doesn't have a cost-based search algorithm and relies only on heuristics when transitioning between optimization steps. See our &lt;a href="https://www.querifylabs.com/blog/the-architecture-of-presto-optimizer-part-1" rel="noopener noreferrer"&gt;previous blog&lt;/a&gt; for more details on Presto query optimizer.&lt;/p&gt;

&lt;p&gt;As Presto cannot explore multiple equivalent plans at once, it has a simpler &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Rule.java" rel="noopener noreferrer"&gt;rule interface&lt;/a&gt; that produces no more than one new equivalent tree.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;interface Rule {
    Pattern getPattern();
    Result apply(T node, ...);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Presto also has several rules that use costs internally to explore multiple alternatives in a rule call scope. An example is a (relatively) recently introduced &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ReorderJoins.java#L93" rel="noopener noreferrer"&gt;ReorderJoins&lt;/a&gt; rule. Similar to the above-mentioned Apache Calcite's n-way join rules, the &lt;code&gt;ReorderJoins&lt;/code&gt; rule first converts a sequence of joins into a single n-way join. Then the rule enumerates equivalent joins orders and picks the one with the least cost (unlike Apache Calcite's &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/rel/rules/LoptOptimizeJoinRule.java" rel="noopener noreferrer"&gt;LoptOptimizerJoinRule&lt;/a&gt;, which uses heuristics).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ReorderJoins&lt;/code&gt; rule is of particular interest because it demonstrates how we may use rule-based optimization to combine heuristic and cost-based search strategies in the same optimizer.&lt;/p&gt;

&lt;h2&gt;
  
  
  CockroachDB
&lt;/h2&gt;

&lt;p&gt;CockroachDB is a cloud-native SQL database for modern cloud applications. It has a rule-based Cascades-style query &lt;a href="https://www.cockroachlabs.com/blog/building-cost-based-sql-optimizer/" rel="noopener noreferrer"&gt;optimizer&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Unlike Apache Calcite and Presto, Cockroach doesn't have a common rule interface. Instead, it uses a custom DSL to define the rule's pattern and transformation logic. The code &lt;a href="https://github.com/cockroachdb/cockroach/tree/release-20.2/pkg/sql/opt/optgen" rel="noopener noreferrer"&gt;generator&lt;/a&gt; analyzes the DSL files and produces a monolithic optimization routine. The code generation may allow for a faster optimizer's code because it avoids virtual calls when calling rules.&lt;/p&gt;

&lt;p&gt;Below is an &lt;a href="https://github.com/cockroachdb/cockroach/blob/release-20.2/pkg/sql/opt/xform/rules/groupby.opt#L110" rel="noopener noreferrer"&gt;example&lt;/a&gt; of a rule definition that attempts to generate a streaming aggregate. Notice that you do not need to write the whole rule logic using DSL only. Instead, you may reference utility methods written in Go (which is CockroachDB primary language) from within the rule to minimize the amount of DSL-specific code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[GenerateStreamingGroupBy, Explore]
(GroupBy | DistinctOn | EnsureDistinctOn | UpsertDistinctOn
        | EnsureUpsertDistinctOn
    $input:*
    $aggs:*
    $private:* &amp;amp; (IsCanonicalGroupBy $private)
)
=&amp;gt;
(GenerateStreamingGroupBy (OpName) $input $aggs $private)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two rule types in CockroachDB. The &lt;a href="https://github.com/cockroachdb/cockroach/tree/release-20.2/pkg/sql/opt/norm/rules" rel="noopener noreferrer"&gt;normalization rules&lt;/a&gt; convert relational operators into canonical forms before being inserted into a MEMO, simplifying the subsequent optimization. An example is a &lt;a href="https://github.com/cockroachdb/cockroach/blob/release-20.2/pkg/sql/opt/norm/rules/bool.opt#L6" rel="noopener noreferrer"&gt;NormalizeNestedAnds&lt;/a&gt; rule that normalizes &lt;code&gt;AND&lt;/code&gt; expressions into a left-deep tree. The normalization is performed via a sequential invocation of normalization rules. The second category is &lt;a href="https://github.com/cockroachdb/cockroach/tree/release-20.2/pkg/sql/opt/xform/rules" rel="noopener noreferrer"&gt;exploration&lt;/a&gt; rules, which generate multiple equivalent plans. The exploration rules are invoked using the cost-based Cascades-like top-down optimization strategy with memorization.&lt;/p&gt;

&lt;p&gt;CockroachDB has a &lt;a href="https://github.com/cockroachdb/cockroach/blob/release-20.2/pkg/sql/opt/xform/rules/join.opt#L11" rel="noopener noreferrer"&gt;ReorderJoins&lt;/a&gt; rule to do the join planning. The rule uses a variation of the dynamic programming algorithm described in this &lt;a href="https://dl.acm.org/doi/10.1145/2463676.2465314" rel="noopener noreferrer"&gt;paper&lt;/a&gt; to enumerate the valid join orders and add them to MEMO.&lt;/p&gt;

&lt;p&gt;Thus, CockroachDB uses rule-based optimization for heuristic normalization, cost-based exploration, and join planning with dynamic programming.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;Rule-based query optimization is a very flexible pattern that you may use when designing a query optimizer. It allows you to split the complicated transformation logic into self-contained parts, reducing the optimizer's complexity.&lt;/p&gt;

&lt;p&gt;The rule-based optimization doesn't limit you in how exactly to optimize your plans, be it bottom-up dynamic programming or top-down Cascades-style exploration, cost-based or heuristic optimization, or anything else.&lt;/p&gt;

&lt;p&gt;In future posts, we will discuss the difference between logical and physical optimization. Stay tuned!&lt;/p&gt;

</description>
      <category>sql</category>
      <category>calcite</category>
      <category>presto</category>
      <category>cockroachdb</category>
    </item>
    <item>
      <title>Inside Presto Optimizer</title>
      <dc:creator>Vladimir Ozerov</dc:creator>
      <pubDate>Tue, 20 Apr 2021 06:49:23 +0000</pubDate>
      <link>https://dev.to/devozerov/inside-presto-optimizer-2img</link>
      <guid>https://dev.to/devozerov/inside-presto-optimizer-2img</guid>
      <description>&lt;h1&gt;
  
  
  Abstract
&lt;/h1&gt;

&lt;p&gt;Presto is an open-source distributed SQL query engine for big data. Presto provides a connector API to interact with different data sources, including RDBMSs, NoSQL products, Hadoop, and stream processing systems. Created by Facebook, Presto received wide adoption by the open-source world (&lt;a href="https://github.com/prestodb"&gt;Presto&lt;/a&gt;, &lt;a href="https://github.com/trinodb"&gt;Trino&lt;/a&gt;) commercial companies (e.g., &lt;a href="https://ahana.io/"&gt;Ahana&lt;/a&gt;, &lt;a href="https://www.qubole.com/"&gt;Qubole&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Presto comes with a sophisticated query optimizer that applies various rewrites to the query plan. In this blog post series, we investigate the internals of Presto optimizer. In the first part, we discuss the optimizer interface and the design of the rule-based optimizer.&lt;/p&gt;

&lt;p&gt;Please refer to the &lt;a href="https://research.fb.com/publications/presto-sql-on-everything/"&gt;original paper&lt;/a&gt; by Facebook to get a better understanding of Presto's capabilities and design.&lt;/p&gt;

&lt;p&gt;We will use the Presto Foundation fork &lt;a href="https://github.com/prestodb/presto/tree/release-0.245"&gt;version 0.245&lt;/a&gt; for this blog post.&lt;/p&gt;

&lt;h1&gt;
  
  
  Relational Tree
&lt;/h1&gt;

&lt;p&gt;Presto optimizer works with relational operators. Similarly to other SQL optimizers, such as &lt;a href="https://www.querifylabs.com/blog/assembling-a-query-optimizer-with-apache-calcite"&gt;Apache Calcite&lt;/a&gt;, Presto performs syntax and semantic analysis of the original SQL string and then produces the logical relational tree:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The ANTLR-based &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4"&gt;parser&lt;/a&gt; converts the original query string into an abstract syntax tree (AST)&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/analyzer/Analyzer.java#L77"&gt;analyzer&lt;/a&gt; performs the semantic validation of the AST.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java"&gt;converter&lt;/a&gt; creates the logical relational tree from the AST.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every node in the tree represents a relational operation and implements a common &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-spi/src/main/java/com/facebook/presto/spi/plan/PlanNode.java"&gt;PlanNode&lt;/a&gt; interface, which exposes a unique node's ID, node's inputs, and node's output. The interface also allows traversing the tree with a visitor pattern, used extensively during the optimization. Examples of relational operations: &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-spi/src/main/java/com/facebook/presto/spi/plan/TableScanNode.java"&gt;TableScanNode&lt;/a&gt;, &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-spi/src/main/java/com/facebook/presto/spi/plan/ProjectNode.java"&gt;ProjectNode&lt;/a&gt;, &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-spi/src/main/java/com/facebook/presto/spi/plan/FilterNode.java"&gt;FilterNode&lt;/a&gt;, &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-spi/src/main/java/com/facebook/presto/spi/plan/AggregationNode.java"&gt;AggregationNode&lt;/a&gt;, &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/plan/JoinNode.java"&gt;JoinNode&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Consider the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;orderstatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;totalprice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;orderstatus&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The associated query plan might look like this:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dfyujJ3R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/607d33d2b04e03f8aad679dd_plan-example.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dfyujJ3R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/607d33d2b04e03f8aad679dd_plan-example.png" alt="Presto plan"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Optimizer Interface
&lt;/h1&gt;

&lt;p&gt;When the logical plan is ready, we can start applying optimizations to it. In Presto, there is the general &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/PlanOptimizer.java"&gt;PlanOptimizer&lt;/a&gt; interface that every optimization phase implements. The interface accepts one relational tree and produces another.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;PlanOptimizer&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;PlanNode&lt;/span&gt; &lt;span class="nf"&gt;optimize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;PlanNode&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;Session&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;TypeProvider&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;PlanVariableAllocator&lt;/span&gt; &lt;span class="n"&gt;variableAllocator&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;PlanNodeIdAllocator&lt;/span&gt; &lt;span class="n"&gt;idAllocator&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;WarningCollector&lt;/span&gt; &lt;span class="n"&gt;warningCollector&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The optimization program builder &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java#L212"&gt;PlanOptimizers&lt;/a&gt; creates a list of optimizers that are invoked sequentially on the relational tree. Optimization problems often split into several phases to keep logical and computational complexity under control. In Presto, there are more than 70 optimization phases that every relational tree will pass through.&lt;/p&gt;

&lt;p&gt;The majority of optimization phases use the rule-based optimizer that we will discuss further. Other phases rely on custom optimizers that make no use rules but apply a custom transformation logic. For example, the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/PredicatePushDown.java"&gt;PredicatePushDown&lt;/a&gt; optimizer moves filters down in the relational tree, and &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/PruneUnreferencedOutputs.java"&gt;PruneUnreferencedOutputs&lt;/a&gt; removes unused fields that could be generated during the AST conversion or the previous optimization phases. We will discuss the most important custom optimizers in the second part of this blog post series.&lt;/p&gt;

&lt;p&gt;Presto may also &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java#L620"&gt;reoptimize&lt;/a&gt; the query plan in runtime. The details of this process are out of the scope of this blog post.&lt;/p&gt;

&lt;h1&gt;
  
  
  Rule-Based Optimizer
&lt;/h1&gt;

&lt;p&gt;Presto uses the rule-based &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java"&gt;IterativeOptimizer&lt;/a&gt; for the majority of optimization phases. In rule-based optimization, you provide the relational tree and a set of pluggable optimization rules. A &lt;strong&gt;rule&lt;/strong&gt; is a self-contained code that defines the relational tree &lt;strong&gt;pattern&lt;/strong&gt; it should be applied to and the &lt;strong&gt;transformation&lt;/strong&gt; logic. The optimizer then applies the rules to the relational tree using some algorithm. The main advantage of rule-based optimizers is extensibility. Instead of having a monolithic optimization algorithm, you split the optimizer into smaller self-contained rules. To extend the optimizer, you create a new rule that doesn't affect the rest of the optimizer code. Please refer to our &lt;a href="https://www.querifylabs.com/blog/rule-based-query-optimization"&gt;blog post&lt;/a&gt; to get more details about rule-based optimization.&lt;/p&gt;

&lt;p&gt;Rule-based optimizers could be either cost-based or heuristic. In &lt;a href="https://www.querifylabs.com/blog/what-is-cost-based-optimization"&gt;cost-based optimizers&lt;/a&gt;, a particular transformation is chosen based on the estimated cost assigned to a plan. Heuristic optimizers don't use costs and could produce arbitrary bad plans in the worst case. Presto relies on a rule-based heuristic optimization, although some specific rules use costs internally to pick a single transformation from multiple alternatives. An example is the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ReorderJoins.java#L102"&gt;ReorderJoins&lt;/a&gt; rule that selects a single join order with the least cost from multiple alternatives.&lt;/p&gt;

&lt;p&gt;We now describe the most important parts of the Presto rule-based optimizer: the &lt;code&gt;Memo&lt;/code&gt; class, rule matching, and the search algorithm.&lt;/p&gt;

&lt;h2&gt;
  
  
  MEMO
&lt;/h2&gt;

&lt;p&gt;MEMO is a data structure used primarily in cost-based optimizers to encode multiple alternative plans efficiently. The main advantage of MEMO is that multiple alternative plans could be encoded in a very compact form. We discuss the design of MEMO in one of our &lt;a href="https://www.querifylabs.com/blog/memoization-in-cost-based-optimizers"&gt;blog posts&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Presto also uses a MEMO-like data structure. There is the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Memo.java"&gt;Memo&lt;/a&gt; class that stores groups. The optimizer &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java#L89"&gt;initializes&lt;/a&gt; the &lt;code&gt;Memo&lt;/code&gt;, which populates groups via a recursive traversal of the relational tree. However, every group in &lt;code&gt;Memo&lt;/code&gt; may have only &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Memo.java#L254"&gt;one operator&lt;/a&gt;. That is, Presto doesn't store multiple equivalent operators in a group. Instead, as we will see below, Presto unconditionally replaces the current operator with the transformed operator. Therefore, the &lt;code&gt;Memo&lt;/code&gt; class in Presto is not a MEMO data structure in a classical sense because it doesn't track equivalent operators. In Presto, you may think of the group as a convenient wrapper over an operator, used mostly to track operators' reachability during the optimization process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CcxjsuJz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/607d35a463c67eb5fef29ffd_memo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CcxjsuJz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://uploads-ssl.webflow.com/5fe5c475cb3c75040200bfe6/607d35a463c67eb5fef29ffd_memo.png" alt="Presto MEMO"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule Matching
&lt;/h2&gt;

&lt;p&gt;To optimize the relational tree, you should provide the optimizer with one or more rules. Every rule in Presto implements the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Rule.java"&gt;Rule&lt;/a&gt; interface.&lt;/p&gt;

&lt;p&gt;First, the interface defines the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Rule.java#L35"&gt;pattern&lt;/a&gt;, which may target an arbitrary part of the tree. It could be a single operator (filter in the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PruneFilterColumns.java#L37"&gt;PruneFilterColumns&lt;/a&gt; rule), multiple operators (filter on top of the filter in the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MergeFilters.java#L34"&gt;MergeFilters&lt;/a&gt; rule), an operator with a predicate (join pattern in the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ReorderJoins.java#L114"&gt;ReorderJoins&lt;/a&gt; rule), or anything else.&lt;/p&gt;

&lt;p&gt;Second, the interface defines the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Rule.java#L42"&gt;transformation logic&lt;/a&gt;. The result of the transformation could be either a new operator that replaces the previous one or no-op if the rule failed to apply the transformation for whatever reason.&lt;/p&gt;

&lt;h2&gt;
  
  
  Search Algorithm
&lt;/h2&gt;

&lt;p&gt;Now, as we understand the Presto rule-based optimizer's core concepts, let's take a look at the search algorithm.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;Memo&lt;/code&gt; class is &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java#L89"&gt;initialized&lt;/a&gt; with the original relational tree, as we discussed above.&lt;/li&gt;
&lt;li&gt;For every &lt;code&gt;Memo&lt;/code&gt; group, starting with the root, the method &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java#L102"&gt;exploreGroup&lt;/a&gt; is invoked. We look for rules that &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java#L133"&gt;match&lt;/a&gt; the current operator and fire them. If a rule produces an alternative operator, it &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java#L144"&gt;replaces&lt;/a&gt; the original operator unconditionally. The process continues until there are no more available transformations for the current operator. Then we optimize operators' inputs. If an alternative input is found, it may open up more optimizations for the parent operator, so we &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java#L108"&gt;reoptimize&lt;/a&gt; the parent. Presto relies on timeouts to terminate the optimization process if some rules continuously replace each other's results. Think of &lt;code&gt;b JOIN a&lt;/code&gt;, that replaces &lt;code&gt;a JOIN b&lt;/code&gt;, that replaces &lt;code&gt;b JOIN a&lt;/code&gt;, etc. You may run the &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/test/java/com/facebook/presto/sql/planner/iterative/TestIterativeOptimizer.java"&gt;TestIterativeOptimizer&lt;/a&gt; test to see this behavior in action.&lt;/li&gt;
&lt;li&gt;In the end, we &lt;a href="https://github.com/prestodb/presto/blob/release-0.245/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java#L99"&gt;extract&lt;/a&gt; the final plan from &lt;code&gt;Memo&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is it. The search algorithm is very simple and straightforward.&lt;/p&gt;

&lt;p&gt;The main drawback is that the optimizer is heuristic and cannot consider multiple alternative plans concurrently. That is, at every point in time, Presto has only one plan that it may transform further. In the &lt;a href="https://research.fb.com/publications/presto-sql-on-everything/"&gt;original paper&lt;/a&gt; from 2019, Facebook engineers mentioned that they explore an option to add a cost-based optimizer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We are in the process of enhancing the optimizer to perform a more comprehensive exploration of the search space using a cost-based evaluation of plans based on the techniques introduced by the Cascades framework.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is also a &lt;a href="https://github.com/prestodb/presto/wiki/New-Optimizer"&gt;document&lt;/a&gt; dated back to 2017 with some design ideas around cost-based optimization.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;In this blog post, we explored the design of the Presto optimizer. The optimization process is split into multiple sequential phases. Every phase accepts a relational tree and produces another relational tree. Most phases use a rule-based heuristic optimizer, while some rules rely on custom logic without rules. There were some thoughts to add the cost-based optimizer to Presto, but it hasn't happened yet.&lt;/p&gt;

&lt;p&gt;In the second part of this series, we will explore the concrete optimization rules and custom phases of Presto's query optimization. Stay tuned!&lt;/p&gt;

&lt;p&gt;We are always ready to help you with your SQL query optimizer design. Just &lt;a href="https://www.querifylabs.com/#contact-form"&gt;let us know&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>presto</category>
      <category>sql</category>
      <category>database</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Custom traits in Apache Calcite</title>
      <dc:creator>Vladimir Ozerov</dc:creator>
      <pubDate>Fri, 16 Apr 2021 15:10:15 +0000</pubDate>
      <link>https://dev.to/devozerov/custom-traits-in-apache-calcite-32bm</link>
      <guid>https://dev.to/devozerov/custom-traits-in-apache-calcite-32bm</guid>
      <description>&lt;h1&gt;
  
  
  Abstract
&lt;/h1&gt;

&lt;p&gt;Physical properties are an essential part of the optimization process that allows you to explore more alternative plans.&lt;/p&gt;

&lt;p&gt;Apache Calcite comes with convention and collation (sort order) properties. Many query engines require custom properties. For example, distributed and heterogeneous engines that we often see in our daily practice need to carefully plan the movement of data between machines and devices, which requires a custom property to describe data location.&lt;/p&gt;

&lt;p&gt;In this blog post, we will explore how to define, register and enforce a custom property, also known as a &lt;em&gt;trait&lt;/em&gt;, with Apache Calcite cost-based optimizer.&lt;/p&gt;

&lt;h1&gt;
  
  
  Physical Properties
&lt;/h1&gt;

&lt;p&gt;We start our journey by looking at the example of common physical property - sort order.&lt;/p&gt;

&lt;p&gt;Query optimizers work with relational operators, such as &lt;code&gt;Scan&lt;/code&gt;, &lt;code&gt;Project&lt;/code&gt;, &lt;code&gt;Filter&lt;/code&gt;, and &lt;code&gt;Join&lt;/code&gt;. During the optimization, an operator may require its input to satisfy a specific condition. To check whether the condition is satisfied, operators may expose &lt;strong&gt;physical properties&lt;/strong&gt; - plain values associated with an operator. Operators may compare the desired and actual properties of their inputs and enforce the desired property by injecting a special &lt;strong&gt;enforcer&lt;/strong&gt; operator on top of the input.&lt;/p&gt;

&lt;p&gt;Consider the join operator &lt;code&gt;t1 JOIN t2 ON t1.a = t2.b&lt;/code&gt;. We could use a merge join if both inputs are sorted on their join attributes, &lt;code&gt;t1.a&lt;/code&gt; and &lt;code&gt;t2.b&lt;/code&gt;, respectively. We may define the collation property for every operator, describing the sort order of produced rows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Join[t1.a=t2.b]
  Input[t1]      [SORTED by a]
  Input[t2]      [NOT SORTED]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The merge join operator may enforce the sorting on &lt;code&gt;t1.a&lt;/code&gt; and &lt;code&gt;t2.b&lt;/code&gt; on its inputs. Since the first input is already sorted on &lt;code&gt;t1.a&lt;/code&gt;, it remains unchanged. The second input is not sorted, so the enforcer &lt;code&gt;Sort&lt;/code&gt; operator is injected, making a merge join possible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MergeJoin[t1.a=t2.b]  
  Input[t1]           [SORTED by t1.a]
  Sort[t2.a]          [SORTED by t2.b]
    Input[t2]         [NOT SORTED]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Apache Calcite API
&lt;/h1&gt;

&lt;p&gt;In Apache Calcite, properties are defined by the &lt;code&gt;RelTrait&lt;/code&gt; and &lt;code&gt;RelTraitDef&lt;/code&gt; classes. &lt;code&gt;RelTrait&lt;/code&gt; is a concrete value of the property. RelTraitDef is a property definition, which describes the property name, expected Java class of the property, the default value of the property, and how to enforce the property. Property definitions are registered in the planner via the &lt;code&gt;RelOptPlanner.addRelTraitDef&lt;/code&gt; method. The planner will ensure that every operator has a specific value for every registered property definition, whether the default or not.&lt;br&gt;
All properties of a node are organized in an immutable data structure &lt;code&gt;RelTraitSet&lt;/code&gt;. This class has convenient methods to add and update properties with copying semantics. You may access the properties of a concrete operator using the &lt;code&gt;RelOptNode.getTraitSet&lt;/code&gt; method.&lt;br&gt;
To enforce a specific property on the operator during planning, you should do the following from within the rule:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get the current properties of a node using &lt;code&gt;RelOptNode.getTraitSet&lt;/code&gt; method.&lt;/li&gt;
&lt;li&gt;Create a new instance of &lt;code&gt;RelTraitSet&lt;/code&gt; with updated properties.&lt;/li&gt;
&lt;li&gt;Enforce the properties by calling the &lt;code&gt;RelOptRule.convert&lt;/code&gt; method.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Finally, before invoking the planner program, you may define the desired properties of the root operator of the optimized relational tree. After the optimization, the planner will either return the operator that satisfies these properties or throw an exception.&lt;/p&gt;

&lt;p&gt;Internally, the Apache Calcite enforces properties by adding a special &lt;code&gt;AbstractConverter&lt;/code&gt; operator with the desired traits on top of the target operator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AbstractConverter [SORTED by a]
  Input[t2]       [NOT SORTED]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To transform the &lt;code&gt;AbstractConverter&lt;/code&gt; into a real enforcer node, such as &lt;code&gt;Sort&lt;/code&gt;, you should add the built-in &lt;code&gt;ExpandConversionRule&lt;/code&gt; rule to your optimization program. This rule will attempt to expand the &lt;code&gt;AbstractConverter&lt;/code&gt; into a sequence of enforcers to satisfy the desired traits. We have only one unsatisfied property in our example, so the converter expands into a single &lt;code&gt;Sort&lt;/code&gt; operator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sort[t2.a]        [SORTED by a]
  Input[t2]       [NOT SORTED]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may use your custom expansion rule if needed. See Apache Flink &lt;a href="https://github.com/apache/flink/blob/release-1.12/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/rules/physical/FlinkExpandConversionRule.scala"&gt;custom rule&lt;/a&gt; as an example.&lt;/p&gt;

&lt;h1&gt;
  
  
  Custom Property
&lt;/h1&gt;

&lt;p&gt;As we understand the purpose of properties and which Apache Calcite API to use, we will define, register, and enforce our custom property.&lt;/p&gt;

&lt;p&gt;Consider that we have a distributed database, where every relational operator might be distributed between nodes in one of two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;PARTITIONED&lt;/code&gt; - relation is partitioned between nodes. Every tuple (row) resides on one of the nodes. An example is a typical distributed data structure.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SINGLETON&lt;/code&gt; - relation is located on a single node. An example is a cursor that delivers the final result to the user application.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In our example, we would like to ensure that the top operator always has a &lt;code&gt;SINGLETON&lt;/code&gt; distribution, simulating the results' delivery to a single node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enforcer
&lt;/h2&gt;

&lt;p&gt;First, we define the enforcer operator. To ensure the &lt;code&gt;SINGLETON&lt;/code&gt; distribution, we need to move from all nodes to a single node. In distributed databases, data movement operators are often called &lt;code&gt;Exchange&lt;/code&gt;. The minimal requirement for a custom operator in Apache Calcite is to define the constructor and the &lt;code&gt;copy&lt;/code&gt; method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExchangeRel&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;SingleRel&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;RedistributeRel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;RelOptCluster&lt;/span&gt; &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;RelTraitSet&lt;/span&gt; &lt;span class="n"&gt;traits&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;
    &lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;super&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;traits&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RelTraitSet&lt;/span&gt; &lt;span class="n"&gt;traitSet&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;RelNode&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ExchangeRel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getCluster&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;traitSet&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Trait
&lt;/h2&gt;

&lt;p&gt;Next, we define our custom trait and trait definition. Our implementation must adhere to the following rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The trait must refer to a common trait definition instance in the method &lt;code&gt;getTraitDef&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The trait must override the &lt;code&gt;satisfies&lt;/code&gt; method to define whether the current trait satisfies the target trait. If not, the enforcer will be used.&lt;/li&gt;
&lt;li&gt;The trait definition must declare the expected Java class of the trait in the &lt;code&gt;getTraitClass&lt;/code&gt; method.&lt;/li&gt;
&lt;li&gt;The trait definition must declare the default value of the trait in the &lt;code&gt;getDefault&lt;/code&gt; method.&lt;/li&gt;
&lt;li&gt;The trait definition must implement the method &lt;code&gt;convert&lt;/code&gt;, which Apache Calcite will invoke to create the enforcer if the current trait doesn't satisfy the desired trait. If there is no valid conversion between traits, &lt;code&gt;null&lt;/code&gt; should be returned.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below is the source code of our trait. We define two concrete values, &lt;code&gt;PARTITIONED&lt;/code&gt; and &lt;code&gt;SINGLETON&lt;/code&gt;. We also define the special value &lt;code&gt;ANY&lt;/code&gt;, which we use as the default. We say that both &lt;code&gt;PARTITIONED&lt;/code&gt; and &lt;code&gt;SINGLETON&lt;/code&gt; satisfy &lt;code&gt;ANY&lt;/code&gt; but &lt;code&gt;PARTITIONED&lt;/code&gt; and &lt;code&gt;SINGLETON&lt;/code&gt; do not satisfy each other.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;RelTrait&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="no"&gt;ANY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ANY&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="no"&gt;PARTITIONED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;PARTITIONED&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="no"&gt;SINGLETON&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;SINGLETON&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Type&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RelTraitDef&lt;/span&gt; &lt;span class="nf"&gt;getTraitDef&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;DistributionTraitDef&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;INSTANCE&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="nf"&gt;satisfies&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RelTrait&lt;/span&gt; &lt;span class="n"&gt;toTrait&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="n"&gt;toTrait0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;toTrait&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;toTrait0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ANY&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;equals&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;toTrait0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;enum&lt;/span&gt; &lt;span class="nc"&gt;Type&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="no"&gt;ANY&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="no"&gt;PARTITIONED&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="no"&gt;SINGLETON&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our trait definition defines the &lt;code&gt;convert&lt;/code&gt; function, which injects the &lt;code&gt;ExchangeRel&lt;/code&gt; enforcer if the current property doesn't satisfy the target one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DistributionTraitDef&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;RelTraitDef&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;DistributionTraitDef&lt;/span&gt; &lt;span class="no"&gt;INSTANCE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DistributionTraitDef&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;DistributionTraitDef&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// No-op.&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Class&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;getTraitClass&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getSimpleName&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"DISTRIBUTION"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;RelOptPlanner&lt;/span&gt; &lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="n"&gt;rel&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="n"&gt;toTrait&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;allowInfiniteCostConverters&lt;/span&gt;
    &lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="n"&gt;fromTrait&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTraitSet&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;getTrait&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DistributionTraitDef&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;INSTANCE&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fromTrait&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;satisfies&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;toTrait&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rel&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ExchangeRel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;rel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getCluster&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;rel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTraitSet&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;plus&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;toTrait&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;rel&lt;/span&gt;
        &lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="nf"&gt;canConvert&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;RelOptPlanner&lt;/span&gt; &lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="n"&gt;fromTrait&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="n"&gt;toTrait&lt;/span&gt;
    &lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt; &lt;span class="nf"&gt;getDefault&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ANY&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You would likely have more distribution types, dedicated distribution columns, and different exchange types in production implementations. You may refer to Apache Flink as an example of a real &lt;a href="https://github.com/apache/flink/blob/release-1.12/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/trait/FlinkRelDistribution.scala#L63"&gt;distribution trait&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;Let's see the new trait in action. The complete source code is available &lt;a href="https://github.com/querifylabs/querifylabs-blog/tree/main/02-custom-calcite-trait"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;First, we create a schema with a couple of tables - one with &lt;code&gt;PARTITIONED&lt;/code&gt; distribution and another with &lt;code&gt;SINGLETON&lt;/code&gt; distribution. We use custom table and schema implementation, similar to the ones we used in the previous &lt;a href="https://www.querifylabs.com/blog/assembling-a-query-optimizer-with-apache-calcite"&gt;blog post&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Table with PARTITIONED distribution.&lt;/span&gt;
&lt;span class="nc"&gt;Table&lt;/span&gt; &lt;span class="n"&gt;table1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"table1"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;PARTITIONED&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addField&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"field"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SqlTypeName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;DECIMAL&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Table with SINGLETON distribution.&lt;/span&gt;
&lt;span class="nc"&gt;Table&lt;/span&gt; &lt;span class="n"&gt;table2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"table2"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;SINGLETON&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addField&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"field"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SqlTypeName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;DECIMAL&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Schema&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"schema"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;addTable&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table1&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;addTable&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table2&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we create a planner instance and register our custom trait definition in it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;VolcanoPlanner&lt;/span&gt; &lt;span class="n"&gt;planner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;VolcanoPlanner&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addRelTraitDef&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConventionTraitDef&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;INSTANCE&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addRelTraitDef&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DistributionTraitDef&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;INSTANCE&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we create a table scan operator for each of our tables and enforce the &lt;code&gt;SINGLETON&lt;/code&gt; distribution. Notice that we use the aforementioned &lt;code&gt;ExpandConversionRule&lt;/code&gt; in our optimization program. Otherwise, the enforcement will not work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Use the built-in rule that will expand abstract converters.&lt;/span&gt;
&lt;span class="nc"&gt;RuleSet&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RuleSets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AbstractConverter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ExpandConversionRule&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;INSTANCE&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Prepare the desired traits with the SINGLETON distribution.&lt;/span&gt;
&lt;span class="nc"&gt;RelTraitSet&lt;/span&gt; &lt;span class="n"&gt;desiredTraits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTraitSet&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;plus&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;SINGLETON&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Use the planner to enforce the desired traits&lt;/span&gt;
&lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="n"&gt;optimizedNode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Programs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;desiredTraits&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;Collections&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;emptyList&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
    &lt;span class="nc"&gt;Collections&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;emptyList&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we run the &lt;a href="https://github.com/querifylabs/querifylabs-blog/commit/1bb7d1c8942e7df00916d20de0e51f2a1965ac23#diff-5c30c410d3e7db69b7b8375de2802b3c04f94799dc4b30239fba532001d8eea4"&gt;TraitTest&lt;/a&gt; from the sample project to see this in action. For the &lt;code&gt;PARTITIONED&lt;/code&gt; table, the planner has added the &lt;code&gt;ExchangeRel&lt;/code&gt; to enforce the &lt;code&gt;SINGLETON&lt;/code&gt; distribution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BEFORE:
2:LogicalTableScan(table=[[schema, partitioned]])

AFTER:
7:ExchangeRel
  2:LogicalTableScan(table=[[schema, partitioned]])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the table with the &lt;code&gt;SINGLETON&lt;/code&gt; distribution remains unchanged because it already has the desired distribution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BEFORE:
0:LogicalTableScan(table=[[schema, singleton]])

AFTER:
0:LogicalTableScan(table=[[schema, singleton]])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congratulations! Our custom property is ready.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;Physical properties are an important concept in query optimization that allows you to explore more alternative plans.&lt;/p&gt;

&lt;p&gt;In this blog post, we demonstrated how to define the custom physical property in Apache Calcite. We created custom &lt;code&gt;RelTraitDef&lt;/code&gt; and &lt;code&gt;RelTrait&lt;/code&gt; classes, registered them in the planner, and used the custom operator to enforce the desired value of the property. &lt;/p&gt;

&lt;p&gt;However, we omitted one crucial question - how to &lt;strong&gt;propagate&lt;/strong&gt; properties between operators? It turns out, Apache Calcite cannot do this well, and you will have to make a tough decision choosing between several non-ideal solutions. We will discuss property propagation in detail in future posts. Stay tuned!&lt;/p&gt;

&lt;p&gt;We are always ready to help you with your SQL query optimizer design. Just &lt;a href="https://www.querifylabs.com/#contact-form"&gt;let us know&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>calcite</category>
      <category>sql</category>
      <category>database</category>
    </item>
    <item>
      <title>Assembling a query optimizer with Apache Calcite</title>
      <dc:creator>Vladimir Ozerov</dc:creator>
      <pubDate>Mon, 04 Jan 2021 22:35:37 +0000</pubDate>
      <link>https://dev.to/devozerov/assembling-a-query-optimizer-with-apache-calcite-1b4e</link>
      <guid>https://dev.to/devozerov/assembling-a-query-optimizer-with-apache-calcite-1b4e</guid>
      <description>&lt;h1&gt;
  
  
  Abstract
&lt;/h1&gt;

&lt;p&gt;Apache Calcite is a dynamic data management framework with SQL parser, optimizer, executor, and JDBC driver.&lt;/p&gt;

&lt;p&gt;Many examples of Apache Calcite usage demonstrate the end-to-end execution of queries using JDBC driver, some built-in optimization rules, and the &lt;code&gt;Enumerable&lt;/code&gt; executor. Our customers often have their own execution engines and JDBC drivers. So how to use Apache Calcite for query optimization only, without it's JDBC driver and &lt;code&gt;Enumerable&lt;/code&gt; executor?&lt;/p&gt;

&lt;p&gt;In this tutorial, we create a simple query optimizer using internal Apache Calcite classes.&lt;/p&gt;

&lt;h1&gt;
  
  
  Schema
&lt;/h1&gt;

&lt;p&gt;First, we need to define the schema. We start with a custom table implementation. To create a table, you should extend Apache Calcite's &lt;code&gt;AbstractTable&lt;/code&gt;. We pass two pieces of information to our table:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Field names and types that we will use to construct the row type of the table (required for expression type derivation).&lt;/li&gt;
&lt;li&gt;Optional &lt;code&gt;Statistic&lt;/code&gt; object that provides helpful information for query planner: row count, collations, unique table keys, etc.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our statistic class exposes only row count information.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimpleTableStatistic&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;Statistic&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;rowCount&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;SimpleTableStatistic&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;rowCount&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;rowCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rowCount&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Double&lt;/span&gt; &lt;span class="nf"&gt;getRowCount&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;rowCount&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Other methods no-op&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We pass column names and types to our table class to construct the row type, which Apache Calcite uses to derive data types of expressions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimpleTable&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractTable&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;fieldNames&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;SqlTypeName&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;fieldTypes&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;SimpleTableStatistic&lt;/span&gt; &lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;RelDataType&lt;/span&gt; &lt;span class="n"&gt;rowType&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;SimpleTable&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;fieldNames&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;SqlTypeName&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;fieldTypes&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
        &lt;span class="nc"&gt;SimpleTableStatistic&lt;/span&gt; &lt;span class="n"&gt;statistic&lt;/span&gt;
    &lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tableName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tableName&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fieldNames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fieldNames&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fieldTypes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fieldTypes&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RelDataType&lt;/span&gt; &lt;span class="nf"&gt;getRowType&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RelDataTypeFactory&lt;/span&gt; &lt;span class="n"&gt;typeFactory&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rowType&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;RelDataTypeField&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ArrayList&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;fieldNames&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;fieldNames&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;RelDataType&lt;/span&gt; &lt;span class="n"&gt;fieldType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;typeFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;createSqlType&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fieldTypes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
                &lt;span class="nc"&gt;RelDataTypeField&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RelDataTypeFieldImpl&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fieldNames&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fieldType&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
                &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;

            &lt;span class="n"&gt;rowType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RelRecordType&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StructKind&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;PEEK_FIELDS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rowType&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Statistic&lt;/span&gt; &lt;span class="nf"&gt;getStatistic&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;statistic&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our table also implements Apache Calcite's &lt;code&gt;ScannableTable&lt;/code&gt; interface. We do this only for demonstration purposes because we will use a certain &lt;code&gt;Enumerable&lt;/code&gt; optimization rule in our example that will fail without this interface. You do not need to implement this interface if you are not going to use Apache Calcite &lt;code&gt;Enumerable&lt;/code&gt; execution backend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimpleTable&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractTable&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ScannableTable&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Enumerable&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;[]&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DataContext&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;UnsupportedOperationException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Not implemented"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we extend Apache Calcite's &lt;code&gt;AbstractSchema&lt;/code&gt; class to define our own schema. We pass a map from a table name to a table. Apache Calcite uses this map to resolve tables during semantic validation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimpleSchema&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractSchema&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;schemaName&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tableMap&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;SimpleSchema&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;schemaName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tableMap&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;schemaName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;schemaName&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tableMap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tableMap&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;getTableMap&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tableMap&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are ready to start the optimization.&lt;/p&gt;

&lt;h1&gt;
  
  
  Optimizer
&lt;/h1&gt;

&lt;p&gt;The optimization process consists of the following phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Syntax analysis that produces an abstract syntax tree (AST) from a query string.&lt;/li&gt;
&lt;li&gt;Semantic analysis of an AST.&lt;/li&gt;
&lt;li&gt;Conversion of an AST to a relational tree.&lt;/li&gt;
&lt;li&gt;Optimization of a relational tree.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;Many Apache Calcite classes that we use for query optimization require configuration. However, there is no common configuration class in Apache Calcite that could be used by all objects. For this reason, we store the common configuration in a single object and then copy configuration values into other objects when needed. &lt;/p&gt;

&lt;p&gt;In this specific example, we instruct Apache Calcite on how to process object identifiers: do not change identifier casing, use case-sensitive name resolution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;configProperties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="n"&gt;configProperties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CalciteConnectionProperty&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;CASE_SENSITIVE&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;camelName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;Boolean&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;TRUE&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;configProperties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CalciteConnectionProperty&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UNQUOTED_CASING&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;camelName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;Casing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UNCHANGED&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;configProperties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CalciteConnectionProperty&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;QUOTED_CASING&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;camelName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;Casing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UNCHANGED&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

&lt;span class="nc"&gt;CalciteConnectionConfig&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CalciteConnectionConfigImpl&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;configProperties&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Syntax Analysis
&lt;/h2&gt;

&lt;p&gt;First of all, we should parse the query string. The result of parsing is an abstract syntax tree, with every node being a subclass of &lt;code&gt;SqlNode&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;We pass a part of our common configuration to the parser configuration, then instantiate &lt;code&gt;SqlParser&lt;/code&gt;, and finally perform the parsing. If you have a custom SQL syntax, you may pass a custom parser factory class to the configuration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;SqlNode&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;SqlParser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ConfigBuilder&lt;/span&gt; &lt;span class="n"&gt;parserConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SqlParser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;configBuilder&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;parserConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setCaseSensitive&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;caseSensitive&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;parserConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setUnquotedCasing&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unquotedCasing&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;parserConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setQuotedCasing&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;quotedCasing&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;parserConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setConformance&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;conformance&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

    &lt;span class="nc"&gt;SqlParser&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SqlParser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parserConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;parseStmt&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Semantic Analysis
&lt;/h2&gt;

&lt;p&gt;The goal of semantic analysis is to ensure that the produced abstract syntax tree is valid. Semantic analysis includes the resolution of object and function identifiers, data types inference, checking the correctness of certain SQL constructs (e.g., a group key in the &lt;code&gt;GROUP BY&lt;/code&gt; statement).&lt;/p&gt;

&lt;p&gt;The validation is performed by the &lt;code&gt;SqlValidatorImpl&lt;/code&gt; class, one of the most &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java"&gt;complex&lt;/a&gt; classes in Apache Calcite. This class requires several supporting objects. First, we create an instance of &lt;code&gt;RelDataTypeFactory&lt;/code&gt;, which provides SQL type definitions. We use the built-in type factory, but you may also provide your custom implementation if need.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;RelDataTypeFactory&lt;/span&gt; &lt;span class="n"&gt;typeFactory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;JavaTypeFactoryImpl&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we create a &lt;code&gt;Prepare.CatalogReader&lt;/code&gt; object that provides access to database objects. This is where our previously defined schema comes into play. Catalog reader consumes our common configuration object to have an object name resolution mechanics consistent with the one we used during parsing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SimpleSchema&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="c1"&gt;// Create our custom schema&lt;/span&gt;

&lt;span class="nc"&gt;CalciteSchema&lt;/span&gt; &lt;span class="n"&gt;rootSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CalciteSchema&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;createRootSchema&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;rootSchema&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getSchemaName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;Prepare&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;CatalogReader&lt;/span&gt; &lt;span class="n"&gt;catalogReader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CalciteCatalogReader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;rootSchema&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;Collections&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;singletonList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getSchemaName&lt;/span&gt;&lt;span class="o"&gt;()),&lt;/span&gt;
    &lt;span class="n"&gt;typeFactory&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we define a &lt;code&gt;SqlOperatorTable&lt;/code&gt;, the library of SQL functions and operators. We use the built-in library. You may also provide your implementation with custom functions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SqlOperatorTable&lt;/span&gt; &lt;span class="n"&gt;operatorTable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChainedSqlOperatorTable&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;SqlStdOperatorTable&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;instance&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We created all the required supporting objects. Now we instantiate the built-in &lt;code&gt;SqlValidatorImpl&lt;/code&gt;. As usual, you may extend it if you need a custom validation behavior (such as custom error messages).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SqlValidator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Config&lt;/span&gt; &lt;span class="n"&gt;validatorConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SqlValidator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;DEFAULT&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withLenientOperatorLookup&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;lenientOperatorLookup&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withSqlConformance&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;conformance&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withDefaultNullCollation&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaultNullCollation&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withIdentifierExpansion&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;SqlValidator&lt;/span&gt; &lt;span class="n"&gt;validator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SqlValidatorUtil&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newValidator&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;operatorTable&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;catalogReader&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;typeFactory&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;validatorConfig&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we perform validation. Keep the validator instance because we will need it for AST conversion to a relational tree.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SqlNode&lt;/span&gt; &lt;span class="n"&gt;sqlNode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sqlString&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;SqlNode&lt;/span&gt; &lt;span class="n"&gt;validatedSqlNode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;validator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conversion to a Relational Tree
&lt;/h2&gt;

&lt;p&gt;AST is not convenient for query optimization because the relational semantics of it's nodes is too complicated. It is much more convenient to perform query optimization on a tree of relational operators, defined by the &lt;code&gt;RelNode&lt;/code&gt; subclasses, such as &lt;code&gt;Scan&lt;/code&gt;, &lt;code&gt;Project&lt;/code&gt;, &lt;code&gt;Filter&lt;/code&gt;, &lt;code&gt;Join&lt;/code&gt;, etc. We use &lt;code&gt;SqlToRelConverter&lt;/code&gt;, another &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java"&gt;monstrous class&lt;/a&gt; of Apache Calcite, to convert the original AST into a relational tree. &lt;/p&gt;

&lt;p&gt;Interestingly, to create a converter, we must create an instance of a cost-based planner &lt;code&gt;VolcanoPlanner&lt;/code&gt; first. This is one of Apache Calcite's abstraction leaks. &lt;/p&gt;

&lt;p&gt;To create the &lt;code&gt;VolcanoPlanner&lt;/code&gt;, we again pass the common configuration and the &lt;code&gt;RelOptCostFactory&lt;/code&gt; that the planner will use to calculate costs. In a production-grade optimizer, you are likely to define a custom cost factory, because the built-in factories &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L100"&gt;take in count&lt;/a&gt; only cardinality of relations, which is often insufficient for proper cost estimation. &lt;/p&gt;

&lt;p&gt;You should also specify which physical operator properties the &lt;code&gt;VolcanoPlanner&lt;/code&gt; should track. Every property has a descriptor that extends Apache Calcite's &lt;code&gt;RelTraitDef&lt;/code&gt; class. In our example, we register only the &lt;code&gt;ConventionTraitDef&lt;/code&gt;, which defines the execution backend for a relational node.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;VolcanoPlanner&lt;/span&gt; &lt;span class="n"&gt;planner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;VolcanoPlanner&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;RelOptCostImpl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;FACTORY&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="nc"&gt;Contexts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addRelTraitDef&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConventionTraitDef&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;INSTANCE&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then create a &lt;code&gt;RelOptCluster&lt;/code&gt;, a common context object used during conversion and optimization.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;RelOptCluster&lt;/span&gt; &lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RelOptCluster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;RexBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;typeFactory&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can create the converter now. Here we set a couple of configuration properties for a subquery unnesting, which is out of this post's scope.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SqlToRelConverter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Config&lt;/span&gt; &lt;span class="n"&gt;converterConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SqlToRelConverter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;configBuilder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withTrimUnusedFields&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withExpand&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; 
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;SqlToRelConverter&lt;/span&gt; &lt;span class="n"&gt;converter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SqlToRelConverter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;validator&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;catalogReader&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;StandardConvertletTable&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;INSTANCE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;converterConfig&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we have the converter, we can create the relational tree.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SqlNode&lt;/span&gt; &lt;span class="n"&gt;validatedSqlNode&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;RelRoot&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;converter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;convertQuery&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validatedSqlNode&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;rel&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During the conversion, Apache Calcite produces a tree of &lt;a href="https://github.com/apache/calcite/tree/branch-1.24/core/src/main/java/org/apache/calcite/rel/logical"&gt;logical&lt;/a&gt; relational operators, are abstract and do not target any specific execution backend. For this reason, logical operators always have the convention trait set to &lt;code&gt;Convention.NONE&lt;/code&gt;. It is expected that you will convert them into physical operators during the optimization. Physical operators have a specific convention different from &lt;code&gt;Convention.NONE&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization
&lt;/h2&gt;

&lt;p&gt;Optimization is a process of conversion of a relation tree to another relational tree. You may do rule-based optimization with heuristic or cost-based planners, &lt;code&gt;HepPlanner&lt;/code&gt; and &lt;code&gt;VolcanoPlanner&lt;/code&gt; respectively. You may also do any manual rewrite of the tree without rule. Apache Calcite comes with several powerful rewriting tools, such as &lt;code&gt;RelDecorrelator&lt;/code&gt; and &lt;code&gt;RelFieldTrimmer&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Typically, to optimize a relational tree, you will perform multiple optimization passes using rule-based optimizers and manual rewrites. Take a look at the &lt;a href="https://github.com/apache/calcite/blob/branch-1.24/core/src/main/java/org/apache/calcite/tools/Programs.java#L250-L283"&gt;default optimization program&lt;/a&gt; used by Apache Calcite JDBC driver or &lt;a href="https://github.com/apache/flink/blob/release-1.12/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/optimize/program/FlinkBatchProgram.scala#L45"&gt;multi-phase query optimization&lt;/a&gt; in Apache Flink.&lt;/p&gt;

&lt;p&gt;In our example, we will use &lt;code&gt;VolcanoPlanner&lt;/code&gt; to perform cost-based optimization. We already instantiated the &lt;code&gt;VolcanoPlanner&lt;/code&gt; before. Our inputs are a relational tree to optimize, a set of optimization rules, and traits that the optimized tree's parent node must satisfy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="nf"&gt;optimize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;RelOptPlanner&lt;/span&gt; &lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="nc"&gt;RelTraitSet&lt;/span&gt; &lt;span class="n"&gt;requiredTraitSet&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="nc"&gt;RuleSet&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Program&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Programs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RuleSets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;requiredTraitSet&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;Collections&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;emptyList&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
        &lt;span class="nc"&gt;Collections&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;emptyList&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Example
&lt;/h1&gt;

&lt;p&gt;In this example, we will optimize the TPC-H query №6. The full source code is available &lt;a href="https://github.com/querifylabs/querifylabs-blog/tree/main/01-simple-query-optimizer"&gt;here&lt;/a&gt;. Run the &lt;code&gt;OptimizerTest&lt;/code&gt; to see it in action.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l_extendedprice&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l_discount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;lineitem&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l_shipdate&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l_shipdate&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l_discount&lt;/span&gt; &lt;span class="k"&gt;between&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l_quantity&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We define the &lt;code&gt;Optimizer&lt;/code&gt; class that encapsulates the created configuration, &lt;code&gt;SqlValidator&lt;/code&gt;, &lt;code&gt;SqlToRelConverter&lt;/code&gt; and &lt;code&gt;VolcanoPlanner&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Optimizer&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;CalciteConnectionConfig&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;SqlValidator&lt;/span&gt; &lt;span class="n"&gt;validator&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;SqlToRelConverter&lt;/span&gt; &lt;span class="n"&gt;converter&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;VolcanoPlanner&lt;/span&gt; &lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;Optimizer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SimpleSchema&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Create supporting objects as explained above&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt; 
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we create the schema with the &lt;code&gt;lineitem&lt;/code&gt; table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SimpleTable&lt;/span&gt; &lt;span class="n"&gt;lineitem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleTable&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"lineitem"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addField&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"l_quantity"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SqlTypeName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;DECIMAL&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addField&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"l_extendedprice"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SqlTypeName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;DECIMAL&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addField&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"l_discount"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SqlTypeName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;DECIMAL&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addField&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"l_shipdate"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SqlTypeName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;DATE&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withRowCount&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60_000L&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;SimpleSchema&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleSchema&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tpch"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;addTable&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lineitem&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Optimizer&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we use our optimizer to parse, validate, and convert the query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SqlNode&lt;/span&gt; &lt;span class="n"&gt;sqlTree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;parse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;SqlNode&lt;/span&gt; &lt;span class="n"&gt;validatedSqlTree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sqlTree&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="n"&gt;relTree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;convert&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validatedSqlTree&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The produced logical tree looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;LogicalAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="o"&gt;=[{}],&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="no"&gt;SUM&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)]):&lt;/span&gt; &lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cumulative&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;63751.137500047684&lt;/span&gt;
  &lt;span class="nc"&gt;LogicalProject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;$f0&lt;/span&gt;&lt;span class="o"&gt;=[*(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;)]):&lt;/span&gt; &lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1875.0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cumulative&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;63750.0&lt;/span&gt;
    &lt;span class="nc"&gt;LogicalFilter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="no"&gt;AND&lt;/span&gt;&lt;span class="o"&gt;(&amp;gt;=(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-(?&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="o"&gt;)),&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;+(?&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="o"&gt;)),&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;))]):&lt;/span&gt; &lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1875.0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cumulative&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;61875.0&lt;/span&gt;
      &lt;span class="nc"&gt;LogicalTableScan&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="o"&gt;=[[&lt;/span&gt;&lt;span class="n"&gt;tpch&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lineitem&lt;/span&gt;&lt;span class="o"&gt;]]):&lt;/span&gt; &lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;60000.0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cumulative&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;60000.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we optimize the relational tree and convert it into the &lt;code&gt;Enumerable&lt;/code&gt; convention. We use logical rules that convert and merge &lt;code&gt;LogicalProject&lt;/code&gt; and &lt;code&gt;LogicalFilter&lt;/code&gt; into compound &lt;code&gt;LogicalCalc&lt;/code&gt;, and physical rules that convert logical nodes into &lt;code&gt;Enumerable&lt;/code&gt; nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;RuleSet&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RuleSets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;CoreRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;FILTER_TO_CALC&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;CoreRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;PROJECT_TO_CALC&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;CoreRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;FILTER_CALC_MERGE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;CoreRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;PROJECT_CALC_MERGE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;EnumerableRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ENUMERABLE_TABLE_SCAN_RULE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;EnumerableRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ENUMERABLE_PROJECT_RULE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;EnumerableRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ENUMERABLE_FILTER_RULE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;EnumerableRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ENUMERABLE_CALC_RULE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;EnumerableRules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ENUMERABLE_AGGREGATE_RULE&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;RelNode&lt;/span&gt; &lt;span class="n"&gt;optimizerRelTree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;optimize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;relTree&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;relTree&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTraitSet&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;plus&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;EnumerableConvention&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;INSTANCE&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;rules&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The produced physical tree looks like this. Notice that all nodes are &lt;code&gt;Enumerable&lt;/code&gt;, and that &lt;code&gt;Project&lt;/code&gt; and &lt;code&gt;Filter&lt;/code&gt; nodes have been replaced with &lt;code&gt;Calc&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;EnumerableAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="o"&gt;=[{}],&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="no"&gt;SUM&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)]):&lt;/span&gt; &lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;187.5&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cumulative&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;62088.2812589407&lt;/span&gt;
  &lt;span class="nc"&gt;EnumerableCalc&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;=[{&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;}],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;=[*(&lt;/span&gt;&lt;span class="n"&gt;$t1&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t2&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;=[?&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;=[&amp;gt;=(&lt;/span&gt;&lt;span class="n"&gt;$t3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t5&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;=[?&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="o"&gt;=[&amp;lt;(&lt;/span&gt;&lt;span class="n"&gt;$t3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t7&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;=[?&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="no"&gt;DECIMAL&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="o"&gt;=[-(&lt;/span&gt;&lt;span class="n"&gt;$t9&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t10&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;=[&amp;gt;=(&lt;/span&gt;&lt;span class="n"&gt;$t2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t11&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="o"&gt;=[?&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="o"&gt;=[+(&lt;/span&gt;&lt;span class="n"&gt;$t13&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t10&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="o"&gt;=[&amp;lt;=(&lt;/span&gt;&lt;span class="n"&gt;$t2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t14&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="o"&gt;=[?&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="o"&gt;=[&amp;lt;(&lt;/span&gt;&lt;span class="n"&gt;$t0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t16&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="no"&gt;AND&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;$t6&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t8&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t12&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t15&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;$t17&lt;/span&gt;&lt;span class="o"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;$f0&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;$t4&lt;/span&gt;&lt;span class="o"&gt;],&lt;/span&gt; &lt;span class="n"&gt;$condition&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;$t18&lt;/span&gt;&lt;span class="o"&gt;]):&lt;/span&gt; &lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1875.0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cumulative&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;61875.0&lt;/span&gt;
    &lt;span class="nc"&gt;EnumerableTableScan&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="o"&gt;=[[&lt;/span&gt;&lt;span class="n"&gt;tpch&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lineitem&lt;/span&gt;&lt;span class="o"&gt;]]):&lt;/span&gt; &lt;span class="n"&gt;rowcount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;60000.0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cumulative&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;60000.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;Apache Calcite is a flexible framework for query optimization. In this blog post, we demonstrated how to optimize SQL queries with Apache Calcite parser, validator, converter, and rule-based optimizer. In future posts, we will dig into individual components of Apache Calcite. Stay tuned!&lt;/p&gt;

&lt;p&gt;We are always ready to help you with your SQL query optimizer design. Just &lt;a href="https://www.querifylabs.com/#contact-form"&gt;let us know&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>calcite</category>
      <category>sql</category>
      <category>database</category>
    </item>
  </channel>
</rss>
