DEV Community: Vladimir Ozerov

Memoization in Cost-based Optimizers

Vladimir Ozerov — Wed, 09 Jun 2021 19:55:15 +0000

Query optimization is an expensive process that needs to explore multiple alternative ways to execute the query. The query optimization problem is NP-hard, and the number of possible plans grows exponentially with the query's complexity. For example, a typical TPC-H query may have up to several thousand possible join orders, 2-3 algorithms per join, a couple of access methods per table, some filter/aggregate pushdown alternatives, etc. Combined, this could quickly explode the search space to millions of alternative plans.

This blog post will discuss memoization - an important technique that allows cost-based optimizers to consider billions of alternative plans in a reasonable time.

The Naïve Approach

Consider that we are designing a rule-based optimizer. We want to apply a rule to a relational operator tree and produce another tree. If we insert a new operator in the middle of the tree, we need to update the parent to point to the new operator. Once we've changed the parent, we may need to change the parent of the parent, etc. If your operators are immutable by design or used by other parts of the program, you may need to copy large parts of the tree to create a new plan.

This approach is wasteful because you need to propagate changes to parents over and over again.

Indirection

We may solve the problem with change propagation by applying an additional layer of indirection. Let us introduce a new surrogate operator that will store a reference to a child operator. Before starting the optimization, we may traverse the initial relational tree and create copy of operators, where all concrete inputs are replaced with references.

When applying a transformation, we may only change a reference without updating other parts of the tree. When the optimization is over, we remove the references and reconstruct the final tree.

You may find a similar design in many production-grade heuristic optimizers. In our previous blog post about Presto, we discussed the Memo class that manages such references. In Apache Calcite, the heuristic optimizer HepPlanner models node references through the class HepRelVertex.

We realized how references might help us minimize change propagation overhead. But in a cost-based optimization, we need to consider multiple alternative plans at the same time. We need to go deeper.

MEMO

In cost-based optimization, we need to generate multiple equivalent operators, link them together, and find the cheapest path to the root.

Two relational operators are equivalent if they generate the same result set on every legal database instance. How can we encode equivalent operators efficiently? Let's extend our references to point to multiple operators! We will refer to such a surrogate node as a group, which is a collection of equivalent operators.

We start the optimization by creating equivalence groups for existing operators and replacing concrete inputs with relevant groups. At this point, the process is similar to our previous approach with references.

When a rule is applied to operator A, and a new equivalent operator B is produced, we add B to A's equivalence group. The collection of groups that we consider during optimization is called MEMO. The process of maintaining a MEMO is called memoization.

MEMO is a variation of the AND/OR graph. Operators are AND-nodes representing the subgoals of the query (e.g., applying a filter). Groups are OR-nodes, representing the alternative subgoals that could be used to achieve the parent goal (e.g., do a table scan or an index scan).

When all interesting operators are generated, the MEMO is said to be explored. We now need to extract the cheapest plan from it, which is the ultimate goal of cost-based optimization. To do this, we first assign costs to individual operators via the cost function. Then we traverse the graph bottom-up and select the cheapest operator from each group (often referred to as "winner"), combining costs of individual operators with costs of their inputs.

Practical optimizers often maintain groups' winners up-to-date during the optimization to allow for search space pruning, which we will discuss in future blog posts.

When the root group's cheapest operator is resolved, we construct the final plan through a top-down traverse across every group's cheapest operators.

Memoization is very efficient because it allows for the deduplication of nodes, eliminating unnecessary work. Consider a query that has five joins. The total number of unique join orders for such a query is 30240. If we decide to create a new plan for every join order, we would need to instantiate 30240 * 5 = 151200 join operators. With memoization, you only need 602 join operators to encode the same search space - a dramatic improvement!

The memoization idea is simple. Practical implementations of MEMO are much more involved. You need to design operator equivalence carefully, decide how to do the deduplication, manage the operator's physical properties (such as sort order), track already executed optimization rules, etc. We will cover some of these topics in future blog posts.

Summary

Memoization is an efficient technique that allows you to encode the large search space in a very compact form and eliminate duplicate work. MEMO data structure routinely backs modern cost-based rule-based optimizers.

In future posts, we will discuss the design of MEMO in practical cost-based optimizers. Stay tuned!

We are always ready to help you with your query optimizer design. Just let us know.

Rule-based Query Optimization

Vladimir Ozerov — Sat, 08 May 2021 07:36:46 +0000

The goal of the query optimizer is to find the query execution plan that computes the requested result efficiently. In this blog post, we discuss rule-based optimization - a common pattern to explore equivalent plans used by modern optimizers. Then we explore the implementation of several state-of-the-art rule-based optimizers. Then we analyze the rule-based optimization in Apache Calcite, Presto, and CockroachDB.

Transformations

A query optimizer must explore the space equivalent execution plans and pick the optimal one. Intuitively, plan B is equivalent to plan A if it produces the same result for all possible inputs.

To generate the equivalent execution plans, we may apply one or more transformations to the original plan. A transformation accepts one plan and produces zero, one, or more equivalent plans. As a query engine developer, you may implement hundreds of different transformations to generate a sufficient number of equivalent plans.

Some transformations operate on bigger parts of the plan or even the whole plan. For example, an implementation of the join order selection with dynamic programming may enumerate all joins in the plan, generate alternative join sequences, and pick the best one.

Other transformations could be relatively isolated. Consider the transformation that pushes the filter operator past the aggregate operator. It works on an isolated part of the tree and doesn't require a global context.

Rules

Every optimizer follows some algorithm that decides when to apply particular transformations and how to process the newly created equivalent plans. As the number of transformations grows, it becomes not very convenient to hold them in a monolithic routine. Imagine a large if-else block of code that decides how to apply a hundred transformations to several dozens of relational operators.

To facilitate your engine's evolution, you may want to abstract out some of your transformations behind a common interface. For every transformation, you may define a pattern that defines whether we can apply the transformation to the given part of the plan. A pair of a pattern and a transformation is called a rule.

The rule abstraction allows you to split the optimization logic into pluggable parts that evolve independently of each other, significantly simplifying the development of the optimizer. The optimizer that uses rules to generate the equivalent plans is called a rule-based optimizer.

Notice that the rules are, first of all, a pattern that helps you decompose the optimizer's codebase. The usage of rules doesn't force you to follow a specific optimization procedure, such as Volcano/Cascades. It doesn't prevent you from using particular optimization techniques, like dynamic programming for join enumeration. It doesn't require you to choose between heuristic or cost-based approaches. However, the isolated nature of rules may complicate some parts of your engine, such as join planning.

Examples

Now, as we understand the idea behind the rule-based optimization, let's look at several real-world examples: Apache Calcite, Presto, and CockroachDB.

Apache Calcite

Apache Calcite is a dynamic data management framework. At its core, Apache Calcite has two rule-based optimizers and a library of transformation rules.

The HepPlanner is a heuristic optimizer that applies rules one by one until no more transformations are possible.

The VolcanoPlanner is a cost-based optimizer that generates multiple equivalent plans, put them into the MEMO data structure, and uses costs to choose the best one. The VolcanoPlanner may fire rules in an arbitrary order or work in a recently introduced Cascades-like top-down style.

The rule interface accepts the pattern and requires you to implement the onMatch(context) method. This method doesn't return the new relational tree as one might expect. Instead, it returns void but provides the ability to register new transformations in the context, which allows you to emit multiple equivalent trees from a single rule call. Apache Calcite comes with an extensive library of built-in rules and allows you to add your own rules.

class CustomRule extends RelOptRule {
    new CustomRule() {
        super(pattern_for_the_rule);
    }

    void onMatch(RelOptRuleCall call) {
        RelNode equivalentNode = ...;

        // Register the new equivalent node in MEMO
        call.transformTo(equivalentNode);
    }
}

In Apache Calcite, you may define one or more optimization stages. Every stage may use its own set of rules and optimizer. Many products based on Apache Calcite use multiple stages to minimize the optimization time at the cost of the possibility of producing a not optimal plan. See our previous blog post for more details on how to create a query optimizer with Apache Calcite.

Let's take a look at a couple of rules for join planning. To explore all bushy join trees, you may use JoinCommuteRule and JoinAssociateRule. These rules are relatively simple and work on individual joins. The problem is that they may trigger duplicate derivations, as explained in this paper.

Alternatively, Apache Calcite may use a set of rules that convert multiple joins into a single n-way join and then apply a heuristic algorithm to produce a single optimized join order from the n-way join. This is an example of the rule, that works on a large part of the tree, rather than individual operators. You may use a similar approach to implement the rule to do the join planning with dynamic programming.

The Apache Calcite example demonstrates that the rule-based optimization could be used with both heuristic and cost-based exploration strategies, as well as for complex join planning.

Presto

Presto is a distributed query engine for big data. Like Apache Calcite, it uses rules to perform transformations. However, Presto doesn't have a cost-based search algorithm and relies only on heuristics when transitioning between optimization steps. See our previous blog for more details on Presto query optimizer.

As Presto cannot explore multiple equivalent plans at once, it has a simpler rule interface that produces no more than one new equivalent tree.

interface Rule {
    Pattern getPattern();
    Result apply(T node, ...);
}

Presto also has several rules that use costs internally to explore multiple alternatives in a rule call scope. An example is a (relatively) recently introduced ReorderJoins rule. Similar to the above-mentioned Apache Calcite's n-way join rules, the ReorderJoins rule first converts a sequence of joins into a single n-way join. Then the rule enumerates equivalent joins orders and picks the one with the least cost (unlike Apache Calcite's LoptOptimizerJoinRule, which uses heuristics).

The ReorderJoins rule is of particular interest because it demonstrates how we may use rule-based optimization to combine heuristic and cost-based search strategies in the same optimizer.

CockroachDB

CockroachDB is a cloud-native SQL database for modern cloud applications. It has a rule-based Cascades-style query optimizer.

Unlike Apache Calcite and Presto, Cockroach doesn't have a common rule interface. Instead, it uses a custom DSL to define the rule's pattern and transformation logic. The code generator analyzes the DSL files and produces a monolithic optimization routine. The code generation may allow for a faster optimizer's code because it avoids virtual calls when calling rules.

Below is an example of a rule definition that attempts to generate a streaming aggregate. Notice that you do not need to write the whole rule logic using DSL only. Instead, you may reference utility methods written in Go (which is CockroachDB primary language) from within the rule to minimize the amount of DSL-specific code.

[GenerateStreamingGroupBy, Explore]
(GroupBy | DistinctOn | EnsureDistinctOn | UpsertDistinctOn
        | EnsureUpsertDistinctOn
    $input:*
    $aggs:*
    $private:* & (IsCanonicalGroupBy $private)
)
=>
(GenerateStreamingGroupBy (OpName) $input $aggs $private)

There are two rule types in CockroachDB. The normalization rules convert relational operators into canonical forms before being inserted into a MEMO, simplifying the subsequent optimization. An example is a NormalizeNestedAnds rule that normalizes AND expressions into a left-deep tree. The normalization is performed via a sequential invocation of normalization rules. The second category is exploration rules, which generate multiple equivalent plans. The exploration rules are invoked using the cost-based Cascades-like top-down optimization strategy with memorization.

CockroachDB has a ReorderJoins rule to do the join planning. The rule uses a variation of the dynamic programming algorithm described in this paper to enumerate the valid join orders and add them to MEMO.

Thus, CockroachDB uses rule-based optimization for heuristic normalization, cost-based exploration, and join planning with dynamic programming.

Summary

Rule-based query optimization is a very flexible pattern that you may use when designing a query optimizer. It allows you to split the complicated transformation logic into self-contained parts, reducing the optimizer's complexity.

The rule-based optimization doesn't limit you in how exactly to optimize your plans, be it bottom-up dynamic programming or top-down Cascades-style exploration, cost-based or heuristic optimization, or anything else.

In future posts, we will discuss the difference between logical and physical optimization. Stay tuned!

Inside Presto Optimizer

Vladimir Ozerov — Tue, 20 Apr 2021 06:49:23 +0000

Abstract

Presto is an open-source distributed SQL query engine for big data. Presto provides a connector API to interact with different data sources, including RDBMSs, NoSQL products, Hadoop, and stream processing systems. Created by Facebook, Presto received wide adoption by the open-source world (Presto, Trino) commercial companies (e.g., Ahana, Qubole).

Presto comes with a sophisticated query optimizer that applies various rewrites to the query plan. In this blog post series, we investigate the internals of Presto optimizer. In the first part, we discuss the optimizer interface and the design of the rule-based optimizer.

Please refer to the original paper by Facebook to get a better understanding of Presto's capabilities and design.

We will use the Presto Foundation fork version 0.245 for this blog post.

Relational Tree

Presto optimizer works with relational operators. Similarly to other SQL optimizers, such as Apache Calcite, Presto performs syntax and semantic analysis of the original SQL string and then produces the logical relational tree:

The ANTLR-based parser converts the original query string into an abstract syntax tree (AST)
The analyzer performs the semantic validation of the AST.
The converter creates the logical relational tree from the AST.

Every node in the tree represents a relational operation and implements a common PlanNode interface, which exposes a unique node's ID, node's inputs, and node's output. The interface also allows traversing the tree with a visitor pattern, used extensively during the optimization. Examples of relational operations: TableScanNode, ProjectNode, FilterNode, AggregationNode, JoinNode.

Consider the following query:

SELECT 
    orderstatus, 
    SUM(totalprice) 
FROM orders 
GROUP BY orderstatus

The associated query plan might look like this:

Optimizer Interface

When the logical plan is ready, we can start applying optimizations to it. In Presto, there is the general PlanOptimizer interface that every optimization phase implements. The interface accepts one relational tree and produces another.

public interface PlanOptimizer
{
    PlanNode optimize(
        PlanNode plan,
        Session session,
        TypeProvider types,
        PlanVariableAllocator variableAllocator,
        PlanNodeIdAllocator idAllocator,
        WarningCollector warningCollector
    );
}

The optimization program builder PlanOptimizers creates a list of optimizers that are invoked sequentially on the relational tree. Optimization problems often split into several phases to keep logical and computational complexity under control. In Presto, there are more than 70 optimization phases that every relational tree will pass through.

The majority of optimization phases use the rule-based optimizer that we will discuss further. Other phases rely on custom optimizers that make no use rules but apply a custom transformation logic. For example, the PredicatePushDown optimizer moves filters down in the relational tree, and PruneUnreferencedOutputs removes unused fields that could be generated during the AST conversion or the previous optimization phases. We will discuss the most important custom optimizers in the second part of this blog post series.

Presto may also reoptimize the query plan in runtime. The details of this process are out of the scope of this blog post.

Rule-Based Optimizer

Presto uses the rule-based IterativeOptimizer for the majority of optimization phases. In rule-based optimization, you provide the relational tree and a set of pluggable optimization rules. A rule is a self-contained code that defines the relational tree pattern it should be applied to and the transformation logic. The optimizer then applies the rules to the relational tree using some algorithm. The main advantage of rule-based optimizers is extensibility. Instead of having a monolithic optimization algorithm, you split the optimizer into smaller self-contained rules. To extend the optimizer, you create a new rule that doesn't affect the rest of the optimizer code. Please refer to our blog post to get more details about rule-based optimization.

Rule-based optimizers could be either cost-based or heuristic. In cost-based optimizers, a particular transformation is chosen based on the estimated cost assigned to a plan. Heuristic optimizers don't use costs and could produce arbitrary bad plans in the worst case. Presto relies on a rule-based heuristic optimization, although some specific rules use costs internally to pick a single transformation from multiple alternatives. An example is the ReorderJoins rule that selects a single join order with the least cost from multiple alternatives.

We now describe the most important parts of the Presto rule-based optimizer: the Memo class, rule matching, and the search algorithm.

MEMO

MEMO is a data structure used primarily in cost-based optimizers to encode multiple alternative plans efficiently. The main advantage of MEMO is that multiple alternative plans could be encoded in a very compact form. We discuss the design of MEMO in one of our blog posts.

Presto also uses a MEMO-like data structure. There is the Memo class that stores groups. The optimizer initializes the Memo, which populates groups via a recursive traversal of the relational tree. However, every group in Memo may have only one operator. That is, Presto doesn't store multiple equivalent operators in a group. Instead, as we will see below, Presto unconditionally replaces the current operator with the transformed operator. Therefore, the Memo class in Presto is not a MEMO data structure in a classical sense because it doesn't track equivalent operators. In Presto, you may think of the group as a convenient wrapper over an operator, used mostly to track operators' reachability during the optimization process.

Rule Matching

To optimize the relational tree, you should provide the optimizer with one or more rules. Every rule in Presto implements the Rule interface.

First, the interface defines the pattern, which may target an arbitrary part of the tree. It could be a single operator (filter in the PruneFilterColumns rule), multiple operators (filter on top of the filter in the MergeFilters rule), an operator with a predicate (join pattern in the ReorderJoins rule), or anything else.

Second, the interface defines the transformation logic. The result of the transformation could be either a new operator that replaces the previous one or no-op if the rule failed to apply the transformation for whatever reason.

Search Algorithm

Now, as we understand the Presto rule-based optimizer's core concepts, let's take a look at the search algorithm.

The Memo class is initialized with the original relational tree, as we discussed above.
For every Memo group, starting with the root, the method exploreGroup is invoked. We look for rules that match the current operator and fire them. If a rule produces an alternative operator, it replaces the original operator unconditionally. The process continues until there are no more available transformations for the current operator. Then we optimize operators' inputs. If an alternative input is found, it may open up more optimizations for the parent operator, so we reoptimize the parent. Presto relies on timeouts to terminate the optimization process if some rules continuously replace each other's results. Think of b JOIN a, that replaces a JOIN b, that replaces b JOIN a, etc. You may run the TestIterativeOptimizer test to see this behavior in action.
In the end, we extract the final plan from Memo.

This is it. The search algorithm is very simple and straightforward.

The main drawback is that the optimizer is heuristic and cannot consider multiple alternative plans concurrently. That is, at every point in time, Presto has only one plan that it may transform further. In the original paper from 2019, Facebook engineers mentioned that they explore an option to add a cost-based optimizer:

We are in the process of enhancing the optimizer to perform a more comprehensive exploration of the search space using a cost-based evaluation of plans based on the techniques introduced by the Cascades framework.

There is also a document dated back to 2017 with some design ideas around cost-based optimization.

Summary

In this blog post, we explored the design of the Presto optimizer. The optimization process is split into multiple sequential phases. Every phase accepts a relational tree and produces another relational tree. Most phases use a rule-based heuristic optimizer, while some rules rely on custom logic without rules. There were some thoughts to add the cost-based optimizer to Presto, but it hasn't happened yet.

In the second part of this series, we will explore the concrete optimization rules and custom phases of Presto's query optimization. Stay tuned!

We are always ready to help you with your SQL query optimizer design. Just let us know.

Custom traits in Apache Calcite

Vladimir Ozerov — Fri, 16 Apr 2021 15:10:15 +0000

Abstract

Physical properties are an essential part of the optimization process that allows you to explore more alternative plans.

Apache Calcite comes with convention and collation (sort order) properties. Many query engines require custom properties. For example, distributed and heterogeneous engines that we often see in our daily practice need to carefully plan the movement of data between machines and devices, which requires a custom property to describe data location.

In this blog post, we will explore how to define, register and enforce a custom property, also known as a trait, with Apache Calcite cost-based optimizer.

Physical Properties

We start our journey by looking at the example of common physical property - sort order.

Query optimizers work with relational operators, such as Scan, Project, Filter, and Join. During the optimization, an operator may require its input to satisfy a specific condition. To check whether the condition is satisfied, operators may expose physical properties - plain values associated with an operator. Operators may compare the desired and actual properties of their inputs and enforce the desired property by injecting a special enforcer operator on top of the input.

Consider the join operator t1 JOIN t2 ON t1.a = t2.b. We could use a merge join if both inputs are sorted on their join attributes, t1.a and t2.b, respectively. We may define the collation property for every operator, describing the sort order of produced rows:

Join[t1.a=t2.b]
  Input[t1]      [SORTED by a]
  Input[t2]      [NOT SORTED]

The merge join operator may enforce the sorting on t1.a and t2.b on its inputs. Since the first input is already sorted on t1.a, it remains unchanged. The second input is not sorted, so the enforcer Sort operator is injected, making a merge join possible:

MergeJoin[t1.a=t2.b]  
  Input[t1]           [SORTED by t1.a]
  Sort[t2.a]          [SORTED by t2.b]
    Input[t2]         [NOT SORTED]

Apache Calcite API

In Apache Calcite, properties are defined by the RelTrait and RelTraitDef classes. RelTrait is a concrete value of the property. RelTraitDef is a property definition, which describes the property name, expected Java class of the property, the default value of the property, and how to enforce the property. Property definitions are registered in the planner via the RelOptPlanner.addRelTraitDef method. The planner will ensure that every operator has a specific value for every registered property definition, whether the default or not.
All properties of a node are organized in an immutable data structure RelTraitSet. This class has convenient methods to add and update properties with copying semantics. You may access the properties of a concrete operator using the RelOptNode.getTraitSet method.
To enforce a specific property on the operator during planning, you should do the following from within the rule:

Get the current properties of a node using RelOptNode.getTraitSet method.
Create a new instance of RelTraitSet with updated properties.
Enforce the properties by calling the RelOptRule.convert method.

Finally, before invoking the planner program, you may define the desired properties of the root operator of the optimized relational tree. After the optimization, the planner will either return the operator that satisfies these properties or throw an exception.

Internally, the Apache Calcite enforces properties by adding a special AbstractConverter operator with the desired traits on top of the target operator.

AbstractConverter [SORTED by a]
  Input[t2]       [NOT SORTED]

To transform the AbstractConverter into a real enforcer node, such as Sort, you should add the built-in ExpandConversionRule rule to your optimization program. This rule will attempt to expand the AbstractConverter into a sequence of enforcers to satisfy the desired traits. We have only one unsatisfied property in our example, so the converter expands into a single Sort operator.

Sort[t2.a]        [SORTED by a]
  Input[t2]       [NOT SORTED]

You may use your custom expansion rule if needed. See Apache Flink custom rule as an example.

Custom Property

As we understand the purpose of properties and which Apache Calcite API to use, we will define, register, and enforce our custom property.

Consider that we have a distributed database, where every relational operator might be distributed between nodes in one of two ways:

PARTITIONED - relation is partitioned between nodes. Every tuple (row) resides on one of the nodes. An example is a typical distributed data structure.
SINGLETON - relation is located on a single node. An example is a cursor that delivers the final result to the user application.

In our example, we would like to ensure that the top operator always has a SINGLETON distribution, simulating the results' delivery to a single node.

Enforcer

First, we define the enforcer operator. To ensure the SINGLETON distribution, we need to move from all nodes to a single node. In distributed databases, data movement operators are often called Exchange. The minimal requirement for a custom operator in Apache Calcite is to define the constructor and the copy method.

public class ExchangeRel extends SingleRel {
    public RedistributeRel(
        RelOptCluster cluster,
        RelTraitSet traits,
        RelNode input
    ) {
        super(cluster, traits, input);
    }

    @Override
    public RelNode copy(RelTraitSet traitSet, List<RelNode> inputs) {
        return new ExchangeRel(getCluster(), traitSet, inputs.get(0));
    }
}

Trait

Next, we define our custom trait and trait definition. Our implementation must adhere to the following rules:

The trait must refer to a common trait definition instance in the method getTraitDef.
The trait must override the satisfies method to define whether the current trait satisfies the target trait. If not, the enforcer will be used.
The trait definition must declare the expected Java class of the trait in the getTraitClass method.
The trait definition must declare the default value of the trait in the getDefault method.
The trait definition must implement the method convert, which Apache Calcite will invoke to create the enforcer if the current trait doesn't satisfy the desired trait. If there is no valid conversion between traits, null should be returned.

Below is the source code of our trait. We define two concrete values, PARTITIONED and SINGLETON. We also define the special value ANY, which we use as the default. We say that both PARTITIONED and SINGLETON satisfy ANY but PARTITIONED and SINGLETON do not satisfy each other.

public class Distribution implements RelTrait {

    public static final Distribution ANY = new Distribution(Type.ANY);
    public static final Distribution PARTITIONED = new Distribution(Type.PARTITIONED);
    public static final Distribution SINGLETON = new Distribution(Type.SINGLETON);

    private final Type type;

    private Distribution(Type type) {
        this.type = type;
    }

    @Override
    public RelTraitDef getTraitDef() {
        return DistributionTraitDef.INSTANCE;
    }

    @Override
    public boolean satisfies(RelTrait toTrait) {
        Distribution toTrait0 = (Distribution) toTrait;

        if (toTrait0.type == Type.ANY) {
            return true;
        }

        return this.type.equals(toTrait0.type);
    }

    enum Type {
        ANY,
        PARTITIONED,
        SINGLETON
    }
}

Our trait definition defines the convert function, which injects the ExchangeRel enforcer if the current property doesn't satisfy the target one.

public class DistributionTraitDef extends RelTraitDef<Distribution> {

    public static DistributionTraitDef INSTANCE = new DistributionTraitDef();

    private DistributionTraitDef() {
        // No-op.
    }

    @Override
    public Class<Distribution> getTraitClass() {
        return Distribution.class;
    }

    @Override
    public String getSimpleName() {
        return "DISTRIBUTION";
    }

    @Override
    public RelNode convert(
        RelOptPlanner planner,
        RelNode rel,
        Distribution toTrait,
        boolean allowInfiniteCostConverters
    ) {
        Distribution fromTrait = rel.getTraitSet().getTrait(DistributionTraitDef.INSTANCE);

        if (fromTrait.satisfies(toTrait)) {
            return rel;
        }

        return new ExchangeRel(
            rel.getCluster(),
            rel.getTraitSet().plus(toTrait),
            rel
        );
    }

    @Override
    public boolean canConvert(
        RelOptPlanner planner,
        Distribution fromTrait,
        Distribution toTrait
    ) {
        return true;
    }

    @Override
    public Distribution getDefault() {
        return Distribution.ANY;
    }
}

You would likely have more distribution types, dedicated distribution columns, and different exchange types in production implementations. You may refer to Apache Flink as an example of a real distribution trait.

Putting It All Together

Let's see the new trait in action. The complete source code is available here.

First, we create a schema with a couple of tables - one with PARTITIONED distribution and another with SINGLETON distribution. We use custom table and schema implementation, similar to the ones we used in the previous blog post.

// Table with PARTITIONED distribution.
Table table1 = Table.newBuilder("table1", Distribution.PARTITIONED)
  .addField("field", SqlTypeName.DECIMAL).build();

// Table with SINGLETON distribution.
Table table2 = Table.newBuilder("table2", Distribution.SINGLETON)
  .addField("field", SqlTypeName.DECIMAL).build();

Schema schema = Schema.newBuilder("schema").addTable(table1).addTable(table2).build();

Then we create a planner instance and register our custom trait definition in it.

VolcanoPlanner planner = new VolcanoPlanner();

planner.addRelTraitDef(ConventionTraitDef.INSTANCE);
planner.addRelTraitDef(DistributionTraitDef.INSTANCE);

Finally, we create a table scan operator for each of our tables and enforce the SINGLETON distribution. Notice that we use the aforementioned ExpandConversionRule in our optimization program. Otherwise, the enforcement will not work.

// Use the built-in rule that will expand abstract converters.
RuleSet rules = RuleSets.ofList(AbstractConverter.ExpandConversionRule.INSTANCE);

// Prepare the desired traits with the SINGLETON distribution.
RelTraitSet desiredTraits = node.getTraitSet().plus(Distribution.SINGLETON);

// Use the planner to enforce the desired traits
RelNode optimizedNode = Programs.of(rules).run(
    planner,
    node,
    desiredTraits,
    Collections.emptyList(),
    Collections.emptyList()
);

Now we run the TraitTest from the sample project to see this in action. For the PARTITIONED table, the planner has added the ExchangeRel to enforce the SINGLETON distribution.

BEFORE:
2:LogicalTableScan(table=[[schema, partitioned]])

AFTER:
7:ExchangeRel
  2:LogicalTableScan(table=[[schema, partitioned]])

But the table with the SINGLETON distribution remains unchanged because it already has the desired distribution.

BEFORE:
0:LogicalTableScan(table=[[schema, singleton]])

AFTER:
0:LogicalTableScan(table=[[schema, singleton]])

Congratulations! Our custom property is ready.

Summary

Physical properties are an important concept in query optimization that allows you to explore more alternative plans.

In this blog post, we demonstrated how to define the custom physical property in Apache Calcite. We created custom RelTraitDef and RelTrait classes, registered them in the planner, and used the custom operator to enforce the desired value of the property.

However, we omitted one crucial question - how to propagate properties between operators? It turns out, Apache Calcite cannot do this well, and you will have to make a tough decision choosing between several non-ideal solutions. We will discuss property propagation in detail in future posts. Stay tuned!

We are always ready to help you with your SQL query optimizer design. Just let us know.

Assembling a query optimizer with Apache Calcite

Vladimir Ozerov — Mon, 04 Jan 2021 22:35:37 +0000

Abstract

Apache Calcite is a dynamic data management framework with SQL parser, optimizer, executor, and JDBC driver.

Many examples of Apache Calcite usage demonstrate the end-to-end execution of queries using JDBC driver, some built-in optimization rules, and the Enumerable executor. Our customers often have their own execution engines and JDBC drivers. So how to use Apache Calcite for query optimization only, without it's JDBC driver and Enumerable executor?

In this tutorial, we create a simple query optimizer using internal Apache Calcite classes.

Schema

First, we need to define the schema. We start with a custom table implementation. To create a table, you should extend Apache Calcite's AbstractTable. We pass two pieces of information to our table:

Field names and types that we will use to construct the row type of the table (required for expression type derivation).
Optional Statistic object that provides helpful information for query planner: row count, collations, unique table keys, etc.

Our statistic class exposes only row count information.

public class SimpleTableStatistic implements Statistic {

    private final long rowCount;

    public SimpleTableStatistic(long rowCount) {
        this.rowCount = rowCount;
    }

    @Override
    public Double getRowCount() {
        return (double) rowCount;
    }

    // Other methods no-op
}

We pass column names and types to our table class to construct the row type, which Apache Calcite uses to derive data types of expressions.

public class SimpleTable extends AbstractTable {

    private final String tableName;
    private final List<String> fieldNames;
    private final List<SqlTypeName> fieldTypes;
    private final SimpleTableStatistic statistic;

    private RelDataType rowType;

    private SimpleTable(
        String tableName, 
        List<String> fieldNames, 
        List<SqlTypeName> fieldTypes, 
        SimpleTableStatistic statistic
    ) {
        this.tableName = tableName;
        this.fieldNames = fieldNames;
        this.fieldTypes = fieldTypes;
        this.statistic = statistic;
    }

    @Override
    public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        if (rowType == null) {
            List<RelDataTypeField> fields = new ArrayList<>(fieldNames.size());

            for (int i = 0; i < fieldNames.size(); i++) {
                RelDataType fieldType = typeFactory.createSqlType(fieldTypes.get(i));
                RelDataTypeField field = new RelDataTypeFieldImpl(fieldNames.get(i), i, fieldType);
                fields.add(field);
            }

            rowType = new RelRecordType(StructKind.PEEK_FIELDS, fields, false);
        }

        return rowType;
    }

    @Override
    public Statistic getStatistic() {
        return statistic;
    }
}

Our table also implements Apache Calcite's ScannableTable interface. We do this only for demonstration purposes because we will use a certain Enumerable optimization rule in our example that will fail without this interface. You do not need to implement this interface if you are not going to use Apache Calcite Enumerable execution backend.

public class SimpleTable extends AbstractTable implements ScannableTable {
    ...
    @Override
    public Enumerable<Object[]> scan(DataContext root) {
        throw new UnsupportedOperationException("Not implemented");
    }
    ...
}

Finally, we extend Apache Calcite's AbstractSchema class to define our own schema. We pass a map from a table name to a table. Apache Calcite uses this map to resolve tables during semantic validation.

public class SimpleSchema extends AbstractSchema {

    private final String schemaName;
    private final Map<String, Table> tableMap;

    private SimpleSchema(String schemaName, Map<String, Table> tableMap) {
        this.schemaName = schemaName;
        this.tableMap = tableMap;
    }

    @Override
    public Map<String, Table> getTableMap() {
        return tableMap;
    }
}

We are ready to start the optimization.

Optimizer

The optimization process consists of the following phases:

Syntax analysis that produces an abstract syntax tree (AST) from a query string.
Semantic analysis of an AST.
Conversion of an AST to a relational tree.
Optimization of a relational tree.

Configuration

Many Apache Calcite classes that we use for query optimization require configuration. However, there is no common configuration class in Apache Calcite that could be used by all objects. For this reason, we store the common configuration in a single object and then copy configuration values into other objects when needed.

In this specific example, we instruct Apache Calcite on how to process object identifiers: do not change identifier casing, use case-sensitive name resolution.

Properties configProperties = new Properties();

configProperties.put(CalciteConnectionProperty.CASE_SENSITIVE.camelName(), Boolean.TRUE.toString());
configProperties.put(CalciteConnectionProperty.UNQUOTED_CASING.camelName(), Casing.UNCHANGED.toString());
configProperties.put(CalciteConnectionProperty.QUOTED_CASING.camelName(), Casing.UNCHANGED.toString());

CalciteConnectionConfig config = new CalciteConnectionConfigImpl(configProperties);

Syntax Analysis

First of all, we should parse the query string. The result of parsing is an abstract syntax tree, with every node being a subclass of SqlNode.

We pass a part of our common configuration to the parser configuration, then instantiate SqlParser, and finally perform the parsing. If you have a custom SQL syntax, you may pass a custom parser factory class to the configuration.

public SqlNode parse(String sql) throws Exception {
    SqlParser.ConfigBuilder parserConfig = SqlParser.configBuilder();
    parserConfig.setCaseSensitive(config.caseSensitive());
    parserConfig.setUnquotedCasing(config.unquotedCasing());
    parserConfig.setQuotedCasing(config.quotedCasing());
    parserConfig.setConformance(config.conformance());

    SqlParser parser = SqlParser.create(sql, parserConfig.build());

    return parser.parseStmt();
}

Semantic Analysis

The goal of semantic analysis is to ensure that the produced abstract syntax tree is valid. Semantic analysis includes the resolution of object and function identifiers, data types inference, checking the correctness of certain SQL constructs (e.g., a group key in the GROUP BY statement).

The validation is performed by the SqlValidatorImpl class, one of the most complex classes in Apache Calcite. This class requires several supporting objects. First, we create an instance of RelDataTypeFactory, which provides SQL type definitions. We use the built-in type factory, but you may also provide your custom implementation if need.

RelDataTypeFactory typeFactory = new JavaTypeFactoryImpl();

Then, we create a Prepare.CatalogReader object that provides access to database objects. This is where our previously defined schema comes into play. Catalog reader consumes our common configuration object to have an object name resolution mechanics consistent with the one we used during parsing.

SimpleSchema schema = ... // Create our custom schema

CalciteSchema rootSchema = CalciteSchema.createRootSchema(false, false);
rootSchema.add(schema.getSchemaName(), schema);

Prepare.CatalogReader catalogReader = new CalciteCatalogReader(
    rootSchema,
    Collections.singletonList(schema.getSchemaName()),
    typeFactory,
    config
);

Then, we define a SqlOperatorTable, the library of SQL functions and operators. We use the built-in library. You may also provide your implementation with custom functions.

SqlOperatorTable operatorTable = ChainedSqlOperatorTable.of(
    SqlStdOperatorTable.instance()
);

We created all the required supporting objects. Now we instantiate the built-in SqlValidatorImpl. As usual, you may extend it if you need a custom validation behavior (such as custom error messages).

SqlValidator.Config validatorConfig = SqlValidator.Config.DEFAULT
    .withLenientOperatorLookup(config.lenientOperatorLookup())
    .withSqlConformance(config.conformance())
    .withDefaultNullCollation(config.defaultNullCollation())
    .withIdentifierExpansion(true);

SqlValidator validator = SqlValidatorUtil.newValidator(
    operatorTable, 
    catalogReader, 
    typeFactory,
    validatorConfig
);

Finally, we perform validation. Keep the validator instance because we will need it for AST conversion to a relational tree.

SqlNode sqlNode = parse(sqlString);
SqlNode validatedSqlNode = validator.validate(node);

Conversion to a Relational Tree

AST is not convenient for query optimization because the relational semantics of it's nodes is too complicated. It is much more convenient to perform query optimization on a tree of relational operators, defined by the RelNode subclasses, such as Scan, Project, Filter, Join, etc. We use SqlToRelConverter, another monstrous class of Apache Calcite, to convert the original AST into a relational tree.

Interestingly, to create a converter, we must create an instance of a cost-based planner VolcanoPlanner first. This is one of Apache Calcite's abstraction leaks.

To create the VolcanoPlanner, we again pass the common configuration and the RelOptCostFactory that the planner will use to calculate costs. In a production-grade optimizer, you are likely to define a custom cost factory, because the built-in factories take in count only cardinality of relations, which is often insufficient for proper cost estimation.

You should also specify which physical operator properties the VolcanoPlanner should track. Every property has a descriptor that extends Apache Calcite's RelTraitDef class. In our example, we register only the ConventionTraitDef, which defines the execution backend for a relational node.

VolcanoPlanner planner = new VolcanoPlanner(
    RelOptCostImpl.FACTORY, 
    Contexts.of(config)
);

planner.addRelTraitDef(ConventionTraitDef.INSTANCE);

We then create a RelOptCluster, a common context object used during conversion and optimization.

RelOptCluster cluster = RelOptCluster.create(
    planner, 
    new RexBuilder(typeFactory)
);

We can create the converter now. Here we set a couple of configuration properties for a subquery unnesting, which is out of this post's scope.

SqlToRelConverter.Config converterConfig = SqlToRelConverter.configBuilder()
    .withTrimUnusedFields(true)
    .withExpand(false) 
    .build();

SqlToRelConverter converter = new SqlToRelConverter(
    null,
    validator,
    catalogReader,
    cluster,
    StandardConvertletTable.INSTANCE,
    converterConfig
);

Once we have the converter, we can create the relational tree.

public RelNode convert(SqlNode validatedSqlNode) {
    RelRoot root = converter.convertQuery(validatedSqlNode, false, true);

    return root.rel;
}

During the conversion, Apache Calcite produces a tree of logical relational operators, are abstract and do not target any specific execution backend. For this reason, logical operators always have the convention trait set to Convention.NONE. It is expected that you will convert them into physical operators during the optimization. Physical operators have a specific convention different from Convention.NONE.

Optimization

Optimization is a process of conversion of a relation tree to another relational tree. You may do rule-based optimization with heuristic or cost-based planners, HepPlanner and VolcanoPlanner respectively. You may also do any manual rewrite of the tree without rule. Apache Calcite comes with several powerful rewriting tools, such as RelDecorrelator and RelFieldTrimmer.

Typically, to optimize a relational tree, you will perform multiple optimization passes using rule-based optimizers and manual rewrites. Take a look at the default optimization program used by Apache Calcite JDBC driver or multi-phase query optimization in Apache Flink.

In our example, we will use VolcanoPlanner to perform cost-based optimization. We already instantiated the VolcanoPlanner before. Our inputs are a relational tree to optimize, a set of optimization rules, and traits that the optimized tree's parent node must satisfy.

public RelNode optimize(
    RelOptPlanner planner,
    RelNode node, 
    RelTraitSet requiredTraitSet, 
    RuleSet rules
) {
    Program program = Programs.of(RuleSets.ofList(rules));

    return program.run(
        planner,
        node,
        requiredTraitSet,
        Collections.emptyList(),
        Collections.emptyList()
    );
}

Example

In this example, we will optimize the TPC-H query №6. The full source code is available here. Run the OptimizerTest to see it in action.

SELECT
    SUM(l.l_extendedprice * l.l_discount) AS revenue
FROM
    lineitem
WHERE
    l.l_shipdate >= ?
    AND l.l_shipdate < ?
    AND l.l_discount between (? - 0.01) AND (? + 0.01)
    AND l.l_quantity < ?

We define the Optimizer class that encapsulates the created configuration, SqlValidator, SqlToRelConverter and VolcanoPlanner.

public class Optimizer {
    private final CalciteConnectionConfig config;
    private final SqlValidator validator;
    private final SqlToRelConverter converter;
    private final VolcanoPlanner planner;

    public Optimizer(SimpleSchema schema) {
        // Create supporting objects as explained above
        ... 
    }
}

Next, we create the schema with the lineitem table.

SimpleTable lineitem = SimpleTable.newBuilder("lineitem")
    .addField("l_quantity", SqlTypeName.DECIMAL)
    .addField("l_extendedprice", SqlTypeName.DECIMAL)
    .addField("l_discount", SqlTypeName.DECIMAL)
    .addField("l_shipdate", SqlTypeName.DATE)
    .withRowCount(60_000L)
    .build();

SimpleSchema schema = SimpleSchema.newBuilder("tpch").addTable(lineitem).build();

Optimizer optimizer = Optimizer.create(schema);

Now we use our optimizer to parse, validate, and convert the query.

SqlNode sqlTree = optimizer.parse(sql);
SqlNode validatedSqlTree = optimizer.validate(sqlTree);
RelNode relTree = optimizer.convert(validatedSqlTree);

The produced logical tree looks like this.

LogicalAggregate(group=[{}], revenue=[SUM($0)]): rowcount = 1.0, cumulative cost = 63751.137500047684
  LogicalProject($f0=[*($1, $2)]): rowcount = 1875.0, cumulative cost = 63750.0
    LogicalFilter(condition=[AND(>=($3, ?0), <($3, ?1), >=($2, -(?2, 0.01)), <=($2, +(?3, 0.01)), <($0, ?4))]): rowcount = 1875.0, cumulative cost = 61875.0
      LogicalTableScan(table=[[tpch, lineitem]]): rowcount = 60000.0, cumulative cost = 60000.0

Finally, we optimize the relational tree and convert it into the Enumerable convention. We use logical rules that convert and merge LogicalProject and LogicalFilter into compound LogicalCalc, and physical rules that convert logical nodes into Enumerable nodes.

RuleSet rules = RuleSets.ofList(
    CoreRules.FILTER_TO_CALC,
    CoreRules.PROJECT_TO_CALC,
    CoreRules.FILTER_CALC_MERGE,
    CoreRules.PROJECT_CALC_MERGE,
    EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE,
    EnumerableRules.ENUMERABLE_PROJECT_RULE,
    EnumerableRules.ENUMERABLE_FILTER_RULE,
    EnumerableRules.ENUMERABLE_CALC_RULE,
    EnumerableRules.ENUMERABLE_AGGREGATE_RULE
);

RelNode optimizerRelTree = optimizer.optimize(
    relTree,
    relTree.getTraitSet().plus(EnumerableConvention.INSTANCE),
    rules
);

The produced physical tree looks like this. Notice that all nodes are Enumerable, and that Project and Filter nodes have been replaced with Calc.

EnumerableAggregate(group=[{}], revenue=[SUM($0)]): rowcount = 187.5, cumulative cost = 62088.2812589407
  EnumerableCalc(expr#0..3=[{inputs}], expr#4=[*($t1, $t2)], expr#5=[?0], expr#6=[>=($t3, $t5)], expr#7=[?1], expr#8=[<($t3, $t7)], expr#9=[?2], expr#10=[0.01:DECIMAL(3, 2)], expr#11=[-($t9, $t10)], expr#12=[>=($t2, $t11)], expr#13=[?3], expr#14=[+($t13, $t10)], expr#15=[<=($t2, $t14)], expr#16=[?4], expr#17=[<($t0, $t16)], expr#18=[AND($t6, $t8, $t12, $t15, $t17)], $f0=[$t4], $condition=[$t18]): rowcount = 1875.0, cumulative cost = 61875.0
    EnumerableTableScan(table=[[tpch, lineitem]]): rowcount = 60000.0, cumulative cost = 60000.0

Summary

Apache Calcite is a flexible framework for query optimization. In this blog post, we demonstrated how to optimize SQL queries with Apache Calcite parser, validator, converter, and rule-based optimizer. In future posts, we will dig into individual components of Apache Calcite. Stay tuned!

We are always ready to help you with your SQL query optimizer design. Just let us know.