DEV Community

Cover image for Query Planning: How citus plans query
Muhammad Zeeshan
Muhammad Zeeshan

Posted on

Query Planning: How citus plans query

Distributed query planner

When a SQL query references a Citus table during execution, the distributed query planner produces a PostgreSQL query plan that contains a Custom Scan node, which contains the distributed query plan.

A distributed query plan constists of the following tasks.

  • Queries on shards to run on the worker nodes.
  • A set of subplans whose result need to be broadcast or re-partitioned.
  • It has four main query planning strategies to scale and to distribute the queries.

4 strategies

A: Fast Path Planner

It handles and work for simple queries like create read, update, and delete i.e, CRUD queries on a single table with a single distributed column value. It works by extracting the distributed column value directly from a filter in the query and determines the value of shard that matched with that value.

Planner then rewrites the name of the table to the shard name that matches the value in the column and construct the query to run on the worker node, this task is done with the minimal usage of the CPU overhead.

B:Router Planner

It handles arbitrary complex queries, that can be scoped to one set of co-located shards. It works by checking and infering whether all distributed tables have the same distributed column filters or not. If so, it then rewrites the table name in the query to the names of the co-located shards that matches the distributed column value. It supports all SQL features that PostgreSQL supports since it will simply delegate the full query to another PostgreSQL server.

Worker nodes

C: Logical Push Planner

It works by detecting whether the join tree( queries that have some joins statements) can be fully pushed down. This requires that all the distributed tables have co-located joins between them and subqueries do not require a global merger step. In this a GROUP BY must include the distribution column. If so, the distribution planner can be largely agnostic to the SQL constructs being used woth the join tree, because they can be fully delegated to the worker nodes, and so our distribution query planner can work in planer.

D: Logical Join Order Planner

It determines the optimal execution order for join tree which involves joins that are not co-located. Firsly, it evaluates all possible join order between distributed tables and subqueries using co-located joins, broadcast joins and re-partition joins. And after evaluation it chooses the order that minimizes the network traffic. From the mentioned joins broadcst and re-partition joins results in subplans that includes the filter and projections pushed into the subplan.

These above are the main four plans that Citus cinsiders while query execution. For each query, Citues iterates over these four planners, from lowest to highest overhead. Then it uses the planner that can plan and executes that query with minimum overhead.

Quey Planner

Conclusion

So, to sum up the query planner, Citus Distributed query planner takes a SQL query and then plans it for distributed execution. Then after performing some optimizations planner breaks the query into two parts. The coordinatoe query which runs on the coordinator and worker query fragments which runs on individual shards on the worker. Planner then assign these fragments to the workers keeping in view the best utilization of its resources. After this query has been passed to the Distributed query executor for execution.

Top comments (0)