DEV Community

Tomas@PawSQL
Tomas@PawSQL

Posted on

Slash Slow Queries by Rewriting OR to UNION

Problem Definition

If two query conditions are connected by OR, the database optimizer may not be able to use indexes to execute the query. For example, for the following SQL statement:

select * from customer where c_phone like '139%' or c_name = 'Ray' 
Enter fullscreen mode Exit fullscreen mode

Even if both fields have indexes, they can be rewritten as a UNION query to leverage the indexes and improve query performance:

select * from customer where c_phone like '139%'
union  
select * from customer where c_name = 'Ray'
Enter fullscreen mode Exit fullscreen mode

However, this transformation does not always improve performance. It requires certain conditions and cost-based estimation.

If the database supports INDEX MERGING, you can also enable the INDEX MERGING optimization strategy by adjusting database parameters to improve database performance.

Applicable Conditions

  1. The OR-connected conditions must be indexable.
  2. The estimated cost of the rewritten UNION statement must be lower than the original SQL.
  3. If the OR branch conditions are mutually exclusive, rewrite them as UNION ALL instead of UNION.

Case Analysis

Case 1. Conditions cannot utilize indexes, no rewrite optimization

select * from customer 
where c_phone = '1' or c_phone like '%139%' 
Enter fullscreen mode Exit fullscreen mode

Analysis: One condition c_phone like '%139%' cannot use indexes. Even after rewriting, a full table scan is still required. PawSQL does not rewrite.

Case 2. Filter condition selectivity is low enough, no rewrite optimization

select * from customer 
where custkey = 1 and (c_phone = '1' or c_phone like '%139%')
Enter fullscreen mode Exit fullscreen mode

Analysis: custkey is the primary key. custkey = 1 uniquely locates one record, so no need to rewrite.

Case 3. Conditions met, use rewrite optimization

select distinct * from customer 
where c_phone like '139%' or c_name = 'Ray' 
Enter fullscreen mode Exit fullscreen mode

Analysis: Both conditions can use indexes and the selectivity is less than 10%. Can be rewritten. distinct can be removed after rewriting by union.

select * from customer where c_phone = '1'  
union
select * from customer where c_phone like '139%' 
Enter fullscreen mode Exit fullscreen mode

Case 4. OR condition branches are mutually exclusive, rewrite to UNION ALL

select * from customer where custkey = 1 or (custkey = 2 and c_phone like '139%')
Enter fullscreen mode Exit fullscreen mode

Analysis: The two condition branches custkey = 2 and c_phone like '139%' and custkey = 1 are mutually exclusive. Rewrite to UNION ALL:

select * from customer where c_custkey = 2 and c_phone like '139%' 
union all
select * from customer where c_custkey = 1
Enter fullscreen mode Exit fullscreen mode

Case 5. Contains order by and limit, use rewrite optimization

select * from orders o  
where O_ORDERDATE>='2021-01-01' and (O_ORDERPRIORITY = 1 or O_SHIPPRIORITY = 1)
order by O_ORDERDATE desc limit 10
Enter fullscreen mode Exit fullscreen mode

Analysis: Although the selectivity of O_ORDERPRIORITY = 1 and O_SHIPPRIORITY = 1 is high, the total cost is lower by leveraging indexes to avoid sorting. PawSQL rewrites the optimization.

select *
from (
   (select /*QB_2*/ *
    from orders as o
    where o.O_ORDERDATE >= '2021-01-01'
    and o.O_SHIPPRIORITY = 1
    order by o.O_ORDERDATE desc limit 10)  
       union  
  (select /*QB_1*/ *
  from orders as o
  where o.O_ORDERDATE >= '2021-01-01'
    and o.O_ORDERPRIORITY = '1'
  order by o.O_ORDERDATE desc limit 10
  )
) as PawDT_1702555889039  
order by PawDT_1702555889039.O_ORDERDATE desc limit 10
Enter fullscreen mode Exit fullscreen mode

Case 6. Contains grouping and aggregates, use rewrite optimization

select O_ORDERDATE, count(1) from orders o 
where O_ORDERDATE>='2021-01-01' and (O_ORDERPRIORITY = 1 or O_SHIPPRIORITY = 1)
group by O_ORDERDATE 
Enter fullscreen mode Exit fullscreen mode

Analysis: Although the selectivity of O_ORDERPRIORITY = 1 and O_SHIPPRIORITY = 1 is high, the total cost is lower by leveraging indexes to avoid sorting. PawSQL rewrites the optimization.

select PawDT_1702884016144.O_ORDERDATE, count(1) 
from (
    select /*QB_2*/ o.O_ORDERDATE, o.O_ORDERKEY  
    from tpch_pkfk.orders as o
    where o.O_ORDERDATE >= '2021-01-01'
    and o.O_SHIPPRIORITY = 1
       union
select /*QB_1*/ o.O_ORDERDATE, o.O_ORDERKEY  
 from tpch_pkfk.orders as o  
 where o.O_ORDERDATE >= '2021-01-01'
    and o.O_ORDERPRIORITY = '1'
) as PawDT_1702884016144 
group by PawDT_1702884016144.O_ORDERDATE
Enter fullscreen mode Exit fullscreen mode

Performance Validation

Case 5. 900x performance improvement

  • Before optimization (execution time 432.322ms)

Image description

  • After optimization (execution time 0.189ms)

Image description

Case 6. 15x performance improvement

  • Before optimization (2.816ms)

Image description

  • After optimization (0.204 ms)

Image description

About PawSQL

PawSQL focuses on automatic and intelligent SQL optimization for databases. It supports MySQL, PostgreSQL, openGauss, Oracle and more. PawSQL products include:

  • PawSQL Cloud, an online automated SQL optimization tool that provides SQL review, intelligent query rewriting, cost-based index recommendations for DBAs and developers.
  • PawSQL Advisor, an IntelliJ plugin for developers, can be installed from IDEA/DataGrip marketplace by searching "PawSQL Advisor".
  • PawSQL Engine, the backend optimization engine of PawSQL products, can be deployed independently and provides SQL optimization services via HTTP/JSON interfaces. PawSQL Engine is provided as a Docker image.

source

Top comments (1)

Collapse
 
linuxguist profile image
Nathan S.R.

Thanks for this very useful post. I just wanted to add that, there is a very easy way now, to test all the SQLs described here, using the free & portable tools, mentioned in my latest post here : dev.to/linuxguist/learn-sql-quickl...