Recently, I ran into a performance issue while writing an Oracle SQL
query.
The execution time was shockingly slow --- about 110 seconds for the
whole query, with each sub-query taking around 18 seconds.
Here's a simplified version of one of the SQL blocks (several nearly
identical blocks were combined using UNION ALL):
select
'A' as type,
count(adl.is_attend) as count,
round(
sum(case when adl.is_attend = 1 then 1 else 0 end)
/ count(adl.is_attend), 2
) * 100 as rate
from attendance adl
inner join (
select deptId
from department
start with deptId = '...'
connect by up_daptId = prior deptId
) nbd on adl.deptId = nbd.deptId
where
to_date(adl.date,'yyyy-MM-dd') >= to_date(#{begin_date},'yyyy-MM-dd')
and to_date(adl.date,'yyyy-MM-dd') <= to_date(#{end_date},'yyyy-MM-dd');
#{begin_date} and #{end_date} are backend parameters passed
dynamically.
Since this block is repeated multiple times with only minor differences,
the total runtime quickly stacked up to over 100 seconds.
Why Was It So Slow?
1. Huge amount of data
The attendance table contains 13+ million rows.\
Even a simple full table scan takes close to 20 seconds.
2. Recursive join
The query joins the department table using Oracle's hierarchical
query:
start with ... connect by ...
This expands to all sub-departments under a given department, which is
computationally expensive.
⚡ Optimization Step 1 --- Indexing the Join Key
First of all, the problem of data volume is unsolvable. It is impossible for us to say that a certain piece of data is directly discarded. Moreover, our where condition has already limited the time, so there is actually a certain limit on the data volume. Therefore, the first idea of optimization is on Join:
Recursive
Joinis a very time-consuming operation, so we can directly think of building an index ondeptId-in fact.
This is the correct idea, adding an index immediately reduced the execution time:
18 seconds → 2 seconds
⚡ Optimization Step 2 --- Removing to_date in the WHERE Clause (With Caution)
Calling to_date() on a column disables index usage:
to_date(adl.date,'yyyy-MM-dd') >= ...
The
to_datefunction is used when the date is limited. In fact, when only comparing the year, month, and day, we can directly not use this function for string comparison. This is because Oracle SQL's Data Priority will implicitly convert low-priority strings to date formats. Here, through implicit conversion, the running time can be continuously optimized, from 2S to 1S.
But note, this is risky - first, implicit data type conversion may have a negative impact on performance (of course here is a positive); also, implicit conversion may produce strange results. Depending on the value of the NLS_DATE_FORMAT parameter.
When converting a datetime value to text via an implicit conversion or an explicit conversion that does not specify a format model, the format model is defined by one of the globalization session parameters, namely NLS_DATE_FORMAT, NLS_TIMESTAMP_FORMAT, or NLS_TIMESTAMP_TZ_FORMAT.
For example, the default value of the NLS_DATE_FORMAT setting is 'DD-MON-RR' two-digit year, so the century time may be lost when the date is converted to a string, and replaced by 1900 or 2000 when converted back to a date based on the decade value; and assuming that the original time has some minutes and seconds, this data will also be lost.
In conclusion, different parameters may have different results.
So, considering that the optimization of Solution 2 is not necessary, I still keep the date function.
✅ Final Result
After applying only the safe optimization (indexing):
- Whole query time: 110s → 10s
- Each SQL block: 18s → 2s
With date optimization (not used in production):
- Each block: ~1s
Top comments (0)