Recently I was trying to optimize query, this was the query,
SELECT ...
FROM A
INNER JOIN B
ON B.ID = A.ID
WHERE B.Status = 'Done'
AND B.DateCreated BETWEEN @Start and @End
AND NOT EXISTS (...)
AND A. ... other conditions
Table B
is huge table, and query took couple of seconds. Even though there was an index of (ID, Status, DateCreated).
When I changed query to,
SELECT ...
FROM A
INNER JOIN B
ON B.ID = A.ID AND B.Status = 'Done'
AND B.DateCreated BETWEEN @Start and @End
AND NOT EXISTS (...)
WHERE A. ... other conditions
Surprisingly query only took few milliseconds. Upon further investigation I found that moving NOT EXISTS
inside JOIN improved the speed.
Top comments (11)
You were just lucky. Query optimizer has chosen a non cached version of plan to execute this query.
I don't think that was the case, Execution plans for both queries are different, Also these queries were executing frequently and I saw the difference even executing them in parallel (half queries in old way and half in new way simultaneously).
Plans have obviously have to be different.
I see this is SQL Server and it will cache plan for an exact query you run. By moving date time clause to join condition you force a new plan generation.
You should be able to get same results by forcing plan recompilation
I looked at my query again and I found out that I had an extra clause
NOT EXISTS
which was causing difference in speed and plans, omittingnot EXISTS
didn't make difference in speed in both cases.That's not luck, it would absolutely produce a new plan because the text changed. Even adding a space or making a letter upper case will generate a new plan.
Yes, this is correct.
I should've been more clear that moving clause from where to join does not add any performance on its own, new forcefully generated plan did. Thanks for bringing this up!
You also have to look at the possible 'reason for early termination' in the plan. If the plan is not able to be fully optimized and has a reason for early termination, it may have executed serially and in these cases you may get a better plan with the filter in the join. However, your problem lies in the fact that the optimizer isn't able to complete. Potential causes there are complex queries or auto updating statistics, to name a couple.
Thanks for this very useful post. I just wanted to add that, there is a very easy way now, to test all the SQLs described here, using the free & portable tools, mentioned in my latest post here : dev.to/linuxguist/learn-sql-quickl...
In any decent optimizer, INNER JOIN and WHERE conditions are basically interchangeable.
Also, I'm suspicious about any index that starts on ID because they are highly selective from the start.
Would like to see the execution plans of the two different queries.
That's just a very useless observation and doesn't prove anything.
Have you figured out why exactly did this happen? What's in NOT EXISTS condition? What are the execution plans?
Can you share the execution plans?