SQL Server Indonesia User Groups Community Index Union Analysis - kasim.wirama
Welcome to SQL Server User Group Indonesia
Sign in JoinHelp | Sign In Live ID

kasim.wirama

Index Union Analysis

Index Union Analysis

By : Kasim Wirama, MCDBA

 

When there are 2 or more where criteria which are joined with OR, each columns is different among others, they are indexed and search selectivity is high, optimizer could consider multiple indexes and join the end result with concatenation or merge join concatenation operator. Such a pattern of execution plan is called index union. Let’s explore what possibilities of execution plan generated from index union of a query statement.

 

Here is the query to get any ordered which is shipped or created between 1 January 2007 and 7 January 2007.

 

SELECT orderid
FROM orders
WHERE shippeddate BETWEEN '19970101' AND '19970107'
OR orderdate BETWEEN '19970101' AND '19970107';

 

Because there is index for shippeddate and orderdate column respectively, optimizer decides to use each indexes to get result from shippeddate BETWEEN '19970101' AND '19970107' and orderdate BETWEEN '19970101' AND '19970107'. Execution plan for the query is :

 

  |--Sort(DISTINCT ORDER BY:([Northwind].[dbo].[Orders].[OrderID] ASC))

       |--Concatenation

            |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[ShippedDate]), SEEK:([Northwind].[dbo].[Orders].[ShippedDate] >= 1997-01-01 00:00:00.000' AND [Northwind].[dbo].[Orders].[ShippedDate] <= 1997-01-07 00:00:00.000') ORDERED FORWARD)

            |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]), SEEK:([Northwind].[dbo].[Orders].[OrderDate] >= 1997-01-01 00:00:00.000' AND [Northwind].[dbo].[Orders].[OrderDate] <= 1997-01-07 00:00:00.000') ORDERED FORWARD)

 

Notice that result from 2 index seeks is joined with concatenation operator. Because the result might be duplicate, optimizer implement SORT DISTINCT operator to remove duplication. The query above could be rewritten as :

 

SELECT ordered
FROM orders
WHERE shippeddate BETWEEN '19970101' AND '19970107'
UNION
SELECT ordered
FROM orders
WHERE orderdate BETWEEN '19970101' AND '19970107';

 

Query rewrite above is valid because orderid is primary key which is also considered as unique key.

 

Now consider another query below :

 

SELECT orderid
FROM orders
WHERE shippeddate = '19970101' OR orderdate = '19970101';

 

Query plan for the query is :

 

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Northwind].[dbo].[Orders].[OrderID]))

       |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[OrderID]))

       |    |--Merge Join(Concatenation)

       |         |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[ShippedDate]), SEEK:([Northwind].[dbo].[Orders].[ShippedDate]='1997-01-01 00:00:00.000') ORDERED FORWARD)

       |         |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]), SEEK:([Northwind].[dbo].[Orders].[OrderDate]='1997-01-01 00:00:00.000') ORDERED FORWARD)

       |--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]), SEEK:([Northwind].[dbo].[Orders].[OrderID]=[Northwind].[dbo].[Orders].[OrderID]) LOOKUP ORDERED FORWARD)

 

Notice that stream aggregate and merge join concatenation appears instead of concatenation and SORT DISTINCT operator. Why does optimizer choose merge join and stream aggregate? It’s because that the query uses equality instead of inequality. Recall that merge operator will probably appear when a predicate contains at least one equality operator. From efficiency of resource utilization perspective, merge join and stream aggregate is better than sort operator because they do not require memory rather than SORT operator that might introduce data spill out to tempdb database when memory is under pressure.

 

The query above could be written as :

 

SELECT orderid
FROM orders
WHERE shippeddate = '19970101'
UNION
SELECT orderid
FROM orders
WHERE orderdate = '19970101';

 

Query plan for this query is :

   |--Merge Join(Union)

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[ShippedDate]), SEEK:([Northwind].[dbo].[Orders].[ShippedDate]='1997-01-01 00:00:00.000') ORDERED FORWARD)

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]), SEEK:([Northwind].[dbo].[Orders].[OrderDate]='1997-01-01 00:00:00.000') ORDERED FORWARD)

 

Merge join (Union) operator replaces Stream Aggregate and Merge join (concatenation).

Query rewriting is always generated from query plan analysis, and the process is fun and rewarding.

Posted: Saturday, January 17, 2009 5:02 AM by kasim.wirama

Filed under:

Comments

No Comments