Calculating Number of Hierarchy Level in Clustered Index
By : Kasim Wirama, MCDBA, MCITP
This posting, I would like to describe the way to calculate number of hierarchy level of an index in SQL Server. Before I delve into calculation discussion how to calculate index level, I describe index structure in SQL Server.
In SQL Server Books Online, index structure (applies on both non-clustered and clustered index) forms b-tree. B-tree is balanced tree. Balanced tree is a tree where no leaf is much farther away from the root than any other leaf in the tree. Algorithmic background of the tree structure can be found at http://www.nist.gov/dads/HTML/balancedtree.html or you can read book titled “The Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition) by Donald E. Knuth (Addison-Wesley Professional, 1998). Go back to tree structure discussion, the difference between non-clustered and clustered index lies to what data contains in leaf level (lowest level) of an index hierarchy. In clustered index, lowest level of index contains data itself, whereas it contains index key column and clustering key (in clustered index table) or row locater (in heap table). Heap table is a table that doesn’t have clustered index. What is clustering key? Clustering key is a clustered index that contains unique index key or if the index key is not unique, internally SQL Server will add uniquifier to make uniqueness of the index column, at the end clustered index is a unique index internally though its key column doesn’t appear unique.
Structure of balanced tree contains root (only 1 page) at top level and subsequent lower levels (1 or more pages). For example there are 3 levels, i.e. : level1, level2, level3. Each records in level1 points to each page(s) in level2 and each rows in level2 will point to each page(s). In same level where there might be more than one pages (except level1), there is double linked list connecting one page to next page so that it is possible SQL Server parses from one page to other page at same level.
To get information how many index level on existing index, you can query function INDEXPROPERTY with parameter IndexDepth. Type in books online regarding to INDEXPROPERTY function and you will get list of input parameter which one of them has description for IndexDepth. How about non existing index? You can still have a way to estimate how many level that a index would have by following the following calculation :
1. Get information number of rows in a table. For example you have 1 million rows in orders table.
2. Determine row size of a table. If there is variable length data type such as varchar/nvarchar; you probably need to estimate how many characters will occupy for length specified to corresponding variable those length data types. For example you assume 50% of column length is counted. And see whether the column is null, if yes, then add 1 bit column overhead. At the end of all columns size is determined, add 2 byte pointer overhead. That’s because there is 2 byte pointer located at the end of every page. For example, row size for order table is 200 byte.
3. Determine page density. You will have your index has empty space so that it doesn’t fully fill up a page. You can refer page density to fill factor of an index, usually fill factor is 90%.
4. Calculate the number of rows that could fit into one page, the formula is (page size – header size) * page density / row size. Page size in SQL Server is 8192 byte, header size is 96 byte, page density is 0.9 and row size is 200 byte. The floor result is 36.
5. Calculate number of pages based on result on step 4, the formula is number of total rows divided by number of rows of each pages (1 million/36). The result is 27778 pages.
6. That’s the lowest level, now go up the upper level. Calculate average row size of a page providing size of index key column. If index key column is datetime type so the size is 8 byte. If it is not unique then there is additional 4 byte size as uniquifier, 2 bytes pointer located at the end of a page, pointer for linked list is 6 bytes plus internal overhead is 5 bytes, so total is 25 bytes if the index column is not unique or 21 bytes is index column is unique. For example the orderdate column is not unique.
7. Counter number of rows that fit in one page with same formula in point 4. The result is 291 rows per one page.
8. To get number of levels above leaf level, you get it from formula ceiling (log 27778 / log 291) = 2.
Total pages including leaf level is the previous result plus 1 (it is 3 for final result).
By getting know number of levels of a non-existing index, you will know the cost of index seeks. The cost equals to number of page it travels from root to required leaf level, whilst index scan will scan all leaf pages. The fewer pages it travels, the more efficient access method of a corresponding query is.