Microsoft – SQL Server – Windowing Function – Indexing Requirements

Preface

I was bestowed a MS SQL Server code-base that has sufficient amount of MS SQL Server Windowing Functions (WF) in it. And, so I have to finally catch up to them. And, take WF a bit more seriously.

Background

In Old SQL to utilize line level and summarized data we break them into nice sections.  To do so inside the defining SQL, we use Common Table Expressions, Correlated Sub-Queries, etc.

Obviously, we can also externalized them into fully defined objects such as SQL Functions (Table Value and Scaler Functions) and Views.

Sample Code – Create Table/Indexes and Add Data

Store Line

So nothing original, just list each customer and his / her max and min balances.

Sample Code – Common Table Expression (CTE)

Here is the CTE implementation:


;with cteBalance
(
	  customerID
	, valueMin
	, valueMax
)
as
(
	select
		  tblcustomerBalance.customerID
		, min(tblcustomerBalance.value) as valueMin
		, max(tblcustomerBalance.value) as valueMax

	from 	dbo.customerBalance tblcustomerBalance

	group by
		tblcustomerBalance.customerID

)
select *
from   cteBalance tblBalance
order  by
	  tblBalance.customerID asc

Codebase

Create Tables and Indexes

  • Create Database DBLab if it does not exist
  • Create Table – customerList – with primary key customerID
  • Create Table – customerBalance – with 2 indexes – the columns are customerID, value — one with value ascending and the other value descending

 


set noexec off
set nocount on
set statistics io off
go

use [master]
go

if DB_ID('DBLab') is null
begin
 exec('create database [DBLab]')
end
go

if DB_ID('DBLab') is null
begin
 
  raiserror('Database - DBLab - has not been created', 16, 1)
  return

end
go

use [DBLab]
go

if (DB_NAME() != 'DBLab')
begin

 raiserror('current database is not DBLab', 16, 1)
 return

end
go

if object_id('dbo.customerList') is null
begin

 create table dbo.customerList
 (
      [customerID] int not null
    , [customerName] varchar(100) not null
 )
end
go

if object_id('dbo.customerBalance') is null
begin

 create table dbo.customerBalance
 (
      [customerID] int not null
    , [value] decimal(20,2) not null
 )

 create index idx_customerID__value_desc
 on dbo.customerBalance
 ([customerID], [value] desc)

end
go

--create index on customerID, value /ascending
if not exists
 (
     select 1
     from sys.indexes tblIndex
     where tblIndex.object_id = object_id('dbo.customerBalance')
     and tblIndex.name = 'idx_customerID__value_asc'

 )
begin

 create index idx_customerID__value_asc
 on dbo.customerBalance
 ([customerID], [value] asc)

end
go

--create index on customerID, value /descending
if not exists
 (
 select 1
 from sys.indexes tblIndex
 where tblIndex.object_id = object_id('dbo.customerBalance')
 and tblIndex.name = 'idx_customerID__value_desc'

 )
begin

 create index idx_customerID__value_desc
 on dbo.customerBalance
 ([customerID], [value] desc)

end
go

Add Data

Here we go adding random data:

For the customerList table, we add data based on ASCII/Char
For the customerBalance table, we add random data – MS SQL Server Engineers nice one on go N — It allows us to run the preceding query N times
Please change N to whatever number of customerBalance records suits you


use [DBLab]
go

declare @iID int
declare @iIDMax int

set @iID =1
set @iIDMax = 26

if not exists
(
select top 1 1
from dbo.customerList
)
begin

print 'Adding data to dbo.customerList ...'

while (@iID <= @iIDMax)
begin

	insert into dbo.customerList
	(
	customerID
	, [customerName]
	)
	values
	(
	@iID
	, char(@iID+64)
	)

	set @iID = @iID +1

end

print 'Adding data to dbo.customerList'

end

if exists
(
	select top 1 *
	from dbo.customerBalance
)
begin

	print 'skipping insert into dbo.customerBalance'
	set noexec on

end
go

print 'not skipping -- Inserting into dbo.customerBalance ....'
go

insert into dbo.customerBalance
(
	  customerID
	, [value]
)
SELECT
	ceiling(rand() * 10)
	, cast(rand() * 1000000 as decimal(16,2))

go 100000

print ' Added data into dbo.customerBalance'
go

set noexec off
go

 

Query Plan

WF-Correlated-CustomerBalance-Min

Windowing Function /Min

 


select 

	  tblcustomerBalance.customerID
	, tblcustomerBalance.value

from   dbo.customerList tblCustomer

inner join

        (
           select
              tblcustomerBalance.*
            , DENSE_RANK()
              over (PARTITION by tblcustomerBalance.customerID order
                      by tblcustomerBalance.value asc)
                        as rankID
           from dbo.[customerBalance] tblcustomerBalance

        ) tblcustomerBalance

         on tblCustomer.customerID = tblcustomerBalance.customerID 

		 and tblcustomerBalance.rankID = 1

 

 

Query Plan

WF-PartitionBy-CustomerBalance-Min

Correlated Sub Query /Max


/*
	Correlated customerBalance/Max
*/
select
	  tblcustomerBalance.customerID
	, tblcustomerBalance.value

from   dbo.customerList tblCustomer

  	  inner join dbo.customerBalance tblcustomerBalance
		on tblCustomer.customerID = tblcustomerBalance.customerID

where  tblcustomerBalance.value =
 	(
		select max(tblcustomerBalance_Inner.value)
		from   dbo.customerBalance tblcustomerBalance_Inner
		where  tblcustomerBalance.customerID
                        = tblcustomerBalance_Inner.customerID
	)

Query Plan

 

WF-Correlated-CustomerBalance-Max

Windowing Function /Max



select 
	  tblcustomerBalance.customerID
	, tblcustomerBalance.value
		
from   dbo.customerList tblCustomer

	inner join

		(
	
		   select
			  tblcustomerBalance.*

			, DENSE_RANK() 
  			    over (PARTITION by tblcustomerBalance.customerID order 
						  by tblcustomerBalance.value desc)
							as rankID
		   from dbo.[customerBalance] tblcustomerBalance

		) tblcustomerBalance
		
  	       on tblCustomer.customerID = tblcustomerBalance.customerID 
	       and tblcustomerBalance.rankID = 1		
			
			

Query Plan

WF-PartitionBy-CustomerBalance-Max

 

Analysis

 

Compare Query Plans

Here is a screen shot that compares the Query Plans: CompareQueryPlans

The query costs appear similar for all 4 queries; with a tiny bit of twist:

    • The 2nd and 4th query which uses Windowing Functions weigh slightly more at 26% compared to the ones that use Correlated Joins @ 24%

 

Tabulated Results

CPU and Cost

 

Correlated Sub Query/MinWindow Function/MinCorrelated Sub Query/MaxWindow Function/Max
31 ms / 83 ms0 ms / 0 ms15 ms / 153 ms0 ms / 4 ms
78 ms / 252 ms187 ms / 206 ms78 ms / 77 ms187 ms / 193 ms
0.5004612 0.5960573 0.5004612 0.4923536
Metric
Parse and Compile Time Execution Time Total Sub Tree Cost

 

 

Quick Analysis:

    • Parse and Compile Time :- We have values for Correlated queries, but 0 for Window Functions
    • Execution Time :- Window Functions doubles that for Correlated queries
    • Total Sub Tree Cost :- Close to statistically equivalent

 

 

IO Stats

 

TableStats

customerListScan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
customerBalanceScan count 11, logical reads 420, physical reads 3, read-ahead reads 392, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

customerListScan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
customerBalanceScan count 1, logical reads 530, physical reads 3, read-ahead reads 553, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
WorktableScan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

customerListScan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
customerBalanceScan count 11, logical reads 420, physical reads 3, read-ahead reads 392, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

customerListScan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
customerBalanceScan count 1, logical reads 389, physical reads 3, read-ahead reads 392, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
WorktableScan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Query
Correlated Sub Query/Min Table Value Function /Min Correlated Sub Query/Max Table Value Function /Min

 

 

Quick Analysis:

    • As expected with the correlated queries, we repeat the inner query once per each outer record.  As we have 11 records in the customerList table, we have ‘Scan count’ of 11 or so
    • For WF Queries, our scan count for customerBalance remains at 1
    • Thankfully, Microsoft reports Worktable interactions when it occurs.  For WF it occurs and because of our index etc, it is @ 0 to indicate that actual disk access does not occur

 

 

Compare Indexes used

Based on a must read Blog post by SQL Server MVP, Paul White:

Problem with Windows Functions
http://www.sqlperformance.com/2013/03/t-sql-queries/the-problem-with-window-functions-and-views

I confirmed that I needed a descending index on the customerBalance column.  Without it, the Window Function query struggles when it partitions by our key column and tries to order using the Index’s un-ordered column:


DENSE_RANK()
     over (PARTITION by tblcustomerBalance.customerID order
            by tblcustomerBalance.value desc
          )
          as rankID

Here are the queries once they are hard-coded to use specific Indexes:


select
           tblcustomerBalance.customerID
	 , tblcustomerBalance.value

from   dbo.customerList tblCustomer

        inner join

          (
           select
                     tblcustomerBalance.*
                   , DENSE_RANK()
              over (PARTITION by tblcustomerBalance.customerID order
                      by tblcustomerBalance.value desc)
                        as rankID
           from dbo.[customerBalance] tblcustomerBalance
                     with (index=idx_customerID__value_asc)

          ) tblcustomerBalance

           on tblCustomer.customerID = tblcustomerBalance.customerID
		 and tblcustomerBalance.rankID = 1

select
		  tblcustomerBalance.customerID
		, tblcustomerBalance.value
from   dbo.customerList tblCustomer
inner join

        (
           select
              tblcustomerBalance.*
            , DENSE_RANK()
              over (PARTITION by tblcustomerBalance.customerID order
                      by tblcustomerBalance.value desc)
                        as rankID
           from dbo.[customerBalance] tblcustomerBalance
                   with (index=idx_customerID__value_desc)

        ) tblcustomerBalance
         on tblCustomer.customerID = tblcustomerBalance.customerID
		 and tblcustomerBalance.rankID = 1

go

 

Tabulated Results

CPU and Cost

Metric Window Function (using asc index) Window Function (using desc indx)
Parse and Compile Time 15 ms / 38 ms 0 ms / 105 ms
Execution Time 311 ms / 442 ms 187 ms / 353 ms
Total Sub Tree Cost 4.604602 0.4923536

Quick Analysis:

  • Parse and Compile Time :- When we have the right index, in this case the desc index, the Parse and Compile (PC) time is @ 0.  Without the right index, the PC time is 15 ms
  • Execution Time :- With the right index, execution tim is @ 187 ms.  Without it, it is @ 311 ms
  • Total Sub Tree Cost :- Without the right index, Total Sub Tree Cost is @ 4.6.  With it, we are @ 0.49

In summary, without the right index our query is exponentially more expensive

IO Stats

Query Table Stats
Table Value Function /Max (not using the right index)
customerList Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
customerBalance Scan count 3, logical reads 588, physical reads 3, read-ahead reads 536, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Value Function /Max (using the right index)
customerList Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
customerBalance Scan count 1, logical reads 389, physical reads 3, read-ahead reads 392, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Quick Analysis:

  • Without the right index, we cycle through the WF table three times; as compared to once when we have the right index

And, here is MS SQL Server report from comparing the Query Plans:

compareWF-QueryHintSpecificIndexes

Quick Points

  • From the Screen shot, the un-mirrored index columns Query takes up 69% compared to the perfectly matched query which takes up 31%
  • When we do not have a corresponding orderings a big part of our query Plan is eaten up to ensure Orderings;  this is captured and reflected by the Sort Operator
  •  We have the need for the Sort Operator when an identically ordered index key is not present; when present the Index is used and there is no need for the Sort Operator
  • In the screen shot above, the Sort takes up 54%
  • When Windowing Functions are in-use, there are a couple of new operators including “Sequence Project”

Take Away

What is my take-away?  As I thought of writing this up, I struggled with how to communicate it effectively.  Much of what I am saying adds to previously observed and published Hard Labor.

Window Functions have been in MS SQL Server since v2005.  But, having learnt SQL way before then, never used nor even learnt about them.  Using sub-queries is good enough for me; I am basically a DBA (Operational and Engineering).

And, so intuitively I know about how to scale SQL and find \ fix troublesome queries.  But, as one feel the nudge of writing, another need that is both over-lapping and compelling starts to take root.

And, that is the need to dig deeper into your tools and see what it has to help you to translucent your thoughts.

And, for me that is using “set statistics profile on”.  I have never used it before.   I use “set statistics io” and “set statistics time on”.  And, make sure that through the GUI/mouse, I have enabled “Include Actual Execution Plan”.

For harder stuff, I will use a tool from Huntervilles, NC.  We all know SQLSentry is the best armor.  And, they have a free one @ http://www.sqlsentry.com/plan-explorer/sql-server-query-view.asp.

But, checked SQL Server Books online by goggling for “set statistics io on”.

Found and thus enabled “set statistics profile on”:

set statistics io on
set statistics time on
set statistics profile on
go

And, that gave me a nice layout of each operation.

setstatisticsprofileon

With “set statistics profile on”:

  • We see that our most expensive operation is the Sort Operation.  In its EstimateCPU column it is @ 3.8
  • Window Functions can be a bit expensive as they do not act upon a subset of the table, but against the entire data-set.  In our case, across all 100,000 records in customerBalance table

Dedication

This post is dedicated to Paul White & his publicly shared labor; in this case specifically The Problem with Window Functions and View (http://www.sqlperformance.com/2013/03/t-sql-queries/the-problem-with-window-functions-and-views).

In all my years of working with SQL never knew where the Index Desc clause fitted-in. Now I know thanks to Paul White; and also thankful for Microsoft having it in; though probably under-utilized all these years.

A special thanks and recognition to Itzik Ben-Gan’s book (Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions)  and articles on same topic in SQL Server Pro, as well.

Like the old folks say the days go by so quickly these days.  And, unlike Glen Campbell, I am not going to make Phoenix, but once I make it to that computer, I will read, learn something new, and affirm previous thoughts from these guys.

References

References – Paul White

References – Robert Shelton

References – Itzik Ben-Gan

One thought on “Microsoft – SQL Server – Windowing Function – Indexing Requirements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s