Technical: Microsoft – SQL Server – Transact SQL – Get max correlated join VS Windowing Function (ROW_NUMBER() with PARTITION)

Technical: Microsoft – SQL Server – Transact SQL – Get max correlated join VS Windowing Function (ROW_NUMBER() with PARTITION)

Introduction

This is a short write-up of a recent query that I was looking to optimize.

One good thing about reviewing other developer’s code is that one gets introduced to new SQL Functions ( that are being added to the SQL ANSI Standards).

 

Use-Case

In this case, we wanted to get the highest sales number for all Sales Agents.

Traditionally, we will use correlated queries and rely on the Max function to get us Max Sales figures.

But, having recently reviewed a query where the Architect (JR) had used the new Windowing Functions effectively I wanted to see if it will help here, as well.

DDL – Create Tables

DDL – Create Table – dbo.client



if OBJECT_ID('dbo.client') is null
begin

	create table dbo.[client]
	(
	          [clientID] bigint not null identity(1,1) 
		, [clientName] nvarchar(100)

		, constraint PK_Client primary key ([clientID])
	)

	create unique index idx_ClientName
	on dbo.[client]
	(
		[clientName]
	)	

end
go

DDL – Create Table – SalesPerson



if OBJECT_ID('dbo.salesPerson') is null
begin

	create table dbo.[salesPerson]
	(
	          [personID]   bigint not null identity(1,1) 
		, [firstName]  nvarchar(100) not null
		, [lastName]   nvarchar(100) not null
		, [middleName] nvarchar(100) null

		, [fullName] as [firstName] + ' ' + [lastName] persisted

		, constraint PK_SalesPerson primary key ([personID])
	)

end
go

Create Table – dbo.Order



if OBJECT_ID('dbo.order') is null
begin

	create table dbo.[order]
	(
	      [orderID]    bigint not null identity(1,1)
	    , [clientID]   bigint
			constraint FK_ClientID foreign key ([clientID])
				 references dbo.[client]
				  ([clientID])		  

	    , [salesPersonID]   bigint not null
			constraint FK_PersonID foreign key ([salesPersonID])
				 references dbo.[salesperson] ([personID])

	    , [orderDate]  datetime not null 
			constraint DF_Order_OrderDate default getutcdate()

		, [orderAmt]   money not null

		, constraint PK_Order primary key ([orderID])
	)

	create index idx_OrderAmt
	on dbo.[order]
	([orderAmt])

	create index idx_SalesPersonID
	on dbo.[order]
	([salesPersonID])

	CREATE NONCLUSTERED INDEX [idx_SalesPersonID__OrderID]	
	ON [dbo].[order] ([salesPersonID])
	INCLUDE ([orderID],[clientID],[orderDate],[orderAmt])

end
go

Data

Data – Insert Data – Client



set nocount on
go

declare @NumberofClients bigint

delete from dbo.client

if not exists 
	(
		select 1
		from   dbo.[client]
	)
begin	

   insert into dbo.client([clientName]) values ('Macy');
   insert into dbo.client([clientName]) values ('Exxon Mobil');
   insert into dbo.client([clientName]) values ('Wal-Mart Stores');
   insert into dbo.client([clientName]) values ('Chevron');
   insert into dbo.client([clientName]) values ('General Motors');

   insert into dbo.client([clientName]) values ('General Electric');
   insert into dbo.client([clientName]) values ('Ford Motor');
   insert into dbo.client([clientName]) values ('AT&T');
   insert into dbo.client([clientName]) values ('Fannie Mae');	
   insert into dbo.client([clientName]) values ('CVS Caremark');

   insert into dbo.client([clientName]) values ('McKesson');
   insert into dbo.client([clientName]) values ('Hewlett-Packard');
   insert into dbo.client([clientName]) values ('Verizon Communications');
   insert into dbo.client([clientName]) values ('UnitedHealth Group');
   insert into dbo.client([clientName]) values ('J.P. Morgan Chase & Co');

   insert into dbo.client([clientName]) values ('International Business Machines');
   insert into dbo.client([clientName]) values ('Bank of America Corp');
   insert into dbo.client([clientName]) values ('Costco Wholesale');
   insert into dbo.client([clientName]) values ('Kroger');
   insert into dbo.client([clientName]) values ('Express Scripts Holding');	

   insert into dbo.client([clientName]) values ('WellsFargo');
   insert into dbo.client([clientName]) values ('Citigroup');
   insert into dbo.client([clientName]) values ('Archer Daniels Midland');
   insert into dbo.client([clientName]) values ('Proctor & Gamble');
   insert into dbo.client([clientName]) values ('Prudential Financial');

   insert into dbo.client([clientName]) values ('Boeing');
   insert into dbo.client([clientName]) values ('Home Depot');
   insert into dbo.client([clientName]) values ('Microsoft');
   insert into dbo.client([clientName]) values ('Target');
   insert into dbo.client([clientName]) values ('Walgreen');	

   insert into dbo.client([clientName]) values ('American International Group');
   insert into dbo.client([clientName]) values ('MetLife');
   insert into dbo.client([clientName]) values ('Johnson & Johnson');
   insert into dbo.client([clientName]) values ('Caterpillar');
   insert into dbo.client([clientName]) values ('PepsiCo');	

end

set @NumberofClients = (select count(*) from dbo.client)
print 'Number of Clients ' + cast(@NumberofClients as varchar(30))

Data – Insert Data – SalesPerson



set nocount on
go

declare @NumberofPersons bigint

delete from  dbo.[salesPerson]

if not exists 
	(
		select 1
		from   dbo.[salesPerson]
	)
begin	

   /*
	Inc. 10 Greatest salesperson of all time
	http://www.inc.com/ss/10-greatest-salespeople-of-all-time#3

	http://www.resume.se/nyheter/media/2013/05/07/hon-kan-bli-arets-mediesaljare/

	http://www.woopidoo.com/profession/sales/

  */	

	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Rahel', 'Bishops');

         insert into dbo.salesPerson([firstName], [middleName], [lastName]) 
         values ('John', 'H.', 'Patterson');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('David', 'Ogilvy');
	insert into dbo.salesPerson([firstName], [middleName], [lastName]) 
        values ('Mary', 'Kay','Ash');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Rahel', 'Bishops');					

	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Dale', 'Carnegie');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Joe', 'Girald');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Erica', 'Feidner');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Ron', 'Popeil');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Larry', 'Ellison');	

	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Zig', 'Ziglar');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Napoleon', 'Barragan');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Joe', 'Girald');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Erica', 'Feidner');
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Nyla', 'Dove');

	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Marie', 'Strand');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Monica', 'Sved');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Niklas', 'Linberg');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Tony', 'Kaski');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Lotta', 'Bimer');	

	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Josephine', 'Baker');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Greg', 'White');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Kari', 'Bloom');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Jay', 'Abrahmam');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Edward', 'Bernays');

	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Richard', 'Branson');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Calvin', 'Klein');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Og', 'Mandino');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Charles', 'Saatchi');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Edward', 'Bernays');							

	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Bernard', 'Arnault');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Alan', 'Bond');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Eli', 'Broad');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Walter', 'Chrysler');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('David', 'Geffen');							

	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Sakip', 'Sabanci');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Kerry', 'Stokes');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Steve', 'Wynn');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('Rene', 'Rivkin');	
	insert into dbo.salesPerson([firstName], [lastName]) 
        values ('John', 'Templeton');	

end

set @NumberofPersons = (select count(*) from dbo.[salesPerson])
print 'Number of Clients ' + cast(@NumberofPersons as varchar(30))

Data – Insert Data – Order



set nocount on
go

declare @NumberofClients bigint
declare @NumberofSalesPersons bigint
declare @NumberofOrders bigint

declare @lOrder bigint
declare @lOrderMax bigint

declare @clientID   bigint
declare @personID   bigint
declare @orderDate  datetime
declare @orderAmt   bigint
declare @iDayToAdd  int		

truncate table dbo.[order]

if not exists 
	(
		select 1
		from   dbo.[order]
	)
begin	

	set @NumberofClients = (select MAX([clientID]) from dbo.[client])

	set @NumberofSalesPersons = (select max([personID]) from dbo.[salesPerson])

	set @lOrder = 1
	set @lOrderMax = 100000

	while (@lOrder <= @lOrderMax) 	
        begin

	set @clientID = cast((rand(checksum(newid()))) * @NumberofClients as bigint)

	set @personID = cast((rand(checksum(newid()))) 
                             * @NumberofSalesPersons as bigint) 	
	set @iDayToAdd = cast((rand(checksum(newid()))) * @lOrderMax as bigint) 	set @orderDate = dateadd(Day, RAND() * @iDayToAdd,  '1/1/1970') 		set @orderAmt = cast(rand(checksum(newid())) * 10181.819 as bigint) 				   + cast(rand(checksum(newid())) * 5161.819 as bigint) 			   + cast(rand(checksum(newid())) * 7181720.819 as bigint) 			   + cast(rand(checksum(newid())) * 51615490.78 as bigint) 

	if (@clientID =0) or (@clientID > @NumberofClients)
	begin
	     set @clientID = cast(@NumberofClients / 2 as int)
	end

	if (@personID =0) or (@personID > @NumberofSalesPersons)
	begin
	    set @personID = cast(@NumberofSalesPersons / 2 as int)
	end

	insert dbo.[order]
	(
	  [clientID], [salespersonID], [orderDate], [orderAmt]
	)
	values
	(
	    @clientID
	  , @personID
	  , @orderDate
	  , @orderAmt
        )

	set @lOrder = @lOrder + 1

end

end

set @NumberofClients = (select count(*) from dbo.[client])
print 'Number of Clients ' + cast(@NumberofClients as varchar(30))

set @NumberofSalesPersons = (select count(*) from dbo.[salesPerson])
print 'Number of Salespersons ' + cast(@NumberofSalesPersons as varchar(30))

set @NumberofOrders = (select count(*) from dbo.[order])
print 'Number of Orders ' + cast(@NumberofOrders as varchar(30))
go

update statistics dbo.[order] with fullscan
go

Query

Query – Correlated Max Query

This query uses the correlated Max Query



select
           tblSalesPerson.personID
         , tblSalesPerson.fullName
         , tblOrder.orderDate
         , tblOrder.orderAmt

from dbo.[salesPerson] tblSalesPerson

        inner join dbo.[order] tblOrder

              on tblSalesPerson.personID = tblOrder.salesPersonID

where tblOrder.[orderAmt] =
                               (
                                   select MAX(orderAmt)
                                   from dbo.[order] tblOrder_Inner
                                   where tblOrder.salesPersonID 
                                            = tblOrder_Inner.[salesPersonID]
                               )

Query - Order - Max (Correlated)

Query – Uses Partitioned Ranking

This query uses the new Windowing Function.  And, that query snippet is embedded in the actual query.



select
             tblSalesPerson.personID
           , tblSalesPerson.fullName
           , tblOrder.orderDate
           , tblOrder.orderAmt

from dbo.[salesPerson] tblSalesPerson

   inner join

        (
           select
              tblOrder.*
            , DENSE_RANK() 
              over (PARTITION by tblOrder.salespersonID order 
                      by tblOrder.orderAmt desc)
                        as rankID
           from dbo.[Order] tblOrder

        ) tblOrder
         on tblSalesPerson.personID = tblOrder.salesPersonID

where tblOrder.rankID = 1

order by tblOrder.orderAmt desc

Query - Order - Partition Ranking (Embedded)

Query – Uses Partitioned Ranking

This query uses the new Windowing Function.  And, that query snippet is externalized as a Common Table Expression (CTE).


;with cteOrder 
(
	salesPersonID, orderAmt, rankID
)
as
(
	select *
	from 
		(
	 	     select 
			     tblOrder.salespersonID
                           , tblOrder.orderAmt
			   , DENSE_RANK() over 
                                (PARTITION by tblOrder.salespersonID 
                                    order by tblOrder.orderAmt desc)
				       as rankID
			from   dbo.[order] tblOrder 

		) tblOrder
	where rankID = 1

)

select *
from  cteOrder	tblOrder	
		inner join dbo.salesPerson tblSalesPerson	
			on tblOrder.salesPersonID = tblSalesPerson.personID

Partitioned Ranking – Common Table Expression (CTE)

Query - Order - Partition Ranking (CTE)

Results:

ID Correlated/Max Embedded CTE
CPU Execution Time 31 ms 124 ms 94 ms
Cached plan size 48 B 64 B 64 B
CompileCPU 12 4 4
CompileMemory 720 360 352
CompileTime 12 4 4
Degree of Parallelism 1 2 2
Subtree Cost 0.712381 5.83487 5.60542
Memory Granted 1024 14624 11104
Estimated # of Rows 29.2305  4214.69 4214.69

 

Interpret Performance Data

CPU Execution Time

  • Will vary widely depending on what is going on in the system (at the time) metrics were taking
  • Will suggest that load tool be used and that profile data should be averaged

Cached Plan Size

  • The max query’s plan size is @ 48 Bytes and the partition range is @ 64 Bytes

Compile CPU / Compile Memory / Compile Time

  • Compile CPU and Compile Time for Max Query is triple the size of the Partition Range
  • And, Compile Memory is double

Degree of Parallelism

  • The Max query’s degree of Parallelism holds stable @ 1; whereas that Partition Range expands as the Instance’s degree of Parallelism is adjusted

Subtree Cost

  • The Sub Tree Cost for the Max query is 0.71 and the cost for the Partition Range is 5.61
  • Note that since our threshold for Parallelism is @ 5, the latter queries are run in Parallel

Memory Granted

  • Memory granted for the Max query is 1024 and the one for the Partition Range are 14624 and 11104

Estimated Number of Rows

  • The Estimated # of Rows for the Max query is 39 and the one for the Partition Range is at 14624 and 11104
  • Once we review the Queries plan we can see when the Estimated # of Rows are actualized.  And, that occurs when the Order and SalesPerson tables are actually joined
  • The joins occur during the correlated join for the Max Query
  • But for the Partition Range it occurs after the Sort Operation

More on Query Plans…

Sort

  • The Sort Operation is quite expensive and it seems that the Partition Range will use it to get max date for each range
  • On the Partition Range queries the Sort Operation is @ 65%

References:

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s