Common Table Expression ( CTE ) – Multiple References can cause drag

Background

A co-worker is working on a new project and wanted my thoughts on the best way to design the underlying database queries.

 

Design

Table Design

Here is the the table design.

UserActivity

 

Contents

Review Data

Here is the data …

reviewData

 

Explanation

  1. The current state is listed in the eventCD column
  2. The timestamp is listed in the recent_activity column
  3. And, the entries are assigned sequence Number

 

Query Design

Self Join

As the data is logged vertically, we will use a self join to get start and end time.



declare @userID	   varchar(100)
declare @refID	   varchar(200)
declare @schoolID  varchar(40)
declare @courseID  varchar(40)

set @userid = 'mary'
set @refID = 'DED001'
set @schoolID = 'DRVEDCA'
set @courseID = 'C0000013'


;with cte
(
	  [id]
	, eventCD
	, recent_activity

)
as
(
	select
		   [id] = ROW_NUMBER() OVER(ORDER BY recent_activity asc)
		 , eventCD
		 , recent_activity
	from   UserAcct.UserActivity 
	where  userID    = @userID 
	and    refID     = @refID 
	and    schoolID  = @schoolID 
	and    courseID  = @courseID 
	and    eventCD in 
		 (
			  'COURSESTARTED'
			, 'COURSERESUMED'
			, 'COURSEEXIT'
			, 'REVIEWSTARTED'
			, 'REVIEWEXIT'
		)

)

, cteDetail
(
	  [id]
	, recent_activity
	, eventCD
	, recent_activityNext
	, eventCDNext
	, [duration]
)

as
(
				
	select 
		  cteCurrent.[id]

		, cteCurrent.[recent_activity]

		, cteCurrent.eventCD

		, recent_activityNext
			= cteNext.[recent_activity]

		, eventCDNext
			= cteNext.eventCD

		, [duration] = datediff
			(
			     minute
			   , cteCurrent.[recent_activity]
			   , cteNext.[recent_activity]
			)


	from   cte cteCurrent

	inner join cte cteNext

			on cteNext.[id]
				=
				  (
					select min(cteNext_Inner.[id])

					from   cte cteNext_Inner

					where  cteCurrent.eventCD 
						in ( 'COURSESTARTED', 'COURSERESUMED', 'REVIEWSTARTED')

					and  cteNext_Inner.eventCD	
						in  ( 'COURSEEXIT', 'REVIEWEXIT')

					and   cteNext_Inner.[id]
						 > cteCurrent.[id]

					)


	where  cteCurrent.eventCD 
				in ( 'COURSESTARTED', 'COURSERESUMED', 'REVIEWSTARTED')

	and    cteNext.eventCD	
				in    ( 'COURSEEXIT', 'REVIEWEXIT')

)

select 
	  [id]
	, recent_activity
	, eventCD
	, recent_activityNext
	, eventCDNext
	, [duration]

from   cteDetail


 

Explanation

  1. Common Table Expression
    • Get a filtered list for specific criteria; Our criteria is Userid, Reference ID, School ID, Course ID
    • Use Windowing Function to get Row Number

 

Get Schedule

 

Scaler Function

[OnlineCourse].[GetCourseTimingLog]

Can not share the original Scaler function as I did not write it

 

Inline User-Defined Function ( IUDF )

Let us create an Inline User-Defined function to get a generate a tabulated view of our schedule.

IUDF – [OnlineCourse].[IUDF_CourseTimingTabulated]



SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

if object_id('[OnlineCourse].[IUDF_CourseTimingTabulated]') is null
begin

	exec( '
				create function [OnlineCourse].[IUDF_CourseTimingTabulated]() 
				returns table 
				as return( select 1/0 as [shell]) 
		  '
		)

end
go

ALTER function  [OnlineCourse].[IUDF_CourseTimingTabulated]
(
	  @userID		varchar(100)
	, @refID		varchar(200)
	, @schoolID		varchar(40)
	, @courseID		varchar(40)
)
RETURNS TABLE
AS
	return
	(

		with cte
		(
			  [id]
			, eventCD
			, recent_activity

		)
		as
		(

			select
					[id] = ROW_NUMBER() 
						OVER(ORDER BY recent_activity asc)
				   , eventCD
				   , recent_activity
			from   UserAcct.UserActivity 
			where  userID    = @userID 
			and    refID     = @refID 
			and    schoolID  = @schoolID 
			and    courseID  = @courseID 
			and    eventCD in 
						(
							  'COURSESTARTED'
							, 'COURSERESUMED'
							, 'COURSEEXIT'
							, 'REVIEWSTARTED'
							, 'REVIEWEXIT'
						)

		)
		, CteDetail
		(
			  [id]
			, recent_activity
			, eventCD
			, recent_activityNext
			, eventCDNext
			, [duration]
		)

		as
		(
						
			select 
					  cteCurrent.[id]

					, cteCurrent.[recent_activity]

					, cteCurrent.eventCD

					, recent_activityNext
							= cteNext.[recent_activity]

					, eventCDNext
						= cteNext.eventCD

					, [duration] = datediff
						(
							  minute
							, cteCurrent.[recent_activity]
							, cteNext.[recent_activity]
						)


			from   cte cteCurrent

			inner join cte cteNext

					on cteNext.[id]
						=
							(
								select min(cteNext_Inner.[id])

								from   cte cteNext_Inner

								where  cteCurrent.eventCD 
										in ( 
											    'COURSESTARTED'
											  , 'COURSERESUMED'
											  , 'REVIEWSTARTED'
											)

								and  cteNext_Inner.eventCD	
										in  (
											    'COURSEEXIT'
											  , 'REVIEWEXIT'
											)

								and   cteNext_Inner.[id]
										 > cteCurrent.[id]

							)


			where  cteCurrent.eventCD 
						in ( 
								  'COURSESTARTED'
								, 'COURSERESUMED'
								, 'REVIEWSTARTED'
							)

			and    cteNext.eventCD	
						in  ( 
								  'COURSEEXIT'
								, 'REVIEWEXIT'
							)

		)

		, cteDetailUnique
		as
		(

			select 
					  [id]
					, recent_activity
					, eventCD
					, recent_activityNext
					, eventCDNext
					, [duration]

			from   cteDetail cteD

			where  id =
						(
							select min(id)

							from   cteDetail cteD_Inner

							where  cteD.[recent_activityNext] 
									= cteD_Inner.[recent_activityNext]
						)

		)

		select *
		from   cteDetailUnique


	)

go



 

Metrics

Statistics Time

StatisticsTime

Statistics I/O

StatisticsIO

 

Explanation
  1. Logical reads of 22004392 seems to be excessive

 

Query Plan

QueryPlan

 

Explanation
  1. From the query plan, we can see that the same operations are being repeated multiple times
    • Index Seek – Find targeted data
    • Key Lookup – Use Clustering key to get Key
    • Nested Loops – Join results of Index Seek and Key Lookup
    • Sequence Project – Windowing Function ( Row Number )

 

 

Table Value User-Defined Function ( TVUDF )

Let us attempt to use a Table Value User-Defined function to tackle same problem.

TVUDF – [OnlineCourse].[TVUDF_CourseTimingTabulated]

 




if object_id('[OnlineCourse].[TVUDF_CourseTimingTabulated]') is null
begin

	exec('create function [OnlineCourse].[TVUDF_CourseTimingTabulated]()

			RETURNS @table TABLE 
			(
				  [id]					int
				, recent_activity		datetime
				, eventCD				varchar(255)
				, recent_activityNext	datetime
				, eventCDNext			varchar(255)
				, [duration]			int

			)
			as
			begin

				return

			end

		')

end

go

ALTER FUNCTION [OnlineCourse].[TVUDF_CourseTimingTabulated]
(
	  @userID		varchar(100)
	, @refID		varchar(200)
	, @schoolID		varchar(40)
	, @courseID		varchar(40)
)

RETURNS @table TABLE 
(
	  [id]					int
	, recent_activity		datetime
	, eventCD				varchar(255)
	, recent_activityNext	datetime
	, eventCDNext			varchar(255)
	, [duration]			int

)

AS
BEGIN

	declare @source TABLE
	(
		  [id]				int not null 
								primary key

		, eventCD			varchar(255) not null

		, recent_activity	datetime not null
	)

	;with cte
	(
		  [id]
		, eventCD
		, recent_activity

	)
	as
	(
		select
				[id] = ROW_NUMBER() 
					OVER(ORDER BY recent_activity asc)

				, eventCD

				, recent_activity

		from   UserAcct.UserActivity 
		where  userID    = @userID 
		and    refID     = @refID 
		and    schoolID  = @schoolID 
		and    courseID  = @courseID 
		and    eventCD in 
					(
						  'COURSESTARTED'
						, 'COURSERESUMED'
						, 'COURSEEXIT'
						, 'REVIEWSTARTED'
						, 'REVIEWEXIT'
					)

	)

	insert into @source
	(
		  [id]		
		, eventCD	
		, recent_activity
	)

	select 
		  [id]		
		, eventCD	
		, recent_activity

	from   cte

	;with CteDetail
	(
		  [id]
		, recent_activity
		, eventCD
		, recent_activityNext
		, eventCDNext
		, [duration]
	)

	as
	(
					
		select 
				  cteCurrent.[id]

				, cteCurrent.[recent_activity]

				, cteCurrent.eventCD

				, recent_activityNext
						= cteNext.[recent_activity]

				, eventCDNext
					= cteNext.eventCD

				, [duration] = datediff
					(
						  minute
						, cteCurrent.[recent_activity]
						, cteNext.[recent_activity]
					)


		from   @source cteCurrent

		inner join @source cteNext

				on cteNext.[id]
					=
						(
							select min(cteNext_Inner.[id])

							from   @source cteNext_Inner

							where  cteCurrent.eventCD 
									in ( 
										    'COURSESTARTED'
										  , 'COURSERESUMED'
										  , 'REVIEWSTARTED'
									   )

							and  cteNext_Inner.eventCD	
									in  ( 
											  'COURSEEXIT'
											, 'REVIEWEXIT'
										)

							and   cteNext_Inner.[id]
									 > cteCurrent.[id]

						)


		where  cteCurrent.eventCD 
					in ( 
							  'COURSESTARTED'
							, 'COURSERESUMED'
							, 'REVIEWSTARTED'
						)

		and    cteNext.eventCD	
					in ( 
							  'COURSEEXIT'
							, 'REVIEWEXIT'
					   )

	)

	, cteDetailUnique
	as
	(

		select 
				  [id]
				, recent_activity
				, eventCD
				, recent_activityNext
				, eventCDNext
				, [duration]

		from   cteDetail cteD

		where  id =
					(
						select min(id)

						from   cteDetail cteD_Inner

						where  cteD.[recent_activityNext] 
								= cteD_Inner.[recent_activityNext]
					)

	)

	insert into @table
	(
		  [id]
		, recent_activity
		, eventCD		
		, recent_activityNext
		, eventCDNext		
		, [duration]		
	)
	select 
		  [id]
		, recent_activity
		, eventCD		
		, recent_activityNext
		, eventCDNext		
		, [duration]		
	from   cteDetailUnique

	return

END

GO


grant select on [OnlineCourse].[TVUDF_CourseTimingTabulated] to [public]
go




Explanation
  1. Declared a table variable ( @source )
  2. Populated the table variable using Common Table Expression ( cte)
  3. Referred to the cached table variable  rather than the cte later in the code

 

Metrics

Statistics Time

StatisticsTime

 

Statistics I/O

StatisticsIO

Explanation
  1. No data on Statistics I/O

 

Query Plan

QueryPlan

 

Explanation
  1. Nothing useful via Query Plan

 

Compare “Inline User-Defined Function” and “Table Value User-Defined Function”

Query Plan

compareQueryPlan

 

Statistics I/O

compareStatisticsIO

Tabulated

Here is a tabulated view of how the two types of functions compare with each other:

 

 

Metric Metric-Sub Inline User Defined Function Table Value User Defined Function
 Query Plan
Weighted  97%  3%
 Time
 CPU Time  29, 484 ms  702 ms
 Elapsed Time  29, 615 ms  710 ms
Statistics IO
UserActivity Scan count 18632, logical reads 22004392
Worktable  Scan count 15, logical reads 169 Table ‘#149C0161’. Scan count 1, logical reads 1, physical reads 0

 

Explanation

Unfortunately, Table Value User Defined Functions hides I/O and query plan data.

But, the time taken for Inline User Defined Function is 30 seconds, while for Table Value User Defined Function, we came in at less than 1 second.

Summary

Aesthetically, I like the simplicity of Inline User Defined Functions ( IUDF ).

But, depending on the complexity of the query you might have to resort to a Table Value User Defined Function ( TVUDF )  to squeeze out performance gains.

In our case, the most expensive part of the query is getting to our filtered list in the base table.

When we used TVUDF, we are able to separate gathering that data from actual usage.

 

Listening

Listening to Nas & J.Cole on Stay

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s