Transact SQL – Prune Data based on Date

 

Background

We have a table that is getting quite long. As it is only a log table, it is OK to remove dated records.

 

Problem

We went in without a good, solid script to remove records and we rightfully paid for our laziness and stupidity.

We started experiencing timeouts and so we quickly reached for a few good diagnostic tools.

Adam Machanic – WhoisActive

Available here.

sp_whoIsActive-20160322-0533PM

 

Explanation

  1. Session ID = 142 is being blocked by Session ID=151

 

Basit Farooq – Identify the cause of SQL Server blocking

Basit Farooq has a very nice script for tracking blocking sessions.

Available Here

Blocked Sessions

BlockTransactions

 

Explanation:

  1. Inserts into the same table via SaveSTSLogRequest are blocked
  2. Updates via SaveSTSLogResponse are blocked, as well

 

Waiting Command Query Plan

Query Plan

Here is the Query Plan

IdentifyCauseOfBlocking

 

Explanation

  1. We are attempting an insert into a table
  2. Currently the table has three indexes, all are updated

Diagnosis

We have a couple of indexes on the table

  1. The index, recordCreated, that we using to filter the deletes is not the Clustered Index
  2. If we are able to change our Clustered Index to recordCreated, we will be to get away with using one index and not get into trouble with two or more indexes deadlocking each other
    • The two indexes being the recordCreated one, and the Identity Value one

 

Code


if schema_id('dbmgmt') is null
begin

	exec('create schema [dbmgmt] authorization [dbo] ')

end
go

--drop procedure [dbmgmt].[usp_Prune_STSDHSMVSrvLog] 
if object_id('[dbmgmt].[usp_Prune_STSDHSMVSrvLog]') is null
begin

	exec('create procedure [dbmgmt].[usp_Prune_STSDHSMVSrvLog] as select 1/0 as [shell] ')

end
go

alter procedure [dbmgmt].[usp_Prune_STSDHSMVSrvLog] 
(
	  @NumberofMonthsToKeep		   int = 12
	, @NumberofRecordsInEachBatch  int = 1000
	, @MaxNumberofBatches		   int = 1E6
	, @AddPrimaryKeyToDeleteFilter bit = 0
	, @useHardCodedIndexes		   bit = 0
	, @waitDelay				   varchar(30) = '00:00:01'
	, @NumberofRecordsDeleted	   bigint = 0 output
	, @debug					   bit = 0
)
as
begin

	/*

		Arguments

			waitDelay 
				a) Format is hh:mi:ss.mmm.


	*/

	set nocount on;
	set XACT_ABORT on;
	set DEADLOCK_PRIORITY LOW;

	declare @iNumberofRecordsAffected bigint

	declare @dateCurrent datetime
	declare @dateCutoff  datetime
	declare @log		 varchar(120)

	declare @CHAR_TAB    varchar(30)
	declare @CHAR_TAB_2    varchar(30)
	declare @CHAR_TAB_3    varchar(30)
	declare @CHAR_TAB_4    varchar(30)

	declare @CHAR_CRLF    varchar(30)

	declare @CONVERT_STYLE int

	declare @idPKCutoff	   int

	declare @idBatchID         int 
	declare @idBatchIDAsString varchar(30)
	declare @BATCH_ID_STRING_LENGTH int

	declare @time		  datetime
	declare @timeAsString varchar(30)

	declare @iNumberofRecords bigint

	declare @bWait			  bit

	set @CHAR_TAB = char(9)
	set @CHAR_TAB_2 = @CHAR_TAB + @CHAR_TAB
	set @CHAR_TAB_3 = @CHAR_TAB + @CHAR_TAB + @CHAR_TAB
	set @CHAR_TAB_4 = @CHAR_TAB + @CHAR_TAB + @CHAR_TAB + @CHAR_TAB

	set @CHAR_CRLF = char(13) + char(10)

	--set @CONVERT_STYLE = 109
	set @CONVERT_STYLE = 100

	set @bWait = 0

	if (@AddPrimaryKeyToDeleteFilter is null)
	begin

		set @AddPrimaryKeyToDeleteFilter = 0

	end


	if (@useHardCodedIndexes is null)
	begin

		set @useHardCodedIndexes = 1
	
	end

	/*
		If wait delay is valid, then set flag
	*/
	if (
			    ( @waitDelay is not null )
			and ( @waitDelay != '' )
	   )
	begin
		set @bWait = 1
	end
	else
	begin
		set @bWait = 0
	end

	--get current date
	set @dateCurrent = getdate()

	--get dateCutoff as differential in months
	set @dateCutoff = dateadd
						(
							  month
							, @NumberofMonthsToKeep * -1
							, @dateCurrent
						)

	set @idBatchID = 0
	set @BATCH_ID_STRING_LENGTH = 6

	if (@debug = 1)
	begin

		set @log = @CHAR_TAB_2 
					+ 'Current Date is '
					+ convert(varchar(30), @dateCurrent, @CONVERT_STYLE)
					+ @CHAR_CRLF
		print @log

	end


	if (@debug = 1)
	begin

		set @log = @CHAR_TAB_2 
					+ 'Cutoff Date is ' 
					+ convert(varchar(30),  @dateCutoff, @CONVERT_STYLE)
					+ @CHAR_CRLF
		print @log

	end

	if (@debug = 1)
	begin

		select @iNumberofRecords = count(*)
		from   [FLDHSMVTest].[STSDHSMVSrvLog] tblL 
		where  tblL.record_created < @dateCutoff

		if (@debug = 1)
		begin

			set @log = @CHAR_TAB_2 
						+ 'Number of Records (Before) is ' 
						+ convert(varchar(30),  @iNumberofRecords)
						+ @CHAR_CRLF
			print @log

		end

	end

	/*
		If we will be using the Primary/Clustering index to find affected records,
			let us determine that ID
	*/
	if (@AddPrimaryKeyToDeleteFilter = 1)
	begin

		if (@useHardCodedIndexes = 1)
		begin

			select @idPKCutoff
					 = min(
							tblL.[ID]
						  )
			from   [FLDHSMVTest].[STSDHSMVSrvLog] tblL  with ( INDEX=INDX_DBA_RecordCreated)
			where  tblL.record_created > @dateCutoff

		end
		else if (@useHardCodedIndexes = 0)
		begin

			select @idPKCutoff
					 = min(
							tblL.[ID]
						  )
			from   [FLDHSMVTest].[STSDHSMVSrvLog] tblL 
			where  tblL.record_created > @dateCutoff

		end

		if (@debug = 1)
		begin

			set @log = @CHAR_TAB_2 
							+ 'Cutoff Date / Max corresponding ID is ' 
							+ convert(varchar(30),  @idPKCutoff) 
							+ @CHAR_CRLF
			print @log

		end

	end

	/*
		Reset Number of Records Deleted
	*/
	set @NumberofRecordsDeleted = 0

	/*
		Reset Number of Records Affected
	*/
	set @iNumberofRecordsAffected = -1

	/*
		Iterate through Batch
			- Break out if numnber of records is 0
	*/
	while (
			    ( @idBatchID < @MaxNumberofBatches )
			and ( @iNumberofRecordsAffected != 0 )
		  )	
	begin

		/*
			Increment Batch ID
		*/
		set @idBatchID = isNull(@idBatchID, 0) + 1

		/*
			Set Batch ID
		*/
		set @idBatchIDAsString = replicate
									(
										  ' '
										, @BATCH_ID_STRING_LENGTH - len(@idBatchID)
									)
									+ cast(@idBatchID as varchar(30))

		/*
			Display Processed Batch - Pre Processing
		*/
		if (@debug = 1)
		begin

			set @time = getdate()
			set @timeAsString = convert(varchar(30), @time, @CONVERT_STYLE)

			set @log = @CHAR_TAB_2 
							+ @idBatchIDAsString
							+ ' - Processing Batch ' 
							+ ' @ '
							+ @timeAsString
							+ @CHAR_CRLF
			print @log

		end


		if (isNull(@AddPrimaryKeyToDeleteFilter, 0) = 0)
		begin

			/*
				Remove records prior to Cutoff Date
			*/
			delete top (@NumberofRecordsInEachBatch) tblL
			from   [FLDHSMVTest].[STSDHSMVSrvLog] tblL
			where  tblL.record_created < @dateCutoff

			/*
				get @@ROWCOUNT
			*/
			set @iNumberofRecordsAffected = @@ROWCOUNT

		end
		else if (@AddPrimaryKeyToDeleteFilter = 1)
		begin

			if (@useHardCodedIndexes = 1)
			begin

				/*
					Remove records prior to Earliest Sequential Value to Preserve
				*/
				delete top (@NumberofRecordsInEachBatch) tblL
				from   [FLDHSMVTest].[STSDHSMVSrvLog] tblL  with (INDEX=PK_FLDHSMVSrvLog)
				where  tblL.ID < @idPKCutoff 

				/*
					get @@ROWCOUNT
				*/
				set @iNumberofRecordsAffected = @@ROWCOUNT

			end

			else if (@useHardCodedIndexes = 0)
			begin

				/*
					Remove records prior to Earliest Sequential Value to Preserve
				*/
				delete top (@NumberofRecordsInEachBatch) tblL
				from   [FLDHSMVTest].[STSDHSMVSrvLog] tblL
				where  tblL.ID < @idPKCutoff 

				/*
					get @@ROWCOUNT
				*/
				set @iNumberofRecordsAffected = @@ROWCOUNT

			end



		end

		/*
			Display Processed Batch - Post Processing
		*/
		if (@debug = 1)
		begin

			set @time = getdate()
			set @timeAsString = convert(varchar(30), @time, @CONVERT_STYLE)

			set @log = @CHAR_TAB_2 
							+ @idBatchIDAsString
							+ ' - Processed Batch ' 
							+ ' @ '
							+ @timeAsString
							+ ' - '
							+ cast(@iNumberofRecordsAffected as varchar(6))
							+ ' record(s) deleted'
							+ @CHAR_CRLF

			print @log


		end

		set @NumberofRecordsDeleted
			 = isNull(@NumberofRecordsDeleted, 0)
				+ isNull(@iNumberofRecordsAffected, 0)

		print ''


		/*
			If we found no record to delete, then break out
		*/
		if (@iNumberofRecordsAffected =0)
		begin

			if (@debug = 1)
			begin

				set @time = getdate()
				set @timeAsString = convert(varchar(30), @time, @CONVERT_STYLE)

				set @log = @CHAR_TAB_2 
								+ @idBatchIDAsString
								+ ' - Breaking out of loop ' 
								+ ' @ '
								+ @timeAsString
								+ @CHAR_CRLF

				print @log


			end

			break;

		end

		/*
			If wait enabled, then wait specified time
		*/
		if (@bWait = 1)
		begin

			waitfor delay @waitDelay

		end

	end -- while 


	/*
		Display Number of Records pruned
	*/
	if (@debug = 1)
	begin
		
		set @log = @CHAR_TAB_2 
					+ 'Number of Records pruned is ' 
					+ convert(varchar(30),  @NumberofRecordsDeleted)
					+ @CHAR_CRLF
		print @log


	end

	/*
		Display Number of Records after
	*/
	if (@debug = 1)
	begin

		select @iNumberofRecords = count(*)
		from   [FLDHSMVTest].[STSDHSMVSrvLog] tblL
		where  tblL.record_created < @dateCutoff

		if (@debug = 1)
		begin

			set @log = @CHAR_TAB_2 
						+ 'Number of Records (After) is ' 
						+ convert(varchar(30),  @iNumberofRecords)
						+ @CHAR_CRLF
			print @log

		end

	end

end
go



Explanation

Parameters

  1. @NumberofRecordsInEachBatch
    • 1 record at a time
      • Some bloggers suggest to remove record one at a time
        • Good
          • The rationale is that you will have less deadlocks
        • Not Good
          • Things will be slower
          • Writing to your log file will be in very tiny increments and wasteful
    • We were good at about 1000 records per batch
  2.  @AddPrimaryKeyToDeleteFilter
    • When set to 1
      • We query for the lowest Sequential value greater than our cutoff date
  3. @waitDelay
    • We introduced a silent wait between each batch
    • The wait duration is default at 1 sec ( 00:00:01 )

 

Set Options

There are a couple of Set Options that we established at the top of our Script

  1. Set “No Count on” ( Explanation )
    • Prevent Number of rows affected count from being returned to caller after “each” statement
  2. Set “XACT_ABORT on”  ( Explanation )
    • If error occurs, the entire script is stopped and rolled back
  3. Set “Deadlock Priority Low” ( Explanation )
    • If deadlock is detected, our session should be chosen as the victim

Indexes

Hard-Coded Indexes

We resorted to hard-coded indexes in a few places

Statistics I/O

HardCoded

Explanation
  1. When we allow the SQL Engine to choose indexes, here is our Statistics I/O
    • Get minimal ID to keep = 276406
    • Delete records based on ID = 16147
  2. On the other hand, when we hard-coded indexes
    • Get Minimal ID to keep = 4
    • Delete records based on ID = 10695

Compare

Query Plan

 

QueryPlan

 

Explanation

  1. When we went against the date filter
    • 45% of Total
  2. When we went against the Primary Key
    • 5% of Total to get ID
    • 50% of Total to remove data based on Clustered Index

 

Statistics IO

StatisticsIO

Explanation

  1. When we went against the date filter
    • Logical Reads was at 16145
  2. When we went against the Primary Key
    1. Logical Read was at 10142
  3. On a grade of 100, Primary key is at 40%, Date is 60%

 

Credits

Remus Rasanu

Credits to Remus Rasanu.

Inserting data to a table which is being cleaned up by SQL job at the same time ( here )

CreditRemusRasanu

 

References

  1. Minimizing Deadlocks
  2. How to resolve blocking problems that are caused by lock escalation in SQL Server

 

2 thoughts on “Transact SQL – Prune Data based on Date

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s