Microsoft – SQL Server – Table Inserts – Table Spool [Eager Spool] – Is misaligned partitioning really the issue?

This is a follow-up to a couple of other posts:

Table Design

Create Database – demo


  if db_id('Demo') is not null
  begin

     create database [Demo];

  end
  go

Define Partition Scheme and Function

use [Demo]
go

if not exists   
                (

                    select *
                    from   sys.partition_functions
                    where  name = 'pfnYear'

                )
begin

    print 'Creating partition function pfnYear(smallint) ...'

    create partition function pfnYear(smallint)
    as range right for values
    (
          2010   
        , 2011
        , 2012
        , 2013
        , 2014
        , 2015

    )

        print 'Created partition function pfnYear(smallint) '

end

go

if not exists   
                (

                    select *
                    from   sys.partition_schemes
                    where  name = 'partitionSchemeYear'

                )
begin

    print 'Creating partition scheme partitionSchemeYear  ...'

        create partition scheme partitionSchemeYear 
        as partition pfnYear
        ALL TO ([default])

    print 'Created partition scheme partitionSchemeYear'

end

Define Table with misaligned index partition

To satisfy half-way correction, we will do the following:

  • Create a new table dbo.customer
  • The Primary Key will be a Non-clustered Index and it will be named PK_customer and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will not be partitioned
  • An Index on regionName (idx_Region) will not be partitioned


/*

    exec sp_helpindex 'dbo.customer'
*/

if OBJECT_ID('dbo.customer') is not null
begin
    drop table [dbo].[customer] 
end    
go	

CREATE TABLE [dbo].[customer]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] CONSTRAINT defaultCustomerInsertTime DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null constraint defaultCustomerSalesDate DEFAULT (getutcdate()),
	[SalesYear] as cast(isNull(datepart(year,[salesDate]), 1990) as smallint) persisted not null,

		CONSTRAINT [PK_customer] PRIMARY KEY NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_pfnYear_SalesYear__customerID] 
ON [dbo].[customer] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customer] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customer] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]

GO

Insert into Misaligned partition



set statistics time off
set statistics io off

truncate table [dbo].[customer];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

Results of misalignment:

Misaligned Index - Full

Misaligned Index - Full - Statistics IO

Explanation:

  • Expensive Table Spool (Eager Spool) – Taking up 31%
  • Expensive Work Table using up 7 logical reads

Define Table with corrected index partition

To satisfy full correction, we will do the following:

  • Create a new table dbo.customerRevised
  • The Primary Key will be a Non-clustered Index and it will be named PK_customerRevised and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will be partitioned
  • An Index on regionName (idx_Region) will be partitioned


/*
    exec sp_helpindex 'dbo.customerRevised'
*/

if OBJECT_ID('dbo.customerRevised') is not null
begin
    drop table [dbo].[customerRevised] 
end    
go	

CREATE TABLE [dbo].[customerRevised]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] 
             CONSTRAINT defaultcustomerRevisedInsertTime 
                DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null 
             constraint defaultcustomerRevisedSalesDate 
                DEFAULT (getutcdate()),
	[SalesYear] as 
              cast(isNull(datepart(year,[salesDate]), 1990) as smallint) 
                  persisted not null,

		CONSTRAINT [PK_customerRevised] PRIMARY KEY NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_DBA_pf_customerIDChecksumModulo20__customerID] 
ON [dbo].[customerRevised] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customerRevised] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])

GO

CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customerRevised] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])
GO


Misalignment Correction (Insert Statement)


set statistics time off
set statistics io off

truncate table [dbo].[customer];
truncate table [dbo].[customerRevised];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

insert into [dbo].[customerRevised]
(
    [customerID], [salesDate]
)
values
(
    'B1', getutcdate()
)

Misalignment Correction:

Misaligned Index - Correction - Full

Misalignmed Index - Correction - Statistics IO

Explanation:

  • With proper partition correction we now have a much cleaner Plan (At a base of 100% -the original plan costed 57% and the revised plan costed 43%)
  • We have shed the 7 logical reads attributed to the “worktable”

Define Table with half-way corrected index partition

To satisfy half-way correction, we will do the following:

  • Create a new table dbo.customerRevisedHalfway
  • The Primary Key will be a Non-clustered Index and it will be named PK_customerRevisedHalfway and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will be partitioned
  • An Index on regionName (idx_Region) will not be partitioned


/*

    exec sp_helpindex 'dbo.customerRevisedHalfway'
*/

if OBJECT_ID('dbo.customerRevisedHalfway') is not null
begin
    drop table [dbo].[customerRevisedHalfway] 
end    
go	

CREATE TABLE [dbo].[customerRevisedHalfway]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] 
            CONSTRAINT defaultcustomerRevisedHalfwayInsertTime 
                DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null 
            constraint defaultcustomerRevisedHalfwaySalesDate
                DEFAULT (getutcdate()),
	[SalesYear] as 
            cast(isNull(datepart(year,[salesDate]), 1990) as smallint) 
               persisted not null,

		CONSTRAINT [PK_customerRevisedHalfway] PRIMARY KEY
                 NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_DBA_pf_customerIDChecksumModulo20__customerID] 
ON [dbo].[customerRevisedHalfway] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customerRevisedHalfway] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])

GO

--index is pointing @ default file group
CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customerRevisedHalfway] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]
          --ON [partitionSchemeYear]([SalesYear])

GO

Insert into Half way aligned partition


set statistics time off
set statistics io off

truncate table [dbo].[customer];
truncate table [dbo].[customerRevised];
truncate table [dbo].[customerRevisedHalfway];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

insert into [dbo].[customerRevised]
(
    [customerID], [salesDate]
)
values
(
    'B1', getutcdate()
)

insert into [dbo].[customerRevisedHalfway]
(
  [customerID], [salesDate]
)
values
(
  'C1', getutcdate()
)

Misalignment Correction (Halfway):

Misaligned Index - Correction - Halfway

Statistics I/O

Misaligned Index - Correction - Halfway (Statistics IO)

Explanation:

  • The Query cost of misalignment is at 40%, fully corrected alignment is at 30%, and halfway correction is at 30%
  • When partitioning is fully corrected and when only halfway corrected we still shed the same amount of logical reads (7); directly attributable to the “worktable”
  • For the half way corrected partitioned index, you will see two distinct index inserts (the first one for the Clustered index and the secondary one for the non-partitioned aligned index); note that the half way corrected case is pointing at two different file groups (different partitions)
  • Whereas for the fully corrected index, we have a single index operator (all indexes are tied into the same partition)

CodeBase

To share the code, I have posted it to my GitHub Account.

Conclusion

Eager Spool

ScaryDBA in his web post,  Spools in Execution Plan ( http://www.scarydba.com/2009/09/09/spools-in-execution-plans/), did a good job describing this Table Spool thing.

Spool operations are temporary storage of the data for later reuse in a query plan. There are two types of spool operations, eager spool and lazy spool. A spool is basically a temporary table created within the execution of the query that is used when it’s likely that data will be needed again, and again during the execution of the query. This is not an explicit #temp temporary table, but a work table for operations within the processing necessary for a given query’s behavior. A spool is created when the optimizer thinks that it can work better with a semi-permanent sub-set of data rather than have to perform multiple seeks or scans against a table or index or in other places where data re-use is important (more in a bit).

Seemingly, in our original table definition, with more than one misaligned index, we ended up with a situation where post the “Clustered Index” operation, the Query Engine opted to go the route of creating and populating “Table Spools”.

It is not immediately clear at this point why the Query Engine chose this fairly in-efficient path.

Addendum

Addendum (2013-04-27)

The original incident was captured on MS SQL Server – v2008R2.

I have revised per v2012.

Execution Plan

TableSpool__MSSQLServer_v2012

IO Cost

TableSpool__MSSQLServer_v2012_IOStats

So Microsoft’s documentation did a fairly good job on describing what a Table Spool is:

http://msdn.microsoft.com/en-us/library/ms181032(v=sql.105).aspx
The Table Spool operator scans the input and places a copy of each row in a hidden spool table that is stored in the tempdb database and existing only for the lifetime of the query. If the operator is rewound (for example, by a Nested Loops operator) but no rebinding is needed, the spooled data is used instead of rescanning the input.

But, I will allow you the Performance Engineer to look at your queries and understand which ones can potentially follow this path and guard against it when other paths can be a bit less pernicious.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s