Technical: Microsoft – SQL Server – Function – Newsequentialid

Technical: Microsoft – SQL Server – Data type GUID – Function – Newsequentialid

 

Background:

Here is a post that was started well over a year ago.  I was dealing with a terabyte database and the most expensive insert was inserting records into a table that has a uniqueIdentifier clustered primary key.

What is newsequentialid():

Quite a bit of web searches suggested that we consider using NewSequentialID.

NewSequentialID (Transact ID)
http://msdn.microsoft.com/en-us/library/ms189786.aspx

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

 

Implementation:

NewID

There are a couple of ways to use the NewID function.

We can use it as a default constraint and bind it directly to the column.  Thus we are employing declaratively.

We can also use it in our code-line by explicitly invoking it, capture the results into a variable, and use that variable in our DML statement.

Using NewID() as default constraint

Here is use using NewID declaratively:

    CREATE TABLE myTable 
    (
       ColumnRadom uniqueidentifier
           DEFAULT NEWID()
    );

 

Using result of NewID() in Insert Statements

Using it programatically:

    CREATE TABLE myTableB 
    (
       ColumnRadomB uniqueidentifier
    );

   declare @UIRandom uniqueIdentifier

   set @UIRandom = newid()

   insert into myTableB(columnRandomB) values (@UIRandom);

 

NewSequentialID

NEWSEQUENTIALID() can only be used with DEFAULT constraints on table columns of type uniqueidentifier.

For example:

    CREATE TABLE myTable 
    (
       ColumnSequential uniqueidentifier
           DEFAULT NEWSEQUENTIALID()
    );

 

Lab Work – Create Tables

Let us create two very simple tables that have uniqueIdentifiers columns.  The UniqueIdentifer columns are in-use as clustered index primary key.

 

UniqueIdentifier has radommly generated default value


set
noexec off go use [DBLab] go /* drop table dbo.datatypeUniqueIdentifierRandomTraditional */ if object_id( 'dbo.datatypeUniqueIdentifierRandomTraditional' ) is not null begin set noexec on end go create table [dbo].[datatypeUniqueIdentifierRandomTraditional] ( [uid] uniqueIdentifier not null constraint constraintDatatypeUniqueIdentifierRandomTraditional default NewID() , [counter] bigint not null constraint constraintDatatypeUniqueIdentifierRandomTraditional__Counter default (1) , constraint PK_datatypeUniqueIdentifierRandomTraditional primary key ([uid]) ) go set noexec off go

 



UniqueIdentifier has an increasing  default value

 

set noexec off
go

use [DBLab]
go

/*
   drop table dbo.datatypeUniqueIdentifierSequentialTraditional
*/

if object_id(
             'dbo.datatypeUniqueIdentifierSequentialTraditional'           ) is not null
begin
	set noexec on
end
go

create table 
    [dbo].[datatypeUniqueIdentifierSequentialTraditional]
(
   [uid] uniqueIdentifier not null

    constraint 
        constraintDatatypeUniqueIdentifierSequentialTraditional
  	  default NewSequentialID()

  , [counter] bigint not null

	constraint constraintDatatypeUniqueIdentifierSequentialTraditional__Counter
	default (1)


, constraint PK_datatypeUniqueIdentifierSequentialTraditional 
	primary key
	([uid])

)
go

set noexec off
go

 

Lab Work – Populate Tables

Here we are using ostress to populate our tables.

The basic code block is:

  • Remove existing data
  • Create ostress temporary folder, if it does not exist
  • Remove ostress temporary folder
  • Create two hundred threads and repeat payload twenty thousand times on each thread
  • The payload is a simple insert statement into our table

 

Payload – Random UniqueIdentifiers

sqlcmd -S (local) -d DBLab -E -Q " truncate table dbo.datatypeUniqueIdentifierRandomTraditional;" 
if not exist e:\tmp\ostress\ md e:\tmp\ostress\
del e:\tmp\ostress\* /Q
"E:\Program Files\Microsoft Corporation\RMLUtils\ostress.exe" -q -oe:\tmp\ostress -S(local) -dDBLab -E -Q"insert into dbo.datatypeUniqueIdentifierRandomTraditional default values;" -n200 -r20000

 

Output: randomInserts

Completed inserts of 4 million records in 15 minutes and 40 seconds.

 

Payload – Sequential UniqueIdentifiers

sqlcmd -S (local) -d DBLab -E -Q " truncate table dbo.datatypeUniqueIdentifierSequentialTraditional;" 
if not exist e:\tmp\ostress\ md e:\tmp\ostress\
del e:\tmp\ostress\* /Q
"E:\Program Files\Microsoft Corporation\RMLUtils\ostress.exe" -q -oe:\tmp\ostress -S(local) -dDBLab -E -Q"insert into dbo.datatypeUniqueIdentifierSequentialTraditional default values;" -n200 -r20000

 

Output:

UniqueIdentifierSequentialInsertsBenchmark

Completed inserts of 4 million records in 10 minutes, 20 seconds.

 

sequentialInserts

Completed inserts of 4 million records in 11 minutes, 30 seconds.

 

 

Number of Data pages

Here is a sample query to determine the number of data pages occupied by objects:

SELECT	
          Schema_name(tblObject.schema_id) as [schema]
	, OBJECT_NAME(tblIndex.OBJECT_ID) AS TableName
	, tblIndex.name AS IndexName
	, tblIndexStat.index_type_desc AS IndexType 
	, tblIndexColumn.column_id as IndexColumnID
	, tblColumnType.name as columnDataType
	, tblIndexStat.page_count as [pageCount]
	, tblPartition.rows as NumberofRecords

FROM sys.dm_db_index_physical_stats
     (DB_ID(), NULL, NULL, NULL, NULL) tblIndexStat

	INNER JOIN sys.objects tblObject

		ON tblIndexStat.object_id = tblObject.object_id

	INNER JOIN sys.indexes tblIndex
		ON tblIndex.object_id = tblIndexStat.object_id 
		AND tblIndex.index_id = tblIndexStat.index_id 

	inner join sys.index_columns tblIndexColumn
	     on tblIndex.object_id = tblIndexColumn.object_id

	inner join sys.columns tblColumn
	   on  tblIndexColumn.object_id = tblColumn.object_id
	   and tblIndexColumn.column_id = tblColumn.column_id

	inner join  sys.types tblColumnType

	   on tblColumn.system_type_id = 
                 tblColumnType.system_type_id

		
        INNER JOIN sys.partitions tblPartition  
	  ON  tblPartition.object_id = tblIndex.object_id 
	  AND tblPartition.index_id = tblIndex.index_id   

--column Data type
where tblColumnType.name in ('uniqueidentifier')

--index first column
and   tblIndexColumn.column_id in (1)

ORDER BY tblIndexStat.page_count DESC

 

Output:

pageCount

 

Explanation:

  • Because of page splits, the table that uses NEW_ID uses more data pages than the table that uses NewSequentialID; about 30% more pages for a table sized @ 4 million records

 

Index Fragmentation

Here is a sample query to detect fragmentation:

SELECT	
          Schema_name(tblObject.schema_id) as [schema]
	, OBJECT_NAME(tblIndex.OBJECT_ID) AS TableName
	, tblIndex.name AS IndexName
	, tblIndexStat.index_type_desc AS IndexType 
	, tblIndexStat.avg_fragmentation_in_percent 
	, tblPartition.rows as NumberofRecords

FROM sys.dm_db_index_physical_stats
       (DB_ID(), NULL, NULL, NULL, NULL) tblIndexStat

	INNER JOIN sys.objects tblObject
		ON tblIndexStat.object_id = tblObject.object_id

	INNER JOIN sys.indexes tblIndex
		ON tblIndex.object_id = tblIndexStat.object_id 
		AND tblIndex.index_id = tblIndexStat.index_id 

	INNER JOIN sys.partitions tblPartition  
		ON  tblPartition.object_id = tblIndex.object_id 
		AND tblPartition.index_id = tblIndex.index_id   

WHERE  OBJECT_NAME(tblIndex.OBJECT_ID) like 'datatypeUni%'

ORDER BY tblIndexStat.avg_fragmentation_in_percent DESC

Output:

fragmentation

 

Explanation:

  • The table that relies on NEW_ID to populate its uniqueIdentifier column is 99% fragmented; while the corresponding table that uses NewSequentialID is only 7% fragmented

 

Change UniqueIdentifier column default constraint from newid to Newsequentialid

 

Here is a script that will change the default constraints on existing objects from using NEWID to NewSequentialID.

 

use [master]
go

if object_id(
             'dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier'
            ) is null
begin

	exec('create procedure dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier as select 1/0 as [shell] ')

end
go

alter procedure 
    dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier
(
    @scriptOnly bit = 1
  , @debug bit = 0
)
as
begin

	set nocount on;

	declare @id		int
	declare @idAsString     sysname
	declare @schemaName	sysname
	declare @objectName	sysname
	declare @columnName	sysname

	declare @constraintName		sysname
	declare @constraintNameAssigned sysname
	declare @constraintDefinition   sysname

	declare @CRLF sysname
	declare @TAB  sysname

	declare @CONSTRAINT_DEFINITION_NEWID	  sysname
	declare @CONSTRAINT_DEFINITION_NEWID_QUOTED    sysname
	declare @CONSTRAINT_DEFINITION_NEWSEQUENTIALID sysname


	declare @SQLLine  nvarchar(4000)
	declare @SQL	  nvarchar(4000)
	declare @log      nvarchar(4000)

	declare @NEWID	sysname
	declare @TRANSACTION_NAME   sysname

	declare @iNumberofEntries   int


	declare @tblConstraint TABLE
	(
	      [id] int not null identity(1,1)
	    , [schemaName] sysname not null
	    , [objectName] sysname not null
	    , [columnName] sysname not null
 	    , [constraintName] sysname null
	    , [constraintDefinition] sysname null

	)


	set @constraintName = null
	set @constraintDefinition = null

	set @TAB = char(9)
	set @CRLF = char(13) + char(10)

	set @CONSTRAINT_DEFINITION_NEWID = 'newid()'
	set @CONSTRAINT_DEFINITION_NEWID_QUOTED = '(newid())'
	set @CONSTRAINT_DEFINITION_NEWSEQUENTIALID = 
               'NewSequentialID()'
	set @TRANSACTION_NAME =
               'trnDefaultConstraintModification'

	insert into @tblConstraint
	(
		  [schemaName]
		, [objectName]
		, [columnName]
		, [constraintName] 
		, [constraintDefinition]

	)
	select 
	       schema_name(tblObject.schema_id) as [schemaName]
	     , tblObject.name as objectName 
	     , tblColumn.name as columnName
	     , tblDefaultConstraint.name as constraintName
	     , tblDefaultConstraint.[definition]

	from   sys.objects tblObject

  	        INNER JOIN sys.indexes tblIndex

		  ON tblObject.object_id = tblIndex.object_id 

		inner join sys.index_columns tblIndexColumn

		 on tblIndex.object_id = 
                        tblIndexColumn.object_id

		inner join sys.columns tblColumn

		  on  tblIndexColumn.object_id 
                        = tblColumn.object_id
		  and tblIndexColumn.column_id 
                        = tblColumn.column_id


		inner join  sys.types tblColumnType

		  on tblColumn.system_type_id
                      = tblColumnType.system_type_id


		left outer join sys.default_constraints 
                     tblDefaultConstraint

		  on  tblColumn.object_id =
tblDefaultConstraint.parent_object_id 
		  and tblColumn.column_id = 
                      tblDefaultConstraint.parent_column_id  

	-- user table
	where  tblObject.[type] = 'U'

	--first index column
	and   tblIndexColumn.column_id in (1)

	--column Data type
	and    tblColumnType.name in ('uniqueidentifier')

	set @iNumberofEntries = ( select count(*) 
                                  from @tblConstraint
                                )


	if (@debug = 1)
	begin

		select * 
		from @tblConstraint

	end
	
	set @id = 1

	while (@id <= @iNumberofEntries)
	begin

		set @idAsString = cast(@id as sysname)

		select 
		     @schemaName = [schemaName]
		   , @objectName = [objectName]
		   , @columnName = [columnName]
		   , @constraintName = [constraintName]
		   , @constraintDefinition = 
                        constraintDefinition
		from  @tblConstraint
		where [id] = @id

		if (@debug = 1)
		begin


			set @log = 'Processing '
					+ @idAsString


			print @log

		end

		if
		  (
		       (@constraintName is null)
		    or (@constraintDefinition is null)
		    or (ltrim(rtrim(@constraintDefinition)) = 
                         @CONSTRAINT_DEFINITION_NEWID)
		    or (ltrim(rtrim(@constraintDefinition)) = 
                         @CONSTRAINT_DEFINITION_NEWID_QUOTED)
			)
		begin

		   set @SQL = ''
		   set @SQLLine = ''

		   set @SQL = @SQL + 'begin tran ' 
                       + @TRANSACTION_NAME + @CRLF

		   if (@constraintName is null)
		   begin
				
		     set @constraintNameAssigned = 
                           'constraintDefault'
			  + @objectName 
			  + '__' 
			  + @columnName

		   end
		   else
		   begin

			set @SQLLine = 
			      @TAB
	                    + 'alter table '
	                    + quoteName(@schemaName)
			    + '.'
			    + quoteName(@objectName)
			    + ' drop constraint '
			    + quoteName(@constraintName)
                            + @CRLF

			set @sql = @sql + @SQLLine 


		end

		set @SQLLine =
			@TAB
		     + 'alter table '
		     + ' '
		     + isNull(quoteName(@schemaName), '')
		     + '.'
		     + isNull(quoteName(@objectName), '')
		     + ' '
		     + ' add constraint '
		     + ' '
		     + '[' + isNull(@constraintName
                            , @constraintNameAssigned)
                     + ']'
		     + ' default '
		     + ' '
		     + @CONSTRAINT_DEFINITION_NEWSEQUENTIALID
		     + ' '
		     + ' for '
		     + ' '
		     + quoteName(@columnName)
		     + @CRLF

		set @SQL = @SQL + @SQLLine

		set @SQL = @SQL + 'commit tran ' 
                              + isNull(@TRANSACTION_NAME, '')
                              + @CRLF

	       set @log = @TAB + @TAB 
			+ 'SQL: '
		        + @idAsString + ' ' + @CRLF 
			+ ' ' 
			+ isNull(@SQL, ' --blank -- ')

			print @log

		if (@scriptOnly = 0)
		begin

		   print @TAB + 'Executing '
		   exec( @SQL)

		end

		set @SQL = ''

	end -- if need to modify


	set @id = @id + 1


   end -- while


end
go


EXEC sys.sp_MS_marksystemobject 
     'dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier'
go


/*

	use [DBLab]
	go

	exec dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier
		  @debug = 1
		, @scriptOnly = 0
	go


*/

Benefits

  • Newsequentialid (History/Benefits and Implementation)
    http://blogs.msdn.com/b/sqlprogrammability/archive/2006/03/23/559061.aspx

    In general, we made significant improvements in SQL Server scalability during Yukon. One of the areas of improvement is replication scalability.  While doing merge replication testing we found out that scaling was severely affected by high number of I/O operations.The cause of the problem was that new rows were inserted in random disk pages.  Guid generating function (newid) was returning non-sequential guids which resulted in random B-tree lookups.  After some investigation we figured out that we could use the new OS function UuidCreateSequential with some byte scrambling to convince the rest of SQL engine that guids are produced in sequential order.
  • Hopefully, sequential reads will be more optimal due to overall smaller number of pages needed by NewSequentialID compared to GUID

 

Downsides

  • To ensure data integrity keep in mind that the system still uses system mutexes and so one will experience a slight bottleneck during “NewSequentialID” generation and dispensing
  • NewSequentials ID has they are ever increasing and write to the same data pages at the bottom of our tables \ file groups create hot pages in that location.  Consider careful partitioning to spread data across few physical LUNs.  Keep in mind that this relates to the actual data writes, as log writes are always s those can are localized on the same physical file

 

Best Explanation:

One good thing about writing is one is forced to read quite a bit.  And, that is true for many reasons.  Reasons involve..

  • Is the writing new and refreshing
  • Is your hypothesis true
  • Is it relevant
  • Have others covered same topic and which ones are worth sharing
  • One might discover new keywords.  And, these days with Google, you end up searching on those keywords and might very well discover new, novel and fascinating areas

 

Along those lines, I hope you will take the time to read Dan Guzman’s post:

Why Random Keys are Bad
http://www.dbdelta.com/improving-uniqueidentifier-performance/

I think it’s important for one to understand why random keys have such a negative impact on performance against large tables. DBAs often cite fragmentation and page splits as the primary causes of poor performance with random keys. Although it is true random inserts do cause fragmentation and splits, the primary cause of bad performance with random keys is poor temporal reference locality ( http://en.wikipedia.org/wiki/Locality_of_reference ), which I’ll detail shortly. Note that there were no real page splits in these insert performance tests because the nearly 8K row size allowed only one row per page. Although significant extent fragmentation occurred, this didn’t impact these single-row requests; extent fragmentation is mostly an issue with sequential scans against spinning media. So neither splits nor fragmentation explain the poor performance of the random inserts.

Temporal reference locality basically means that once data is used (e.g. inserted or touched in any way), it is likely to be used again in the near future. This is why SQL Server uses a LRU-2 algorithm to manage the buffer cache; data most recently touched will remain in memory while older, less often referenced data are aged out. The impact of random key values on temporal locality (i.e. buffer efficiency) is huge. Consider that inserts are basically rewrites of existing pages. When a new row is inserted into a table, SQL Server first reads the page where the row belongs (by key value if the table has a clustered index) and then either adds the row to the existing data page or allocates a new one if there’s not enough space available in the existing page for the new row. With a random key value, the new key value is unlikely to be adjacent to the last one inserted (which is probably still in memory) so the needed page often must be read from storage.

 

Summary

In summary, inserts timelines are a bit less for uniqueIdentifiers compares to newid().  The system will also have less need for de-fragmentation.

And, as Dan Guzman pointed out, in time one might very well experience less I/O.  This might be less obvious as one will have to watch System I/O metrics for extending period of time and compare the competing designs.

References

Microsoft – Reference – UniqueIdentifier

 

Microsoft – History & Benefits

 

Application Development – Microsoft – .Net

 

Application Development – ORM Tools – NHibernate – UniqueIdentifier

 

Comparison

Costs of UniqueIdentifiers / GUIDS

 

Optimization

 

Software Development

 

Microsoft – SQL Server – Default Constraints

 

Microsoft – Sql Server – Stats – Fragmentation

 

Microsoft – Sql Server – Index – Fill Factors

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s