SQL Server – XML – Indexes – Utilization Pattern & Cost

Background

In this quick post, we will dig a bit more on into XML Indexes; compare them and attempt to get size information.

BTW, this is a follow-up to Transact SQL – XML Column Hidden In SUBSTRING.

Code

We have a code that looks like


select top 10
 
      [id]
    , [username]
    , UserLog1.[message]
    , userLog1.details
 
from    Admin.CourseManagerUserLog UserLog1 with (nolock)
 
where
            (
                   ( UserLog1.[message] like '%set as cancelled%' )
                OR ( UserLog1.[message] like '%has been added%' )
            ) 

And, we re-wrote as:

select
    UserLog1.*
 
from    Admin.CourseManagerUserLog UserLog1 with (nolock)
 
cross apply UserLog1.details.nodes('/in-car-lsn-status') UserLogInCarLSN(node)
 
join [dbo].BTWTAppointments App 
 
    on App.appointmentID 
        = cast(UserLogInCarLSN.node.value('@apptID', 'varchar(60)') AS UniqueIdentifier)
 
where
    (
           ( UserLog1.[message] like '%set as cancelled%' )
        OR ( UserLog1.[message] like '%has been added%' )
    ) 

Indexes

No XML Index

Query Plan

QueryPlan

Explanation

  1. XML Reader with XPath filter
    1. 99% of the query cost

 

Statistics Profile

Screen Shot

StatisticsProfile

Tabulated

 

Step
–Filter(WHERE:(STARTUP EXPR([HRDB].[Admin].[CourseManagerUserLog].[details] as [UserLog1].[details] IS NOT NULL)))
|–Stream Aggregate(DEFINE:([Expr1014]=MIN(CASE WHEN [Admin].[CourseManagerUserLog].[details] as [UserLog1].[details] IS NULL THEN NULL ELSE CASE WHEN datalength(XML Reader with XPath filter.[value])>=(128) THEN CONVERT_IMPLICIT(varchar(60),XML Reader with XPath filter.[lvalue],0) ELSE CONVERT_IMPLICIT(varchar(60),XML Reader with XPath filter.[value],0) END END)))
|–Filter(WHERE:(XML Reader with XPath filter.[id]=getancestor(XML Reader with XPath filter.[id],(1))))

 

Explanation

  1. XML Reader with XPath filter

 

Statistics I/O

StatisticsIO

Explanation

  1. Workfile reference

 

XML Index – Primary

Let us create the primary XML Index.

Create Index


SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET NUMERIC_ROUNDABORT OFF

GO

IF NOT EXISTS 
(
	SELECT * 
	FROM   sys.indexes 
	WHERE  object_id = OBJECT_ID(N'[Admin].[CourseManagerUserLog]') 
	AND    name = N'INDX_DBA_XML_MAIN'
)
begin

	CREATE PRIMARY XML INDEX [INDX_DBA_XML_MAIN] 
	ON [Admin].[CourseManagerUserLog]
	(
		[details]
	)
	WITH 
	(
		  PAD_INDEX = OFF
		, STATISTICS_NORECOMPUTE = OFF
		, SORT_IN_TEMPDB = OFF
		, DROP_EXISTING = OFF
		, ONLINE = OFF
		, ALLOW_ROW_LOCKS = ON
		, ALLOW_PAGE_LOCKS = ON
	)
	;

end
GO


 

Query Plan

QueryPlan-0129PM

Statistics Profile

Screen Shot

StatisticsProfile-0131PM

Tabulated

 

Statement Text Node Parent
                           |    |    |    |–Clustered Index Seek(OBJECT:([HRDB].[sys].[xml_index_nodes_885578193_256000].[INDX_DBA_XML_MAIN] AS [in-car-lsn-status:1]), SEEK:([in-car-lsn-status:1].[pk1]=[HRDB].[Admin].[CourseManagerUserLog].[id] as [UserLog1].[id]),  WHERE:([HRDB].[sys].[xml_index_nodes_885578193_256000].[hid] as [in-car-lsn-status:1].[hid]=’Ë‚’) ORDERED FORWARD) 46 12
                            |    |    |–Stream Aggregate(DEFINE:([Expr1013]=MIN(CASE WHEN [HRDB].[Admin].[CourseManagerUserLog].[details] as [UserLog1].[details] IS NULL THEN NULL ELSE CASE WHEN datalength([HRDB].[sys].[xml_index_nodes_885578193_256000].[value] as [apptID:1].[value])>=(128) THEN CONVERT_IMPLICIT(varchar(60),[HRDB].[sys].[xml_index_nodes_885578193_256000].[lvalue] as [apptID:1].[lvalue],0) ELSE CONVERT_IMPLICIT(varchar(60),[HRDB].[sys].[xml_index_nodes_885578193_256000].[value] as [apptID:1].[value],0) END END))) 47 9
                            |    |         |–Clustered Index Seek(OBJECT:([HRDB].[sys].[xml_index_nodes_885578193_256000].[INDX_DBA_XML_MAIN] AS [apptID:1]), SEEK:([apptID:1].[pk1]=[HRDB].[Admin].[CourseManagerUserLog].[id] as [UserLog1].[id] AND [apptID:1].[id] >= [HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id] AND [apptID:1].[id] < getdescendantlimit([HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id])),  WHERE:([HRDB].[sys].[xml_index_nodes_885578193_256000].[hid] as [apptID:1].[hid]=’ì‚Ë‚’ AND [HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id]=getancestor([HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [apptID:1].[id],(1))) ORDERED FORWARD) 49 47
                            |    |–Clustered Index Seek(OBJECT:([HRDB].[dbo].[BTWTAppointments].[PK_BTWTAppointments] AS [App]), SEEK:([App].[appointmentID]=CONVERT(uniqueidentifier,[Expr1013],0)) ORDERED FORWARD)  54  7

 

Statistics I/O

StatisticsIO-0130PM

 

XML Index – Secondary – Path

Let us create a secondary path index.

Create Index


SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET NUMERIC_ROUNDABORT OFF

GO

IF NOT EXISTS 
(
	SELECT * 
	FROM   sys.indexes 
	WHERE  object_id = OBJECT_ID(N'[Admin].[CourseManagerUserLog]') 
	AND    name = N'INDX_DBA_XML_Secondary_Path'
)
begin


--create Secondary xml index - Path
CREATE XML INDEX [INDX_DBA_XML_Secondary_Path] 
ON  Admin.CourseManagerUserLog
	(
		[details]
	)
USING XML INDEX [INDX_DBA_XML_MAIN]
FOR Path    
GO

end
GO


Query Plan

There are two XML Index Nodes operations

Satisfy XPATH

QueryPlan-0148-XPath

 

Satisfy XML Predicate Match

QueryPlan-0148-SeekOnPrimaryXML

Statistics Profile

Screen Shot

StatisticsProfile.0155PM

Tabulated

 

Statement Text Node Parent
                                     |    |    |    |–Clustered Index Seek(OBJECT:([HRDB].[sys].[xml_index_nodes_885578193_256000].[INDX_DBA_XML_MAIN] AS [in-car-lsn-status:1]), SEEK:([in-car-lsn-status:1].[pk1]=[HRDB].[Admin].[CourseManagerUserLog].[id] as [UserLog1].[id]),  WHERE:([HRDB].[sys].[xml_index_nodes_885578193_256000].[hid] as [in-car-lsn-status:1].[hid]=’Ë‚’) ORDERED FORWARD) 46 12
                                 |    |    |–Stream Aggregate(DEFINE:([Expr1013]=MIN(CASE WHEN [HRDB].[Admin].[CourseManagerUserLog].[details] as [UserLog1].[details] IS NULL THEN NULL ELSE CASE WHEN datalength([HRDB].[sys].[xml_index_nodes_885578193_256000].[value] as [apptID:1].[value])>=(128) THEN CONVERT_IMPLICIT(varchar(60),[HRDB].[sys].[xml_index_nodes_885578193_256000].[lvalue] as [apptID:1].[lvalue],0) ELSE CONVERT_IMPLICIT(varchar(60),[HRDB].[sys].[xml_index_nodes_885578193_256000].[value] as [apptID:1].[value],0) END END))) 47 9
                                  |    |         |–Clustered Index Seek(OBJECT:([HRDB].[sys].[xml_index_nodes_885578193_256000].[INDX_DBA_XML_MAIN] AS [apptID:1]), SEEK:([apptID:1].[pk1]=[HRDB].[Admin].[CourseManagerUserLog].[id] as [UserLog1].[id] AND [apptID:1].[id] >= [HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id] AND [apptID:1].[id] < getdescendantlimit([HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id])),  WHERE:([HRDB].[sys].[xml_index_nodes_885578193_256000].[hid] as [apptID:1].[hid]=’ì‚Ë‚’ AND [HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id]=getancestor([HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [apptID:1].[id],(1))) ORDERED FORWARD) 49 47
                                   |    |–Clustered Index Seek(OBJECT:([HRDB].[dbo].[BTWTAppointments].[PK_BTWTAppointments] AS [App]), SEEK:([App].[appointmentID]=CONVERT(uniqueidentifier,[Expr1013],0)) ORDERED FORWARD)  54  7

 

Statistics I/O

StatisticsProfile.0200PM

XML Index – Secondary – Property

Let us create a secondary property index.

Create Index


CREATE XML INDEX [INDX_DBA_XML_Property] 
ON  Admin.CourseManagerUserLog
	(
		[details]
	)
USING XML INDEX [INDX_DBA_XML_MAIN]
FOR PROPERTY    
GO

 

Query Plan

There are two XML Index Nodes operations

Index Seek – Property Index – Satisfy XPATH

QueryPlan_XMLProperty-0214PM

Clustered Index Seek – Primary Index – Predicate Match

QueryPlan_XMLPrimary-0217PM

 

Statistics Profile

Screen Shot

StatisticsProfileOn-201603202-0222PM

 

Tabulated

 

Statement Text Node Parent
                              |    |    |    |–Index Seek(OBJECT:([HRDB].[sys].[xml_index_nodes_885578193_256000].[INDX_DBA_XML_Property] AS [in-car-lsn-status:1]), SEEK:([in-car-lsn-status:1].[pk1]=[HRDB].[Admin].[CourseManagerUserLog].[id] as [UserLog1].[id] AND [in-car-lsn-status:1].[hid]=’Ë‚’) ORDERED FORWARD) 46 12
                                                         |    |    |–Stream Aggregate(DEFINE:([Expr1013]=MIN(CASE WHEN [HRDB].[Admin].[CourseManagerUserLog].[details] as [UserLog1].[details] IS NULL THEN NULL ELSE CASE WHEN datalength([HRDB].[sys].[xml_index_nodes_885578193_256000].[value] as [apptID:1].[value])>=(128) THEN CONVERT_IMPLICIT(varchar(60),[HRDB].[sys].[xml_index_nodes_885578193_256000].[lvalue] as [apptID:1].[lvalue],0) ELSE CONVERT_IMPLICIT(varchar(60),[HRDB].[sys].[xml_index_nodes_885578193_256000].[value] as [apptID:1].[value],0) END END))) 47 9
                                                           |    |         |–Clustered Index Seek(OBJECT:([HRDB].[sys].[xml_index_nodes_885578193_256000].[INDX_DBA_XML_MAIN] AS [apptID:1]), SEEK:([apptID:1].[pk1]=[HRDB].[Admin].[CourseManagerUserLog].[id] as [UserLog1].[id] AND [apptID:1].[id] >= [HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id] AND [apptID:1].[id] < getdescendantlimit([HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id])),  WHERE:([HRDB].[sys].[xml_index_nodes_885578193_256000].[hid] as [apptID:1].[hid]=’ì‚Ë‚’ AND [HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [in-car-lsn-status:1].[id]=getancestor([HRDB].[sys].[xml_index_nodes_885578193_256000].[id] as [apptID:1].[id],(1))) ORDERED FORWARD) 49 47
                                                         |    |–Clustered Index Seek(OBJECT:([HRDB].[dbo].[BTWTAppointments].[PK_BTWTAppointments] AS [App]), SEEK:([App].[appointmentID]=CONVERT(uniqueidentifier,[Expr1013],0)) ORDERED FORWARD)  54  7

 

Statistics I/O

 

 StatisticsProfileOn-201603202-0226PM

Metadata

Size

XML Index size is quite opaque as they lumped in with the primary key/clustered index.

Code – Using sys.partition

Here is a sample query



use [HRDB]
go

declare @tableName sysname
declare @objectID int

set @tableName = '[Admin].[CourseManagerUserLog]'
set @objectID = object_id(@tableName)


select 

		  [tableName] = schema_name(tblT.schema_id)
							 + '.'
							 + tblT.[name]

		, [indexName] = tblI.name

		, [indexID]
			= tblI.index_id

		, [type]
			=  max(tblI.[type])

		, [typeLit]
			=  max(tblI.type_desc)

		, [objectIDInternalTable]	
		   = tblIT.object_id

		, [parentObjectIDInternalTable]	
			= tblIT.parent_object_id

		, [internalTableName]	
			= isNull(tblIT.name, '')

		, [objectIDPartition]
			= tblP.object_id 

		, [filegroup]
			= max(tblFG.[name])


		, [SizeInMB]
			= SUM(tblAU.total_pages) * 8 / 1024.00



from sys.tables tblT

left outer join sys.internal_tables tblIT

	on tblT.object_id = tblIT.parent_object_id

JOIN sys.indexes tblI

	ON tblT.object_id = tblI.object_id

INNER JOIN sys.partitions tblP 

	ON  
		(

			(

					tblI.object_id = tblP.object_id 
				AND tblI.index_id = tblP.index_id
				AND tblI.type_desc not in ( 'XML' )

			)

			OR
			(
				    tblIT.parent_object_id = tblP.object_id 
				AND tblP.index_id = 1
				AND tblI.type_desc = 'XML'
			)

		)



JOIN sys.allocation_units tblAU

	ON tblP.partition_id = tblAU.container_id


JOIN sys.filegroups tblFG

	on tblFG.data_space_id = tblAU.data_space_id


where tblT.object_id = @objectID

GROUP BY 
		  tblT.schema_id
		, tblT.name

		, tblI.name 
		, tblI.index_id

		, tblIT.object_id
		, tblIT.parent_object_id
		, isNull(tblIT.name, '')

		, tblP.object_id 

order by 
		  tblT.schema_id
		, tblT.name
		, tblI.index_id




 

Explanation

  1. We are doing a left outer join against sys.internal_tables as that table is only populated when internal tables exists; in this case the Primary XML Index

 

Output

NO XML Indexes

SizeInfoNoXMLIndexes

 

XML Indexes

SizeInfoXMLIndexes

 

Explanation

  1. We can see that the size for the XML Indexes are lumped in with the size of the Clustered Primary Key
  2. The internal table name is what is shown when we review “Statistics IO

 

Code – Using sys.dm_db_index_physical_stats

here is a better query shared by Basit Farooq @ Finding Size of Index


declare @tableName sysname
declare @objectID int

set @tableName = '[Admin].[CourseManagerUserLog]'
set @objectID = object_id(@tableName)



SELECT 

		 [ObjectId]

		,[ObjectName]

		,[IndexId]

		,[indexName]

		,[IndexType]

		,indexLevel

		,[IndexSize(MB)]
			= CONVERT
				(
					  DECIMAL(16, 1)
					, (SUM([avg_record_size_in_bytes] * [record_count]) / (1024.0 * 1024))
				)

		,[StatisticLastUpdated]
			= max([lastupdated])

		, [AvgFragmentationInPercent]
			= avg([AvgFragmentationInPercent])

FROM (

		SELECT 
		  	   ObjectID = tblInd.OBJECT_ID

			,  ObjectName
				= QuoteName
					(
						object_schema_name
							(tblInd.object_id)
					)
					+ '.'
					+ 
					QuoteName
					(
						object_name
						(
							tblInd.object_id
						)
					)

			, [indexName]
				= tblSysInd.name

			, IndexID = tblInd.Index_ID
			, IndexType = Index_Type_Desc
			, IndexLevel = index_level
			, avg_record_size_in_bytes
			, record_count
			, LastUpdated
				= STATS_DATE
					(
						  tblInd.object_id
						, tblInd.index_id
					)
			, AvgFragmentationInPercent
				= round(Avg_Fragmentation_In_Percent, 3)

		FROM sys.dm_db_index_physical_stats
			(
				  db_id()
				, @objectID
				, NULL
				, NULL
				, 'detailed'
			) tblInd

		inner join sys.indexes tblSYSIND

			on tblInd.object_id = tblSysInd.object_id
			and tblInd.index_id = tblSysInd.index_id

		where index_level = 0

    ) T

GROUP BY 

      ObjectId
    , ObjectName
    , IndexId
    , [indexName]
    , IndexType
   , indexLevel



 

Output

sys

XML Indexes

Let us quickly speak to the various type of XML Indexes and how they compare to each other.

Primary XML Index

 

Item
Mandatory to be created if any XML Index will be created on table
Shreds entire XML Column
Each node in the XML Column is shredded into an Index Row
The base table must have a primary key and the primary key must be clustered

 

 

Secondary XML Index

Path

 

Item
What is stored?  The path and node value
Sample Query
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription' AS "PD")

SELECT CatalogDescription.query('
  /PD:ProductDescription/PD:Summary
') AS Result
FROM Production.ProductModel
WHERE CatalogDescription.exist ('/PD:ProductDescription/@ProductModelID[.="19"]') = 1
In the Sample above:

  1. We have supplied a specific path – /PD:ProductDescription
  2. Along with an attribute (ProductModelID) to search on
  3. And, we are matching on ProductModelID equal to 19
Also, keep in mind that we are not joining against against another table, just interested in an existence check

 

Takeaway

I would have thought that a secondary XML Index based on Path would have suffice.  But, based on this test, we found out that a Property Index will be taken.  And, in actuality the Path index was completely overlooked.

Also, gained familiarity with the sys.internal table.

 

Additional Reading

sys.internal_tables

  1. sys.internal_tables
    https://msdn.microsoft.com/en-us/library/ms187799.aspx

Storage Allocation

  1. XML Indexes – SQL Server
    https://msdn.microsoft.com/en-us/library/ms191497.aspx

XML Indexes

  1. Toad World – XML Index Rules
    https://www.toadworld.com/platforms/sql-server/w/wiki/9648.xml-index-rules

 

 

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s