SQL Server – XML – Typed – Implementation

Background

As I played more around with XML and Xquery,  I wanted to see if there are obvious and quick performance gains that we were missing out on, simply for being a bit lazy about understanding and implementing Typed XML columns.

Let us see what it will take to convert an untyped XML column to a typed one.

Advantages

The listed advantages for using typed XML Columns includes:

  • Validate XML data through XML schemas.
  • Access storage and query optimization through type information
    • Storage
      • Data type information. Schemas provide information about the types of attributes and elements in the xml data type instance. The type information provides more precise operational semantics to the values contained in the instance than is possible with untyped xml. For example, decimal arithmetic operations can be performed on a decimal value, but not on a string value. Because of this, typed XML storage can be made significantly more compact than untyped XML. ( Link )
  • Access Type information during query compilation

 

Scenario

If a couple of earlier posts, we spoke about storing Query Plans in a SQL Server table.   We will use the same table structure.

Btw, here are the aforementioned posts:

  1. SQL Server – dm_exec_query_plan – shredding – Part 1
    https://danieladeniji.wordpress.com/2016/03/18/sql-server-dm_exec_query_plan-shredding/

 

 

Showplan Schema

Checked online for Microsoft’s publication of it’s Showplan Schema. And, found it available a http://schemas.microsoft.com/sqlserver/2004/07/showplan/.

Available Schemas

Here are the available schemas.

ShowplanSchema

Version

At the top each xsd, there is a Version tag that represents the Version Number.

v2014

Current-Version

v2012

Version-v2012

v2008

v2008

v2005

Version-1-0

 

Import  XSD

We will go with the most recent XSD.

 

Create XML Schema Collection

Syntax


CREATE XML SCHEMA COLLECTION [schemaCollection]
AS
    [Expression]

Sample

Here is a very shortened, incomplete version of the XSD for SQL Server v2014.



CREATE XML SCHEMA COLLECTION SchemaCollectionShowPlanv2014 
AS
N'<?xml version="1.0" encoding="UTF-16"?>
<xsd:schema xmlns:shp="http://schemas.microsoft.com/sqlserver/2004/07/showplan" xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://schemas.microsoft.com/sqlserver/2004/07/showplan" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1.2" blockDefault="#all">
<xsd:annotation>
<xsd:documentation>
The following schema for Microsoft SQL Server 2014 describes output from the showplan functionality in XML format. Microsoft does not make any representation or warranty regarding the schema or any product or item developed based on the schema. The schema is provided to you on an AS IS basis. Microsoft disclaims all express, implied and statutory warranties, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, and freedom from infringement. Without limiting the generality of the foregoing, Microsoft does not make any warranty of any kind that any item developed based on the schema, or any portion of the schema, will not infringe any copyright, patent, trade secret, or other intellectual property right of any person or entity in any country. It is your responsibility to seek licenses for such intellectual property rights where appropriate. MICROSOFT SHALL NOT BE LIABLE FOR ANY DAMAGES OF ANY KIND ARISING OUT OF OR IN CONNECTION WITH THE USE OF THE SCHEMA, INCLUDING WITHOUT LIMITATION, ANY DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL (INCLUDING ANY LOST PROFITS), PUNITIVE OR SPECIAL DAMAGES, WHETHER OR NOT MICROSOFT HAS BEEN ADVISED OF SUCH DAMAGES. (c) Microsoft Corporation. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:annotation>
<xsd:documentation>Last updated: 07/31/11</xsd:documentation>
</xsd:annotation>
<xsd:element name="ShowPlanXML">
<xsd:complexType>
<xsd:annotation>
<xsd:documentation>This is the root element</xsd:documentation>
</xsd:annotation>
<xsd:sequence>
<xsd:element name="BatchSequence">

...
...
...
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
'
go

 

 

Use Schema Collection

Let us use the Schema Collection in our table creation table…

Create Table


if schema_id('dbaDMV') is null
begin

	exec('create schema [dbaDMV] authorizzation [dbo] ')

end
go

IF NOT EXISTS 
(
	SELECT * 
	FROM sys.objects 
	WHERE object_id = OBJECT_ID(N'[dbaDMV].[dm_exec_query_plan_typed]') 
	AND type in (N'U')
)
BEGIN

	CREATE TABLE [dbaDMV].[dm_exec_query_plan_typed]
	(

		  [identity] [bigint] IDENTITY(1,1) NOT NULL
		, [dbid] [smallint] NULL
		, [objectid] [int] NULL
		, [number] [smallint] NULL
		, [encrypted] [bit] NULL
		, [query_plan] [xml] 
                      (SchemaCollectionShowPlanv2014) NULL
		, [plan_handle] [varbinary](64) NULL

		, CONSTRAINT [PK_DM_EXEC_QUERYPLAN_Typed] PRIMARY KEY CLUSTERED 
		(
			[identity] ASC
		)

		ON [PRIMARY]
	) 
	ON [PRIMARY] 

END
GO



Create Nonclustered Indexes

plan_handle


IF NOT EXISTS 
(
  SELECT * 
  FROM   sys.indexes 
  WHERE  object_id 
          = OBJECT_ID(N'[dbaDMV].[dm_exec_query_plan_typed]') 
  AND    name = N'INDX_PlanHandle'
)
begin

	CREATE NONCLUSTERED INDEX [INDX_PlanHandle] 
	ON [dbaDMV].[dm_exec_query_plan_typed]
	(
		[plan_handle] ASC
	)

end

GO

 

Create XML Indexes



IF NOT EXISTS 
(
   SELECT * 
   FROM   sys.indexes 
   WHERE  object_id
            = OBJECT_ID(N'[dbaDMV].[dm_exec_query_plan_typed]') 
   AND    name = N'INDX_DBA_XML_PRIMARY_MAIN'
)
begin

   CREATE PRIMARY XML INDEX [INDX_DBA_XML_PRIMARY_MAIN] 
       ON [dbaDMV].[dm_exec_query_plan_typed]
   (
      [query_plan]
   )

end
GO


IF NOT EXISTS 
(
   SELECT * 
   FROM sys.indexes 
   WHERE object_id = OBJECT_ID(N'[dbaDMV].[dm_exec_query_plan_typed]') 
   AND name = N'INDX_DBA_XML_SECONDARY_PATH'
)
begin

   CREATE XML INDEX [INDX_DBA_XML_SECONDARY_PATH]
          ON [dbaDMV].[dm_exec_query_plan_typed]
   (
	[query_plan]
   )
   USING XML INDEX [INDX_DBA_XML_PRIMARY_MAIN] 
      FOR PATH;

end

GO


IF NOT EXISTS 
(
   SELECT * 
   FROM   sys.indexes 
   WHERE object_id = OBJECT_ID(N'[dbaDMV].[dm_exec_query_plan_typed]') 
   AND name = N'INDX_DBA_XML_SECONDARY_Property'
)
begin

   CREATE XML INDEX [INDX_DBA_XML_SECONDARY_Property] 
   ON [dbaDMV].[dm_exec_query_plan_typed]
   (
	[query_plan]
   )
   USING XML INDEX [INDX_DBA_XML_PRIMARY_MAIN] FOR PROPERTY 

end

GO


IF NOT EXISTS 
(
   SELECT * 
   FROM sys.indexes 
   WHERE object_id
          = OBJECT_ID(N'[dbaDMV].[dm_exec_query_plan_typed]') 
   AND name = N'INDX_DBA_XML_SECONDARY_Value'
)
begin

   CREATE XML INDEX [INDX_DBA_XML_SECONDARY_Value] 
   ON [dbaDMV].[dm_exec_query_plan_typed]
   (
     [query_plan]
   )
   USING XML INDEX [INDX_DBA_XML_PRIMARY_MAIN] FOR VALUE 

end

GO

Lab

Cache Data

Here are sample queries that we used for caching our Query Plans.

[dbaDMV].[dm_exec_query_plan]

set nocount on;


truncate table [dbaDMV].[dm_exec_query_plan]
go

insert into [dbaDMV].[dm_exec_query_plan]
(
     [dbid]
   , [objectid]
   , [number]
   , [encrypted]
   , [query_plan]
   , [plan_handle]
)
select
      tblQP.[dbid]
    , tblQP.[objectid]
    , tblQP.[number]
    , tblQP.[encrypted]
    , tblQP.[query_plan]
    , tblCP.[plan_handle]

from   sys.dm_exec_cached_plans tblCP

cross apply sys.[dm_exec_query_plan](tblCP.plan_handle) as tblQP

go

update statistics [dbaDMV].[dm_exec_query_plan]
      with FULLSCAN
go

[dbaDMV].[dm_exec_query_plan_typed]


set nocount on;

go

truncate table [dbaDMV].[dm_exec_query_plan_typed]
go

insert into [dbaDMV].[dm_exec_query_plan_typed]
(
     [dbid]
   , [objectid]  
   , [number]
   , [encrypted]
   , [query_plan]
   , [plan_handle]
)
select
     [dbid]
  , [objectid]
  , [number]
  , [encrypted]
  , [query_plan]
  , [plan_handle]
from  [dbaDMV].[dm_exec_query_plan]
go

update statistics [dbaDMV].[dm_exec_query_plan_typed] 
     with FULLSCAN;
go

 

Compare

Here is a very simple comparison between Untyped and Typed XML columns.

Table Information

Untyped

TableProperties - Untyped

Typed

TableProperties - Typed

Size

Using an index size script defined here, we queried for Index Size.

Code



declare
	  @SchemaName		sysname
	, @TableName		sysname
	, @IndexName		sysname
	, @dataspace		sysname
	, @skipXMLIndexes	bit


--[dbaDMV].[dm_exec_query_plan_typed]
set @schemaName = 'dbaDMV'
set @TableName = 'dm_exec_query_plan'
set @dataspace = null
set @skipXMLIndexes = 0

 exec [dbo].[sp_helpindexInfo] 
 		  @SchemaName = @SchemaName
		, @TableName = @TableName
		, @IndexName = @IndexName
		, @dataspace = @dataspace
		, @skipXMLIndexes = @skipXMLIndexes


set @schemaName = 'dbaDMV'
set @TableName = 'dm_exec_query_plan_typed'
set @dataspace = null
set @skipXMLIndexes = 0

 exec [dbo].[sp_helpindexInfo] 
 		  @SchemaName = @SchemaName
		, @TableName = @TableName
		, @IndexName = @IndexName
		, @dataspace = @dataspace
		, @skipXMLIndexes = @skipXMLIndexes


Output

sizeCompare

 

Explanation

  1. Number of records
    • We have about 26101 records
  2. Table Size
    1. Untyped = 165 MB
    2. Typed = 185 MB
    3. Compared to Untyped, Typed is 30 MB more
  3. XML Index
    • Compared to Untyped, Typed is about 5 MB less
  4. Clustered Index
    • Compared to Untyped, Typed is about 20 MB more

Query

Query – Find Missing Indexes

Code


print replicate('~', 80)
go

select *

from   [dbaDMV].[dm_exec_query_plan] tblQP

WHERE  tblQP.query_plan.exist('//MissingIndex')=1

print replicate('-', 80)

select *

from   [dbaDMV].[dm_exec_query_plan_typed] tblQP

WHERE  tblQP.query_plan.exist('//MissingIndex')=1

print replicate('=', 80)

print replicate('#', 80)


Statistics IO

statisticsIO

Statistics Time

statisticsTime

 

Explanation

  1. Statistics IO
    • Untyped = 103926
    • Typed = 103034
    • Typed is 49.78% of overall
  2. Statistics Time
    1. Untyped = 6333
    2. Typed = 6318
    3. Typed is 49.94% of overall

Query – Cached Plans – Metadata – Summary

Guide

In our first exercise, we will list the top N queries ordered by Estimated Number of records.
We will consider a couple of XPATH queries for parsing the cached plans.

As the information we are interested in is located in the first RELOP element, we will use a couple of XPath expressions for attempting to achieve our end goal.

//RelOp

CROSS APPLY tblQP.[query_plan].nodes(‘//RelOp’)  as x(i)

        where x.i.value(‘@NodeId’, ‘int’) = 0

//RelOp[1]

CROSS APPLY tblQP.[query_plan].nodes(‘//RelOp[1]’)  as x(i)

 

Query – Cached Plans – Metadata – Summary – //RELOP

  1. CROSS APPLY tblQP.[query_plan].nodes(‘//RelOp’)
    as x(i)where x.i.value(‘@NodeId’, ‘int’) = 0
Code



declare @iNumberofResultsToDisplay int
declare @dbName sysname
declare @dbid   int

set @dbname = 'HRDB'
set @dbid = db_id(@dbName)

set @iNumberofResultsToDisplay = 50

print replicate('~', 80)

;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select 

	 top ( @iNumberofResultsToDisplay  )

	       queryPlan 
                     = tblQP.query_plan

	  ,	nodeName
			=
				(
					cast(x.i.query('.') as xml)
				)


	 , query
			=  tblST.[text]


	  ,	[EstimatedTotalSubtreeCost]
			= 	x.i.value('@EstimatedTotalSubtreeCost[1]', 'float')

	  ,	nodeID
			= x.i.value('@NodeId[1]', 'int')


	  ,	[EstimateRows]
			= x.i.value('@EstimateRows[1]', 'float')


	  , tblQP.[dbid]

	  , tblQP.[objectID]

	  , tblQP.[number]

	  , tblQP.[encrypted]

from   sys.dm_exec_query_stats tblQS

--cross apply sys.[dm_exec_query_plan](tblQS.plan_handle) 
inner join [dbaDMV].[dm_exec_query_plan] tblQP
		on tblQS.[plan_handle] = tblQP.[plan_handle]

cross apply sys.[dm_exec_sql_text](tblQS.[sql_handle]) 
			as tblST

CROSS APPLY tblQP.[query_plan].nodes('//RelOp') 
			as x(i)

where x.i.value('@NodeId', 'int') = 0

and	 tblQP.[dbid] = @dbid

order by
	 [EstimateRows] desc

option ( maxdop 1)

print replicate('-', 80)


;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select 

	 top ( @iNumberofResultsToDisplay  )

	       queryPlan 
                     = tblQP.query_plan

	  ,	nodeName
			=
				(
					cast(x.i.query('.') as xml)
				)


	 , query
			=  tblST.[text]


	  ,	[EstimatedTotalSubtreeCost]
			= 	x.i.value('@EstimatedTotalSubtreeCost[1]', 'float')

	  ,	nodeID
			= x.i.value('@NodeId[1]', 'int')


	  ,	[EstimateRows]
			= x.i.value('@EstimateRows[1]', 'float')


	  , tblQP.[dbid]

	  , tblQP.[objectID]

	  , tblQP.[number]

	  , tblQP.[encrypted]

from   sys.dm_exec_query_stats tblQS

--cross apply sys.[dm_exec_query_plan](tblQS.plan_handle) 
inner join [dbaDMV].[dm_exec_query_plan_typed] tblQP
		on tblQS.[plan_handle] = tblQP.[plan_handle]

cross apply sys.[dm_exec_sql_text](tblQS.[sql_handle]) 
			as tblST

CROSS APPLY tblQP.[query_plan].nodes('//RelOp') 
			as x(i)

where x.i.value('@NodeId', 'int') = 0

and	 tblQP.[dbid] = @dbid

order by
	 [EstimateRows] desc

option ( maxdop 1)

print replicate('=', 80)

print replicate('#', 80)


Statistic I/O

StatisticsIO

 

Statistic Time

StatisticsTime

 

Query – Cached Plans – Metadata – Summary – RELOP[1]

  1. CROSS APPLY tblQP.[query_plan].nodes(‘//RelOp[1]’)
Code

declare @iNumberofResultsToDisplay int
declare @dbName sysname
declare @dbid int


set @dbname = 'HRDB'
set @dbid = db_id(@dbName)

set @iNumberofResultsToDisplay = 50

print replicate('~', 80)

;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select 

	 top ( @iNumberofResultsToDisplay  )

	       queryPlan 
                     = tblQP.query_plan

	  ,	nodeName
			=
				(
					cast(x.i.query('.') as xml)
				)


	 , query
			=  tblST.[text]


	  ,	[EstimatedTotalSubtreeCost]
			= 	x.i.value('@EstimatedTotalSubtreeCost[1]', 'float')

	  ,	nodeID
			= x.i.value('@NodeId[1]', 'int')


	  ,	[EstimateRows]
			= x.i.value('@EstimateRows[1]', 'float')


	  , tblQP.[dbid]

	  , tblQP.[objectID]

	  , tblQP.[number]

	  , tblQP.[encrypted]

from   sys.dm_exec_query_stats tblQS

--cross apply sys.[dm_exec_query_plan](tblQS.plan_handle) 
inner join [dbaDMV].[dm_exec_query_plan] tblQP
		on tblQS.[plan_handle] = tblQP.[plan_handle]

cross apply sys.[dm_exec_sql_text](tblQS.[sql_handle]) 
			as tblST

CROSS APPLY tblQP.[query_plan].nodes('//RelOp[1]') 
			as x(i)

where  	 tblQP.[dbid] = @dbid

order by
	 [EstimateRows] desc

option ( maxdop 1)

print replicate('-', 80)


;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select 

	 top ( @iNumberofResultsToDisplay  )

	       queryPlan 
                     = tblQP.query_plan

	  ,	nodeName
			=
				(
					cast(x.i.query('.') as xml)
				)


	 , query
			=  tblST.[text]


	  ,	[EstimatedTotalSubtreeCost]
			= 	x.i.value('@EstimatedTotalSubtreeCost[1]', 'float')

	  ,	nodeID
			= x.i.value('@NodeId[1]', 'int')


	  ,	[EstimateRows]
			= x.i.value('@EstimateRows[1]', 'float')


	  , tblQP.[dbid]

	  , tblQP.[objectID]

	  , tblQP.[number]

	  , tblQP.[encrypted]

from   sys.dm_exec_query_stats tblQS

--cross apply sys.[dm_exec_query_plan](tblQS.plan_handle) 
inner join [dbaDMV].[dm_exec_query_plan_typed] tblQP
		on tblQS.[plan_handle] = tblQP.[plan_handle]

cross apply sys.[dm_exec_sql_text](tblQS.[sql_handle]) 
			as tblST

CROSS APPLY tblQP.[query_plan].nodes('//RelOp[1]') 
			as x(i)

where  	 tblQP.[dbid] = @dbid

order by
	 [EstimateRows] desc

option ( maxdop 1)

print replicate('=', 80)

print replicate('#', 80)



Statistic I/O

StatisticsIO

 

Statistic Time

StatisticsTime

 

Query – Cached Plans – Metadata – Summary – RELOP* – Tabulated I/O Stats

 

 

 

Metric Non Typed Table Typed Table
//RELOP
Worktable =  Scan count 625, logical reads 97913  Scan count 625, logical reads 97271
 xml_index = Scan count 88898, logical reads 829079  Scan count 88898, logical reads 826035
dm_exec_query_plan = Scan count 1, logical reads 39315  Scan count 1, logical reads 39315
//RELOP[1]
Worktable = Scan count 0, logical reads 32504  Scan count 625, logical reads 97271
 xml_index = Scan count 342518, logical reads 2131552  Scan count 342518, logical reads 2130245
dm_exec_query_plan_typed = Scan count 1, logical reads 39315  Scan count 1, logical reads 40535

 

Query – Cached Plans – Metadata – Summary – RELOP* – Differences

  1. RELOP
    • Worktable
      • Scans of 625, logical reads of 97913
    • xml index node
      • Scans of 88898, logical reads of 829079
  2. RELOP [1]
    • Worktable
      • Scans of 0, logical reads of 32504
    • xml index node
      • Scans of 342518, logical reads of 2130245

Quick Analysis

  1. //RELOP and where x.i.value(‘@NodeId’, ‘int’) = 0
    • Less scans against XML Index
    • Does worktable scan
  2. //RELOP[1]
    • More scan counts against XML Index
    • zero worktable scan

Observation

  1. Filter out your instrumentation qureies that target resource DB or sys schema.  Here are some ways …
    • Database = resource
      • database id is null
      • database id is 32767
    • Schema
      • schema != [sys]
      • Sample Query – Finding Implicit Column Conversions in the Plan Cache – Jonathan Kehayias ( Link )
        WHERE t.exist('ScalarOperator/Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1
        
        

Summary

XPATH

Queries against XPATH are costly for various reasons

  1. They are packaged away in XPATH expressions
    • It can be a bit difficult to tune them as one would regular query filters.  yes, sql:variable helps, but that can be problematic, due to:
      • case-sensitive
      • data type matching
      • Workarounds
        • The work around such as using upper and lower for case-insensitive matching can be costly as they will not use Indexes
        • Introduce subtle bugs

 

Conclusion

We did not notice performance gains by switching from an untyped XML Column to a typed one.

It is likely because of the dullness of our queries.

If we find best use cases for typed XML Column, will publish follow-up posts.

 

References

XML Schema Collection

  1. CREATE XML SCHEMA COLLECTION (Transact-SQL)
    https://msdn.microsoft.com/en-us/library/ms176009.aspx

 

XML Column

  1. XML Data Type and Columns (SQL Server)
    https://msdn.microsoft.com/en-us/library/hh403385.aspx
  2. Create XML Data Type Variables and Columns
    https://msdn.microsoft.com/en-us/library/bb522510.aspx
  3. Compare Typed XML to Untyped XML
    https://msdn.microsoft.com/en-us/library/ms184277.aspx

 

Manipulating XML Data

  1. Examples of using XQuery to update XML Data in SQL Server
    https://www.mssqltips.com/sqlservertip/2738/examples-of-using-xquery-to-update-xml-data-in-sql-server/
  2. Working with the XML Data Type of SQL Server
    http://www.developer.com/db/article.php/3755106/Working-with-the-XML-Data-Type-of-SQL-Server.htm

 

XML – XQuery

  1. SQL Server 2005 XQuery and XML-DML – Part 1 ( By Quin Street )
    http://www.codeguru.com/csharp/.net/net_data/article.php/c19491/SQL-Server-2005-XQuery-and-XMLDML–Part-1.htm

 

XML – Performance Tips

  1. Performance tips of using XML data in SQL Server – By Kun Cheng (SQLCAT)
    https://blogs.msdn.microsoft.com/sqlcat/2010/03/01/performance-tips-of-using-xml-data-in-sql-server/

 

Book

  1. Microsoft SQL Server for dummies
    https://books.google.com/books?id=kq321qpvP_EC&pg=PA104&lpg=PA104&dq=create+table+sql+server+xml+fragment&source=bl&ots=2I0Oq46ofm&sig=v7sLwMPcXHaLRNQpcSSZ6g5StLY&hl=en&sa=X&ved=0ahUKEwi6s43F6_zLAhVU4mMKHThQDpYQ6AEIXTAJ#v=onepage&q=create%20table%20sql%20server%20xml%20fragment&f=false

 

Dynamic Management View

  1. Table MetaData for Query Tuning – Quick and Dirty
    https://njandwani.wordpress.com/2012/08/07/table-metadata-for-query-tuning-quick-and-dirty/

 

TechEd

  1. Michael Rys
    • Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s