Technical: Microsoft – SQL Server – Data I/O – Computed Columns

Technical: Microsoft – SQL Server – Data I/O – Computed Columns

Introduction

For the last couple of weeks or so, I have struggled with a little SQL Script that transfers data between two SQL Server Instances.

Background

The Transact SQL is simple enough.  It uses Microsoft’s SQL Server highly regarded Linked Server (Heterogeneous Database) to copy data from a legacy system to a new one.

Original Query – Linked Server

Here is the original query that uses Linked Server



        declare @logdateDestMax datetime
        declare @logdateDestMaxDefault datetime

        set @logdateDestMaxDefault = '1/1/1900'

        set @logdateDestMax = ( 
                                 select max(createDate) 
                                 from dbo.employee 
                              )

        if ( @logdateDestMax is null)
        begin
              set @logdateDestMax = @logdateDestMaxDefault
        end
	insert into dbo.employee	
        (
	   [username]
	  ,[createDate]
	  ,[createdBy]
	)
	select 
	   [username]
	  ,[createDate]
	  ,[createdBy]
        from [DBLABJ_NC_10_ODBC].[dbHR].dbo.employee tblSrc
	where [logdate] > @logdateDestMax

Rewrite  Query – OpenQuery

As I know that 4 part name queries can sometimes be problematic and slow, I thought may be I should rewrite to use openquery.

Here is the re-written query that uses Linked Server \ Open Query


        declare @logdateDestMax datetime
        declare @logdateDestMaxDefault datetime

        set @logdateDestMaxDefault = '1/1/1900'

        set @logdateDestMax = ( 
                                 select max(createDate) 
                                 from dbo.employee 
                              )

        if ( @logdateDestMax is null)
        begin
              set @logdateDestMax = @logdateDestMaxDefault
        end
	insert into dbo.employee	
        (
	   [username]
	  ,[createDate]
	  ,[createdBy]
	)
	select 
	   [username]
	  ,[createDate]
	  ,[createdBy]
        from openquery(
                           [DBLABJ_NC_10_ODBC]
                         , 'select * from [dbHR].dbo.employee'
                      ) tblSrc
	where [logdate] > @logdateDestMax

OpenQuery – Poor Auditioning

The example above is a very poor staging of the beenfits of openquery.

Usually, we want a query that can be completely serviceable at the Other Server. The query above still brings all the data over to the querying server.

Once all the data is brought over, the resident SQL instance then issues a comparison.

There are things we can do to better ‘position’ the linked Server.  Our alternate choices includes:

  • Creating a Stored Procedure on the Linked Server.  The SP will  accept parameters that will help ‘our case’

Compare 4 part name and OpenQuery

Compare Execution Plans

Here is a quick comparison of both plans.

ExecutionPlan

One can quickly see that the plan that uses the 4 part name is 53%, while the one that uses openquery is at 36%.

Compare IO Stats

With “set statistics io on”, we can see that IO stats for the first query:

Table 'employee'. Scan count 1, logical reads 1369, physical reads 0, read-ahead 
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

We do not have corresponding IO stats for the query that uses openquery; as that query is completely ‘materialized’ on the Linked Server and seemingly Profiling Stats are not returned for externally ran queries.

Remote Scan Stats

If we focus on the remote Scan Stats, we are able to dig a bit deeper into our estimated and actual costs:

Linked Server Cost

RemoteQuery-Slow-4PartName

Open Query Cost

RemoteQuery-Slow-OpenQuery

Here is the break-down

Item 4 part name Open Query
Actual Number of Rows 5 49821
Actual Rebinds  1 0
Actual Rewinds  0 0
Estimated CPU Cost  4.9921 3.3633
Estimated I/O Cost  0 0
Estimated Number of Executions 1 1
Estimated Number of Rows  14936.3 10000
Estimated Operator Cost  4.9921 (100%) 3.36333 (100%)
Estimated Rebinds  0  0
Estimated Rewinds  0  0
Estimated Row Size 448 Bytes  448 Bytes
Estimated SubTree Cost 4.9921  3.36333
Logical Operation Remote Query Remote Scan
Node ID 1 1
Number of Executions 1 1
Output List  DBLAB].[dbo].[UpdateLog].col1, [DBLAB].[dbo].[UpdateLog].col2, [DBLAB].[dbo].[UpdateLog].logdate, [DBLAB].[dbo].[UpdateLog].col3 SQLNCLI11
Parallel False False
Physical Operation  Remote Query Remote Scan
Remote Object SELECT “Tbl1001”.”col1″ “Col1003″,”Tbl1001”.”col2″ “Col1004″,”Tbl1001″.”logdate” “Col1005″,”Tbl1001”.”col3″ “Col1006” FROM “DBLAB”.”dbo”.”UpdateLog” “Tbl1001” WHERE “Tbl1001″.”logdate”>? select * from [DBLAB].dbo.UpdateLog tblSrc
Remote Source  Named Instance name Actual Name Linked Server

Explanation

  • “Actual Number of Rows”.  OpenQuery is bringing back all the records and comparing them against our data filter.  It seems 4 part name is only indicating the number of records that will be returned by the select statement
  • For 4 part name, we have rebinds.  For best explanation of what rebinds are, please “Check out” Scary DBA’s Blog posting – Rebind and Rewind in Execution Plans (http://www.scarydba.com/2011/06/15/rebind-and-rewind-in-execution-plans/“)
  • Estimated CPU Cost – 4 the four part name it is a bit higher
  • Estimated Number of Rows – In case of  “4 Part Name”, the Engine can consult with the Linked Server and ask it for the number of records in the table; whereas, for the “Open Query”, the  query will have to be ran.  And, so we get a place holder 10,000 count
  • Estimated Operator Cost – Same as Estimated CPU Cost; it seems most of our cost is attributed to CPU; very little I/O cost
  • Estimated Sub Tree Cost – All query costs eaten up by this operation
  • Logical Operation – “Remote Query” (Linked Server) vs “Remote Scan” ( openquery)
  • Output List :- Openquery lists our SQL Server Provider (SQLNCLI11)
  • Parallel :- Our record count is relatively low and no parallelism
  • Physical Operation – Same as Logical Operation **** “Remote Query” (Linked Server) vs “Remote Scan” ( openquery) ***
  • In this case our Linked Server is on the same box as our calling Server.  For 4 part name, our entry is the Instance Name, not the full Instance Name, just the instance itself (SQLExpress).
    And, the openquery as the full instance name for our Linked Server

Summary

  • Linked Server offers a bit more reliable instrumentation
  • I/O Costs for the remote scan that is employed when we use OpenQuery are hidden

Bcp

Intro

We decided to try bcp as Linked Server “selects” was taken about 1.5 hours for a measly 15K (15000) records.

Now that we are out of MS SQL Server, we will use Gammadyne’s Timer.exe ( http://www.gammadyne.com/cmdline.htm#timer) to time the performance of our bcp sessions.

As we are only profiling for performance, we will tweak our code by making a couple of adjustments.

  • Add -L <N (-L 1000) :- Only return 1000 records

Bcp Performance – Table

Get data from the entire table.

bcp  "dbo.employee" out employee.txt -c -S hrDB -d hr -T -L 1000

 Image

bcpTable

  • So we are only getting 11.63 rows per sec.
  • And, it is taken us 90 seconds to get 1000 records

Bcp Performance – QueryOut

Get data for all columns.

bcp  "select * from dbo.employee" queryout employee.txt -c -S hrDB -d hr -T -L 1000

 Image

bcpQueryOut

  • So we are only getting 11.56 rows per sec.
  • And, it is taken us 86.8 seconds to get 1000 records

So our numbers are still not nothing to jump around about.

Looked at the following areas on the Linked Server:

  • Wait Stats

Bcp Performance – QueryOut – Filtered ColumnList

Reviewed our column list  and noticed that one of the columns we are bringing back is a computed column.

We removed that column by explicitly listing the columns that we want:

bcp  "select FIRSTNAME, LASTNAME, USERNAME, PASSWORD from dbo.employee" queryout 
employee.txt -c -S hrDB -d hr -T -L 1000

 Image

bcpQueryOutFilteredColumnList

  • So we are only getting 62,500 rows per sec.
  • And, it is taken us 62 milliseconds to get 1000 records

Computed Columns

So our biggest drag was the “computed column”.  We do not even need it, as it is will be regenerated on the other side.

Find Computed Columns

Find Computed Columns – MS SQL Server 2000


select 
		  tblUser.name as schemaName
		, tblObject.name as objectName
		, tblColumn.name as columnName

from   syscolumns tblColumn

	  inner join sysobjects tblObject

	 	on tblColumn.id = tblObject.id

	 inner join sysusers tblUser

		on tblObject.uid = tblUser.uid

where  tblObject.[type] = 'U' 
and    tblColumn.iscomputed = 1

Find Computed Columns – MS SQL Server 2005 and above



select 
		  tblSchema.name as schemaName
		, tblObject.name as objectName
		, tblColumn.name as columnName
		, tblColumnComputed.definition
		, tblColumnComputed.is_computed as [isComputed]		
		, tblColumnComputed.is_persisted as isPersisted		

from   sys.columns tblColumn

	  inner join sys.objects tblObject

	     on tblColumn.object_id = tblObject.object_id

	  inner join sys.schemas tblSchema

	     on tblObject.schema_id = tblSchema.schema_id

	  inner join sys.computed_columns tblColumnComputed

	     on  tblColumn.object_id = tblColumnComputed.object_id
	     and tblColumn.column_id = tblColumnComputed.column_id

where  tblObject.type = 'U'
and    tblColumn.is_computed = 1


Image

ListComputedColumns_v2005

Conclusion

So again keep an eye on computed columns.

They were introduced in MS SQL Server v2000, but at that time they can not be persisted.

From v2005 on out, they can be persisted.

Persisted means their values are computed upon initial creation or subsequent updates.  And, thereafter the saved value is read and thus “readers” are not forced to pay the price of re-calculations.

But, even then make sure that you only include them when you need them.  Be especially careful when you use * (select * from <schema-name>.<object-name>); or when you reference the entire table.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s