SQL Server – Transact SQL – Warning – “Convert Implicit”

Background

Checking for expensive queries and found an interesting one that had some identifying marks of a query that needed some tending to:

  1. CONVERT_IMPLICIT(uniqueidentifier, ….)
  2. Sort (Top N Sort)
  3. Index Spool

Query Plan

Here is what the Query Plan looks like:

ConvertImplicit

Data Model

Here is our Data Model.

DataModel

DDL

Created a new table, dbo.[StudentParentGPSRequestLog.DBA].

In the new table, we changed the definition of AppointmentID from varchar(100) to uniqueIdentifier.

This matches its definition in the table we are joining against.


CREATE TABLE [dbo].[StudentParentGPSRequestLog.DBA]
(
	[StudentID] [varchar](100) NOT NULL,
	[ParentID] [varchar](100) NOT NULL,
	[recordCreated] [datetime] NOT NULL,
	[Status] [varchar](100) NULL,
	--[AppointmentID] [varchar](100) NULL,
	[AppointmentID] [uniqueidentifier] NULL,
	[TrackDate] [datetime] NULL,
	[GPSLogID] [bigint] NULL,
	[ID] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
	[PageID] [varchar](100) NULL,
 CONSTRAINT [PK_StudentParentGPSRequestLog.DBA] PRIMARY KEY CLUSTERED 
(
	[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


Code

Reduced the original query gigantically and somehow got rid of the “implicit conversion”.

Simplified Query

Target Original Tables

Here is the re-written query:


declare @startDate DateTime
declare @finishDate DateTime

set @startDate = '2008-08-01'
set @finishDate = '2008-08-02'

select
	  app.startDateTime
	, app.appointmentID
	, app.*

FROM 	Appointments app

	  left outer join dbo.StudentParentGPSRequestLog  tblSPGPSRL

		on tblSPGPSRL.AppointmentID=app.appointmentID


WHERE app.startDateTime >= @startDate 

and   app.startDateTime <= @finishDate


And, here is the Query Plan:

QueryTargetTableOriginal-QueryPlan

Target Original Tables

Here is the re-written query:


declare @startDate DateTime
declare @finishDate DateTime

set @startDate = '2008-08-01'
set @finishDate = '2008-08-02'

select
	  app.startDateTime
	, app.appointmentID
	, app.*

FROM 	dbo.Appointments app

	  left outer join dbo.[StudentParentGPSRequestLog.DBA]  tblSPGPSRL

		on tblSPGPSRL.AppointmentID=app.appointmentID


WHERE app.startDateTime >= @startDate 

and   app.startDateTime <= @finishDate


And, here is the Query Plan:

QueryTargetTableRevised-QueryPlan

Compared Query Plans & Statistics I/O

Query Plans

ComparedQueryPlan

Statistics I/O

StatisticsIO

Quick Analysis

Metric Original Revised
Query Cost 54% 46%
 Join Operator Hash Match  Merge Join
 Actual Number of Rows  13  13
 Description Use each row from the top input to build a hash table, and each row from the bottom input to probe into the hash table, outputting all matching rows. Match rows from two suitably sorted input tables exploiting their sort order.
Estimated CPU Cost  2.46854  0.278817
Estimated Number of Executions  1  1
Estimated Number of Rows  123463  123459
Estimated Operator Cost  2.746571 (15%)  0.556829 (4%)
Estimated Subtree Cost  18.2881  15.891
 Number of Executions  1  1
 Parallel False  False
 Physical Operation  Hash Match  Merge Join

Quick Explanation:

The mismatched join columns lead to a more expensive query:

  1. Query Cost of 56% VS 46%
    • Hash Match VS Merge Join
    • CPU Cost 2.46854 VS 0.278817
    • Estimated Operator Cost of 2.746571 VS 0.556829
  2. The Compute Scaler in the original query is a bit of a hint, as well

Even though we updated statistics with Full Scan a couple of times, the Expected Number of Rows @ 123459 is still way high compared to the actual number of Rows of 13.

Review Join Operators

Advanced Query Tuning Concepts
https://msdn.microsoft.com/en-us/library/ms191426(v=SQL.100).aspx

  1. Nested loops joins
    • If one join input is small (fewer than 10 rows) and the other join input is fairly large and indexed on its join columns, an index nested loops join is the fastest join operation because they require the least I/O and the fewest comparison
  2. Merge Join
    • If the two join inputs are not small but are sorted on their join column (for example, if they were obtained by scanning sorted indexes), a merge join is the fastest join operation.
  3. Hash Joins
    • Hash joins can efficiently process large, unsorted, nonindexed inputs. They are useful for intermediate results in complex queries

And, so we see that the best we can do is a Merge Join.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s