Transact SQL – Handling Incomplete Strings with the “Like” Operator

Background

Noticed that we were not getting complete result set due to incomplete strings.

 

Lab

Inner Join on Equal Operator

Guide

Here we employ an inner join between two tables.

The joining columns are Transaction ID and Order ID.

SQL – Inner Join on Equal Operator


select 

		   [src] = 'Inner Join on Equal Operator (Original)'

		,  [payerIDExists]
			=  case
					when (tblPC.payer_id is null) then 'No'
					else 'Yes'
			   end

		,  tblHCMA.[TransactionID]

		 , [OrderID]
			= tblPC.[Order_ID]

		, [TransactionIDLength]
			= len(tblHCMA.[TransactionID])

	    , tblHCMA.[ProductName]

		 , tblPC.payer_id	

		 , [typeofPayment]
			= tblPC.[type_of_payment]

from  [DBAnalysis].[dbo].[HitpathCourseManagerAnalysis] tblHCMA

inner join [dbo].[PaymentCharges] tblPC
		on (
				(
					(  tblHCMA.[TransactionID] = tblPC.[Order_ID]  )
				)
			)

order by
		case
			when (tblPC.payer_id is null) then 'No'
			else 'Yes'
		end

		 , len(tblHCMA.[TransactionID])
		 , tblHCMA.[TransactionID]



 

Output

Query1

 

 

Left Outer Join on Equal Operator

Guide

The requestor came back and said some data was missing.

And, so we changed from an inner join to a left outer join.

SQL – Left Outer Join on Equal Operator


select 
		distinct

		   [src] = 'Left Outer Join on Equal Operator'

		,  [payerIDExists]
			=  case
					when (tblPC.payer_id is null) then 'No'
					else 'Yes'
			   end

		,  tblHCMA.[TransactionID]

		 , [OrderID]
			= tblPC.[Order_ID]

		, [TransactionIDLength]
			= len(tblHCMA.[TransactionID])

	    , tblHCMA.[ProductName]

		 , tblPC.payer_id	


		 , [typeofPayment]
			= tblPC.[type_of_payment]


from  [DBAnalysis].[dbo].[HitpathCourseManagerAnalysis] tblHCMA

left outer join [dbo].[PaymentCharges] tblPC

		on (

				(

					(  tblHCMA.[TransactionID] = tblPC.[Order_ID] )

				)

			)

order by
		case
			when (tblPC.payer_id is null) then 'No'
			else 'Yes'
		end

		 , len(tblHCMA.[TransactionID])
		 , tblHCMA.[TransactionID]



 

Output

LeftOuterJoinOnEquality

 

 

Left Outer Join on Like Operator

Guide

My Director who is way smarter than I joined against other tables and found out that the missing data was due to String truncation on one side of our equijoin.

One way to handle is to use the like Operator.

 

SQL – Left Outer Join on Like Operator


print 'Left Outer Join on Like Operator'
print '================================='

select 

		   [src] = 'Left Outer Join on Like Operator'

		,  [payerIDExists]
			=  case
					when (tblPC.payer_id is null) then 'No'
					else 'Yes'
			   end

		,  tblHCMA.[TransactionID]

		 , [OrderID]
			= tblPC.[Order_ID]

		, [TransactionIDLength]
			= len(tblHCMA.[TransactionID])

	    , tblHCMA.[ProductName]

		 , tblPC.payer_id	


		 , [typeofPayment]
			= tblPC.[type_of_payment]


from  [DBAnalysis].[dbo].[HitpathCourseManagerAnalysis] tblHCMA

left outer join [dbo].[PaymentCharges] tblPC

		on (

				(

						(
							    (  tblPC.[Order_ID] like tblHCMA.[TransactionID] + '%'  )
						)
				)

			)

order by
		case
			when (tblPC.payer_id is null) then 'No'
			else 'Yes'
		end

		 , len(tblHCMA.[TransactionID])
		 , tblHCMA.[TransactionID]




Output

LeftOuterJoinUsingLikeOperator

 

Query Comparison

Query Plan Comparison

 

Explanation

In terms of Query Plan Costing the join that does not rely on equijoin, but uses the like, is at 100% of the total

 

Statistics IO

StatisticsIO

Explanation

The Statistics I/O metrics are very similar.

 

In depth Query Plan Analysis

Query 3

Visual

Query3

Tabulated

 

Operator Property Value
Clustered Index Scan – HitPathCM
Estimated CPU Cost  0.0001933
Estimated IO Cost  0.0038657
Estimated Operator Cost 0.004059 (3%)
Estimated Number of Rows  33
Number of Rows  33
Constant Scan (0%)
 Actual Number of Rows  33
Compute Scaler (0%)
 Estimated Number of Rows  1
Nested Loops ( Inner Join ) (61 %)
Estimated CPU Cost  0.0502971
Estimated IO Cost  0.331273
Estimated Number of Rows  45581.9
Estimated Subtree Cost  12.2493
Nested Loops ( Left Outer Join ) (38 %)
Estimated CPU Cost 6.28757
Estimated IO Cost 0
Estimated Number of Rows 1504230
Estimated Operator Cost 7.6113377 (38%)
Estimated Subtree Cost 19.8647

 

 

Explanation

The Equijoin queries have a single Nested Join Operation.

On the other hand, the query with the Like Operator have two Nested Loops operators.

 

Summary

Equijoins enjoy the benefits of sure statistics and a lot easier on the database Engine.

On the other, like matches are more probabilistic in terms of match statistics.

While equijoins matches can be handled via indexes, some of the work for like matches have to be done in the CPU.

One thought on “Transact SQL – Handling Incomplete Strings with the “Like” Operator

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s