Transact SQL – The Time of Scaler Functions Problems

Background

It is easy to create little performance potholes in SQL.  Let us review one such path.

Environment

Create table

Let us create a table.

 


set noexec off
go

use [DBLab]
go

if schema_id('functionScaler') is null
begin

	exec('create schema [functionScaler] authorization [dbo]')

end
go

if object_id('[functionScaler].[String]') is not null
begin

	drop table [functionScaler].[String];

end
go

if object_id('[functionScaler].[String]') is not null
begin

	set noexec on

end
go

create table [functionScaler].[String]
(

	  [int]    int          not null identity(1,1)

	, [value]  varchar(600) not null

	, constraint PK_FUNCTION_SCALER__STRING
		primary key
			(
				[value]
			)

)

set noexec off
go

 

 

Create DDL – Table Value Function

Let us look for isNumeric Inline table value function.

Jeff Moden has a good name within the SQL Community and we will use his.

 


set noexec off
go

use [DBLab]
go

if object_id('dbo.IsAllDigits') is not null
begin

	set noexec on
end
go

CREATE FUNCTION dbo.IsAllDigits
/********************************************************************
 Purpose:
 This function will return a 1 if the string parameter contains only
 numeric digits and will return a 0 in all other cases.  Use it in
 a FROM clause along with CROSS APPLY when used against a table.

 --Jeff Moden
********************************************************************/
--===== Declare the I/O parameters
        (@MyString VARCHAR(8000))
RETURNS TABLE AS
 RETURN (
         SELECT CASE
                WHEN @MyString NOT LIKE '%[^0-9]%'
                THEN 1
                ELSE 0
                END AS IsAllDigits
        )
go

set noexec off
go

 

Add Data

 


set nocount on
go

use [DBLab]
go

truncate table [functionScaler].[String]
go

insert into [functionScaler].[String]
([value])
select char(rand()*26+65)+char(rand()*26+65)+char(rand()*26+65)
      +char(rand()*26+65)+char(rand()*26+65)+char(rand()*26+65)
      +char(rand()*26+65)+char(rand()*26+65)+char(rand()*26+65)
      +char(rand()*26+65)+char(rand()*26+65)+char(rand()*26+65)
      +char(rand()*26+65)+char(rand()*26+65)+char(rand()*26+65)
go 100000
;

insert into [functionScaler].[String]
([value])
select 112
go
;

 

 

Performance Review

 

Query – Base

Let us query our table.

We will initially use a straight forward isNumeric system function.

And, then Mr. Moden’s Inline table value function.

 


set statistics time off
set statistics io on
set nocount on
go

use [DBLab]
go

declare @value_as_int int
declare @value_as_bit bit

set @value_as_int = 1
set @value_as_bit = 1

select [value]
from  [functionScaler].[String] tblS
where isNumeric(tblS.[value]) = 1

print '-------------------------------------------------------------'

select [value]

from   [functionScaler].[String] tblS

		cross apply dbo.IsAllDigits(tblS.[value]) tblID

where tblID.IsAllDigits = 1

  Query Execution Plan:

isNumericCheck-System-and-inlineTableValueFunction
Statistics I/O

statisticsIO

 

Explanation:

  • Both queries pay the price of an Index Scan
  • Also, both queries have identical logical reads of 566

 

Can we attain better performance?

Can we get better performance?

Ideas that come to mind includes:

  • Try to replace the isNumeric scaler function with an Inline Table Value Function.  We have already tried that, but it does not seem to help
  • Shall we try CLR.  Presumably, Microsoft wrote isNumeric in C++ and it is not clear that, C# CLR will offer better performance

 

Index

We all know that indexes usually help.

But, unlike Oracle, Microsoft does not allow us to directly index functions.  And, so we will create a persisted computed column and index that.

 


use [DBLab]
go

/*
	Add computed column and corresponding index
*/
declare @bCreateComputedColumn bit

set @bCreateComputedColumn = 1

if (@bCreateComputedColumn = 1)
begin

	print 'Create Computed Column valueIsNumeric'

	if not exists
		(

			select *
			from   sys.columns tblSC
			where  tblSC.object_id = object_id('[functionScaler].[String]')
			and    tblSC.[name] = 'valueIsNumeric'
		)
		begin

			print 'Create Computed Column (valueIsNumeric) ...'

			-- alter table [functionScaler].[String] drop column valueIsNumeric
			alter table [functionScaler].[String]
				add [valueIsNumeric]
					as isNumeric([value]) persisted

			print 'Create Index idx_valueIsNumeric ...'

			-- drop index [functionScaler].[String].idx_valueIsNumeric
			create index idx_valueIsNumeric
			on [functionScaler].[String]
			( [valueIsNumeric] )
			include
			(
				[value]
			)

		end
	else
	begin

		print 'Computed Column valueIsNumeric Exists'

	end

	print 'Create Computed Column valueIsNumeric_AsBit_UsingCast'

	if not exists
		(

			select *
			from   sys.columns tblSC
			where  tblSC.object_id = object_id('[functionScaler].[String]')
			and    tblSC.[name] = 'valueIsNumeric_AsBit_UsingCast'
		)
		begin

			print 'Create Computed Column (valueIsNumeric_AsBit) ...'

			-- alter table [functionScaler].[String] drop column valueIsNumeric
			alter table [functionScaler].[String]
				add [valueIsNumeric_AsBit_UsingCast]
					as cast(isNumeric([value]) as bit) persisted

			print 'Create Index idx_valueIsNumericAsBit_UsingCast ...'

			-- drop index [functionScaler].[String].idx_valueIsNumeric_AsBit
			create index idx_valueIsNumeric_AsBit_UsingCast
			on [functionScaler].[String]
			( [valueIsNumeric_AsBit_UsingCast] )
			include
			(
				[value]
			)

		end
	else
	begin

		print 'Computed Column Exist valueIsNumeric_AsBit_UsingCast'

	end

end

go

Review Performance

 


use [DBLab]
go

set statistics time off
set statistics io on
set nocount on
go

print ''
print '-------------------------------------------------------------'
print 'Query IsNumeric = 1 (index = PK_FUNCTION_SCALER__STRING)'

select [value]
from  [functionScaler].[String] tblS with (index = PK_FUNCTION_SCALER__STRING)
where isNumeric(tblS.[value]) = 1

print ''
print '-------------------------------------------------------------'
print 'Query IsNumeric = 1'

select [value]
from  [functionScaler].[String] tblS
where isNumeric(tblS.[value]) = 1

/*

select [value]

from   [functionScaler].[String] tblS

		cross apply dbo.IsAllDigits(tblS.[value]) tblID

where tblID.IsAllDigits = 1

--------------------------------------------------------------------

*/

print ''
print '-------------------------------------------------------------'
print 'Query [valueIsNumeric] - value - int'

select [value]

from   [functionScaler].[String] tblS

where tblS.[valueIsNumeric] = 1

print ''
print '-------------------------------------------------------------'
print 'Query [valueIsNumeric_AsBit] - value - int'

select [value]

from   [functionScaler].[String] tblS

where tblS.[valueIsNumeric_AsBit_UsingCast] = 1

print ''
print '-------------------------------------------------------------'
print 'Query [valueIsNumeric_AsBit] - value - bit'

select [value]

from   [functionScaler].[String] tblS

where tblS.[valueIsNumeric_AsBit_UsingCast] = cast(1 as bit)

--------------------------------------------------------------------

We get surprising performance implications once we added the function indexes.

Let us quickly review our findings.

Query Index % Scan Count / Logical Reads
 1 Clustered Index Scan  98% Scan count 1, logical reads 651
 2 Index Seek – idx_valueIsNumeric_AsBit  1% Scan count 1, logical reads 3
 3 Index Seek – idx_valueIsNumeric_AsBit  1% Scan count 1, logical reads 3
 4 Index Scan – idx_valueIsNumeric_AsBit  1% Scan count 1, logical reads 3
 5 Index Scan – idx_valueIsNumeric_AsBit  1% Scan count 1, logical reads 3

 

 

Explanation:

We can clearly see that queries 2 and 5 are quite a bit better than our original scenario where we did not have a ‘backing” index for isNumeric clause

 

Source Control

GitHub

As it is a Git World, Tracking @ https://github.com/DanielAdeniji/TransactSQLScalerFunctionPerf

 

Summary

In summary, be careful about adding little helper\utility functions to your SQL code; as they can cause table scans.

There are certain design decisions that can result in needing determination functions such as isDate, isNumeric, etc.

Those design decisions include using the same column to hold different datatypes.

The resultant SQL DML Statements also often need OR Statements.  And, that often lead to much greater Scan Counts.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s