Technical: Microsoft – SQL Server – Analysis Services – Adding new dimension to an existing Cube, but “No go”

Background

Trying my best to get comfortable with Microsoft SQL Server Analysis Services; specifically Cube Browsing.

Doubt

But, I am running into doubt about my understanding and progress; as when I try to add a new dimension to an existing cube, I am arriving at repeating rows.

Here is a screen shot:

RepeatingDataInCubeBrowser - v2

 

Google

Goggled, but honestly did not know what to Google for.  To properly search the web, one needs a bit of contextual jargon and as I said I am so new at this, I do not even know what to search on.

Leads

After much stumbling, I finally settled on a few leads:

  • In the Data Source View, though I have drawn a relationship between the fact table and our new introduced Dimension, the “Data Relationship” is still a missing the “ForeignKeyName”.
  • In the Cube design, Dimension Usage Tab, we do not have a defined relationship between our new Dimension and our Fact’s Measure Groups

 

Data Relation ForeignKeyName Missing

DataRelationShip-ForeignKeyName-Missing

Cube – Dimensions – Measure Groups – Missing Relationship

Cube-DimensionUsage-MeasureGroups

Problem Source

So how did we get into this problem.  Well our Dimension is based on a table that has a few String columns.

Let us substitute our actual table with one that has a similar structure:

Table Structure

In our fact table (Income), we have Sales Person 1, Sales Person Name, Address 1, Address 2, City, Country, State, and City, Postal Code, Address 1, and Address 2.

Attribute ColumnName Data Type
Sales Person ID SalesPersonID int
Sales Person Name SalesPersonName varchar(100)
Address1 Address1 varchar(100)
Address2 Address2 varchar(100)
City City varchar(300)
State State varchar(300)
Postal Code PostalCode varchar(100)
Country Country varchar(300)
Income Income currency

Imaging for a moment that our business-user will like to be able to query our system based on Postal Codes, Cities, States, and Countries.

Query Fact Attributes

There are a couple of pathways that we can use to query our fact table using the aforementioned geography attributes.

Our choices include:

Dimension Table Structure

Attribute ColumnName Data Type
Location ID LocationID  int identity(1,1)
Postal Code PostalCode  varchar(100)
City City  varchar(300)
State State  varchar(300)
Country Country  varchar(300)

DDL SQL – Dimension (dbo.DimLocation)

Here is the DDL for creating our dbo.dimLocation table.

  • Pasted below is our DDL for creating the dimLocation table
  • Our primary key is PK_DIMLocation and it is based on Location_ID; our Identity column
  • We have a unique index named idx_Unique_Country__State__City__PostalCode.   That index is on Country, State, City, and PostalCode
  • We have another index idx_originatingRecordCreationDate that tracks the dateOriginalRecordCreated column.  This index will afford us quick determination of our last batch import into our dbo.DimLocation


set noexec off
go

/*
	drop table [dbo].[DimLocation];
*/
if object_id('[dbo].[DimLocation]') is not null
begin

	set noexec on

end
go

create table [dbo].[DimLocation]
(
	[Location_ID] int not null identity(1,1)

    , 	[PostalCode] varchar(100) null

    , 	[City]	varchar(300) null

    , 	[State]	varchar(300) null

    , 	[Country] varchar(300) null

    ,   [dateOriginalRecordCreated] datetime not null

    ,   [dateInserted] datetime not null
		constraint LocationDateInserted default getutcdate()

    ,   [InsertedBy] sysname not null
 		constraint LocationInsertedBy default SYSTEM_USER

)
go

set noexec off
go

if not exists
	(
		SELECT 
			OBJECT_NAME(OBJECT_ID) AS NameofConstraint,
			SCHEMA_NAME(schema_id) AS SchemaName,
			OBJECT_NAME(parent_object_id) AS TableName,
			type_desc AS ConstraintType
		FROM  sys.objects tblObject
		WHERE tblObject.type_desc LIKE 'PRIMARY_KEY_CONSTRAINT'
		AND   tblObject.parent_object_id = object_id('dbo.DimLocation')
	)
begin

	alter table dbo.DimLocation
		add constraint PK_DIMLocation primary key 
			(
				[Location_ID]
			)

end
go

if not exists
	(
		select *
		from   sys.indexes tblIndex
		where  tblIndex.object_id = object_id('dbo.dimLocation')
		and    tblIndex.name = 'idx_Unique_Country__State__City__PostalCode'
	)
begin

   CREATE UNIQUE NONCLUSTERED INDEX [idx_Unique_Country__State__City__PostalCode] 
	ON dbo.dimLocation
	(
		  [Country] ASC
		, [State] ASC
		, [City] ASC
		, [PostalCode] ASC
	)
	with
	(
	   data_compression = page
	)
	;

end
GO

if not exists
	(
		select *
		from   sys.indexes tblIndex
		where  tblIndex.object_id = object_id('dbo.dimLocation')
		and    tblIndex.name = 'idx_originatingRecordCreationDate'
	)
begin

	create index idx_originatingRecordCreationDate
	on dbo.dimLocation
	(
		[dateOriginalRecordCreated]
	)
	with
	(
	   data_compression = page
	)
	;

end

go

set noexec off
go

It is always best to be observant of any error messages and warnings.  And, we get one here!

Image Text

Warning! The maximum key length is 900 bytes. The index 
'idx_Unique_Country__State__City__PostalCode' has maximum length of 1000 bytes. For
 some combination of large values, the insert/update operation will fail.

Image Picture

TheMaximumLengthIs900Bytes

DML SQL – Dimension (dbo.DimLocation) – Insert

Pasted below is a set of DML SQL Statements that triggers an error!


set nocount on
go

truncate table [dbo].[DimLocation]
go

while (@@trancount > 0)
begin

	rollback tran

end
go

begin tran

	insert into [dbo].[DimLocation]
	([PostalCode], [City], [State], Country, [dateOriginalRecordCreated])
	values (28201, 'Charlotte', 'NC', 'USA', getutcdate())

	insert into [dbo].[DimLocation]
	([PostalCode], [City], [State], Country, [dateOriginalRecordCreated])
	values (94203, 'Sacramento', 'NC', 'USA', getutcdate())

	insert into [dbo].[DimLocation]
	([PostalCode], [City], [State], Country, [dateOriginalRecordCreated])
	values 
	(
		  replicate('0', 100) -- Postal Code
		, replicate('CITY', 75) -- City
		, replicate('NY', 150) -- State
		, replicate('US', 150) -- Country
		, getutcdate()
	)

rollback tran

go


Error Message Image

TheIndexEntryofLength


Error Message Textual

Server: Msg 1946, Level 16, State 3, Line 13
Operation failed. The index entry of length 1000 bytes for the index 
'idx_Unique_Country__State__City__PostalCode' exceeds the 
maximum length of 900 bytes.

Long Textual Columns

Issues that arise when Indexing & querying long textual columns are voluminous.

Due to the fact that textual searches are a bit slower than numeric searches, sometimes functions are targeted at Strings and smaller datasets are derived.

The numeric and binary data can then be initially compared to shrink the potential data pool.

Microsoft SQL Server has a few functions for the aforementioned purpose:

Hashbytes

Using Hash Columns as Index Column

To help avoid the problem with index keys larger than 900 bytes, we googled and found an excellent article by Microsoft’s Bart Duncan.

Living with SQL’s 900 Byte Index Key Length Limit (by Bart Duncan)
http://blogs.msdn.com/b/bartd/archive/2011/01/06/optionsforindexedlookupsoflongvalues.aspx

From Bart’s Duncan’s article we chose option E, which reads hash the columns and index the hash.

Add Hash Columns



--Adding CountryHash
if not exists
	(
		select *
		from   sys.columns tblColumn
		where  tblColumn.object_id = object_id('dbo.dimLocation')
		and    tblColumn.name = 'CountryHash'
	)
begin
    alter table dbo.dimLocation
	add [CountryHash] as cast(HASHBYTES('MD5', [Country]) AS VARBINARY(16)) 
           persisted
end
go

--Adding StateHash
if not exists
	(
		select *
		from   sys.columns tblColumn
		where  tblColumn.object_id = object_id('dbo.dimLocation')
		and    tblColumn.name = 'StateHash'
	)
begin
	alter table dbo.dimLocation
		add [StateHash] as cast(HASHBYTES('MD5', [State]) AS VARBINARY(16)) 
                  persisted
end
go

--Adding CityHash
if not exists
	(
		select *
		from   sys.columns tblColumn
		where  tblColumn.object_id = object_id('dbo.dimLocation')
		and    tblColumn.name = 'CityHash'
	)
begin
	alter table dbo.dimLocation
		add [CityHash] as cast(HASHBYTES('MD5', [City]) AS VARBINARY(16)) 
                   persisted
end
go

--Adding PostalCodeHash
if not exists
	(
		select *
		from   sys.columns tblColumn
		where  tblColumn.object_id = object_id('dbo.dimLocation')
		and    tblColumn.name = 'PostalCodeHash'
	)
begin
     alter table dbo.dimLocation
	add [PostalCodeHash] as cast(HASHBYTES('MD5', [PostalCode]) AS VARBINARY(16))           persisted
end
go

Create Candidate Key (Unique Index) on Hash Columns

  • So the plan is to create a unique index against our newly added Hash columns
  • It needs to be a unique index for it to be considered a candidate key
  • Why Candidate key? Because we will be creating a foreign key relationship from the fact table that references it


if not exists
	(
		select *
		from   sys.indexes tblIndex
		where  tblIndex.object_id = object_id('dbo.dimLocation')
		and    tblIndex.name = 
                          'idx_Unique_CountryHash_StateHash_CityHash_PostalCodeHash'
	)
begin

	CREATE UNIQUE NONCLUSTERED INDEX
                 [idx_Unique_CountryHash_StateHash_CityHash_PostalCodeHash] 
	ON dbo.dimLocation
	(
		  [CountryHash] ASC
		, [StateHash] ASC
		, [CityHash] ASC
		, [PostalCodeHash] ASC
	)
	with
	(
		   data_compression = page
	)
	  ;

end
GO

Create Table – dbo.FactSales



set noexec off
go

/*
  drop table [dbo].[factSales];
*/
if object_id('[dbo].[factSales]') is not null
begin

   set noexec on

end
go

create table [dbo].[factSales]
(
	  [Sales_ID] bigint not null identity(1,1)

	, [PostalCode]	varchar(100) null

	, [City]  varchar(300) null

	, [State] varchar(300) null

	, [Country] varchar(300) null

	, [dateInserted] datetime not null
			constraint factSalesDateInserted default getutcdate()

	,   [InsertedBy] sysname not null
			constraint factSalesInsertedBy default SYSTEM_USER

)
go

set noexec off
go

if not exists
	(
		SELECT 
			OBJECT_NAME(OBJECT_ID) AS NameofConstraint,
			SCHEMA_NAME(schema_id) AS SchemaName,
			OBJECT_NAME(parent_object_id) AS TableName,
			type_desc AS ConstraintType
		FROM  sys.objects tblObject
		WHERE tblObject.type_desc LIKE 'PRIMARY_KEY_CONSTRAINT'
		AND   tblObject.parent_object_id = object_id('dbo.factSales')
	)
begin

	alter table dbo.factSales
		add constraint PK_Sales primary key 
			(
				[Sales_ID]
			)

end
go

set noexec off
go

--Adding CountryHash
if not exists
	(
		select *
		from   sys.columns tblColumn
		where  tblColumn.object_id = object_id('dbo.factSales')
		and    tblColumn.name = 'CountryHash'
	)
begin
     alter table dbo.factSales
	add [CountryHash] as cast(HASHBYTES('MD5', [Country]) AS VARBINARY(16)) 
            persisted
end
go

--Adding StateHash
if not exists
	(
		select *
		from   sys.columns tblColumn
		where  tblColumn.object_id = object_id('dbo.factSales')
		and    tblColumn.name = 'StateHash'
	)
begin
	alter table dbo.factSales
	  add [StateHash] as cast(HASHBYTES('MD5', [State]) AS VARBINARY(16)) 
            persisted
end
go

--Adding CityHash
if not exists
	(
		select *
		from   sys.columns tblColumn
		where  tblColumn.object_id = object_id('dbo.factSales')
		and    tblColumn.name = 'CityHash'
	)
begin
	alter table dbo.factSales
	  add [CityHash] as cast(HASHBYTES('MD5', [City]) AS VARBINARY(16)) persisted
end
go

--Adding PostalCodeHash
if not exists
	(
		select *
		from   sys.columns tblColumn
		where  tblColumn.object_id = object_id('dbo.factSales')
		and    tblColumn.name = 'PostalCodeHash'
	)
begin
     alter table dbo.factSales
	add [PostalCodeHash] as cast(HASHBYTES('MD5', [PostalCode]) AS VARBINARY(16))           persisted
end
go

Create Foreign key on dbo.FactSales



if not exists
	(
		SELECT 
			[NAME],
			OBJECT_NAME(OBJECT_ID) AS NameofConstraint,
			SCHEMA_NAME(schema_id) AS SchemaName,
			OBJECT_NAME(parent_object_id) AS TableName,
			type_desc AS ConstraintType
		FROM sys.objects
		WHERE type_desc = 'FOREIGN_KEY_CONSTRAINT'
		AND   name = 'FK_CountryHash_StateHash_CityHash_PostalCodeHash'
	)
begin

	ALTER TABLE [dbo].[factSales]  
  	   WITH CHECK 
		ADD  CONSTRAINT [FK_CountryHash_StateHash_CityHash_PostalCodeHash]
		    FOREIGN KEY(
				  [CountryHash]
				, [StateHash]
			        , [CityHash]
				, [PostalCodeHash]
				)
			REFERENCES [dbo].[DimLocation]
				 (
				  [CountryHash]
				, [StateHash]
				, [CityHash]
				, [PostalCodeHash]
				 )
end

GO

SQL Server Analysis Services – Setting up Relationship

Refresh Data Source Views

  • Launched BIDS
  • Accessed Data Source Views
  • In the Editor panel, right click in the panel and choose the “Refresh..” drop-down menu

Review Relationship

Select the arrow that joins factSales and DimLocation.  Once selected, right click on your selection

EditRelationship-FactSales-DimLocation (Arrow)

Edit Relationship

Edit the relationship and you will see that the system picked up our Foreign Key relationship.

EditRelationship-FactSales-DimLocation

Rebuild Project

Let us rebuild our project.

And, we immediately run into problems.

When our foreign key contains binary data types, the system says No!

Textual Error:

Dimension Attribute [Dim Doc].[Doc Hash] : The ‘Binary’ data type is not allowed for the ‘KeyColumns’ property #0

Textual Image:

BinaryDataTypeIsNotAllowedForTheKeyColumn

Solution

We have only a few viable options:

  • Review our original Fact table and see if there is any combination of attributes or function of attributes that can be guaranteed less than 900 and still remain unique
  • Add a new column to the Fact table that we will map to the Dimension Table Primary key

Because our Fact table is so big, we have to be careful and consider the cost of implementing changes.   Areas to consider includes:

  • Do we have enough existing data to reliable generate foreign key data
  • Can the change be done in management and trackable chunks

Other Possible Problems and Error Messages

If you choose to join the Dimension and Fact Table based on non-key columns

  • When we chose our candidate key, we received a warning stating “If you select a non-key granularity attribute, the server will not be able to aggregate data properly unless you make sure that all other attributes are directly or indirectly related to it by specifying them as related attributes“.

For all Dimension attributes, check the KeyColumns attribute

Keep in mind that it is not enough to make sure that the Dimension Key’s do not contain binary columns.

You also want to make to review the keyColumns property for all attributes and make sure that they have not listed the binary column as their KeyColumns.

 

KeyColumnsForBinaryData

Summary

We were lucky as we summarily chose an existing column that appears to be unique.  You might not be.

But, sometimes all there is to telling a story is to say enough for better minds to leapfrog you!

One does not have to win or pre-suppose winning every time one writes.

 

Listening

Listening to …

Avicii ft. Aloe Blacc – Wake Me Up
http://www.youtube.com/watch?v=qXO8hJJjXRk

References

Q/A

Modeling Analysis Services Solutions

Dimension Types

Dimension Relationships

Indexing Wide Chars

2 thoughts on “Technical: Microsoft – SQL Server – Analysis Services – Adding new dimension to an existing Cube, but “No go”

    • Eddie:

      Thanks for sharing.

      Just the way I felt, as well.

      The one time I came up with what I thought was a clever solution, I end up in the deep rough..

      Couldn’t help but to share my thoughtlessness.

      Happy Halloween,

      Daniel

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s