SQL Server – Business Intelligence Development (BIDS) – Data Flow Task – Lookup Transformation – Introduction

What is Lookup Transformation?

Lookup Transformation is exactly what it sounds like.

Using columns that you designate from your Source Data Source, it draws a line to the Secondary Data Source and project columns.

Here is Microsoft’s own definition:

Link
“The Lookup transformation performs lookups by joining data in input columns with columns in a reference dataset. You use the lookup to access additional information in a related table that is based on values in common columns”.

Back End Database Platforms Supported

There are some restrictions, such as:

  1. OLE DB Provider
    • The Only 3 providers that are supported are
      • SQL Server
      • Oracle
      • DB2

 

Join Type

  • It is an Equijoin, an inner join so to speak
    • Only Source Records that have corresponding entries in the Secondary Table are returned
    • On each Source Record that have more than one Secondary Record, only one matching record is returned
  • Both singleton ( Single Column ) and Composite ( Multiple Columns ) joins  are supported
  • Comparison is usually case-sensitive, except if a Cache Manager is used to cache the dataset
    • To guide against mismatches due to case and space differences, please use
      • Conversion Functions
        • upper, lower
      • Trimming Functions
        • ltrim, rtrim, trim
    • To facilitate the adjustments to data mentioned above, please use
      • Character Map Transform
      • Derived Column Transform
  • When matching records are not found in the Lookup Table, an error is triggered.
    • This error is considered to be  a Row Level Error
    • Error can be handled using a few choices
      • Failure ( Fail Component )
      • Ignored ( Ignore failure )
      • Captured ( Redirect Row )
  • When records are found, we choose the columns to merge into the Source Table
    • This error is Column Level Error
    • Errors are handleable based on the same choices available to Row Level Error Handling

 

Business Case

OLTP tables are usually built to be very slim.

OLAP tables on the other hand are built to be fat.

 

Data Model

Drawn below are three tables.

DatabaseDiagram

 

The OLTP table is LTEDRKV.callLog, the lookup table is dbo.internationalCallingCode, and the Data Warehouse Table is LTEDRKV.callLogDW.

Our need is very preliminary and it is also track the country being called in the LTEDRKV.callLogDW.

 

Data

Data Definition Language

Function

Function – [dbo].[ufn_getPhoneNumberCountryCode]

 
if object_id('[dbo].[ufn_getPhoneNumberCountryCode]') is null
begin
 
    exec('create function [dbo].[ufn_getPhoneNumberCountryCode]() 
          returns varchar(60)
          as
          begin
            return 1
          end
          ')
 
end
go
 
alter function [dbo].[ufn_getPhoneNumberCountryCode]
(
    @phoneNumber varchar(100)
) 
returns varchar(60)
with schemabinding
as
begin
 
    return
    (
 
        case
                when (@phoneNumber is null) then null
                when charindex('-', @phoneNumber) = 0 then null
                else left(
                              @phoneNumber
                            , charindex('-', @phoneNumber) -1
                         )
        end
 
    )
 
end    
 
go
 
grant execute [dbo].[ufn_getPhoneNumberCountryCode] on [public]
go
 

 

Table

Table – [dbo].[internationalCallingCode]

 

 
set noexec off
go
 
if object_id('[dbo].[internationalCallingCode]') is not null
begin
    set noexec on
end
go
 
create table [dbo].[internationalCallingCode]
(
 
      [country]                varchar(60) not null
    , [countryCallingCode]  varchar(6)  not null
 
    , [dateAdded]   datetime not null
                        default getdate()
 
    , constraint [PK_InternationalCallingCode]
            primary key
            (
                  [country]    
                , [countryCallingCode]
            )
 
)
go
 
set noexec off
go
 

 

 

 

Table – [LTEDRKV].[callLog]

 

 

 
set noexec off
go
 
if schema_id('LTEDRKV') is null
begin 
 
    exec('create schema [LTEDRKV] authorization [dbo]') 
 
end 
go
 
if object_id('[LTEDRKV].[callLog]') is not null
begin
    set noexec on
end
go
 
create table [LTEDRKV].[callLog]
(
 
      [callID] bigint not null
            identity(1,1)
 
    , [phoneNumber] varchar(60) not null
 
    , [dateAdded]   datetime not null
                        default getdate()
 
    , constraint [PK_CallLog]
            primary key
            (
                  [callID]    
            )
 
)
go
 
set noexec off
go
 
if not exists
    (
        select 1
        from   sys.columns tblSC
        where  tblSC.[object_id] = object_id('[LTEDRKV].[callLog]')
        and    tblSC.[name] = 'phoneNumberCountryCode'
    )
begin
 
    alter table [LTEDRKV].[callLog]
        add [phoneNumberCountryCode] 
            as [dbo].[ufn_getPhoneNumberCountryCode]
            (
                [phoneNumber]
            )
            persisted
end
go
 

 

Table – [LTEDRKV].[callLogDW]

 

 

 
use [DBLab]
go
 
/*
 
    drop table [LTEDRKV].[callLogDW]
 
*/
 
set noexec off
go
 
if schema_id('LTEDRKV') is null
begin 
 
    exec('create schema [LTEDRKV] authorization [dbo]') 
 
end 
go
 
if object_id('[LTEDRKV].[callLogDW]') is not null
begin
    set noexec on
end
go
 

 
create table [LTEDRKV].[callLogDW]
(
      [callID]                   bigint not null
            
    , [phoneNumber]               varchar(60) not null
 
    , [phoneNumberCountryCode] varchar(6) not null
    , [country]                   varchar(60) not null
 
    , [dateAdded]               datetime not null
 
    , constraint [PK_CallLogDW]
            primary key
            (
                  [callID]    
            )
 
                        
)
go
 
set noexec off
go

 

 

Business Intelligence Development (BIDS)

Designer

Lookup

OLE DB Source Editor

The OLE DB Source Editor has 3 tabs. And, those are the Connection Manager, Columns, and Error Output.

Here is a brief outline of the purpose of each tab.

  1. Connection Manager
    • Reference or shall we say a pointer to one of the Connection Managers defined for the Project
    • Data Access mode
      • Our choices are
        • Table or view
          • Name of the table or view
        • SQL
      • In our case it is LTEDRKV.callLog
  2. Columns
    • Available External Columns
      • All the Columns defined in the External Columns are listed
      • Optimize performance by unchecking uneeded columns
      • In our case we choose all the columns ( callID,  phoneNumber, dateAdded, phoneNumberCountryCode )
  3. Error Output
    • Excception Handling Block
      • Data read from Source / External into Internal Storage
        • If truncation error occurs, how should it be handled
          • Fail?
OLE DB Source Editor – Connection Manager

OLEDBSourceEditor-ConnectionManager

OLE DB Source Editor – Columns

OLEDBSourceEditor-Columns

 

OLE DB Source Editor – Error Output

OLEDBSourceEditor-ErrorOuput

 

 

Lookup Transformation Editor

In the Lookup Transformation Editor we perform an inner join against our Lookup Table ( dbo.InternationalCallingCode )

This Editor has 3 Tabs, as well.

The Tabs are named – Reference Table,  Columns, and Advanced.

Here is a brief outline of the purpose of each tab.

  1. Reference Table
    • Details on the Referenced Table
      • The Connection Manager that houses it
      • And, the Table\View or SQL Statement
      • In our case it is dbo.InternationalCallingCode
  2. Columns
    • Define the Joining Columns as well as the Columns to Project from the Referenced Table
      • Review Data Model and see if Foreign Key Relationships are defined
      • Again, optimize by unchecking columns from the Referenced Table
      • In our case we are joining on the columns below:
        • LTEDRKV.callLog.phoneNumberCountryCode <> dbo.InternationalCallingCode countryCallingCode
    • Projected Columns
      • dbo.InternationalCallingCode.country
  3. Advanced
    • Enable Memory Restriction

 

LookupTransformationEditor-ReferenceTable

 

Columns

LookupColumns

 

Advanced

LookupTransformationEditor-Advanced

 

 

Lookup Transformation Editor – Is Data Captured in Data WareHouse Table

In the Lookup Transformation Editor we perform an inner join against our Destination Table ( LTEDRKV.callLogDW )

In cases where corresponding records are not found in the Lookup Table, we redirect them.

This Editor has 3 Tabs, as well.

The Tabs are named – Reference Table,  Columns, and Advanced.

Here is a brief outline of the purpose of each tab.

  1. Reference Table
    • Details on the Referenced Table
      • The Connection Manager that houses it
      • And, the Table\View or SQL Statement
      • In our case it is LTEDRKV.callLogDW
  2. Columns
    • Define the Joining Columns as well as the Columns to Project from the Referenced Table
      • Review Data Model and see if Foreign Key Relationships are defined
      • Again, optimize by unchecking columns from the Referenced Table
      • In our case it is LTEDRKV.CallLog.CallID <->LTEDRKV.callLogDW.CallID
    • Projected Columns
      • Nothing Defined
  3. Advanced
    • Enable Memory Restriction
  4. Configure Error Output
    • Input Output :- Lookup Output
    • Column :- N/A
    • Error :-
      • Ignore Failure
      • Redirect row
      • Failure Component
      • In our case, we chose “Redirect Row
    • Truncation :- N/A

 

Reference Table

ReferenceTable

 

 

Columns

ReferenceTable-Columns

 

Advanced

ReferenceTable-Advanced

 

Configure Error Output

ConfigureErrorOutput-20161029-0805AM

 

 

Lookup Error Output

In the preceding we join LTEDRKV.callLog against LTEDRKV.callLogDW.

When matching data is not found, they are placed in an ERROR bucket.

 

 

 

Data Flow Path Editor

This Editor has 3 Tabs, as well.

The Tabs are named – Reference Table,  Columns, and Advanced.

Here is a brief outline of the purpose of each tab.

  1. General
    • SourceName :- In our case it is Lookup Error Output
    • DestinationName :- In our case it is OLE DB Destination Input
  2. Metadata
    • Summary for the columns
      • Column Name
      • Datatype
      • Length
      • Source Component

 

General

DataFlowPathEditor-CallLogDW-General

Metadata

DataFlowPathEditor-CallLogDW-Metadata

 

 

 

OLE DB Destination Editor

The OLE DB Destination Editor has 3 tabs. And, those are the Connection Manager, Columns, and Error Output.

Here is a brief outline of the purpose of each tab.

  1. Connection Manger
    • Reference or shall we say a pointer to one of the Connection Managers defined for the Project
    • Data Access mode
      • Our choices are
        • Table or view ( Fast Load )
        • Name of the table or view
        • SQL
      • In our case it is LTEDRKV.callLogDW
  2. Columns
    • Available External Columns
      • All the Columns defined in the External Columns are listed
      • Optimize performance by unchecking uneeded columns
      • In our case we choose all the columns ( callID,  phoneNumber, dateAdded, phoneNumberCountryCode )
  3. Error Output
    • Exception Handling Block
      • Data read from Source / External into Internal Storage
        • If truncation error occurs, how should it be handled
          • Fail?

 

 

Connection Manager

ConnectionManager

 

Mappings

Map each Source Column unto the Destination Column

Mappings

 

 

Data Flow Path Editor

dataflowtask-calllog

Summary

Our sample SSIS Project is pretty simple.

We retrieve transactional data from callLog table and join that table against our Lookup table.

We attempt a join against our Lookup table.

When data is not find, we insert the missing rows into the callLogDW.

 

Reference

  1. Lookup Transformation
    • Integration Services > Data Flow > Integration Services Transformations
      Link
  2. SSIS Team Blog
    • Matt Masson
      • Lookup Pattern: Case Insensitive
        Link
      • Lookup – Using the cache connection manager
        Link

One thought on “SQL Server – Business Intelligence Development (BIDS) – Data Flow Task – Lookup Transformation – Introduction

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s