Microsoft – SQL Server – Integration Services – Handling Unicode\NonUnicode Data mismatch

Microsoft – SQL Server – Integration Services – Handling Unicode\NonUnicode Data mismatch

As we attempted to deploy code from Dev to Production, we ended up with SSIS packages that could not be moved from QA to Production.

The problem ended up being related to mismatches between QA and Production unicode settings.

First we need to determine whether we have a “core” problem of mismatch in the Unicode Support Settings between the various environments.

To determine an Instance NLS_CHARACTERSET Setting, issue:

select value
from   nls_database_parameters
where  parameter = 'NLS_CHARACTERSET'

In our case, we received ‘conflicting’ settings:

  1. In QA, WE8MSWIN1252 was returned
  2. In Production, AL32UTF8 was returned

There are quite a few different approaches that we can try to insulate against differences in character set settings:

  1. One approach will be to isolate all differences to the back end database.  In essence, define somewhat rigid datatypes for all columns on the back end server and use Vendor’s native database conversion functions to convert individual interpretations to our ‘template’ versions.
  2. In Microsoft SQL Server Business Intelligence Studio (BIDS), attempt to address ‘incompatible’ datasets
  3. ‘Firewall’ back end database character differences by setting rigid Client datasets

Vendor’s Native Database Conversion Function

Because of the differences in unicode\non-unicode, we tried to use a couple of Unicode functions:

  1. Compose – Convert a string in any data type to a Unicode string
  2. Decompose – Converts a unicode string to a string

References:

  1. Felipe Cruz – Oracle String Function (Oracle v 10.1)
    http://www.felipecruz.com/oracle-string-functions-version-10g.php

But, the function that seems to actually help the most is the Oracle Convert function:

PSOUG.org (http://psoug.org/snippet/CONVERT_628.htm)

Syntax:

select convert(<column>, <destination char set>, <source char set>)

Example:

select convert('Ä Ê Í Ó Ø A B C D E ', 'US7ASCII', 'WE8ISO8859P1')

Further readings suggests that we can also consider using (UTL_I18N):

UTL_I18N provides a set of services that provides additional globalization.

To convert BUYER_NAME to character set WE8MSWIN1252 (non-unicode):

utl_i18n.raw_to_char(utl_raw.cast_to_raw(BUYER_NAME),'WE8MSWIN1252' )

To convert BUYER_NAME to character set WE8MSWIN1252 (unicode):

utl_i18n.raw_to_nchar(utl_raw.cast_to_raw(BUYER_NAME),'WE8MSWIN1252' )

References:

  1. Thread: Converting between CP1252 and Unicode in a 8859-1 database
    http://forums.oracle.com/forums/thread.jspa?messageID=1764784
  2. Oracle Database PL/SQL Packages & Type References – DBMS_CRYPTO 
    http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_crypto.htm
  3. Oracle Database PL/SQL Packages & Type References – DBMS_CRYPTO
    http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/u_i18n.htm

Additional References ( SQL Functions):

  1. SQL Functions (O’Reilly)
    http://oreilly.com/catalog/sqlnut/chapter/ch04.html

Microsoft SQL Server – Business Intelligence Studio

Setting up properties (Designer -> SSIS Package):

  1. AlwaysUseDefaultCodePage is true
  2. ValidateExternalMetadata is False

AlwaysUseDefaultCodePage:

Set “AlwaysUseDefaultCodePage” to true if you get an error message stating “Cannot retrieve the column code page info from the OLE DB provider (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/Q_24776643.html)

If speed were important these additional links may be useful:

  1. Moving Large Amounts of Data Between Oracle and SQL Server: Findings and Observation
    http://sqlcat.com/technicalnotes/archive/2008/08/09/moving-large-amounts-of-data-between-oracle-and-sql-server-findings-and-observations.aspx

Setting OS System Parameters (Oracle Specific)

Setting OS System Parameters (Oracle Specific) – NLS_LANG

  1. Setting NLS parameters on multiple db instance
    http://www.mattreinfeldt.com/ubbtattach/2-unicode.txt

References:

  1. What is Unicode – Part 1
    http://www.toadworld.com/BLOGS/tabid/67/EntryId/385/What-Is-Unicode-Part-1.aspx
  2. Remedy\Oracle – Unicode
    http://www.mattreinfeldt.com/ubbtattach/2-unicode.txt
  3. Oracle – String Functions
    http://www.felipecruz.com/oracle-string-functions-version-10g.php
  4. O’Reilly – SQL in a nutshell
    http://oreilly.com/catalog/sqlnut/chapter/ch04.html
  5. Exporting Unicode String to Oracle Database – SSIS Conversion Error
    http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/Q_24776643.html
  6. Cannot convert between Unicode and Non-Unicode String Data Types
    http://mohammedfauzi.blogspot.com/2009/08/cannot-convert-between-unicode-and-non.html
  7. Features of OraOLEDB
    http://download.oracle.com/docs/cd/B10501_01/win.920/a95498/using.htm#1010255
  8. Moving large amounts of Data between Oracle & SQL Server – Findings and Observations
    http://sqlcat.com/technicalnotes/archive/2008/08/09/moving-large-amounts-of-data-between-oracle-and-sql-server-findings-and-observations.aspx
  9. PLSQL – Convert Function
    http://psoug.org/snippet/CONVERT_628.htm
  10. Thread: Converting between CP1252 and Unicode in a 8859-1 database
    http://forums.oracle.com/forums/thread.jspa?messageID=1764784
  11. Network Steve
    http://www.networksteve.com/enterprise/topic.php/cannot_convert_between_unicode_and_non-unicode/?TopicId=19837&Posts=8

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s