Hadoop – Sqoop – Command – Export Data (from HDFS to Microsoft SQL Server)

Introduction

Having fun with Hadoop; specifically exporting data with Sqoop.

Pre-requisites

Hopefully, you have installed\validated that Hadoop\Sqoop is installed and running properly.

If not, please read Technical: Hadoop – Sqoop on Cloudera (CDH) – Is Sqoop Set up and Configured for MS SQL Server (https://danieladeniji.wordpress.com/2013/05/03/technical-hadoop-sqoop-on-cloudera-cdh/ )

Generate Sample Data

Data, data everywhere but none to share without running afoul of regulatory overlords.

So took to the web and googled for Sample data.

Got a few hits and chose to go with generateData.com (http://www.generatedata.com/#generator)

Sample Data – generateData.com

Chose to generate a pretty generic dataset consisting of :

  • CustomerID 
  • CustomerName
  • Address
  • City
  • State
  • ZipCode
  • EmailAddress
  • PhoneNumber

GenerateDataSample

Other Choices:

  • Result Type –> CSV
  • Country Specific Data –> US
  • Number of results –> 5000
  • CSV options –> Delimiter Characters (|)

Generated data and save unto working machine as randomData.csv

Copy Sample Data to Hadoop \ HDFS Node

Copied Data from desktop to Hadoop Node


Syntax:

   scp <file-name> username@<hadoop-hdfs-node>:/folder

Sample:

   scp /downloads/randomdata.csv dadeniji@hadoop-node:/tmp

Copied Data to HDFS File System

Syntax:


 sudo -u hdfs hadoop fs -mkdir hdfs://hadoopHDFSNode:portNumber/folder

 suo -u hdfs hadoop fs -put \
    /tmp/randomdata.csv \
    hdfs://hadoopHDFSNode:portNumber/folder 

Sample:

 --make folder /sampleData/GenerateData.com/customerList
 sudo -u hdfs hadoop fs -mkdir "/sampleData/GenerateData.com/customerList"

 --copy data from local /tmp/randomdata.csv 
 --to hdfs system hadoopHDFS
 -- hdfs/sampleData/GenerateData.com/customerList

 sudo -u hdfs hadoop fs -put \
      /tmp/randomdata.csv \
     "hdfs://hadoopHDFS:8020/sampleData/GenerateData.com/customerList" 

Export Hadoop \ HDFS file to Database Table

Export Hadoop \ HDFS file to database table


Syntax:

   sudo -u hdfs sqoop export \
  --connect  "jdbc:sqlserver://&lt;DBServer&gt;;database=&lt;database&gt;"  \
  --username dbUsername \
  --password dbPassword \
  --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" \
  --export-dir  \  
      hdfs://URL:PortNumber/&lt;file-name&gt; \
  --table "&lt;table-name&gt;" \
  --fields-terminated-by "&lt;field-terminator&gt;"  

Sample:

   sudo -u hdfs sqoop export \
  --connect  "jdbc:sqlserver://DBLAB;database=DEMO"  \
  --username "sammie" \
  --password "p1smith" \
  --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" \
  --export-dir  \  
'hdfs://hdfsNode:8020/sampleData/GenerateData.com/customerList/randomdata.csv' \
  --table "dbo.customerList" \
  --fields-terminated-by "|"  

Error  - Remediation

Errors  - Invalid object name

Text Capture:

INFO manager.SqlManager: Executing SQL statement: 
SELECT t.* FROM dbo.customerList AS t WHERE 1=0

ERROR manager.SqlManager: Error executing statement: 
com.microsoft.sqlserver.jdbc.SQLServerException: 
Invalid object name 'dbo.customerList'.
com.microsoft.sqlserver.jdbc.SQLServerException: 
Invalid object name 'dbo.customerList'.

Screen Dump:

Hadoop - Sqoop - Invalid object Name

Fix:
Please create corresponding database table beforehand.

Errors  - Extra\Blank lines at the bottom of the HFDS Data feed

Text Capture:


java.io.IOException: Can't export data, please check task tracker logs
	at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
	at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
	at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
	at java.util.ArrayList$Itr.next(ArrayList.java:794)
	at dbo_customerList.__loadFromFields(dbo_customerList.java:355)
	at dbo_customerList.parse(dbo_customerList.j

Screen Dump:

Hadoop - Sqoop - Import - Exception - NoSuchElementException

Fix:
"Screen" datafeed and make sure there are no header nor footer lines.

2 thoughts on “Hadoop – Sqoop – Command – Export Data (from HDFS to Microsoft SQL Server)

  1. Hi Daniel ,

    I am facing the “Errors – Extra\Blank lines at the bottom of the HFDS Data feed” issue as you have describe and I have checked my data input and it does not contain any header or footer field and still I am facing the same issue.

    Can you please help me out ?

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s