Technical: Hadoop/Cloudera (v4.2.1) – Installation on CentOS (32 bit [x32]) – HBase

Technical: Hadoop/Cloudera (v4.2.1) – Installation on CentOS (32 bit [x32]) – HBase

Introduction

For Cloudera Distribution of Hadoop, the install is bundled as an RPM. And, thus it is a straight install.

Once installed, there are some post install configuration steps. Let us see how things work out.

Installation – HBASE (Base)

Introduction

Let us review the install binary.  The package’s is eponymously named HBASE.

Review Package Binary

Let us issue “yum info” to review the package.

Syntax:
   
   yum info <package-name>

Actual:
   yum info hbase

yum --info hbase

Explanation:

  • Couple of things stands out – The architecture is i686.
  • The version is 0.94
  • And, it is part of the CDH4

So just about everything is good and matches our current base install.  But, I am thinking I am not quite sure about i686.

Dependency Info

Dependency info



Syntax:

   yum deplist --nogpgcheck  <package-name>

Sample:

   yum deplist --nogpgcheck  hbase

Output:

yum -- hbase -- dependency check

Installing HBASE (Base)

Introduction

Here are the actual RPM install steps.



sudo yum install hbase

Output:

yum -- hbase -- install rpm

I feel like Doug Flutie in ’84; threw up a Hail Mary, thinking it will not install on a 32-bit system; but it did.

Review RPM Installed Files

Introduction

Use “rpm -ql <package>”, to review installed files.

Syntax:

   rpm -ql <package-name>

Sample:
   rpm -ql hbase

Review – shell (.sh)


Syntax:

   rpm -ql <package-name> | grep -i "sh"

Sample:
  rpm -ql hbase | grep -i "sh"

Here are the Shell (Unix Bash shell and ruby) files.

Output:

yum -- hbase -- ql -- shell

Review – Configuration files (.xml)

Configuration data are usually tucked away in XML files.


Syntax:

   rpm -ql <package-name> | grep -i "xml"

Sample:
  rpm -ql hbase | grep -i "xml"

Here are the XML files.

Output:

yum -- hbase -- ql -- xml

 

Review – Java Jar files


Syntax:

   rpm -ql <package-name> | grep -i "jar"

Sample:
  rpm -ql hbase | grep -i "jar"

Here are the Java Jar files.

Output:

 Jar File Purpose
 /usr/lib/hbase/hbase.jar Hbase
 /usr/lib/hbase/lib/avro-1.7.3.jar Data Serialization System
 /usr/lib/hbase/lib/httpclient-4.1.3.jar Http Client
 /usr/lib/hbase/lib/jetty-6.1.26.cloudera.2.jar Jetty is a pure Java-based HTTP server and Java Servlet container
 /usr/lib/hbase/lib/libthrift-0.9.0.jar The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly
 /usr/lib/hbase/lib/protobuf-java-2.4.0a.jar Secure Data Connector protocol reference implementation
 /usr/lib/hbase/lib/slf4j-api-1.6.1.jar The Simple Logging Facade for Java or (SLF4J) serves as a simple facade or abstraction for various logging frameworks
/usr/lib/hbase/lib/snappy-java-1.0.4.1.jar Compression – Snappy
/usr/lib/hbase/lib/zookeeper.jar ZooKeeper

Installation – HBASE (Master)

Introduction

This file installs the HBASE Master service.

Let us review the install binary.  The package’s name is hbase-master.

Review Package Binary

Let us issue “yum info” to review the package.

Syntax:

   yum info <package-name>

Actual:

   yum info hbase-master

Output:

Hadoop - HBase - Master - yuminfo

Explanation:

  • The architecture is also i686.
  • The version is 0.94
  • And, it is part of the CDH4

Installing HBASE (Master)

Introduction

Here are the actual RPM install steps.


Syntax:
   sudo yum install hbase-master

Sample:
   sudo yum install hbase-master

Output:

 Hadoop - HBase - Master - rpm install

Review RPM Installed Files (HBase-Master)

Introduction

Use “rpm -ql <package>”, to review installed files.

Syntax:

   rpm -ql <package-name>

Sample:
   rpm -ql hbase-master

Output:

Hadoop - Hbase - Master - File List

Explanation:

  • The lone file installed is /etc/rc.d/init.d/hbase-master
  • As this file is in the /etc/rc.d/init.d folder it is a service initialization script
  • The file is a Bash shell script and it is easy enough to read and follow

Installing – Zookeeper

Introduction

This file installs the ZooKeeper service.

Let us review the install binary.  The package’s name is zookeeper-server.

Review Package Binary

Let us issue “yum info” to review the package.

Syntax:

   yum info <package-name>

Actual:

   yum info zookeeper-server

Output:

Hadoop - Zookeeper - Master -- yuminfo

Explanation:

  • The architecture is noarch.
  • The version is 3.4.5+16
  • And, it is part of the CDH4

Install Zookeeper

Introduction

Here are the actual RPM install steps.


sudo yum install zookeeper-server

Output:

Hadoop - Zookeeper - Install - Log

Review Zookeeper

Introduction

Use “rpm -ql <package>”, to review installed files.

Syntax:

   rpm -ql <package-name>

Sample:
   rpm -ql zookeeper-server

Output:

Hadoop - Zookeeper - rpm - review

Explanation:

  • The lone file installed is /etc/rc.d/init.d/zookeeper-server
  • As this file is in the /etc/rc.d/init.d folder it is a service initialization script

Installing – HBase – Region Server

Introduction

This file installs HBase Region Server.

Let us review the install binary.  The package’s name is hbase-regionserver.

Review Package Binary

Let us issue “yum info” to review the package.

Syntax:

   yum info <package-name>

Actual:

   yum info hbase-regionserver

Output:

Hbase - RegionServer -- yumInfo

Explanation:

  • The architecture is i686.
  • The version is 0.94
  • And, it is part of CDH4

Installing – HBase – Region Server

Introduction

Here are the actual RPM install steps.


sudo yum install hbase-regionserver

Output:

Hadoop - Hbase - RegionServer - Install - Log (v2)

Review Hbase Region-Server

Introduction

Use “rpm -ql <package>”, to review installed files.

Syntax:

   rpm -ql <package-name>

Sample:
   rpm -ql hbase-regionserver

Output:

Hadoop - Hbase - RegionServer - rpm - review

Explanation:

  • The lone file installed is /etc/rc.d/init.d/hbase-regionserver
  • As this file is in the /etc/rc.d/init.d folder it is a service initialization script

CDH Services (HBase)

Prepare Inventory of CDH Services

Service List

Here is our expected Service List.

Component Service Name
HBase Master hbase-master
HBase ZooKeeper zookeeper-server
HBase Region Server hbase-regionserver

Using Chkconfig list Hadoop HBase & ZooKeeper Services?


Syntax:

   # list all services
   sudo chkconfig --list 

   # list specific services, based on name
   sudo chkconfig --list | grep -i <service-name>

Sample:

   sudo chkconfig --list | egrep -i "hbase|zoo"

Output:

Hadoop - HBase - Services - List

CDH Services And Network Ports (Prior HBase)

Here are the list of ports being used by Hadoop Core Services; prior to starting HBase.

Using netstat:



Syntax:

   # list all services
   sudo netstat  --ltp

Sample:

   sudo netstat -ltp

Output:

netstat -ltp
Using Hadoop Default Ports ( http://blog.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/ ), we are able to map well known network ports to Application:

Port Application
50010 Hadoop – HDFS – Data Node
50020 Hadoop – HDFS – Data Node
50030 Hadoop – Map Reduce – Job Tracker
50060 Hadoop – Map Reduce – Task Tracker
50070 Hadoop – HDFS – Name Node
50075 Hadoop – HDFS – Data Node
50090 Hadoop – HDFS – Secondary Name Node
58160 Hadoop – MapReduce

 

Post Installation – Configuration

Introduction

Let us review the configuration files and determine whether there are some things we need to do..

Review HBASE Configuration files

Here are the HBASE Configuration files:

  • /etc/hbase/conf.dist/hbase-policy.xml
  • /etc/hbase/conf.dist/hbase-site.xml

hbase-policy.xml

HBASE Configuration files – hbase-policy.xml

As the name indicates, the file contains Policy data.  By policy, we mean security policy.

Policy Use
security.client.protocol.acl ACL for clients talking to HBase Region Server
security.admin.protocol.acl ACL for HMaster Interface protocol implementation – clients talking to Hmaster for admin operations
security.masterregion.protocol.acl Region Servers communicating with HMaster Server

hbase-site.xml

Introduction

So the full file name for hbase-site.xml is /etc/hbase/conf.dist/hbase-site.xml

Let us review the current contents

Delivered Configuration

Policy Use
hbase.cluster.distributed true
hbase.rootdir hdfs://myhost:port/hbase

Entry – hbase.cluster.distributed

  • Set hbase.cluster.distributed to true

Entry – hbase.rootdir

  • Be sure to replace myhost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your conf/core-site.xml file); you may also need to change the port number from the default (8020).
  • In CDH4, there are two core-site.xml files — /etc/hadoop/conf.empty/core-site.xml and /etc/hadoop/conf.pseudo.mr1/core-site.xml
  • The /etc/hadoop/conf.empty/core-site.xml is empty
  • On the hand, the /etc/hadoop/conf.pseudo.mr1/core-site.xml has data in it.
    
    
        <name> fs.default.name </name>
        <value> hdfs://localhost:8020 </value>
    

let us go use hdfs://localhost:8020/hbase as hbase.rootdir

Configuration – Post Changes

Here are the post changes.

Item Value
hbase.cluster.distributed true
hbase.rootdir hdfs://localhost:8020/hbase

Configuration – Post Changes

Hadoop - Hbase - hbase-site (20130521 0246PM)

Post Installation – Create HDFS File System

Introduction

Everything with data in Hadoop is back-ended by HDFS.

Let us go create and set File System permissions on our HDFS Name Node.

 

HDFS – HBase Folder – Create Folder

Create HDFS folder /hbase


Syntax:

    sudo -u hdfs hadoop fs -mkdir /hbase

Sample:

   sudo -u hdfs hadoop fs -mkdir /hbase

HDFS – HBase Folder – Permission – Review (Post Folder Creation)

Review /HBASE permissions:


Syntax:

    sudo -u hdfs hadoop fs -ls -d <folder>

Sample:

    sudo -u hdfs hadoop fs -ls -d /hbase

Output:

Hadoop - Hbase - Folder Permissions (initial)Explanation:

  • Issuing “hadoop fs -ls -d” against /hbase returns what we thought it will.  The folder’s owner is hdfs; and the owner is supergroup.
  • The folder’s owner has full permissions (rwx)
  • The owner’s group has read and execute permissions (rx)
  • Others also have read and execute permission (rx)

HDFS – HBase Folder – Permission – Change Ownership

As we created our new folder using the hbase user, we need to go in and change its owner to hbase.

This ownership change allows our HBASE binaries full control on this folder (/hbase); which is no problem as the HBASE folder is fully dedicated to HBASE.


Syntax:

    sudo -u hdfs hadoop fs -chown <owner> <folder>

Sample:

    sudo -u hdfs hadoop fs -chown hbase /hbase

HDFS – HBase Folder – Permission – Review

Review /HBASE permissions:


Syntax:

    sudo -u hdfs hadoop fs -ls <folder>

Sample:

   sudo -u hdfs hadoop fs -ls /hbase

Output:

Hadoop - Hbase - Folder Permissions (change folder owner)

Post Installation – Zookeeper – Init Data Directory

Init ZooKeeper Data Directory


sudo service zookeeper-server init

If Zookeeper Data directory is already initialized, and you try to re-init it, you will get an error message.



Zookeeper data directory already exists at /var/lib/zookeeper ( or use 
--force re-initialization)

Check Services and Applications

Check

Let us quick check which Hadoop processes we have running.


Syntax:
    sudo jps

Sample:

   sudo /usr/java/jdk1.7.0_21/bin/jps

Output:

jps - before hadoop - hbase

 

Initiate Hadoop/HBase Services – ZooKeeper-Server

Start ZooKeeper

Let us start ZooKeeper.


Syntax:
   sudo service <service-name> start

Sample:

   sudo service zookeeper-server start

Output (Text):



JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
ZooKeeper data directory is missing at /var/lib/zookeeper fix the path or run initialize

Output (Screen Dump):

Hadoop - ZooKeeper Data DIrectory is missing

Review ZooKeeper Configuration File

Nice.

Zookeeper configuration file is /etc/zookeeper/conf/zoo.cfg

Here is our current configuration:

Item Value
dataDir /var/lib/zookeeper
clientPort 2181

Review ZooKeeper Data Directory


ls -la /var/lib/zookeeper

Output:

Hadoop - Zookeeper - dataDir (initial)

Init ZooKeeper Data Directory


sudo service zookeeper-server init

Output:


No myid provided, be sure to specify it in /var/lib/zookeeper/myid
if using non-standalone

If Zookeeper Data directory is already initialized, and you try to re-init it, you will get an error message.



Zookeeper data directory already exists at /var/lib/zookeeper ( or use 
--force re-initialization)

Review ZooKeeper Data Directory


ls -la /var/lib/zookeeper

Output:

Hadoop - Zookeeper - dataDir (initial)

(Re) Start ZooKeeper

Re-start ZooKeeper.


Syntax:
   sudo service <service-name> start

Sample:

   sudo service zookeeper-server start

Output:



[dadeniji@rachel ~]$ sudo service zookeeper-server start
[sudo] password for dadeniji:
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Starting zookeeper ... STARTED
[dadeniji@rachel ~]$



ZooKeeper – Log – Review

Review ZooKeeper Log files:

  • /var/log/zookeeper/zookeeper.log
  • /var/log/zookeeper/zookeeper.out

ZooKeeper – Log – Review – zookeeper.log

  • Either no config or no quorum defined in config, running  in standalone mode
  • Reading configuration from: /etc/zookeeper/conf/zoo.cfg
  • Server environment:zookeeper.version=3.4.5-cdh4.2.1–1, built on 04/22/2013 16:45 GMT
  • Server environment:java.version=1.7.0_21
  • Server environment:java.vendor=Oracle Corporation
  • Server environment:java.home=/usr/java/jdk1.7.0_21/jre
  • Server environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib
  • Server environment:user.name=zookeeper
  • Server environment:user.home=/var/run/zookeeper
  • binding to port 0.0.0.0/0.0.0.0:2181

ZooKeeper – Log – Review – zookeeper.out

  • file is empty

ZooKeeper – jps

Using jps, we are able to validate the the zookeeper app is running.  The app name is QuorumPeerMain

Initiate Hadoop/HBase Services – HBase Master

Start HBase Master

Let us start hbase-master.


Syntax:
   sudo service <service-name> start

Sample:

   sudo service hbase-master start

Output (Text):



[dadeniji@rachel noip-2.1.9-1]$ sudo service hbase-master start
[sudo] password for dadeniji:
starting master, logging to /var/log/hbase/hbase-hbase-master-rachel.out
[dadeniji@rachel noip-2.1.9-1]$

Output (Screen Dump):

hadoop -- service -- hbase-master -- start

HBase – Master – Log – Review

Review ZooKeeper Log files:

  • /var/log/hbase/hbase-hbase-master-<hostname>.log
  • /var/log/hbase/hbase-hbase-master-<hostname>.out
  • /var/log/hbase/securityAuth.audit

HBase – Log – Review – hbase-hbase-master-<hostname>.log

  • DEBUG org.apache.hadoop.hbase.util.FSUtils: hdfs://localhost:8020/hbase/.archive doesn’t exist

HBase – Log – Review – hbase-hbase-master-<hostname>.out

  • file is empty

Hbase – Master – jps

Using jps, we are able to validate the the HBase Master app is running.  The app name is HMaster

Initiate Hadoop/HBase Services – HBase Region Server

Start HBase Master

Let us start hbase-master.


Syntax:
   sudo service <service-name> start

Sample:

   sudo service hbase-master start

Output (Text):



[dadeniji@rachel noip-2.1.9-1]$ sudo service hbase-regionserver start
[sudo] password for dadeniji:
starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-rachel.out
[dadeniji@rachel noip-2.1.9-1]$

Output (Screen Dump):

hadoop -- service -- hbase-regionserver - start

HBase – Master – Log – Review

Review ZooKeeper Log files:

  • /var/log/hbase/hbase-hbase-regionserver-<hostname>.log
  • /var/log/hbase/hbase-hbase-regionserver-<hostname>.out

HBase – Log – Review – hbase-hbase-regionserver-<hostname>.log

  • Extensive logging

HBase – Log – Review – hbase-hbase-regionserver-<hostname>.out

  • file is empty

Hadoop/HBase – Shell

Let us play around with HBase and make sure that it is running well.

Start HBase Shell

Start Hbase Shell..


Syntax:
         hbase shell

Sample:

         hbase shell

Output (Text):


[dadeniji@rachel ~]$ hbase shell
13/05/22 13:17:58 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.2-cdh4.2.1, rUnknown, Mon Apr 22 10:56:52 PDT 2013

HBase  – Issue Basic Commands – Status

Status …



Syntax:
         status

Sample:
         status

Output (Text):

    1 servers, 0 dead, 2.000 average load

HBase  – Issue Basic Commands – Version

Status …



Syntax:
         version

Sample:
         version

Output (Text):

    0.94.2-cdh4.2.1, rUnknown, Mon Apr 22 10:56:52 PDT 2013

HBase  – Issue Basic Commands – Whoami

Who is connecting…



Syntax:
         whoami

Sample:
         whoami

Output (Text):

   dadeniji (auth:SIMPLE)

Explanation:

  • whoami — returned our username; and so we are comfortable with the fact that our actual username is being passed along to HBASE.
  • The authenticating mode is SIMPLE

 

HBase  – Issue Basic Commands – Metadata – List Tables

List Tables..



Syntax:
         list

Sample:
         list

Output (Text):

TABLE
0 row(s) in 0.1080 seconds

=> []

HBase  – Issue Basic Commands – DDL – Create Table (Sample : Customer)

Let us create a sample table:

  • Table Name :- customer
  • Compression :- SNAPPY


Syntax:

  create <table-name> , {NAME => 'cf1', COMPRESSION => <compression>}

Syntax:

  create 'customer', {NAME => 'cf1', COMPRESSION => 'SNAPPY'}

Output:

hadoop - hbase - customer -- compression - SNAPPY

HBase  – Issue Basic Commands – DDL – Describe Table (Sample : Customer)

Let us review the table’s definition:

  • Table Name :- customer


Syntax:

  describe <table-name>

Syntax:

  describe 'customer'

Output:

hadoop - hbase - describe -- Sample -- customer

HBase  – Issue Basic Commands – DDL – Disable Table (Sample : Customer)

Let us review the table’s definition:

  • Table Name :- customer


Syntax:

  disable <table-name>

Syntax:

  disable 'customer'

Output:

Hadoop - hbase - Table - Disable (Sample - Customer)

HBase  – Issue Basic Commands – DDL – Is Table Disabled (Sample : Customer)

Review table and make sure that it is indeed disabled.

  • Table Name :- customer


Syntax:

  is_disabled(<table-name>)

Syntax:

  is_disabled('customer')

Output:

Hadoop - hbase - Table - is_disabled (Sample - Customer)

HBase  – Issue Basic Commands – DDL – Drop Table (Sample : Customer)

As we have confirmed that the table is indeed disabled, let us go ahead and drop it.

  • Table Name :- customer


Syntax:

  drop <table-name>

Syntax:

  drop 'customer'

Output:

Hadoop - hbase - Table - Drop (Sample - Customer)

Stop HBase Shell

Stop Hbase Shell..


Syntax:
         exit

Sample:

         exit

 

CDH Services And Network Ports (HBase)

Here is how we are breaking in terms of listening ports

Service List / Ports

Using Chkconfig list Hadoop Services?


Syntax:

   # list all services
   sudo netstat  --ltp

Sample:

   sudo netstat -ltp

Output:

netstat -- listening port (hadoop - hbase - running)

Port Application
50010 Hadoop – HDFS – Data Node
50020 Hadoop – HDFS – Data Node
50030 Hadoop – Map Reduce – Job Tracker
50060 Hadoop – Map Reduce – Task Tracker
50070 Hadoop – HDFS – Name Node
50075 Hadoop – HDFS – Data Node
50090 Hadoop – HDFS – Secondary Name Node
58160 Hadoop – MapReduce
42067 Zookeeper
60000 Hadoop – HBase – Master
60010 Hadoop – HBase – Master
60020 Hadoop – HBase – RegionServer
60030 Hadoop – HBase – RegionServer

Diagnostic

 

Diagnostic – Log File

Log information are journaled in /var/log/hbase.

 

 

 

References

References – HBase

References – Zookeeper

References – Hadoop – Network Ports

References – Hadoop – Setup

References – Hadoop – HBase — hbase-policy.xml

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s