Technical: Hadoop/Cloudera (v4.2.1) – Installation on CentOS (32 bit [x32])
Introduction
Here is quick preparation, processing, and validation steps for installing Cloudera – Hadoop (v4.2.1) on 32-bit CentOS.
Blueprint
I am using Cloudera’s fine documentation “http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html” as a basis.
It is a very good documentation, but I stumble a lot for lack of education and glossing over important details. And, so I chose to write things down.
Environment Constraints
Here is the constraint that I have to work with:
- My lab PC is an old Dell
- It is a 32-bit processor
- And, so I can only install a 32-bit Linux & Cloudera Distro
Concepts
Here are a couple of concepts that we will utilize:
- File System – Linux – Stickiness
Concepts : File System – Linux – Stickiness
Background
Sticky Bit
http://en.wikipedia.org/wiki/Sticky_bit
The most common use of the sticky bit today is on directories. When the sticky bit is set, only the item’s owner, the directory’s owner, or the superuser can rename or delete files. Without the sticky bit set, any user with write and execute permissions for the directory can rename or delete contained files, regardless of owner. Typically this is set on the /tmp directory to prevent ordinary users from deleting or moving other users’ files.
In Unix symbolic file system permission notation, the sticky bit is represented by the letter t in the final character-place.
Set Stickiness
http://en.wikipedia.org/wiki/Sticky_bit
The sticky bit can be set using the chmod command and can be set using its octal mode 1000 or by its symbol t (s is already used by the setuid bit). For example, to add the bit on the directory/tmp, one would type chmod +t /tmp. Or, to make sure that directory has standard tmp permissions, one could also type chmod 1777 /tmp.
To clear it, use chmod -t /tmp or chmod 0777 /tmp (using numeric mode will also change directory tmp to standard permissions).
Is Stickiness set?
http://en.wikipedia.org/wiki/Sticky_bit
In Unix symbolic file system permissions notation, the sticky bit is represented by the letter t in the final character-place. For instance, in our Linux Environment , the /tmp directory, which by default has the sticky-bit set, shows up as:
$ ls -ld /tmp
drwxrwxrwt 4 root sys 485 Nov 10 06:01 /tmp
Prerequisites – Operating System
Introduction
Listed below are Cloudera’s stated minimal requirements (in the areas of Operating System, Database, and JDK).
For the bare minimum install we are targeting, we do not need a database. And, only kept it in for completeness. And, even when needed, the database itself can be on another server outside of the Cloudera node or Cluster.
Operating System
http://www.cloudera.com/content/support/en/documentation/cdh4-documentation/cdh4-documentation-v4-latest.html
- Redhat – Redhat Enterprise Linux (v5.7 –> 64-bit, v6.2 –> 32 and 64 bit)
- Redhat – CentOS (v5.7 –> 64-bit, v6.2 –> 32 and 64 bit)
- Oracle Linux (v5.6 –> 64-bit)
- SUSE Linux Enterprise Server (SLES) (v11 with SP1 –> 64 bit)
- Ubuntu / Debian (Ubuntu – Lucid 10.04 [LTS] –> 64 bit)
- Ubuntu / Debian (Ubuntu - Precise 12.04 [LTS] –> 64 bit)
- Ubuntu / Debian (Debian – Squeeze 6.03 –> 64 bit)
What does all this mean:
- The only 32-bit OS supported is Redhat’s. If RedHat Enterprise Linux or RedHat CentOS, then the minimum OS version# is v5.7 and v6.2
Databases
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_2.html
- Oozie (MySQL v5.5, PostgreSQL v8.4, Oracle 11gR2)
- Hue (MySQL v5.5, PostgreSQL v8.4)
- Hive (MySQL v5.5, PostgreSQL v8.3)
Java /JDK
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
- Jdk 1.6 –> 1.6.0_31
- jdk 1.7 –> 1.7.0_15
Prerequisites – Networking – Name ID
Introduction
Ensure that your Network Names are unique and they are what you want them to be.
Validate Hostname
Use hostname.
Syntax:
hostname
Sample:
hostname
Output:

Set Hostname
If you the hostname is not what you thought it will be, please set it using resources available on the Net:
Prerequisites – Networking – Domain Name & FQDN
Introduction
Get Domain Name and FQDN (Fully qualified domain name)
Get Domain Name (using hostname)
Syntax:
hostname --domain
Sample:
hostname --domain
Get Domain Name (using resolv.conf)
Syntax:
cat /etc/resolv.conf
Sample:
cat /etc/resolv.conf
Output:

Explanation:
- In the file /etc/resolv.conf, your domain name is the entry prefixed by domain
Get Hostname (FQDN)
Get FQDN (Fully qualified hostname).
Syntax:
hostname --fqdn
Sample:
hostname --fqdn
Output:

Interpretation:
- ping DNS Server and discovered it is offline — Windows machine and yesterday was patched today. And, unfortunately this particular machine needs for a key to be pressed to fully come back online…Never figured out what is up with the BIOS
Set Domain Name
Good Resources on the Net:
Prerequisites – Networking – Name Resolution
Introduction
As Hadoop is fundamentally a testament to Network Clustering and Collaborative Engineering, your working hosts have to have TCP/IP verifiably working.
Validate Hostname
Syntax:
ping <hostname>
Sample:
ping rachel

Since we got an error message stating that “unknown host <hostname>”, we need to go to our DNS Server and make sure that we have “A” entries for them ….
Our DNS is a Windows DNS Server, and it was relatively easy to create an “A” record for it:

Went back and checked to ensure that our DNS Resolution is good:

Prerequisites – wget
Introduction – wget
To download files over HTTP, but without browser, and just through the command shell, we chose to use wget.
Install – wget
sudo yum -y install wget
Prerequisites – lsof
Introduction – lsof
lsof is SysInternals’s process monitor for Linux. It lets us track files and network ports being used by a process.
Install – lsof
sudo yum -y install lsof
Prerequisites – Java
Here are the steps for validating that we have the right Java JDK installed.
Java – Minimal Requirements
We need Java and we need one of the latest versions (JDK 1.6 or JDK 1.7).
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
- For JDK 1.6, CDH4 is certified with 1.6.0_31
- For JDK 1.7, CDH4.2 and later is certified with 1.7.0_15
Is Java installed?
Is Java installed on our box, and if so what version?
java -version
Output:

Get URL for Java (+JDK +JRE)
To get to the Java download, please visit:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Please note that you do not want just the JRE, but JDK (which has the JRE bundled with it).
Thus click on JDK.
As of today (2013-05-12), the latest available JDK is 7U21.

Further down on that same download page, we will notice that there is a separate download file for each OS and bitness.

As we have a 32-bit Linux that is able to use rpm, we want to capture the URL for jdk-7u21-linux-i586.rpm.
That URL ends up being
http://download.oracle.com/otn-pub/java/jdk/7u21-b11/jdk-7u21-linux-i586.rpm
Download Oracle/Sun Java JDK
Goggled for help and found:
How to automate and download and Installation of Java JDK on Linux:
http://stackoverflow.com/questions/10268583/how-to-automate-download-and-instalation-of-java-jdk-on-linux
wget --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2Ftechnetwork%2Fjava%2Fjavase%2Fdownloads%2Fjdk6-downloads-1637591.html;" "http://download.oracle.com/otn-pub/java/jdk/7u21-b11/jdk-7u21-linux-i586.rpm"
Error:
Resolving download.oracle.com... 96.17.108.106, 96.17.108.163
Connecting to download.oracle.com|96.17.108.106|:443... connected.
ERROR: certificate common name âa248.e.akamai.netdownload.oracle.com
To connect to download.oracle.com insecurely, use --no-check-certificate.
And, other series of problems, until I really took care to make the following changes:
- Changed the URL from http: to https:
- Added, the option “–no-check-certificate”
And then ended up with a working syntax:
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com" "https://download.oracle.com/otn-pub/java/jdk/7u21-b11/jdk-7u21-linux-i586.rpm" -O "jdk-7u21-linux-i586.rpm"
Installed Oracle/Sun Java JDK
Syntax:
yum install <jdk-rpm>
Sample:
yum install jdk-7u21-linux-i586.rpm
Output:

Install of Java JDK was successful!
Validated Install of Oracle/Sun Java JDK
Syntax:
java -version
Sample:
java -version
Output:

Install Cloudera Bin Installer (./cloudera-manager-installer.bin)
Disclaimer
Please do not go down this road on a 32-bit system.
It will not work as cloudera-manager-installer.bin is a 64-bit software and will not run on a 32-bit.
This section is merely preserved for completeness; and as a place-holder.
Resource
As we are targeting v4.0x, we should direct our glance @ http://archive.cloudera.com/cm4/installer/
As of 2013-04-11, here is the folder view of what Cloudera has available:

We want the latest folder:

Download
Download using wget
The URL Link is http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin.
Here is the download specification:
- Download URL: http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin
- Output File: tmp/cloudera-manager-installer.bin
wget http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin -O /tmp/cloudera-manager-installer.bin
Validate FileInfo
Validate FileInfo (file -i <file>)
file -i

cloudera-manager-installer.bin: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped
Explanation:
Prepare downloaded file
use chmod to make file executable.
Syntax:
chmod u+x <file>
Sample:
chmod u+x cloudera-manager-installer.bin
Run Installer
Run installer
sudo sudo ./cloudera-manager-installer.bin
Unfortunately, got an error message:
./cloudera-manager-installer.bin: ./cloudera-manager-installer.bin: cannot execute binary file
Verified that we can not install cloudera-manger-installer.bin on a 32-bit; the OS has to be a 64-bit OS.
Manage Yum Repository – Cloudera
Background?
Once you find yourself using packages from a specific Vendor and correspondingly its repository quite a bit, I will advice you to please add that Vendor to your repository configurations.
Basically, you want to be able to do the following:
- Aware your machine that it can safely access said repository for packages you request
- Confirm that you trust the vendor’s GPG key
Is Cloudera GPG key installed?
Repository keys are saved in the /etc/yum.repos.d/ folder.
Check folder
ls /etc/yum.repos.d/
Output:

Trust Vendor – Cloudera
Trust Vendor (Cloudera) by trusting its GPG Key.
Syntax:
sudo rpm --import <key>
Sample:
sudo rpm --import \
http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
No feedback that the above succeeded. But, I think we are good.
Review Vendor Repo File
Check folder
ls /etc/yum.repos.d/
Output:
![Folder List -- etc:yum.repos.d: [v2]](http://danieladeniji.files.wordpress.com/2013/05/folder-list-etcyum-repos-d-v2.png?w=640&h=96)
Obviously, we now have the cloudera-cdh4.repo file in the /etc/yum.repos.d folder
Do we now have Vendor Repo File?
Check folder
ls /etc/yum.repos.d/
Output:
![Folder List -- etc:yum.repos.d: [v2]](http://danieladeniji.files.wordpress.com/2013/05/folder-list-etcyum-repos-d-v2.png?w=640&h=96)
Obviously, we now have the cloudera-cdh4.repo file in the /etc/yum.repos.d folder
Review Contents of Vendor Repo File
Review Repo File Contents
cat /etc/yum.repos.d/cloudera-cdh4.repo
Output:

Explanation:
Decision Time
There are a few critical decisions you have to make:
- What is your topology – A single system or a distributed system?
- MapReduce or Yarn
Topology – Pseudo Distributed / Cluster
If you will be using a single node, then Cloudera terms this a Pseudo Distributed. On the other hand, if you will be using a multiple nodes, Cloudera terms this Cluster.
MapReduce (MRv1) or Yarn (MRv2)
What is the difference between MapReduce and Yarn?
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_11_4.html
MapReduce has undergone a complete overhaul and CDH4 now includes MapReduce 2.0 (MRv2). The fundamental idea of MRv2′s YARN architecture is to split up the two primary responsibilities of the JobTracker — resource management and job scheduling/monitoring — into separate daemons: a global ResourceManager (RM) and per-application ApplicationMasters (AM). With MRv2, the ResourceManager (RM) and per-node NodeManagers (NM), form the data-computation framework. The ResourceManager service effectively replaces the functions of the JobTracker, and NodeManagers run on slave nodes instead of TaskTracker daemons. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks. For details of the new architecture, see Apache Hadoop NextGen MapReduce (YARN).
Can we install both MapReduce (v1) and Yarn (Map Reduce [v2]?
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_1.html
For installations in pseudo-distributed mode, there are separate conf-pseudo packages for an installation that includes MRv1 (hadoop-0.20-conf-pseudo) or an installation that includes YARN (hadoop-conf-pseudo). Only one conf-pseudo package can be installed at a time: if you want to change from one to the other, you must uninstall the one currently installed.
Which of them shall I use?
Cloudera does not consider the current upstream MRv2 release stable yet, and it could potentially change in non-backwards-compatible ways. Cloudera recommends that you use MRv1 unless you have particular reasons for using MRv2, which should not be considered production-ready.
What is our decision?
- We will go with Map Reduce [MRv1]
Installation File Matrix
To keep ourselves honest, let us prepare a quick checklist of RPMs.
Installation File Matrix
Installation File Matrix
If we will go with a Pseudo Install, then please look for the RPMs that have Pseudo in their name.
| Mode |
Component |
RPM |
| Pseudo Distributed |
Map Reduce v1 |
hadoop-0.20-conf-pseudo |
| Pseudo Distributed |
Map Reduce v2 (Yarn) |
hadoop-conf-pseudo |
|
On the other hand, if you will like a Cluster Install, then please follow the instructions documented in http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_4_4.html
Cluster installs have to be performed one component at a time and on each host, and is beyond the scope of our current posting.
Install Pseudo Install
Background
We have chosen the following install path:
Let us go get and install hadoop-0.20-conf-pseudo
Review RPM Package (hadoop-0.20-conf-pseudo)
Before we install this package, let us quickly review and make sure that it is what we want:
- Get general info
- Get a quick dependency list
General Info
Before we even have the file, let us check the package’s info while the package is at rest (on the Vendor’s web site):
Syntax:
yum info --nogpgcheck <package-name>
Sample:
yum info --nogpgcheck hadoop-0.20-conf-pseudo
Output:

Explanation:
- Name :- hadoop-0.20-conf-pseudo
- Architecture :- i386
- Repo :- cloudera-cdh4
- Summary :- Hadoop installation in pseudo-distribution mode with MRv1
Repoquery Info
You can use “Repoquery –list” to check on your package, prior to downloading it.
Beforehand, make that that you have installed the YumUtils package (“sudo yum install yum-utils”).
Run repoquery:
Syntax:
repoquery --list <package-name>
Sample:
repoquery --list hadoop-0.20-conf-pseudo
Dependency Info
Dependency info
Syntax:
yum deplist --nogpgcheck <package-name>
Sample:
yum deplist --nogpgcheck hadoop-0.20-conf-pseudo
Output:

Explanation:
Here are the dependencies:
- hadoop-0.20-mapreduce-tasktracker
- hadoop-hdfs-datanode
- hadoop-hdfs-namenode
- hadoop-0.20-mapreduce-jobtracker
- hadoop-hdfs-secondarynamenode
- /bin/sh (bash)
- hadoop (hadoop base)
Install Rpm (hadoop-0.20-conf-pseudo)
Install rpm
Syntax:
sudo yum install <package-name>
Sample:
sudo yum install hadoop-0.20-conf-pseudo
Output:

We respond in the affirmative….
And, the installation completed:

Post Installation Review – File System
Background
In Linux, it is commonly said “Everything is a File System”.
And, so let us begin by reviewing the File System (FS).
Review our package files (rpm -ql)
Show files installed by our RPM:
Syntax:
rpm -ql <package-name>
Sample:
rpm -ql hadoop-0.20-conf-pseudo
Output:

Explanation:
- We have the pseudo MapReduce v1 configuration files (*.xml)
- We have the base components folders (/var/lib/hadoop, /var/lib/hdfs)
Review our configuration files
Where are those configuration files ?
Glad you asked?
Syntax:
ls -la <configuration>
Sample:
ls -la /etc/hadoop/conf.pseudo.mr1
Output:

Support for various versions
Background
To maximize flexibility, CDH supports different installed versions. But, keep in mind only one version can be running at the same time.
The Alternatives framework underpins this support.
Alternatives
Introduction
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html
The Cloudera packages use the alternatives framework for managing which Hadoop configuration is active. All Hadoop components search for the Hadoop configuration in /etc/hadoop/conf.
Review
Is alternatives actually in effect?
There are couple of things we can check:
- alternatives –display <name>
- update-alternatives –display <name>
alternatives –display
Syntax:
sudo alternatives --display <file-name>
Sample:
sudo alternatives --display hadoop
update-alternatives –display
Syntax:
sudo update-alternatives --display <file-name>
Sample:
sudo update-alternatives --display hadoop
Conclusion
Alternatives does not appear to be in play…
Component Level – User & Group – Review
Background
Let us do a quick review of our users and groups.
Hadoop – Users
Get User file (/etc/password) for well known Hadoop user accounts:
Quick, Quick, what are the well known user accounts:
Syntax:
cat /etc/passwd | cut -d: -f1 | egrep "xxx|yyy|zzz"
Sample:
cat /etc/passwd | cut -d: -f1 | egrep "hdfs|mapred|zookeeper"
Output:

What does our little code do:
- The file is /etc/passwd
- Use cut passing in delimiter (:), and get first word in /etc/password
- Match on any of the supplied users
Hadoop – Groups
Browse Group file (/etc/grown) for well known Hadoop groups and user accounts:
What are the well known Groups and what is their membership:
- hadoop
- hdfs
- mapred
- zookeeper
Syntax:
cat /etc/group | egrep -i "xxx|yyy|zzz"
Sample:
cat /etc/group | egrep -i "hadoop|hdfs|mapred|zookeeper"
What does our little code do:
- The file is /etc/group
- Match on any of the supplied groups

Explanation:
- Obviously, we have a group named in hadoop and its members are hdfs and mapred
Component Level – Review & Configuration – HDFS – NameNode – File System Format
Background
Here are a few things you should do to initialize HDFS Name Node.
HDFS – NameNode – Format
On the Hadoop HDFS Name Node, let us go ahead and format the NameNode:
sudo -u hdfs hadoop namenode -format
Explanation
- The HDFS Named Services runs under the hdfs account, and so to gain access to it, let us sudo to that user name
Output (Screen Shot):

Explanation:
- We are able to format our namenode
- Our default replication is 1
- The File System is owned by the hdfs user
- And, the File System ownership group is supergroup
- Permission is enabled
- High Availability (HA) is not enabled
- We are in Append Mode
- Our storage directory is /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
Reformat?
If the NameNode File System is already formatted, and you issue an HDFS format request, you will be asked to confirm that you want to re-format?
Screen shot:

Text Output:
Re-format filesystem in Storage Directory /var/lib/hadoop-hdfs/cache/hdfs/dfs/name ? (Y or N)
Component Level – Review & Configuration – HDFS – Name Node – Temp Folder
Background
Like any other File System, HDFS needs a temp folder
HDFS – Create and Grant Permissions to the Temp Folder (/tmp)
Let us create and grant the HDFS:/temp folder
Here are the particulars:
- The HDFS folder name :- /tmp
- The HDFS Permission :- 1777
Syntax:
sudo -u hdfs hadoop fs -mkdir /tmp
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Sample:
sudo -u hdfs hadoop fs -mkdir /tmp
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Explanation:
- sudo as hdfs, issue fileSystem (fs) make-directory (mkdir) /tmp
- Change permission to allow all to write, read, and execute
HDFS – Validate Folder (/tmp) Creation and Permission Set
Let us review HDFS:/tmp existence and permission set:
Introduction:
To gain access to HDFS, we do the following:
- sudo as hdfs
- We invoke hadoop fs
- Our payload is -ls — List
- Arguments : -d — Target directory and not files
- And, we targeting the /tmp folder
Syntax:
sudo -u hdfs hadoop fs -ls -d /tmp
Sample:
sudo -u hdfs hadoop fs -ls -d /tmp
Output:

Explanation:
- HDFS :/tmp folder exists
- Owner (hdfs) can read/write/execute
- Group (supergroup) can read/write/execute
- Everyone can read/write/execute and the sticky bit is set (t last character in the file permissions column)
Component Level – Review & Configuration – MapReduce System Directories
Background
There are quite a few HDFS folders that MapReduce needs.
HDFS – MapReduce Folders
Let us create and grant the HDFS:{MapReduce} folders:
- Create new HDFS Folder {/var/lib/hadoop-hdfs/cache/mapred/mapred/staging}
- Set permissions of /var/lib/hadoop-hdfs/cache/mapred/mapred/staging to 1777 – World writable and sticky-bit
- Change the owner of /var/lib/hadoop-hdfs/cache/mapred/mapred and sub-directories to user mapred
Syntax:
sudo -u hdfs hadoop fs -mkdir -p \
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 1777 \
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred \
/var/lib/hadoop-hdfs/cache/mapred
Sample:
sudo -u hdfs hadoop fs -mkdir -p \
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 1777 \
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred \
/var/lib/hadoop-hdfs/cache/mapred
HDFS – Validate Folder {MapReduce} Creation and Permission Set
Let us review HDFS:/var/lib/hadoop-hdfs/cache/mapred existence and permission set:
Syntax:
sudo -u hdfs hadoop fs -ls -R /var/lib/hadoop-hdfs/cache/mapred
Sample:
sudo -u hdfs hadoop fs -ls -R /var/lib/hadoop-hdfs/cache/mapred
Output:

Explanation:
HDFS :/var/lib/hadoop-hdfs/cache/mapred/mapred folder
- Owner (mapred) can read/write/execute
- Group (supergroup) can read/execute (but not write)
- Everyone can read/execute (but not write)
HDFS :/var/lib/hadoop-hdfs/cache/mapred/mapred/staging folder
- Owner (mapred) can read/write/execute
- Group (supergroup) can read/write/execute
- Everyone can read/write/execute and the sticky bit is set (t last character in the file permissions column)
CDH Services
Prepare Inventory of CDH Services
Service List
Here is our expected Service List.
| Component |
Service Name |
| HDFS – Name Node (Primary) |
hadoop-hdfs-namenode |
| HDFS – Name Name (Secondary) |
hadoop-hdfs-secondarynamenode |
| HDFS – Data Node |
hadoop-hdfs-datanode |
| Hadoop-MapReduce – Job Tracker |
hadoop-0.20-mapreduce-jobtracker |
| Hadoop-MapReduce - Task Tracker |
hadoop-0.20-mapreduce-tasktracker |
|
|
|
|
|
Using Chkconfig list Hadoop Services?
Syntax:
# list all services
sudo chkconfig --list
# list specific services, based on name
sudo chkconfig --list | grep -i <service-name>
Sample:
sudo chkconfig --list | grep -i "^hadoop"
Screen Shot:

Explanation:
The services are auto-started starting from run-level 3.
Using /etc/init.d
Syntax:
for service in /etc/init.d/<service-name>; do echo $service; done
Sample:
for service in /etc/init.d/hadoop*; do echo $service; done
Screen Shot:

Starting CHD Services
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_27_1.html
| Component |
Command |
Log |
| hadoop-hdfs-namenode |
sudo/sbin/service hadoop-hdfs-namenode start |
/var/log/hadoop-hdfs/hadoop-hdfs-namenode-<hostname>.log/var/log/hadoop-hdfs/hadoop-hdfs-namenode-<hostname>.out |
| hadoop-hdfs-secondarynamenode |
sudo /sbin/service hadoop-hdfs-namenode start |
/var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-<hostname>.out |
| hadoop-hdfs-datanode |
sudo /sbin/service hadoop-hdfs-datanode start |
/var/log/hadoop-hdfs/hadoop-hdfs-datanode-<hostname>.out |
| hadoop-0.20-mapreduce-jobtracker |
sudo /sbin/service hadoop-0.20-mapreduce-jobtracker start |
/var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-<hostname>.out |
| hadoop-0.20-mapreduce-tasktracker |
sudo /sbin/service hadoop-0.20-mapreduce-tasktracker start |
/var/log/hadoop-0.20-mapreduce/hadoop-hadoop-tasktracker-<hostname>.out |
|
Start Services – Using /etc/init.d
Look for items in the /etc/init.d/ folders that have hadoop in their names and start them.
Syntax:
for service in /etc/init.d/<service-name>; do sudo $service start; done
Sample:
for service in /etc/init.d/hadoop-*; do sudo $service start; done
Errors:
Here are some errors we received, because I chose not to follow instructions or jumped over some steps.
One thing I have to learn about Linux or Enterprise Systems in general is that “breverity in Instructions is sacrosanct” and you should make sure that you follow everything; or Google for help and hopefully someone else made the same mistakes and gave specific errors and resolution.
Errors – HDFS-NameNode
Here are HDFS Name Node errors.
The log file is
- Syntax –> /var/log/hadoop-hdfs/hadoop-hdfs-namenode-<hostname>.log
- Sample –> /var/log/hadoop-hdfs/hadoop-hdfs-namenode-rachel.log
Error due to name resolution error
Specific Errors:
- ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name.
- java.net.UnknownHostException: <hostname>: <hostname>
- at java.net.InetAddress.getLocalHost(InetAddress.java:1466)
Screen Dump:
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name. Using 'localhost'...
java.net.UnknownHostException: rachel: rachel
at java.net.InetAddress.getLocalHost(InetAddress.java:1466)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.getHostname(MetricsSystemImpl.java:496)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSystem(MetricsSystemImpl.java:435)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:431)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:180)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:156)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
Caused by: java.net.UnknownHostException: rachel
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286)
at java.net.InetAddress.getLocalHost(InetAddress.java:1462)
... 9 more
Explanation
- Add hostname to your DNS Server
Error due to HDFS being in an un-consistent state
FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in
namenode join
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory /var/lib/hadoop-hdfs/cache/hdfs/dfs/name is in an inconsistent state:
storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:296)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:609)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:590)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
Specific Errors:
- org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode joinorg.apache.hadoop.hdfs.server.common.InconsistentFSStateException
- Directory /var/lib/hadoop-hdfs/cache/hdfs/dfs/name is in an inconsistent state:storage directory does not exist or is not accessible.
Screen Dump:
Explanation
Errors – HDFS-DataNode
Here are HDFS Data Node errors.
The log file is
- Syntax –> /var/log/hadoop-hdfs/hadoop-hdfs-datanode-<hostname>.log
- Sample –> /var/log/hadoop-hdfs/hadoop-hdfs-datanode-rachel.log
Error due to host name resolution error
Screen Shot:
[dadeniji@rachel conf]$ cat /var/log/hadoop-hdfs/hadoop-hdfs-datanode-rachel.log
2013-05-13 15:24:32,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = java.net.UnknownHostException : rachel: rachel
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.0.0-cdh4.2.1
STARTUP_MSG: build = file:///data/1/jenkins/workspace/generic-package-centos32-6/topdir/BUILD/hadoop-2.0.0-cdh4.2.1/src/hadoop-common-project/hadoop-common -r 144bd548d481c2774fab2bec2ac2645d190f705b; compiled by
'jenkins' on Mon Apr 22 10:26:05 PDT 2013
STARTUP_MSG: java = 1.7.0_21
************************************************************/
2013-05-13 15:24:32,895 WARN org.apache.hadoop.hdfs.server.common.Util:
Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/data should be specified as a URI in configuration files. Please update hdfs configuration.
2013-05-13 15:24:33,962 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.net.UnknownHostException: rachel: rachel
at java.net.InetAddress.getLocalHost(InetAddress.java:1466)
at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:223)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:243)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1694)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1719)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1872)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1893)
Caused by: java.net.UnknownHostException: rachel
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286)
at java.net.InetAddress.getLocalHost(InetAddress.java:1462)
... 6 more
2013-05-13 15:24:33,987 INFO org.apache.hadoop.util.ExitUtil: Exiting withstatus 1
2013-05-13 15:24:34,006 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at java.net.UnknownHostException:
rachel: rachel
************************************************************/
Explanation
- Need to quickly go in and make sure we are able to resolve our host name; for this specific host; the host name is rachel
Error due to required service not running
WARN org.apache.hadoop.hdfs.server.common.Util: Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/data should be specified as a URI in configuration files.
Please update hdfs configuration.
WARN org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration:
tried hadoop-metrics2-datanode.properties,hadoop-metrics2.properties
INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics
system started
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is rachel
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming
server at /0.0.0.0:50010
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing
bandwith is 1048576 bytes/s
INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
via org.mortbay.log.Slf4jLog
INFO org.apache.hadoop.http.HttpServer: Added global filter 'safety'
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context logs
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server
at 0.0.0.0:50075
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
INFO org.mortbay.log: jetty-6.1.26.cloudera.2
INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075
INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port
50020
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request
received for nameservices: null
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
BPOfferServices for nameservices: <default>
WARN org.apache.hadoop.hdfs.server.common.Util: Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/data should be specified as a URI in configuration files.
Please update hdfs configuration.
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
<registering> (storage id unknown) service to localhost/127.0.0.1:8020
starting to offer service
INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Explanation
- It looks like we are breaking when trying to communicate with host localhost, port 8020
- So what is supposed to be listening on port 8020
- Quick Google for “Hadoop” and port 8020 landed us @ http://blog.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/ and the listening service is Hadoop NameNode
- So let us go make sure that Hadoop\Name Node is running and listening on Port 8020
Errors – MapReduce – Job Tracker
The log file is
- Syntax –> /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-<hostname>.log
- Sample –> /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-rachel.log
Error due to MapReduce / File System Permission Error
Specific Errors:
- INFO org.apache.hadoop.mapred.JobTracker: Creating the system directory
- WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://localhost:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/system) because of permissions.
- WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned by the user ‘mapred (auth:SIMPLE)’
- WARN org.apache.hadoop.mapred.JobTracker: Bailing out …
- org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode=”/”:hdfs:supergroup:drwxr-xr-x
- Caused by: org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.security.AccessControlException): Permission denied: user=mapred, access=WRITE, inode=”/”:hdfs:supergroup:drwxr-xr-x
- FATAL org.apache.hadoop.mapred.JobTracker: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode=”/”:hdfs:supergroup:drwxr-xr-x
Screen Dump:
INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=JobTracker, sessionId=
INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 8021
INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
INFO org.apache.hadoop.mapred.JobTracker: Creating the system directory
WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://localhost:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/system) because of
permissions
WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned
by the user 'mapred (auth:SIMPLE)'
WARN org.apache.hadoop.mapred.JobTracker: Bailing out ...
org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4684)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4655)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2996)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:2960)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2938)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:648)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:417)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44096)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
Explanation
- Having HDFS File System permission problems
- Are the folders not created or are they created and we are only having problems with the way they are privileged?
- I remembered that there was extended coverage of HDFS Map Reduce folder permissions in the Cloudera Docs. Let us go review and apply those permissions
Configuring init to start core Hadoop Services
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_27_2.html
Stopping Hadoop Services
Post Installation Review
Services – Review
Commands – service –status-all
sudo service --status-all | egrep -i "jobtracker|tasktracker|Hadoop"
Output:

Commands – tcp/ip service (listening)
sudo lsof -Pnl +M -i4 -i6 | grep LISTEN
tried running lsof, but got error message:
ps -aux 2> /dev/null | grep "java"
Screen Dump (lsof: command not found):

Once installed lsof (via instructions previously given)
Output:

Explanation:
Explanation – Java
- We have quite a few listening Java proceses
- The java processes are listening on TCP/IP ports between 50010 and 50090; specifically 50010, 50020, 50030, 50060, 50070, 50075
- And, also ports 8010 and 8020
Explanation - Auxiliary Services
- sshd (port 22)
- cupsd (port 631)
Commands – ps (running java applications)
ps -eo pri,pid,user,args | grep -i "java" | grep -v "grep" | awk '{printf "%-10s %-10s %-10s %-120s \n ", $1, $2, $3, $4}'
Output:

Interpretation:
- With java app one will see -Dproc_secondarynamenode, -Dproc_namenode, and -Dproc_jobtracker –> This indicator obviously maps to specific Hadoop Services
Operational Errors
Operational Errors – HDFS – Name Node
Operational Errors – HDFS – Name Node – Security – Permission Denied
mkdir: Permission denied: user=dadeniji, access=WRITE, inode="/user/dadeniji":hdfs:supergroup:drwxr-xr-x
Validate:
Check the permissions for HDFS under /user folder:
sudo -u hdfs hadoop fs -ls /user
We received:

Explanation:
- For my folder, /user/dadeniji, my folder is still owned by hdfs.
Let us go change it:
sudo -u hdfs hadoop fs -chown $USER /user/$USER
Validate Fix:
hadoop fs -ls /user/$USER
Output:
![hdfs -- Hadoop -- fs -ls (fixed) [v2]](http://danieladeniji.files.wordpress.com/2013/05/hdfs-hadoop-fs-ls-fixed-v21.png?w=640&h=117)
Operational Errors – HDFS – DataNode
13/05/16 15:59:08 ERROR security.UserGroupInformation: PriviledgedActionException as:dadeniji (auth:SIMPLE) cause
rg.apache.hadoop.security.AccessControlException: Permission denied: user=<username>, access=EXECUTE,
inode=”/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/<username>”
:mapred:supergroup:drwx——
13/05/16 15:59:08 ERROR security.UserGroupInformation: PriviledgedActionException as:dadeniji (auth:SIMPLE) cause
rg.apache.hadoop.security.AccessControlException: Permission denied:
user=dadeniji, access=EXECUTE,
inode="/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/dadeniji":
mapred:supergroup:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:161)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:12
8)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4684)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkTraverse(FSNamesystem.java:4660)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2911)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:673)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamen
odeProtocolServerSideTranslatorPB.java:643)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlocki
ngMethod(ClientNamenodeProtocolProtos.java:44128)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
When we issued:
sudo -u hdfs hadoop fs -ls \
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging
We received:

Explanation:
- For my personalized HDFS Staging folder (/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/dadeniji), the permission set is rwx(——).
- To me it appears that the owner (mapred) is the only account that has any permissions.
- Cloudera Docs is very prophetic about these type of errors:
Installing CDH4 in Pseudo-Distributed Mode
Starting Hadoop and Verifying it is Working Properly:
Create mapred system directories
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html
If you do not create /tmp properly, with the right permissions as shown below, you may have problems with CDH components later. Specifically, if you don’t create /tmp yourself, another process may create it automatically with restrictive permissions that will prevent your other applications from using it.
Let us go correct it:
As it is merely a staging folder, let us remove it, and hope the system re-creates:
sudo -u hdfs hadoop fs -rm -r \
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/dadeniji
Once corrected we can run the MapReduce jobs.
References
References – Cloudera
References – GPG Keys
References – Java – Installation on CentOS
References – Yum – Commands
References – Network – Changing Hostname
References – ps
References – ls
References – ssh
References – Linux – User Management
References – Hadoop – HDFS