Introduction

Have a couple of internal disks and network SAN Storage and wanted to know how they compare.

So looked on the Net for free Disk IO Benchmark and profiling tools.

Tools

• Iometer
• Atto
• CrystalDiskMark

Tried a couple of the tools, but the one I settled on is DiskMark from NetworkDLS.

Install the tool

Usage

Once the tool is installed, please initiate it.

There are 4 major areas you want to touch on.

Disk Drive

All the disks on the system are availed in the “Disk Drive” drop-down box.

Set Size

This is the payload unit size.  It is represented in bytes.

You are able to use simple math when entering your value.

 Size Values 1 K 1024 64 KB 64 * 1024 1 MB (1000 KB) 1000 * 1024 1 GB (1000 * 1000 KB) 1000 * 1000 * 1024

Rounds

This is the number of times to process each payload.

Runs

This value relates to repetition cycle.  For each cycle values are gathered.  And, at the completion of all the runs, averages are calculated.

Sample Result

Trial Run:

Here is the configuration of our Trial Run:

Analyze Result

Analyze result.

 Disk Disk Time Write Performance Read Performance C 274.79 9.34 MB/s 128.32 MB/s D 272.47 9.42 MB/s 125.60 MB/s X 22.74 115.99 MB/s 287.97 MB/s Y 25.56 101.97 MB/s 254.61 MB/s Z 25.24 106.47 MB/s 277.02 MB/s

Conclusion

From using the Open Source Tool, DiskMark, we are quickly able to determine that our internal drives (C: & D:) will only deliver 10 MB/sec when writing data.

Whereas the SAN Storage drives will deliver an average of about 105 MB.

The numbers are not break-point numbers as the tool might probably not be writing data in parallel, using read-ahead cache, etc.

The number are simply guidance numbers while comparing Apple to Apple, so to speak.

References

References – DiskMark

http://lifehacker.com/5824265/diskmark-is-a-free-and-easy-hard-drive-benchmark-tool

References – winsat

http://blog.dv411.com/2011/01/disk-test-quickie-windows.html

Technical: Hadoop/Cloudera (v4.2.1) – Installation on CentOS (32 bit [x32])

Introduction

Here is quick preparation, processing, and validation steps for installing Cloudera – Hadoop (v4.2.1) on 32-bit CentOS.

Blueprint

I am using Cloudera’s fine documentation “http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html” as a basis.

It is a very good documentation, but I stumble a lot for lack of education and glossing over important details.  And, so I chose to write things down.

Environment Constraints

Here is the constraint that I have to work with:

• My lab PC is an old Dell
• It is a 32-bit processor
• And, so I can only install a 32-bit Linux & Cloudera Distro

Concepts

Here are a couple of concepts that we will utilize:

• File System – Linux – Stickiness

Concepts : File System – Linux – Stickiness

Background

Sticky Bit

http://en.wikipedia.org/wiki/Sticky_bit

The most common use of the sticky bit today is on directories. When the sticky bit is set, only the item’s owner, the directory’s owner, or the superuser can rename or delete files. Without the sticky bit set, any user with write and execute permissions for the directory can rename or delete contained files, regardless of owner. Typically this is set on the /tmp directory to prevent ordinary users from deleting or moving other users’ files.

In Unix symbolic file system permission notation, the sticky bit is represented by the letter t in the final character-place.

Set Stickiness

http://en.wikipedia.org/wiki/Sticky_bit

The sticky bit can be set using the chmod command and can be set using its octal mode 1000 or by its symbol t (s is already used by the setuid bit). For example, to add the bit on the directory/tmp, one would type chmod +t /tmp. Or, to make sure that directory has standard tmp permissions, one could also type chmod 1777 /tmp.

To clear it, use chmod -t /tmp or chmod 0777 /tmp (using numeric mode will also change directory tmp to standard permissions).

Is Stickiness set?

http://en.wikipedia.org/wiki/Sticky_bit

In Unix symbolic file system permissions notation, the sticky bit is represented by the letter t in the final character-place. For instance, in our Linux Environment , the /tmp directory, which by default has the sticky-bit set, shows up as:

Sample:

for service in /etc/init.d/hadoop*; do echo $service; done  Screen Shot: Starting CHD Services http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_27_1.html  Component Command Log hadoop-hdfs-namenode sudo/sbin/service hadoop-hdfs-namenode start /var/log/hadoop-hdfs/hadoop-hdfs-namenode-.log/var/log/hadoop-hdfs/hadoop-hdfs-namenode-.out hadoop-hdfs-secondarynamenode sudo /sbin/service hadoop-hdfs-namenode start /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-.out hadoop-hdfs-datanode sudo /sbin/service hadoop-hdfs-datanode start /var/log/hadoop-hdfs/hadoop-hdfs-datanode-.out hadoop-0.20-mapreduce-jobtracker sudo /sbin/service hadoop-0.20-mapreduce-jobtracker start /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-.out hadoop-0.20-mapreduce-tasktracker sudo /sbin/service hadoop-0.20-mapreduce-tasktracker start /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-tasktracker-.out Start Services – Using /etc/init.d Look for items in the /etc/init.d/ folders that have hadoop in their names and start them.  Syntax: for service in /etc/init.d/<service-name>; do sudo$service start; done

Sample:

for service in /etc/init.d/hadoop-*; do sudo $service start; done  Errors: Here are some errors we received, because I chose not to follow instructions or jumped over some steps. One thing I have to learn about Linux or Enterprise Systems in general is that “breverity in Instructions is sacrosanct” and you should make sure that you follow everything; or Google for help and hopefully someone else made the same mistakes and gave specific errors and resolution. Errors – HDFS-NameNode Here are HDFS Name Node errors. The log file is • Syntax –> /var/log/hadoop-hdfs/hadoop-hdfs-namenode-<hostname>.log • Sample –> /var/log/hadoop-hdfs/hadoop-hdfs-namenode-rachel.log Error due to name resolution error Specific Errors: • ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name. • java.net.UnknownHostException: <hostname>: <hostname> • at java.net.InetAddress.getLocalHost(InetAddress.java:1466) Screen Dump:  ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name. Using 'localhost'... java.net.UnknownHostException: rachel: rachel at java.net.InetAddress.getLocalHost(InetAddress.java:1466) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.getHostname(MetricsSystemImpl.java:496) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSystem(MetricsSystemImpl.java:435) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:431) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:180) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:156) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205) Caused by: java.net.UnknownHostException: rachel at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
... 9 more



Explanation

Error due to HDFS being in an un-consistent state


namenode join

Directory /var/lib/hadoop-hdfs/cache/hdfs/dfs/name is in an inconsistent state:

storage directory does not exist or is not accessible.

INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1



Specific Errors:

• Directory /var/lib/hadoop-hdfs/cache/hdfs/dfs/name is in an inconsistent state:storage directory does not exist or is not accessible.

Screen Dump:

Explanation

• Please go ahead and format the HDFS Name Node — This should be ran on primary NameNode:
sudo -u hdfs hadoop namenode -format


Errors – HDFS-DataNode

Here are HDFS Data Node errors.

The log file is

Error due to host name resolution error

Screen Shot:



[dadeniji@rachel conf]$cat /var/log/hadoop-hdfs/hadoop-hdfs-datanode-rachel.log 2013-05-13 15:24:32,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG: host = java.net.UnknownHostException : rachel: rachel STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.2.1 STARTUP_MSG: build = file:///data/1/jenkins/workspace/generic-package-centos32-6/topdir/BUILD/hadoop-2.0.0-cdh4.2.1/src/hadoop-common-project/hadoop-common -r 144bd548d481c2774fab2bec2ac2645d190f705b; compiled by 'jenkins' on Mon Apr 22 10:26:05 PDT 2013 STARTUP_MSG: java = 1.7.0_21 ************************************************************/ 2013-05-13 15:24:32,895 WARN org.apache.hadoop.hdfs.server.common.Util: Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 2013-05-13 15:24:33,962 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.net.UnknownHostException: rachel: rachel at java.net.InetAddress.getLocalHost(InetAddress.java:1466) at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:223) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:243) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1694) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1719) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1872) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1893) Caused by: java.net.UnknownHostException: rachel at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
... 6 more

2013-05-13 15:24:33,987 INFO org.apache.hadoop.util.ExitUtil: Exiting withstatus 1

/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at java.net.UnknownHostException:
rachel: rachel
************************************************************/



Explanation

• Need to quickly go in and make sure we are able to resolve our host name; for this specific host; the host name is rachel
Error due to required service not running


WARN org.apache.hadoop.hdfs.server.common.Util: Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/data should be specified as a URI in configuration files.

INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).

system started

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is rachel

server at /0.0.0.0:50010

bandwith is 1048576 bytes/s

via org.mortbay.log.Slf4jLog

(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode

INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context logs

at 0.0.0.0:50075

INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075

INFO org.mortbay.log: jetty-6.1.26.cloudera.2

INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075

50020

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020

BPOfferServices for nameservices: <default>

WARN org.apache.hadoop.hdfs.server.common.Util: Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/data should be specified as a URI in configuration files.

<registering> (storage id unknown) service to localhost/127.0.0.1:8020
starting to offer service

INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)



Explanation

• It looks like we are breaking when trying to communicate with host localhost, port 8020
• So what is supposed to be listening on port 8020
• So let us go make sure that Hadoop\Name Node is running and listening on Port 8020
Errors – MapReduce – Job Tracker

The log file is

Error due to MapReduce / File System Permission Error

Specific Errors:

• INFO org.apache.hadoop.mapred.JobTracker: Creating the system directory
• WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://localhost:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/system) because of permissions.
• WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned by the user ‘mapred (auth:SIMPLE)’
• WARN org.apache.hadoop.mapred.JobTracker: Bailing out …
• org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode=”/”:hdfs:supergroup:drwxr-xr-x

Screen Dump:



with processName=JobTracker, sessionId=
INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 8021

INFO org.apache.hadoop.mapred.JobTracker: Creating the system directory

permissions

WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned
by the user 'mapred (auth:SIMPLE)'

WARN org.apache.hadoop.mapred.JobTracker: Bailing out ...
org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44096)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)  Explanation • Having HDFS File System permission problems • Are the folders not created or are they created and we are only having problems with the way they are privileged? • I remembered that there was extended coverage of HDFS Map Reduce folder permissions in the Cloudera Docs. Let us go review and apply those permissions Configuring init to start core Hadoop Services http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_27_2.html Stopping Hadoop Services http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_27_3.html Post Installation Review Services – Review Commands – service –status-all  sudo service --status-all | egrep -i "jobtracker|tasktracker|Hadoop"  Output: Commands – tcp/ip service (listening)  sudo lsof -Pnl +M -i4 -i6 | grep LISTEN  tried running lsof, but got error message:  ps -aux 2> /dev/null | grep "java"  Screen Dump (lsof: command not found): Once installed lsof (via instructions previously given) Output: Explanation: Explanation – Java • We have quite a few listening Java proceses • The java processes are listening on TCP/IP ports between 50010 and 50090; specifically 50010, 50020, 50030, 50060, 50070, 50075 • And, also ports 8010 and 8020 Explanation - Auxiliary Services • sshd (port 22) • cupsd (port 631) Commands – ps (running java applications)  ps -eo pri,pid,user,args | grep -i "java" | grep -v "grep" | awk '{printf "%-10s %-10s %-10s %-120s \n ",$1, $2,$3,  $4}'  Output: Interpretation: • With java app one will see -Dproc_secondarynamenode, -Dproc_namenode, and -Dproc_jobtracker –> This indicator obviously maps to specific Hadoop Services Operational Errors Operational Errors – HDFS – Name Node Operational Errors – HDFS – Name Node – Security – Permission Denied  mkdir: Permission denied: user=dadeniji, access=WRITE, inode="/user/dadeniji":hdfs:supergroup:drwxr-xr-x  Validate: Check the permissions for HDFS under /user folder:  sudo -u hdfs hadoop fs -ls /user  We received: Explanation: • For my folder, /user/dadeniji, my folder is still owned by hdfs. Let us go change it:  sudo -u hdfs hadoop fs -chown$USER /user/$USER  Validate Fix:  hadoop fs -ls /user/$USER


Output:

Operational Errors – HDFS – DataNode

13/05/16 15:59:08 ERROR security.UserGroupInformation: PriviledgedActionException as:dadeniji (auth:SIMPLE) cause

:mapred:supergroup:drwx——



13/05/16 15:59:08 ERROR security.UserGroupInformation: PriviledgedActionException as:dadeniji (auth:SIMPLE) cause
mapred:supergroup:drwx------
8)
odeProtocolServerSideTranslatorPB.java:643)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlocki
ngMethod(ClientNamenodeProtocolProtos.java:44128)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)



When we issued:



sudo -u hdfs hadoop fs -ls  \



Explanation:

• For my personalized HDFS Staging folder (/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/dadeniji), the permission set is rwx(——).
• To me it appears that the owner (mapred) is the only account that has any permissions.
• Cloudera Docs is very prophetic about these type of errors:

Installing CDH4 in Pseudo-Distributed Mode
Starting Hadoop and Verifying it is Working Properly:
Create mapred system directories
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html
If you do not create /tmp properly, with the right permissions as shown below, you may have problems with CDH components later. Specifically, if you don’t create /tmp yourself, another process may create it automatically with restrictive permissions that will prevent your other applications from using it.

Let us go correct it:

As it is merely a staging folder, let us remove it, and hope the system re-creates:



sudo -u hdfs hadoop fs -rm -r \



Once corrected we can run the MapReduce jobs.

Introduction

Access to running certain applications is restricted to the root user or users that are able to acquire administrative privileges.

Thus to successfully manage systems it is required to be able to login as the root account or one of the accounts that can act in its place.

Which processes can only be executed by “root” users?

These so called restricted modules have an s in the owner execute flag when viewed using ls -la.


--check /bin folder and list files that have the signature "-rws"
ls -la /bin/* | grep -i "\-rws"



There are a couple of things you want to note:

• You need to escape the – symbol when identifying -rws; you escape – character by using the back-slash (\)
• Notice that we are looking at the first three letters; which signify permission set for the owner
• r — the owner is able to read the file
• w — the owner is able to write\over-write the file
• s — this usually have x to indicate that the owner can execute the file.  When not x, but s it means whomever is executing this process takes on the role of the file’s owner

Taking on the root role via membership in the wheel group

By convention Linux uses a group name named wheel as a surrogate group that can take on the role of the Admin.

Where did the name wheel come from ?

http://en.wikipedia.org/wiki/Wheel_%28Unix_term%29

In computing, the term wheel refers to a user account with a wheel bit, a system setting that provides additional special system privileges that empower a user to execute restricted commands that ordinary user accounts cannot access.  The term is derived from the slang phrase big wheel, referring to a person with great power or influence.

What is the “Wheel Group”

http://en.wikipedia.org/wiki/Wheel_%28Unix_term%29

Modern Unix systems use user groups to control access privileges. The wheel group is a special user group used on some Unix systems to control access to the su command, which allows a user to masquerade as another user (usually the super user).

Adding user to the wheel group

Command Shell – Utility – usermod

To modify user accounts, Linux relies on the usermod utility.  Here are a few quick points:

• The file’s full name is /usr/sbin/usermod
• One can change the user’s home directory via the -d (–home) option
• One can change the user’s primary group via the -g ( –gid) option
• One can wholly replace the user’s group membership via the -G (–groups) option
• One can add to the user’s existing group by using the -a (–append) option
• One can change the user’s shell by using the -s (–shell) option
• One can unlock an account by using the -U (–unlock) option
Usermod – Add user to the wheel group

To add our user, myself, in this case to the wheel group, please do the following:


Syntax:

Sample:



Thanks goodness, you get good nice, indicative messages when the group or user name is not actualized on the system:

• user does not exist
• group does not exist

When things are good, we get no feedback.

Groups – Review User Group Membership

Get user groups


Syntax:

Sample:



Output:

Groups – List all users in a group

List all users in a group


Sample:

grep :grep ^wheel /etc/group | cut -d: -f3: /etc/passwd



Output:

Explanation of Script:

• The surrounding  means that the inner script be ran and the results internal preserved, and not displayed to the console
• What does the inner script do — grep ^wheel / etc/group — it says to get the line in /etc/group that starts with the wheel word.  In  its entirety that line reads “wheel:x:10:”
• The output of “grep ^wheel /etc/group” is piped “|” to the cut utility.  The syntax “cut -d: -f3″ says to get the third word using colon (:) as the delimiter   So when we ask for the first word of “wheel:x:10:”, we get back 10.  10 is obviously the GroupID for wheel
• Please note that you need the colons (:) around the inner script, without it I got extraneous row; like the code and output pasted below:

Code (code and console output):


Command:
grep -e  "grep ^wheel /etc/group | cut -d: -f3"  /etc/passwd

Output:

-------------------------------------------------------------------------------
Command:
grep -e  :"grep ^wheel /etc/group | cut -d: -f3":  /etc/passwd

Output:



Output (Screen shot):

Explanation:

• Without the colon (:), one will see the extra record for games.  Games group id is not 10, but 100

Ensure that wheel has sudo access via customization of sudoers

Why bother with sudoers?

If an account tries to access sudo without membership in the wheel group or the wheel group is not fully configured for sudo access via the sudoers file, then the error message pasted below will come up:

Output (Text):

<account> is not in the sudoers file.  This incident will be reported.


Output (Screenshot):

Email

Next time you, the root user, access uses your system, you will get a nice little notification telling you that you have a nice a little email waiting for you:

You have mail in /var/spool/mail/root


To view the email issue something like

tail /var/spool/mail/root


Screen shot:

The email sent by the gossip delegator is quite straight forward.  The areas covered includes:

• To — root
• From — dadeniji (in our case)
• Auto-Submitted: auto-generated
• Subject: **SECURITY information for <hostname>
• Mesage-Id: ******
• Date: *****
tail /var/spool/mail/root


Email Contents:



rachel : May 12 17:41:27 : dadeniji : user NOT in sudoers ; TTY=pts/2 ; PWD=/home/dadeniji ; USER=root ; COMMAND=/bin/ls



Using visudo

Launch visudo:

visudo


Look for the lines that reference the wheels group:

Shipped:



## Allows people in group wheel to run all commands
# %wheel        ALL=(ALL)       ALL

## Same thing without a password
# %wheel        ALL=(ALL)       NOPASSWD: ALL


• The  statements are refreshingly well documented
• I will suggest that you un-comment the line references wheel, but does not make mention of NOPASSWD

Revised:



## Allows people in group wheel to run all commands
%wheel        ALL=(ALL)       ALL

## Same thing without a password
# %wheel        ALL=(ALL)       NOPASSWD: ALL



Corrected:

Validation


sudo ls -la *

Output:

Now we issue sudo <command> and supply our account’s (dadeniji) password, we are good.

References

References – Bash

References – Grep Commands

Technical – Hadoop – Hive – What is the Version # of Hive Service and Client that you are running?

Introduction

Hadoop is a speeding bullet.  You look online, Google for things, try it out, and sometimes you hit, but often you miss.

What do I mean by that?

Well this evening I was trying to play with Hive; specifically using Sqoop to import a table from MS SQL Server into Hive.

A bit of background, my MS SQL Server table has a couple of columns declared as datetime.

Upon running the Sqoop statement pasted below:



--connect "jdbc:sqlserver://sqlServerLab;database=DEMO" \
--driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" \
-m 1 \
--hive-import \
--hive-table "customer" \
--table "dbo.customer" \
--split-by "customerID"



The above command basically gives the following instruction set:

• Via JDBC Driver (jdbc:sqlserver) connect to SQL Instance (sqlServerLab) and database Demo
• JDBC Driver’s Class name – com.microsoft.sqlserver.jdbc.SQLServerDriver
• Number of Map Reduce Jobs (m 1)
• Sqoop Operation — hive-import
• Hive Table — customer
• SQL Server Table — dbo.customer
• Split-by — customerID

I noticed in the Sqoop console log output statements a couple of warnings:



INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM dbo.customer AS t WHERE 1=0

WARN hive.TableDefWriter: Column InsertTime had to be cast to a less
precise type in Hive

WARN hive.TableDefWriter: Column salesDate had to be cast to a less
precise type in Hive



Processing

Explore MS SQL Server

So I quick went back and looked at my SQL Server Table:

  use [Demo];
exec sp_help 'dbo.customer';


Output:

The output is congruent with my thoughts:

• The InsertTime is a datetime column
• The salesDate is a datetime column

Explore Hive

Launch Hive:

In shell, issue “hive” to initiate Hive Shell:


hive

List all tables:

To confirm that a corresponding table has been created in Hive, uses list


show tables;

Output:

Display Table Structure (customer):

Display table structure using describe:


Syntax:
describe <table-name>;

Sample:

describe customer;

Output:

Explanation:

• So it is obvious that our two original MS SQL Server Date columns (Inserttime and salesdate) were not brought in as Datetime, but String

So I am thinking why?

Hive Datatype Support

I know that the Timestamp column was not one of the original datatypes supported by Hive.  It was added per Hive version 0.8.0

This is noted in:

HortonWorks – Hive – Language Manual – Datatypes
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/language_manual/datatypes.html

Determine Hive Version

There are a couple of ways to get the Hive’s Server and Client Version Number

Determine Hive Version – Command Shell – Using ps

issue ps -aux



ps -aux | grep -i "Hive"



Output (Screen shot):

Output (Text):



410      13853  0.0  0.2 2207844 22824 ?       Ss   Apr15   0:00 postgres: hive hive 10.0.4.1(56963) idle

410      13854  0.0  0.1 2206552 8388 ?        Ss   Apr15   0:00 postgres: hive hive 10.0.4.1(56964) idle


• We have 4 processes bearing the “hive” name

Service Process

• It is identifiable as a Hive Service via its name hive-service*.jar
• It is running under the “hive” account name.  Its Process ID is 13767.  One of the Jar files referenced is /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/hive-service-0.10.0-cdh4.2.0.jar
• The Cloudera Version# is 4.2 and Hive Version# is 0.10

Client Process

• It is identifiable as a Hive Client via its name hive-cli*.jar
• It is running under my username (dadeniji), as I kicked it off.  Its Process ID is 18749.  One of the Jar files referenced is /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/bin/../lib/hive/lib/hive-cli-0.10.0-cdh4.2.0.jar
• The Cloudera Version# is 4.2 and Hive Version# is 0.10

“Postgress” Process

• Hive’s uses an embedded postgress database
• The processes are running under account 410

Determine Hive Version – Cloudera Manager Admin Console & Command Shell

• Launch Web Browser
• Connect to Admin console ( http://<clouderaManagerServices>:<port>).  In our case http://hadoopCMS:7180; as Cloudera Manager Service is running on a machine named hadoopCMS and we kept the default port# of 7180
• The initial screen displayed in the Service Status page (/cmf/services/status)
• Click on the service we are interested in (hive1)
• The service’s specific “Status and Health Summary” screen is displayed.  In this case “Hive1 – Services and Health Summary” page
• In the row labelled “Hive MetaStore Server” Click on the link underneath the “Status” column
• This will bring you to the “hivemetastore” summary page.
• For each Hive host, Hive process information and links the Hive Logs are displayed
• On the “Show Recent Logs” row, click on “Full Stdout” log
• The stdout.log appears – Here is break of what is provided

stdout.log

 

Mon Apr 15 21:06:24 UTC 2013
using /usr/java/default as JAVA_HOME

using 4 as CDH_VERSION

using /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive
as HIVE_HOME

using /var/run/cloudera-scm-agent/process/22-hive-HIVEMETASTORE
as HIVE_CONF_DIR

Starting Hive Metastore Server


Java version

We quickly see that JAVA_HOME is defined as /usr/java/default.

To see what files constitute /usr/java/default

  ls /usr/java/default

Output:

Explanation:

• /usr/java/default is symbolically linked to /usr/java/latest
• /usr/java/latest is symbolically linked to /usr/java/jdk1.7.0_17
Cloudera Distribution version

Based on the screen shot below, the CDH Version is 4

using 4 as CDH_VERSION
Hive Home

Based on the screen shot below, the Hive Home is /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive

using /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive as HIVE_HOME

Again, let us return to the command shell and see what files are in /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive

Please add /lib suffix to get to the Jar files and only get only jar files that have hive in their names.



ls /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/hive*.jar



Output:

Cloudera Manager Admin Console – Service Status

Cloudera Manager Admin Console – Hive1 – Status and Health Summary

Cloudera Manager Admin Console – Hive1 – Status Summary

Cloudera Manager Admin Console – Hive1 – Status Summary – Log – Stdout.log

Conclusion

It thus appears that we are running a version of Hive (0.10) in this case that it did not support the TimeStamp datatype.

The problem can also be with the version of Sqoop we have running or Sqoop’s ability to detect SQL Server’s datetime datatype or datetime data representation in general.

Technical: Hadoop – ZooKeeper – Client (Cloudera)

Introduction

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_21.html

ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services — such as naming, configuration management, synchronization, and group services – in a simple interface so you don’t have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.

What are we trying to do

Review the ZooKeeper client bundled with Cloudera Hadoop distribution.  The tool is known as ZooKeeper-Client.

Configuration Validation

Folder – /etc/init.d/*cloudera*


ls -la /etc/init.d/*cloudera*


Screen Shot:

Service – Status all



sudo service --status-all 2> /dev/null | grep -i "cloudera"



Screen Shot:

ZooKeeper Client – Launch Shell


zookeeper-client


Output:

(press enter) to get a command entry point

ZooKeeper Client – Help

To get a listing of commands:


help


Output:

ZooKeeper Client – Quit

To close the shell, issue the Quit command.


quit


Output:

ZooKeeper Client – Connect

To connect to another ZooKeeper, issue Open :



Syntax:
connect <hostname>:<portNumber>

Sample:



Output:

ZooKeeper Client – ls

ZooKeeper primary object are folders and files.  To get a list of folders and files, issue:

Syntax:

ls <folder-name>
Sample:

ls /
ls /hbase
ls /zookeeper

On a base Zookeeper install, the base folders are /hbase and /zookeeper.

Output:

ls /hbase

ls /zookeeper

ZooKeeper Client – Create Folder

Syntax:

create <folder-name> <Associated-ID>
Sample:

create  /corporate corp
create  /corporate/HR  corpHR

ZooKeeper Client – Remove Folder

Syntax:

rmr <folder-name>
Sample:

rmr  /corpSec8

ZooKeeper Client – getAcl

To get permissions issue the getAcl command.

To get permission set for folder /advert:

Syntax:

getAcl <folder-name>
Sample:

getAcl  /corporate/HR

Output:

Explanation:

• Scheme -> world
• User -> anyone (default and only allowable user)
• Permission –> crdwa

To get permission set for folder /advert:


Syntax:

getAcl <folder-name>
Sample:

getAcl  /advert

Output:

Explanation:

• Scheme -> digest
• Permission –> crdw

ZooKeeper Client – setAcl

To set permissions issue the setAcl command.

We have included setAcl commands as simply an engineering exercise.  I will discourage employing them for the following reasons.

• The folder can become totally inaccessible
• They are indomitable

Indomitable:

• They can not be removed  - There is no resetAcl API
• They are not cumulative

ZooKeeper Client – setAcl – Scheme (Host)

To set permission for specific Hosts or hosts that are in same domain, use:

Everyone whose hosts name has the corp.com moniker:

Syntax:

setAcl <folder-name> <host>:<domain-name>:<permission-set>

Sample:

setAcl /advert host:corp.com:crwda

The host whose FQDN name is appServer1.corp.com:

Syntax:

setAcl <folder-name> <host>:hostname:<permission-set>

Sample:

setAcl /advert host:appServer1.corp.com:cdrwa

ZooKeeper Client – setAcl – Scheme (IP Address)

To assign all permissions to a specific IP Address {10.0.4.70}:

Syntax:

Sample:

setAcl  /corpSec7 ip:10.0.4.70:cdrwa

To validate that things are good, issue getAcl:

To review the Permission set, use getAcl:

Syntax:

getAcl   <folder-name>
Sample:

getAcl  /corpSec7

Output:

ZooKeeper Client – setAcl – Scheme (World)

Anyone

For the following use case scenario:

• Folder -> /corporate
• Authentication Provider -> world
• User –> anyone (The only valid user is the “anyone” user)
• Permission -> crwda
Syntax:

setAcl <folder-name> <scheme>:<permisson-set>

Sample:

setAcl /advert world:anyone:crdwa

Output:

ZooKeeper Client – setAcl – Digest Authentication

To allow anyone within our local network the ability to use the /corporate/HR folder, do the following:

For the following use case scenario:

• Folder -> /corporate
• Authentication Provider -> digest
• Permission -> crwda
Syntax:

setAcl <folder-name>:<scheme>:<permisson-set>

Sample:

setAcl /corporate digest:dadeniji:waTER:crdwa

Output:

For the following use case scenario:

• Authentication Provider -> digest
• Permission -> crwd
Syntax:

setAcl <folder-name>:<scheme>:<permisson-set>

Sample:

setAcl /advert digest:dadeniji:safetec:crdw

Output
:

ZooKeeper Client – Stat

Get folder Stats:

Syntax:

stat <folder-name>

Sample:

stat  /hbase
stat  /zookeeper

Output:

Error Messages:

Error Message – NoAuthException


Exception in thread "main" org.apache.zookeeper.KeeperException\$NoAuthException: KeeperErrorCode = NoAuth for /corpSec7
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setACL(ZooKeeper.java:1375)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:733)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)

`

Explanation:

• One always has to be careful when setting permissions
• And, it seems once they are set, it is difficult to change them

Logging:

Logging – Log File – Location

ZooKeeper log files are kept in /var/log/zookeeper

Logging – Log File – Name

The naming convention for log file is:

zookeeper-cmf-zookeeper1-SERVER-<FQDN>.log

