Technical: Hadoop – HBase – Compression – lzo

Technical: Hadoop – HBase – Compression – lzo

Introduction

http://hbase.apache.org/book.html#compression
Unfortunately, HBase cannot ship with LZO because of the licensing issues; HBase is Apache-licensed, LZO is GPL. Therefore LZO install is to be done post-HBase install. See the Using LZO Compression wiki page for how to make LZO work with HBase.

A common problem users run into when using LZO is that while initial setup of the cluster runs smooth, a month goes by and some sysadmin goes to add a machine to the cluster only they’ll have forgotten to do the LZO fixup on the new machine. In versions since HBase 0.90.0, we should fail in a way that makes it plain what the problem is, but maybe not.

See Section C.2, “ hbase.regionserver.codecs ” for a feature to help protect against failed LZO install.

Validation

Let us use Hadoop\Hbase compression validation test.  That validation test is packaged as org.apache.hadoop.hbase.util.CompressionTest.

Validation – File System

Syntax:

  • package name – org.apache.hadoop.hbase.util.CompressionTest
  • The file system we are targeting is file
  • Ensure that you have a valid file name; we are suing /tmp/testfiles
  • Specify the compression as lzo

hbase org.apache.hadoop.hbase.util.CompressionTest file:/tmp/testfiles lzo


Hadoop - Hbase - Compression Test - File System

Here is the text output:



13/04/30 23:01:02 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
13/04/30 23:01:03 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 61.9m
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec
	at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:110)
	at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:234)
	at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.(HFileBlock.java:591)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishInit(HFileWriterV2.java:178)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.(HFileWriterV2.java:150)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.(HFileWriterV2.java:140)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2$WriterFactoryV2.createWriter(HFileWriterV2.java:104)
	at org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(CompressionTest.java:108)
	at org.apache.hadoop.hbase.util.CompressionTest.main(CompressionTest.java:137)
Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
	at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:105)
	... 8 more

The most important part of the error message is:

Exception in thread "main" java.lang.RuntimeException: 
  java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec

which basically reads that com.hadoop.compression.lzo.lzoCodec is missing.

Validation – HDFS

Get Location and Files

We need to get the location of the HBASE folders and files:

  • sudo hdfs (the hdfs user has access to the folders)
  • Get folder listing for the hadoop/hdfs folder (this is accessible via hadoop fs -ls/hbase)

syntax:

     sudo -u hdfs hadoop fs -ls /hbase

sample:

      sudo -u hdfs hadoop fs -ls /hbase

Use hbase CompressionTest

Issue HBase compression test (org.apache.hadoop.hbase.util.CompressionTest) against hdfs folder.  Test for lzo.


syntax:

  sudo -u hdfs hbase org.apache.hadoop.hbase.util.CompressionTest 
    hdfs://<host>/hbase/<table-name>/.tableinfo.<fileid>  lzo

sample:

  sudo -u hdfs hbase org.apache.hadoop.hbase.util.CompressionTest 
      hdfs:/localhost/hbase/t2/.tableinfo.0000000001 lzo

Screen shot:

 

compressionTest - lzo - hdfs (v2)
Screen text dump:



[$ sudo -u hdfs hbase org.apache.hadoop.hbase.util.CompressionTest hdfs:/localhost/hbase/t2/.tableinfo.0000000001 lzo
13/05/02 19:07:17 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
13/05/02 19:07:19 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 61.9m
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec
	at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:110)
	at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:234)
	at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.(HFileBlock.java:591)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishInit(HFileWriterV2.java:178)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.(HFileWriterV2.java:150)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.(HFileWriterV2.java:140)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2$WriterFactoryV2.createWriter(HFileWriterV2.java:104)
	at org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(CompressionTest.java:108)
	at org.apache.hadoop.hbase.util.CompressionTest.main(CompressionTest.java:137)
Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
	at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:105)
	... 8 more
13/05/02 19:07:19 ERROR hdfs.DFSClient: Failed to close file /localhost/hbase/t2/.tableinfo.0000000001
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /localhost/hbase/t2/.tableinfo.0000000001 File does not exist. Holder DFSClient_NONMAPREDUCE_-767973534_1 does not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44084)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

	at org.apache.hadoop.ipc.Client.call(Client.java:1224)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at com.sun.proxy.$Proxy9.complete(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at com.sun.proxy.$Proxy9.complete(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:329)
	at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1769)
	at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1756)
	at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:654)
	at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:671)
	at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:539)
	at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2308)
	at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2324)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

Package Sources & Deployment

A quick Google search suggests that there are a couple of packages (hopefully RPMs) that contains the com.hadoop.compression.lzo.LzoCodec class file.

Sources that have either RPMs or the JAR files:

Download – RPM { hadoop-gpl-packaging }

Copy downloaded files to Hadoop\Hbase Server

Copy the downloaded file to the Hadoop\Hbase server.



Syntax:

  scp  <username>@<hostname-dest>:<filename-on-destination-host>

Sample:
  scp hadoop-gpl-packaging-0.6.1-1.x86_64.rpm dadeniji@hbase:/tmp/.

Install RPM (hadoop-gpl-packaging-0.6.1-1.x86_64.rpm)

Issue rpm -i to install the Hadoop-gpl package.



Syntax:

  rpm -i <rpm-file>

Sample:
  rpm -i /tmp/hadoop-gpl-packaging-0.6.1-1.x86_64.rpm

Output:

rpm -i rpmfile -- failed on dependencies error

 

error: Failed dependencies:
lzo is needed by hadoop-gpl-packaging-0.6.1-1.x86_64

The error message above is a bit revealing as it seems to imply that the hadoop-gpl-packaging-0.6.1-1.x86_64 package fila actually relies on the lzo library; which is properly an actual native (C, C++) library.

So let us go look for the native lzo package.

Install lzo native library (repoforge)

Determine which lzo version you need

Knowing unlike Java Jar files that are OS independent, the native lzo binaries are properly not.

So let us review our OS Version:



Syntax:

  uname -r

Output:

   2.6.32-279.el6.x86_64

So our Linux version is 2.6 [el6] 64 bit.

Identify where to get lzo native library

A quick Google suggests that the Native Libraries are available @ http://pkgs.repoforge.org/lzo/

lzo libraries

There are two major sets of el6 files – devel and minilzo.  

Based on the source quoted below, the minilzo is a minimal version that has quite a bit less functionality:

http://www.oberhumer.com/opensource/lzo/
miniLZO is a lightweight subset of the LZO library.

There is one more decision that you have to make – Is your target environment 32 or 64-bit.

Download lzo native library { lzo-devel-2.06-1.el6.rfx.x86_64.rpm }

Using mget or your favorite browser, download lzo-devel-2.06-1.el6.rfx.x86_64.rpm from http://pkgs.repoforge.org/lzo/

Copy lzo native library


Syntax:

  scp lzo-devel-2.06-1.el6.rfx.x86_64.rpm <hadoop-hbase-host>:/destination

Sample:

   scp lzo-devel-2.06-1.el6.rfx.x86_64.rpm  dadeniji@hadoop-host:/tmp

Install lzo native library



Sample:

   sudo rpm -i  lzo-devel-2.06-1.el6.rfx.x86_64

failed dependencies –

  • lzo = 2.06-1.el6.rfx is needed by lzo-devel-2.06-1.el6.rfx.x86_64

lzo-devel -- failed dependencies -- rfx is missing -v2

Download lzo native library { lzo-2.06-1.el6.rfx.x86_64.rpm }

Find Location of lzo native library

Copy lzo native library


Syntax:

  scp lzo-devel-2.06-1.el6.rfx.x86_64.rpm <hadoop-hbase-host>:/tmp

Sample:

   scp lzo-2.06-1.el6.rfx.x86_64.rpm dadeniji@hadoop-hbase-test:/tmp/.

Install lzo native library



Sample:

   sudo rpm -i  lzo-2.06-1.el6.rfx.x86_64.rpm

But, received the error message stated below:



warning: lzo-2.06-1.el6.rfx.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 6b8d79e6: NOKEY

file /usr/lib64/liblzo2.so.2.0.0 from install of lzo-2.06-1.el6.rfx.x86_64 conflicts with file from package liblzo2_2-2.03-6.el6.x86_64

Since all of this rpm install is new to me, I am at a standstill.  I really need to better understand the error message; here is what I know:

  • I am trying to install lzo-2.06-1.el6.rfx.x86_64.rpm
  • As part of that install /usr/lib64/liblz02.so.2.0.0 is being installed
  • But, the file liblz02.so.2.0.0 was already installed as part of liblz02_2-2.03-6.el6.x86_64

Slept on this problem; as the Psalmist promised that the morning will bring sunlight.

Though, Spring Pollen raged this morning, I had a good night’s rest.  And, have a bit more vigor that I can not let waste:

  • Backed up the file (/usr/lib64/liblz02.so.2.0.0 –> cp /usr/lib64/liblz02.so.2.0.0 /tmp)
  • Took to Google and issued plea for “rpm ‘conflicts with file from package’ “

Found reputable help via “http://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-rpm-using.html“.  It says to use the replacefiles option.

Syntax:

   sudo rpm -ivh --replacefiles <rpm>

Sample:
   sudo rpm -ivh --replacefiles lzo-2.06-1.el6.rfx.x86_64.rpm 

rpm --install lzo-2.06-1.el6.rfx.x86_64.rpm

Install RPM (hadoop-gpl-packaging-0.6.1-1x86_64.rpm) {2 Attempt}



Syntax:

  rpm -i <rpm-file>

Sample:
  sudo rpm -i /tmp/hadoop-gpl-packaging-0.6.1-1.x86_64.rpm

Output:
rpm -- gpl-packaging -- user hadoop does not exist
Interpretation

  • There is one lone error message – user hadoop does not exist
  • I know that our Hadoop\Hbase distribution (cloudera) does not use the hadoop user

Review Install

Returning to hadoop-gpl-packing (https://code.google.com/p/hadoop-gpl-packing/) review install:

https://code.google.com/p/hadoop-gpl-packing/
Download the rpm and install it on the machine to be used The binaries will be installed in:

  • /opt/hadoopgpl/lib/*jar
  • /opt/hadoopgpl/native

Issue “ls command against /opt/hadoopgpl/” folder:


ls -la /opt/hadoopgpl/*

Folder -- Listing -- opt-hadoopgpl

Create Symbolic Links

We have to create two sets of Symbolic links:

  • /opt/hadoopgpl/lib/*jar
  • /opt/hadoopgpl/native

/opt/hadoopgpl/lib/*jar

  • These are Java Jar files
  • Based on which Hadoop app we will like to use lzo compression, we will direct the Jar files to that app’s folder
  • In our case, our App is HBASE

Let us determine HBASE’s home directory.


From running ps -aux | grep "HBASE":

     -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase

we can tell that HBASE HOME DIR is

     /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase

And, thus deduce that the LIB folder is

     /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib

Next in line is to see which files from /opt/hadoopgpl/lib/*.jar are already registered in /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib:

  • /opt/hadoopgpl/lib/guava-12.0.jar (Older file guava-11.0.2.jar is present)
  • /opt/hadoopgpl/lib/hadoop-lzo-0.4.17.jar (hadoop-lzo.jar)
  • /opt/hadoopgpl/lib/hadoop-lzo.jar -> /opt/hadoopgpl/lib/hadoop-lzo-0.4.17.jar (Symbolic link)
  • /opt/hadoopgpl/lib/protobuf-java-2.4.1.jar (Older file protobuf-java-2.4.0a.jar is present)
  • /opt/hadoopgpl/lib/slf4j-api-1.5.8.jar (Newer file slf4j-api-1.6.1.jar is present)
  • /opt/hadoopgpl/lib/slf4j-log4j12-1.5.10.jar (slf4j-log4j.jar)
  • /opt/hadoopgpl/lib/yamlbeans-0.9.3.jar (yamlbeans.jar)

In cases where corresponding files are missing in the destination folder, we will create symbolic links in the destination folder (/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib):

Please note the following:

  • Here is the matrix
Source File Name Installed File Location Keep or Ignore
/opt/hadoopgpl/lib/hadoop-lzo-0.4.17.jar hadoop-lzo.jar keep
/opt/hadoopgpl/lib/slf4j-log4j12-1.5.10.jar slf4j-log4j.jar Ignore as will cause conflict
/opt/hadoopgpl/lib/yamlbeans-0.9.3.jar yamlbeans.jar Keep

Bash Shell Script that creates symbolic link:



sudo ln -s /opt/hadoopgpl/lib/hadoop-lzo-0.4.17.jar  /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/hadoop-lzo.jar

sudo ln -s /opt/hadoopgpl/lib/slf4j-log4j12-1.5.10.jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/slf4j-log4j.jar

sudo ln -s /opt/hadoopgpl/lib/yamlbeans-0.9.3.jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/yamlbeans.jar

/opt/hadoopgpl/lib/*jar

Run ps and get the folders indicated by java.library.path


From running ps -aux | grep "HBASE":

-Djava.library.path=/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native/Linux-amd64-64

    -Djava.library.path

Here are the folders that are referenced in java.library.path:

  • /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/lib/native
  • /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native/Linux-amd64-64 

Actual bash shell script:



cd /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native/Linux-amd64-64

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/libgplcompression.a          libgplcompression.a 

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/libgplcompression.la         libgplcompression.la

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/libgplcompression.so         libgplcompression.so

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/libgplcompression.so.0       libgplcompression.so.0

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/libgplcompression.so.0.0.0   libgplcompression.so.0.0.0

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/LzoCompressor.lo             LzoCompressor.lo

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/LzoCompressor.o              LzoCompressor.o

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/LzoDecompressor.lo           LzoDecompressor.lo

sudo ln -s /opt/hadoopgpl/native/Linux-amd64-64/LzoDecompressor.o            LzoDecompressor.o

Restarted Cloudera HBase Service

For the changes to be effected, the Cloudera Hbase services needs to be restarted.

Using a web browser, connected to:

http://<cloudera-manager&gt;:7180/cmf/services/status

Unfortunately, we got the error stated below:

Screen Shot

slf4j - incompatible error

Screen Text Capture



MASTER master start
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoopgpl/lib/slf4j-log4j12-1.5.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: The requested version 1.5.10 by your slf4j binding is not compatible with [1.6]
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.

It seems the slf4j.jar we added (via Symbolic Link) is not compatible. The reason being that slf4j-log4j12-1.5.10.jar is not version compatible with the existing slf4j-api-1.6.1.jar

slf4j - unlink - pre

Let us go unlink it:

 



sudo unlink  /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/slf4j-log4j.jar

Validation

Introduction

Pasted below is a simple code that tries to invoke the lzoCodec package\code.

Simple Code

Here is the code; we saved the file as testLZO.java



import com.hadoop.compression.lzo.LzoCodec;
import java.io.PrintStream;
import java.util.Scanner;

public class testLZO
{

    com.hadoop.compression.lzo.LzoCodec objLZOCodec = null;
    Scanner scanner = null;

    testLZO()
    {
        this.scanner = new Scanner(System.in);

        System.out.println("Press enter ... ");
        String str = this.scanner.nextLine();

	objLZOCodec = new com.hadoop.compression.lzo.LzoCodec();

        System.out.println("get default ext "
                             + this.objLZOCodec.getDefaultExtension());

        this.objLZOCodec = null;

   } 

  public static void main(String[] paramArrayOfString)
  {

	testLZO localTestLZO = new testLZO();

	localTestLZO = null;

  } 

} 

Compile Code

Compile the code as testLZO.java:

  • the classpath is hadoop-lzo.jar
  • The source code is testLZO.java
  • And, the object file produced is testLZO.class

javac -cp hadoop-lzo.jar  testLZO.java

Run Test Code

To test things out, issue java and pass in the jar files

  • hadoop-lzo.jar
  • protobuf-java-2.4.1.jar
  • hadoop-common-2.0.0-cdh4.2.0.jar
  • commons-configuration-1.6.jar
  • commons-lang-2.5.jar
  • commons-logging-1.1.1.jar

(native libraries not referenced)



java  -classpath hadoop-lzo.jar:protobuf-java-2.4.1.jar:hadoop-common-2.0.0-cdh4.2.0.jar:commons-configuration-1.6.jar:commons-lang-2.5.jar:commons-logging-1.1.1.jar:. testLZO

Could not load native gpl Library

(native libraries referenced)

  • -Djava.library.path=/opt/hadoopgpl/native/Linux-amd64-64


java  -classpath lzo:hadoop-lzo.jar:protobuf-java-2.4.1.jar:hadoop-common-2.0.0-cdh4.2.0.jar:commons-configuration-1.6.jar:commons-lang-2.5.jar:commons-logging-1.1.1.jar:. -Djava.library.path=/opt/hadoopgpl/native/Linux-amd64-64 testLZO

native gpl Library - Referenced

Run Test Code (Fuller Test)

Anyone who knows me knows that I like to steal others code.  But, their is an Art to stealing; which is that you have to know what to steal.

So I do a little bit of work, stumble about, and Google like HELL.

Here is a better and more accurate test:

Lzo compression “Compression algorithm ‘lzo’ previously failed test”
Sebastien Nahalou
http://grokbase.com/t/cloudera/cdh-user/125s9gd3rn/lzo-compression-compression-algorithm-lzo-previously-failed-test



export JAVA_HOME="/usr/java/jdk-1.7.0.1"
export
HADOOP_CLASSPATH=/usr/lib/hbase/lib:/usr/lib/hadoop/lib:/usr/lib/hbase:/usr/lib/hadoop:/usr/lib/hadoop-hdfs:/usr/lib/hadoop/lib

export
LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/:/usr/lib/hbase/lib/native/lib/

export
HBASE_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib:/usr/lib/hbase/lib/native/lib/

export
LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib:/usr/lib/hbase/lib/native/lib/

export
JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64/lib:/usr/lib/hbase/lib/native/lib/

export
HADOOP_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib:/usr/lib/hbase/lib/native/lib/

export HADOOP_HOME=/usr/lib/hadoop-0.20-mapreduce

export HBASE_HOME=/usr/lib/hbase
hbase org.apache.hadoop.hbase.util.CompressionTest /user/root/1G.bin lzo

We had to make a couple of changes for our specific environment:



# http://grokbase.com/t/cloudera/cdh-user/125s9gd3rn/lzo-compression-compression-algorithm-lzo-previously-failed-test
# 

# dadeniji - Changed per our JDK
# we will just default to whatever JDK is the system's default
#export JAVA_HOME="/usr/java/jdk-1.7.0.1"

export HADOOP_CLASSPATH=/usr/lib/hbase/lib:/usr/lib/hadoop/lib:/usr/lib/hbase:/usr/lib/hadoop:/usr/lib/hadoop-hdfs:/usr/lib/hadoop/lib

#dadeniji added - Cloudera Distribution (CDH) jar files
#dadeniji added - /tmp/hadoop-auth.jar
#dadeniji added - /tmp/hadoop-common.jar
#dadeniji added - /tmp/hadoop-core.jar
#dadeniji added - /tmp/hadoop-lzo.jar

export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/hadoop-auth.jar
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/hadoop-common.jar
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/hadoop-core.jar
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/hadoop-lzo.jar

export LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/:/usr/lib/hbase/lib/native/lib/
export HBASE_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib:/usr/lib/hbase/lib/native/lib/
export LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib:/usr/lib/hbase/lib/native/lib/

export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64/lib:/usr/lib/hbase/lib/native/lib/

# dadeniji added a couple of CDH folders

export JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native/Linux-amd64-64 

export HADOOP_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib:/usr/lib/hbase/lib/native/lib/

export HADOOP_HOME=/usr/lib/hadoop-0.20-mapreduce

export HBASE_HOME=/usr/lib/hbase

#actual test
hbase org.apache.hadoop.hbase.util.CompressionTest file:/tmp/sarai.txt lzo

And, here is the result:



$ sh initCompTest.sh

13/05/02 22:03:52 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

13/05/02 22:03:52 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 61.9m

13/05/02 22:03:53 ERROR nativeio.NativeIO: Unable to initialize NativeIO libraries
java.lang.NoSuchMethodError: 
	at org.apache.hadoop.io.nativeio.NativeIO.initNative(Native Method)
	at org.apache.hadoop.io.nativeio.NativeIO.(NativeIO.java:106)
	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:580)
	at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:420)
	at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:452)
	at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:420)
	at org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.createOutputStream(AbstractHFileWriter.java:255)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.(HFileWriterV2.java:148)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.(HFileWriterV2.java:140)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2$WriterFactoryV2.createWriter(HFileWriterV2.java:104)
	at org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(CompressionTest.java:108)
	at org.apache.hadoop.hbase.util.CompressionTest.main(CompressionTest.java:137)

13/05/02 22:03:53 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library

13/05/02 22:03:53 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev cc0cdbdae8422d5f11096d033c4e6e0324f231f6]

13/05/02 22:03:53 INFO compress.CodecPool: Got brand-new compressor [.lzo_deflate]

13/05/02 22:03:53 INFO compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
SUCCESS

hadoop-native-lib is deprecated

what does all this mean:

  • WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available — Appears that hadoop’s own hadoop.native.lib is being deprecated in favor Java’s JDK own io.native.lib.available
  • Otherwise, everything else is good

HBASE SHELL – VALIDATION

Introduction

The most important validation is to ensure that things work inside HBASE.

Launch HBASE



 hbase Shell

create Table



create 'inventory', {NAME => 'cf1', COMPRESSION => 'lzo'}

lzo - Table Create

Describe Table



describe 'inventory'

lzo - Table Validate

References

References – Google

References – Vendor – Cloudera

References – Vendor – MAPR

    • MAPR – HBASE

http://www.mapr.com/doc/display/MapR/HBase

References – Vendor – DATAMEER

References – JAVA

References – QA

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s