Technical: Hadoop – HBase – Compression – SNAPPY

Technical: Hadoop – HBase – Compression – SNAPPY

Introduction

Support for Snappy compression is pretty well built into Cloudera distribution of Hadoop\Hbase; especially as of CDH4.

But, if you find yourself using another distribution or want to familiarize yourself with debugging compression or 3rd party library support in Hadoop\Hbase in general you might take similar track.

The first two paragraphs speak to how to create a table that relies on SNAPPY Compression or/and how to validate the compression on an existing table.

Create Table with SNAPPY COMPRESSION

Syntax:
    create <table-name>, { NAME => <table-name>, COMPRESSION => 'SNAPPY'}

Example:

    create 't1', { NAME => 'cf1', COMPRESSION => 'SNAPPY'}

Output:
Hadoop - HBase - Create Table with SNAPPY

Review Table definition

Syntax:
    describe <table-name>

Example:

    describe 't1'

Hadoop - HBase - Describe (with SNAPPY)

The next few paragraphs speak to how to review your configuration if you ‘re unable to get SNAPPY Compression to work or just to better understand a working install\configuration.

Find Snappy Libraries

Check your OS and ensure that the snappy libraries are present.

Syntax:
    find / -name libsnappy* 2> /dev/null

Hadoop - Hbase - Compression - Libraries - Snappy

Snappy Deployment Location

The location of the Snappy files are dependent on your OS:

First, get the OS Version:


lsb_release -a


Matrix:

OS Image Bitness Folder
Oracle Linux Server release 6.3 x86_64 /usr/lib/hadoop/lib/native/

Compression – Library – Configuration

Depending on your Hadoop-Hbase distribution and if the test discussed later on fails, you might want to ensure that HBASE is aware of where your libraries are stored.

The environment variable that HBASE queries for your Library path is aptly named HBASE_LIBRARY_PATH. 

The location of the Snappy files are dependent on your OS and in some distributions it’s definition is part of the hbase-env.sh.

In Cloudera distribution the hbase-env.sh file is located in /etc/hbase/conf.cloudera.hbase1/


export HBASE_LIBRARY_PATH = <folder>

 

Compression – Library – Configuration (Backtrack)

In our case, it was difficult to determine where our distro was looking to find the library files.

And, so what to do, but to cheat.  Here is how we cheated:

  • ps -ef
  • get the folders indicated in the Djava.library.path
  • Check those folders

ps -ef | grep hbase

ps aux -- grep hbase

Get folders indicated in -Djava.library.path

Djava.library.path=/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native/Linux-amd64-64

Those folders are thus:

  • /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/lib/native
  • /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native/Linux-amd64-64

Check Folders:

Knowing where to look, let us go look:

/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native



   ls -la /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/lib/native

ls lib -- native

we have being able to confirm that the files we need are in that folder.  To support various development versions and iterations, the files are in there via symbolic linking.

  • libhadoop.so -> libhadoop.so.1.0.0
  • libhadoop.so.1.0.0
  • libsnappy.so -> libsnappy.so.1.1.3
  • libsnappy.so.1.1.3

/opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native/Linux-amd64-64 



ls -la /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hbase/lib/native/Linux-amd64-64 

ls lib -- native (x64)

The 64-bit folder is empty

Validate Compression



Syntax:
  hbase org.apache.hadoop.hbase.util.CompressionTest  snappy

Sample (file system):
  hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/sarah.txt snappy

Sample (hdfs):

 hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://tmp/sarai.txt 
 snappy

file:

Hadoop - HBase - CompressionTest - Snappy

hdfs:

Hadoop - HBase - CompressionTest - Snappy (hdfs)

References

References – Native Hadoop Libraries

References – SNAPPY

References – Mailing List

References – Hadoop\HDFS Commands

References – Linux

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s