Technical: Hadoop/Cloudera [CDH]/MetaStore – MySQL Database

Technical: Hadoop/Cloudera [CDH]/MetaStore – MySQL Database

Background

Laying the groundwork for Hadoop/Cloudera [CDH]/Hive installation and trying to do my homework.  One of the pre-requisites is a Hive MetaStore.

The Hive MetaStore stores metadata information for Hive Tables.  It provides the base plumbing for the Hive Tables.

The Hive database is a bit better positioned by externalizing some of its ‘metadata’; as a so called “Embedded Database“.

Thus, the choice in terms of reliability is very important.

Choices

The choice of MetaStore currently comes down to a few Database products.  For Cloudera 4.2 the choices are tabulated in an easy to read format @

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_2.html#topic_2_unique_4

Database Version
MySQL v5.5
PostgreSQL v8.3
Oracle v11gR2
Derby Default

Our Choice

As we are gunning for a small, self contained, wildly understood LAB environment we settled on MySQL and PostgreSQL.

It appears that post MySQL purchase by a big Database Vendor, some people have started shopping around for other databases still principally owned by the Opened Source community.

And, so PostgreSQL is growing a bit.  A couple of years ago, I set up my own WordPress Instance and found out it bundles a PostgreSQL database.

And, so I really will like to go with PostgreSQL.  But, for no known technical reason, I will stick with MySQL because it really appears that Hive is foundationally closer in terminology and usage to MySQL.

Install MySQL Server

Install MySQL Server



Syntax:
    sudo yum install mysql-server

Sample:
    sudo yum install mysql-server

Output:

mySQL - Installation - Install Log

MySQL – Start Service/daemon

Start mySQL daemon



Syntax:
      sudo service mysqld start

Sample:
      sudo service mysqld start 

Output:

mySQL - Start - Log

Explanation:

  • Mostly everything is good
  • The lone major problem is a DNS issue that has being bugging me for a few days now.
  • It all started with my Microsoft Windows Active Directory going down — Everything after that has being down hill
  • MySQL uses hostnames and domain names for user privileges and so I have to be on the look-out for hard-coded host and FQDN names

MySQL Connector

 

Introduction

To connect to MySQL from Hive one needs client libraries.

As Hive is a Java based application we need libraries that are consumable via Java.

Here is a quick overview of MySQL Connector:

Overview of MySQL Connector:

http://dev.mysql.com/doc/refman/5.6/en/connector-j-overview.html

MySQL provides connectivity for client applications developed in the Java programming language through a JDBC driver, which is called MySQL Connector/J.

MySQL Connector/J is a JDBC Type 4 driver. Different versions are available that are compatible with the JDBC 3.0 and JDBC 4.0 specifications. The Type 4 designation means that the driver is a pure Java implementation of the MySQL protocol and does not rely on the MySQL client libraries.

For large-scale programs that use common design patterns of data access, consider using one of the popular persistence frameworks such as Hibernate, Spring’s JDBC templates or Ibatis SQL Maps to reduce the amount of JDBC code for you to debug, tune, secure, and maintain.

RPM Package Name

The name of the the RPM is mysql-connector-java

 

Get Package Info

Use “yum info” to get a bit of information about the RPM.


Syntax:
      yum info <package-name>

Sample:
      yum info mysql-connector-java

Output:

Mysql - mysql-connector-java -- yum -- info

List Package Files

As we do not yet have the rpm and we do not really want to download it, let us use “repoquery” to get a listing of files that are bundled in the RPM.


Syntax:

      repoquery -lq <package-name>

Sample:
      repoquery -lq mysql-connector-java

Output:

Mysql - mysql-connector-java -- repoquery

Explanation:

  • The files will mostly be installed in /usr/share/java
  • Jar file -> /usr/share/java/mysql-connector-java-5.1.17.jar
  • Jar file -> /usr/share/java/mysql-connector-java.jar
  • Build Tool (maven)-> /etc/maven/fragments/mysql-connector-java and /usr/share/maven2/poms/JPP-mysql-connector-java.pom
  • And, a few doc files

The files that we actually need are the JAR files.

Install MySQL-Connector-Java

We have enough “datasets” to know which files we will need to copy to Hives nodes.

As they are Jar files availing them on the Hive Nodes and adjusting the ClassPath should be enough to avail them  to the Hive Clients.

Please keep in mind that the previous statement is only true for Type-4 JDBC Files which this one, MySQL, clearly is.

Let us go ahead with the actual install.



Syntax:
    sudo yum install mysql-server

Sample:
    sudo yum install mysql-connector-java

Output:

Mysql - mysql-connector-java - install - log

 

Avail MySQL Connector for Hive Usage

Avail MySQL Connector for Hive Usage

Introduction

Installing MySQL avails client modules that are needed by clients such as Hive to connect to Mysql.

These files needs to be placed in the Hive Library folder

Review Client Files

The MySQL-Connector-java Installer places the files in the /usr/share/java folder :-



Syntax:
    ls -la <folder> | grep "mysql-connector-java"

Sample:

    ls -la /usr/share/java | grep "mysql-connector-java"

 

Output:

Mysql - mysql-connector-java - client

Output Interpretation

File Use
mysql-connector-java-5.1.17.jar Actual File (Version specific)
mysql-connector-java.jar Symbolic file (that currently links to connector-java-5.1.17.jar)

Destination

On each Hadoop Hive node, the identified files will need to copied to /usr/lib/hive/lib folder

MySQL – Secure Database

Introduction

During Installation, MySQL’s root password is left un-set.

Please!, please! set it to something secured once you have finished installing MySQL Server and Client.

Hardening MySQL – Scope

The process of better protecting an Application is sometimes referred to as “Hardening” that Application.

MySQL is bundled with a fairly simple script that offer’s basic hardening. The name of the script is mysql_secure_installation.

The areas covered are listed under:

mysql_secure_installation — Improve MySQL Installation Security

http://dev.mysql.com/doc/refman/5.0/en/mysql-secure-installation.html

a) You can set a password for root accounts.

b) You can remove root accounts that are accessible from outside the local host.

c) You can remove anonymous-user accounts.

d) You can remove the test database (which by default can be accessed by all users, even anonymous users), and privileges that permit anyone to access databases with names that start with test_.

Hardening MySQL – Actual



Syntax:
    sudo /usr/bin/mysql_secure_installation

Sample:
    sudo /usr/bin/mysql_secure_installation

Output:

Mysql - secure installation

MySQL – Client Install – What is the RPM Name?

Introduction

If we will be querying MySQL from a different host, I know I need to install the RPM.

Find MySQL Client RPMs

Let us use yum list to query for packages whose name start with mysql…


Syntax:
    yum list mysql\*

Sample:
    yum list mysql\*

Output:

Mysql - yum list mysql

The RPMs that comes up are tabulated below – For each RPM, we ran “yum info <package-name>” to get package’s summary information.

 

Package Use Version#
mysql.i686 MySQL client programs and shared libraries 5.1.69-1.el6_4
mysql-connector-java.noarch Official JDBC driver for MySQL 1:5.1.17-6.el6
mysql-libs.i686 The shared libraries required for MySQL clients 5.1.69-1.el6_4
mysql-server.i686 The MySQL server and related files 5.1.69-1.el6_4
MySQL-python.i686 Python interface to MySQL 1.2.3-0.3.c1.1.el6
mysql-connector-odbc.i686 ODBC driver for MySQL 5.1.69-1.el6_4
mysql-devel.i686 Files for development of MySQL applications 5.1.69-1.el6_4
mysql-embedded.i686 MySQL as an embeddable library 5.1.69-1.el6_4
mysql-embedded-devel.i686 Development files for MySQL as an embeddable library 5.1.69-1.el6_4
mysql-test.i686 The test suite distributed with MySQL 5.1.69-1.el6_4

Choice

The most likely RPMs appear to be the plain mysql.i686

MySQL – Secure Database (Review)

Introduction

Let us quickly review our MySQL Instance and make sure that the changes look OK, and that MySQL is actually installed and available via the root password that we used.

Connect to mysql

Connect to mysql



Syntax:
    /usr/bin/mysql --host <hostname> --user root --password

Sample:
    /usr/bin/mysql --host localhost --user root --password

Output:

Mysql - Client - Initiate

Issue Test

Issue Tests

Issue Test – Does root account have a password

Check the mysql.user table and review the user, host, and password entries:



Syntax:
         select user, host, password from mysql.user 

Sample:
         select user, host, password from mysql.user

Output:

Mysql -- Query -- mysql.user

Explanation:

  • As the password actually has contents, it is invariably not blank/empty
  • There are two entries — one for localhost and the other one 127.0.0.1; 127.0.0.1 is very useful for cases where your DNS Name resolution is not working
  •  More information about 127.0.0.1 is available @

    http://dev.mysql.com/doc/refman/5.0/en/default-privileges.html

    On Unix, each root account permits connections from the local host. Connections can be made by specifying the host name localhost, the IP address 127.0.0.1, or the actual host name or IP address.

Issue Test – Disallow root account from outside the Database Server

Check the mysql.user table and review the user, host, and password entries:



Syntax:
         select user, host, password from mysql.user 

Sample:
         select user, host, password from mysql.user

Output:

Mysql -- Query -- mysql.user

Explanation:

  • Ensure that root records that have hosts bearing localhost and 127.0.0.1 are the only ones present

Issue Test – Anonymous user

Check the mysql.user table and review the user, host, and password entries:



Syntax:
         select user, host, password from mysql.user 

Sample:
         select user, host, password from mysql.user

Output:

Mysql -- Query -- mysql.user

Explanation:

  • Ensure that the anonymous user is not present
  • An empty user column in the mysql.user table signifies that anonymous users have access
  • Here is a quick sample for granting permissions to the anonymous user

    How to mySQL create Anonymous Account
    http://www.cyberciti.biz/tips/howto-mysql-create-anonymous-account.html

    mysql> use <dbname>;

    mysql> GRANT SELECT ON xyz TO ”@localhost

Issue Test – Is the Test database present

Issue “show databases”



Syntax:
         show databases;

Sample:
         show databases;

Output:

Mysql -- Metadata -- show databases

Explanation:

  • The test database is still present
  • Will ignore for now

MySQL – Configuration – Availability

Introduction

We are making MySQL a big part of our Hadoop\Hive install.

Does it AutoStart?

We want it on, all the time!


Syntax:
         sudo /sbin/chkconfig --list <service-name>

Sample:
         sudo /sbin/chkconfig --list mysqld

Output:

chkconfig -- list -- mysqld

Explanation:

  • mysqld is not configured to auto-start

let us go correct that!

Set for AutoStart

Set for AutoStart..


Syntax:

         --to auto-start a service to auto-start at runlevels 2,3,4,5
         sudo /sbin/chkconfig <service-name> on

         --to auto-start a service to auto-start at specific run-levels
         sudo /sbin/chkconfig <service-name> on --levels <runlevels>

Sample:
         sudo /sbin/chkconfig mysqld on

         sudo /sbin/chkconfig mysql on --levels 345

Output:

chkconfig -- set -- levels (v2)

Explanation:

  • mysqld is being configured to auto-start at the following levels (2,3,4,5)

MySQL – Configuration – Resource Utilization

Introduction

Depending on what MySQL is being used for and how heavily it is being leaned on, it is a good idea to set maximum resource usage limits.

Memory

It does not appear that MySQL in its current iteration has a single setting for overall memory utilization.

Other database systems, i.e. Microsoft SQL Server has settings such as ‘max server memory’ that is useful in getting close to setting the max amount of memory a SQL Server Instance will ever request.

Unfortunately, setting max memory involves more component level setting in MySQL.

Please Google to broaden your familiarity with this issue.  Here are some of what I came up with:

MySQL – Configuration – ErrorLog ( Looking for Answers)

Introduction

It goes without saying that unlike a client based desktop application where errors blow up in your face via message boxes or through specially highlighted / color coded error messages, errors in server (daemon) applications are quietly logged in error logs.

It is up to you the sudden MySQL proprietor to determine where the logs are placed and monitor the log metadata (sizes) and contents.

Where are the Log Files?

Where are the log files? I do not know either.

So let us google for “Mysql Error Log Location”.

Google nailed it as I quickly found “Ronald Bradford”. Ronald calls himself the MySQL Expert.  The specific blog posting I found is:

Monitoring MySQL The Error Log
http://ronaldbradford.com/blog/monitoring-mysql-the-error-log-2009-09-16/

Based on his presentation alone, you can tell this Kid is serious about his MySQL.

As the good word says:

And how will anyone go and tell them without being sent? That is why the Scriptures say, “How beautiful are the feet of messengers who bring good news!”

From Ronald’s blog, I know a lot more about MySQL than I would have ever known:

a) If not specified the default is [datadir]/[hostname].err
b) He advises that the log location should be changed via editing the my.cnf file

MySQL – Configuration – ErrorLog ( Researching Answers )

Introduction

Again, Ronald says to look under the DataDir

Research

Where is the DataDir?  First, where is MySQL running from?

As I know I have a running mySQL process, I will try using ps.  On the other hand, if mySQL is not running would probably have issued “find / -name datadir 2> /dev/null”.


Syntax:
         ps -aux | grep -i mysql

Sample:
         ps -aux | grep -i mysql

Output:

Mysql -- ps

Explanation:

  •  The datadir is /var/lib/mysql

Check /var/lib/mysql for *.err file



Syntax:
         ls -la /var/lib/mysql

Sample:
         ls -la /var/lib/mysql

Output:

Mysql -- Folder Listing -- var-lib-mysql

Explanation:

    • The error file is not there
    • But, like all good directions, I know Ronald Bradford’s instructions are good and complete
    • If we go back and look at result of “ps  -aux | grep “mysql”
      
      Command: 
      ps -aux | grep -i mysql | grep -v "grep" 2>/dev/null
      
      Output: 
      
      root 1676 0.0 0.0 5120 1408 ? S 08:23 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
      
      mysql 1778 0.0 0.5 135476 15232 ? Sl 08:23 0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
      
      [dadeniji@rachel ~]$
      
      
    • Thus, it appears from the command line that “log error” is pointing at /var/log/mysqld.log
    • When I issued “cat /var/log/mysqld.log”, received error message stating that “permission is denied”; which makes sense as mysql is running under the mysql user; this also follows what Ronald wrote.  He suggested that we change the permission set for this file; and allow it be view-able by other users who are responsible for monitoring application and Administrative users:
    • Mysql - Log File - var-log-mysql (permission denied) Validated instruction set of /var/log/mysqld.log
    • Mysql - Log File - var-log-mysql

Application Specific Tooling

So once the base MySQL Engine is installed, the next thing to do is “tailor” it for various applications.

That tailoring will include creating Application specific principals (users), objects, and granting users access to the declared objects.

For instance, Hive has its own user and objects that we will have to create as part of its installation.

Since we do not have Hive installed yet, we will wait upon installing it and thus have easy and ready access to the SQL Scripts that contains the object creation statements.

Application Specific Tooling – Hive

As a place holder, here are the steps for HIVE.

For Hive, the steps above are concertize as:

  • Create database known as metastore
  • The schema script’s name is formatted as  /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-x.y.z.mysql.sql; ie /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql
  • Create Hive specific database user
  • Grant permissions to the Hive user
  • Configure Application specific configuration file.  For Hive the file’s name is hive-site.xml 

References

References – RPM

References – MySQL – mysql_secure_installation

References – Services – Configuration (chkconfig)

References – MySQL – Error Log

References – MySQL – Permissions

One thought on “Technical: Hadoop/Cloudera [CDH]/MetaStore – MySQL Database

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s