Tuesday, April 1, 2014

Apache Ambari to install Hadoop

To begin with I used VMWare Workstation to create two clones, where each node was running Red Hat Enterprise Linux.

Below are some more details of the system that I collected:

OS Version: Linux 2.6.18-194.17.4.el5
RedHat Release Version: Red Hat Enterprise Linux Server release 5.5 (Tikanga)

Number of CPU/Sockets: 0

Number of Hardware threads: 2

Processor Type:         Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz

Memeory: 3.85717391967773 GB

IP address: and

After creating the first node of the cluster, below are some of the changes I made to it before
I cloned it. Edit the following files to prepare the node for cloning

/etc/resolv.conf file:

search example.com



Disable selinux:

vi /etc/selinux/config


Kill iptables:

#chkconfig iptables off

IP address in ifcfg-<NIC>

# cd /etc/sysconfig/network-scripts
# vi ifcfg-eth0


Install perl and openssh clients:

# yum -y install perl openssh-clients

Entries in /etc/hosts file: bivm.example.com bivm bivn.example.com bivn

Setup password-less ssh:
This is not a mandatory step, it's needed if you want Ambari Server to automatically install Ambari Agents in all your cluster hosts. The other option is to manually install Ambari agents on the other nodes.

# ssh-keygen
ssh-keygen generates a public/private rsa key
Enter file in which key to save the key (/root/.ssh/id_rsa):
Create directory '/root/.ssh'
Enter password (empty for no passphase):
Enter same passphase again:
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key finger print is:
# cd .ssh
# ls
id_rsa  id_rsa.pub
# cp id_rsa.pub authorised_keys

Edit the /etc/ssh/ssh_config file
# vi /etc/ssh/ssh_config

StrictHostKeyChecking no

Restart the network:
Not necessary at this point, but I just like to check if everything can still start again, and that I've not messed up anything.
# /etc/init.d/network restart

After the clone is created using the VMWare Workstation ie. from the Menu go to VM->Manage->Clone.
This will bring the "Clone Virtual Machine Wizard". You cannot make a clone of a virtual machine
that is powered on or suspended, so power off the virtual machine that needs to be cloned.

Once the cloning has completed, make the appropriate changes(name of the new clone) in the /etc/sysconfig/network. Also, edit the /etc/sysconfig/network-scripts to hold the ipaddress of the new clone ie. IPADDR= 

Next, I installed Ambari by following the docs: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-

One of the things to note here is list of existing installs http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-
I have postgressql 9.x, and this gave me some grief. Removing postgres 9.x and installing prostgres 8.0 fixed the issue.

Ambari does not accept a list of IP addresses, and needs a Fully Qualified Domain Name hostname for each of the hosts of the cluster. You can get this as follows:
# hostname -f 

Now that we have the two hosts of the cluster fully setup. We will begin running the Ambari Installer.Here is the method I used to download the Ambari repo file, and then copy it into the /etc/yum.repos.d directory.

[biadmin@bivm ~]$ sudo wget
--2014-03-27 17:34:45--
Resolving public-repo-1.hortonworks.com...,,, ...
Connecting to public-repo-1.hortonworks.com||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 751 [application/octet-stream]
Saving to: `ambari.repo'

100%[======================================>] 751         --.-K/s   in 0s

2014-03-27 17:34:45 (51.2 MB/s) - `ambari.repo' saved [751/751]

[biadmin@bivm ~]$ sudo cp ambari.repo /etc/yum.repos.d

Do a "yum repolist" to check if the repository is configured correctly.

[biadmin@bivm ~]$ yum repolist
Loaded plugins: rhnplugin, security
*Note* Red Hat Network repositories are not listed below. You must run
this command as root to access RHN repositories.
HDP-UTILS-                                       |  951 B     00:00
HDP-UTILS-                               |  20 kB     00:00
HDP-UTILS-                                                        65/65
Updates-ambari-1.x                                       |  951 B     00:00
Updates-ambari-1.x/primary                               | 7.5 kB     00:00
Updates-ambari-1.x                                                        65/65
ambari-1.x                                               |  951 B     00:00
ambari-1.x/primary                                       | 1.9 kB     00:00
ambari-1.x                                                                  5/5
repo id            repo name                                         status
HDP-UTILS- Hortonworks Data Platform Utils Version - HDP-UTI enabled: 65
Updates-ambari-1.x ambari-1.x - Updates                              enabled: 65
ambari-1.x         Ambari 1.x                                        enabled:  5
repolist: 135
[biadmin@bivm ~]$

Next, we will install the Ambari bits using yum on one of the hosts, in this case I selected bivm to be my Ambari server.

[biadmin@bivm ~]$ sudo yum install ambari-server
Loaded plugins: rhnplugin, security
This system is not registered with RHN.
RHN support will be disabled.
HDP-UTILS-                                       |  951 B     00:00
HDP-UTILS-                               |  20 kB     00:00
HDP-UTILS-                                                        65/65
Updates-ambari-1.x                                       |  951 B     00:00
Updates-ambari-1.x/primary                               | 7.5 kB     00:00
Updates-ambari-1.x                                                        65/65
ambari-1.x                                               |  951 B     00:00
ambari-1.x/primary                                       | 1.9 kB     00:00
ambari-1.x                                                                  5/5
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package ambari-server.noarch 0: set to be updated
--> Processing Dependency: postgresql-server >= 8.1 for package: ambari-server
--> Processing Dependency: python26 for package: ambari-server
--> Running transaction check
---> Package ambari-server.noarch 0: set to be updated
--> Processing Dependency: postgresql-server >= 8.1 for package: ambari-server
---> Package python26.x86_64 0:2.6.8-2.el5 set to be updated
--> Processing Dependency: libffi.so.5()(64bit) for package: python26
--> Processing Dependency: libpython2.6.so.1.0()(64bit) for package: python26
--> Running transaction check
---> Package ambari-server.noarch 0: set to be updated
--> Processing Dependency: postgresql-server >= 8.1 for package: ambari-server
---> Package libffi.x86_64 0:3.0.5-1.el5 set to be updated
---> Package python26-libs.x86_64 0:2.6.8-2.el5 set to be updated
--> Finished Dependency Resolution
ambari-server- from Updates-ambari-1.x has depsolving problems
  --> Missing Dependency: postgresql-server >= 8.1 is needed by
package ambari-server- (Updates-ambari-1.x)
Error: Missing Dependency: postgresql-server >= 8.1 is needed by
package ambari-server- (Updates-ambari-1.x)
 You could try using --skip-broken to work around the problem
 You could try running: package-cleanup --problems
                        package-cleanup --dupes
                        rpm -Va --nofiles --nodigest
The program package-cleanup is found in the yum-utils package.
[biadmin@bivm ~]$

[biadmin@bivm ~]$ sudo yum install ambari-server
Loaded plugins: rhnplugin, security
This system is not registered with RHN.
RHN support will be disabled.
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package ambari-server.noarch 0: set to be updated
--> Processing Dependency: postgresql-server >= 8.1 for package: ambari-server
--> Processing Dependency: python26 for package: ambari-server
--> Running transaction check
---> Package postgresql93-server.x86_64 0:9.3.4-1PGDG.rhel5 set to be updated
--> Processing Dependency: postgresql93 = 9.3.4-1PGDG.rhel5 for
package: postgresql93-server
--> Processing Dependency: libpq.so.5()(64bit) for package: postgresql93-server
---> Package python26.x86_64 0:2.6.8-2.el5 set to be updated
--> Processing Dependency: libffi.so.5()(64bit) for package: python26
--> Processing Dependency: libpython2.6.so.1.0()(64bit) for package: python26
--> Running transaction check
---> Package libffi.x86_64 0:3.0.5-1.el5 set to be updated
---> Package postgresql93.x86_64 0:9.3.4-1PGDG.rhel5 set to be updated
---> Package postgresql93-libs.x86_64 0:9.3.4-1PGDG.rhel5 set to be updated
---> Package python26-libs.x86_64 0:2.6.8-2.el5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

 Package               Arch     Version              Repository            Size
 ambari-server         noarch           Updates-ambari-1.x    35 M
Installing for dependencies:
 libffi                x86_64   3.0.5-1.el5          HDP-UTILS-    25 k
 postgresql93          x86_64   9.3.4-1PGDG.rhel5    pgdg93               1.7 M
 postgresql93-libs     x86_64   9.3.4-1PGDG.rhel5    pgdg93               220 k
 postgresql93-server   x86_64   9.3.4-1PGDG.rhel5    pgdg93               5.6 M
 python26              x86_64   2.6.8-2.el5          HDP-UTILS-   6.5 M
 python26-libs         x86_64   2.6.8-2.el5          HDP-UTILS-   696 k

Transaction Summary
Install       7 Package(s)
Upgrade       0 Package(s)

Total download size: 49 M
Is this ok [y/N]: y
Downloading Packages:
(1/7): libffi-3.0.5-1.el5.x86_64.rpm                     |  25 kB     00:00
(2/7): postgresql93-libs-9.3.4-1PGDG.rhel5.x86_64.rpm    | 220 kB     00:00
(3/7): python26-libs-2.6.8-2.el5.x86_64.rpm              | 696 kB     00:01
(4/7): postgresql93-9.3.4-1PGDG.rhel5.x86_64.rpm         | 1.7 MB     00:02
(5/7): postgresql93-server-9.3.4-1PGDG.rhel5.x86_64.rpm  | 5.6 MB     00:06
(6/7): python26-2.6.8-2.el5.x86_64.rpm                   | 6.5 MB     00:06
(7/7): ambari-server-               |  35 MB     00:37
Total                                           854 kB/s |  49 MB     00:59
warning: rpmts_HdrFromFdno: Header V3 RSA/SHA1 signature: NOKEY, key ID 07513cad
Updates-ambari-1.x/gpgkey                                | 1.6 kB     00:00
Importing GPG key 0x07513CAD "Jenkins (HDP Builds)
<jenkin@hortonworks.com>" from
Is this ok [y/N]: y
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : postgresql93-libs                                        1/7
  Installing     : postgresql93                                             2/7
  Installing     : postgresql93-server                                      3/7
  Installing     : libffi                                                   4/7
  Installing     : python26                                                 5/7
  Installing     : python26-libs                                            6/7
  Installing     : ambari-server                                            7/7

  ambari-server.noarch 0:

Dependency Installed:
  libffi.x86_64 0:3.0.5-1.el5
  postgresql93.x86_64 0:9.3.4-1PGDG.rhel5
  postgresql93-libs.x86_64 0:9.3.4-1PGDG.rhel5
  postgresql93-server.x86_64 0:9.3.4-1PGDG.rhel5
  python26.x86_64 0:2.6.8-2.el5
  python26-libs.x86_64 0:2.6.8-2.el5


Though we see that the "yum install ambari-server" completed fine with postgresql93-server, there were issues with the setup

[biadmin@bivm ~]$ sudo /usr/sbin/ambari-server setup
Using python  /usr/bin/python2.6
Setup ambari-server
Checking SELinux...
SELinux status is 'disabled'
Customize user account for ambari-server daemon [y/n] (n)?
Adjusting ambari-server permissions and ownership...
Checking iptables...
Checking JDK...
To download the Oracle JDK you must accept the license terms found at
and not accepting will cancel the Ambari Server setup.
Do you accept the Oracle Binary Code License Agreement [y/n] (y)?
Downloading JDK from
to /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin
JDK distribution size is 85581913 bytes
jdk-6u31-linux-x64.bin... 100% (81.6 MB of 81.6 MB)
Successfully downloaded JDK distribution to
Installing JDK to /usr/jdk64
Successfully installed JDK to /usr/jdk64/jdk1.6.0_31
Downloading JCE Policy archive from
http://public-repo-1.hortonworks.com/ARTIFACTS/jce_policy-6.zip to
Successfully downloaded JCE Policy archive to
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)?
Default properties detected. Using built-in database.
Checking PostgreSQL...
Running initdb: This may take upto a minute.
About to start PostgreSQL
ERROR: Exiting with exit code 1. Reason: Unable to start PostgreSQL
server. Exiting
[biadmin@bivm ~]$

Fix for the PostgreSQL issue: Remove postgresql93-server and use an older version, in this case I installed PostgreSQL 8.4 5Server - x86_64

[biadmin@bivm ~]$ yum repolist
Loaded plugins: rhnplugin, security
*Note* Red Hat Network repositories are not listed below. You must run
this command as root to access RHN repositories.
repo id            repo name                                        status
HDP-UTILS- Hortonworks Data Platform Utils Version - HDP-UT enabled:  65
Updates-ambari-1.x ambari-1.x - Updates                             enabled:  65
ambari-1.x         Ambari 1.x                                       enabled:   5
pgdg84             PostgreSQL 8.4 5Server - x86_64                  enabled: 176
repolist: 311

[biadmin@bivm ~]$ sudo yum install ambari-server
Loaded plugins: rhnplugin, security
This system is not registered with RHN.
RHN support will be disabled.
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package ambari-server.noarch 0: set to be updated
--> Processing Dependency: postgresql-server >= 8.1 for package: ambari-server
--> Running transaction check
---> Package postgresql-server.x86_64 0:8.4.21-1PGDG.rhel5 set to be updated
--> Processing Dependency: postgresql = 8.4.21-1PGDG.rhel5 for
package: postgresql-server
--> Processing Dependency: libpq.so.5()(64bit) for package: postgresql-server
--> Running transaction check
---> Package postgresql.x86_64 0:8.4.21-1PGDG.rhel5 set to be updated
---> Package postgresql-libs.x86_64 0:8.4.21-1PGDG.rhel5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

 Package             Arch     Version                Repository            Size
 ambari-server       noarch             Updates-ambari-1.x    35 M
Installing for dependencies:
 postgresql          x86_64   8.4.21-1PGDG.rhel5     pgdg84               1.7 M
 postgresql-libs     x86_64   8.4.21-1PGDG.rhel5     pgdg84               209 k
 postgresql-server   x86_64   8.4.21-1PGDG.rhel5     pgdg84               5.2 M

Transaction Summary
Install       4 Package(s)
Upgrade       0 Package(s)

Total download size: 42 M
Is this ok [y/N]: y
Downloading Packages:
(1/4): postgresql-libs-8.4.21-1PGDG.rhel5.x86_64.rpm     | 209 kB     00:00
(2/4): postgresql-8.4.21-1PGDG.rhel5.x86_64.rpm          | 1.7 MB     00:02
(3/4): postgresql-server-8.4.21-1PGDG.rhel5.x86_64.rpm   | 5.2 MB     00:09
(4/4): ambari-server-               |  35 MB     00:15
Total                                           1.5 MB/s |  42 MB     00:28
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : postgresql-libs                                          1/4
  Installing     : postgresql                                               2/4
  Installing     : postgresql-server                                        3/4
warning: /var/lib/pgsql/.bash_profile created as
  Installing     : ambari-server                                            4/4

  ambari-server.noarch 0:

Dependency Installed:
  postgresql.x86_64 0:8.4.21-1PGDG.rhel5
  postgresql-libs.x86_64 0:8.4.21-1PGDG.rhel5
  postgresql-server.x86_64 0:8.4.21-1PGDG.rhel5

[biadmin@bivm ~]$

After the Ambari bits installed fine, we have to setup the Ambari server. This is done as follows:

[biadmin@bivm ~]$ sudo /usr/sbin/ambari-server setup
Using python  /usr/bin/python2.6
Setup ambari-server
Checking SELinux...
SELinux status is 'disabled'
Customize user account for ambari-server daemon [y/n] (n)?
Adjusting ambari-server permissions and ownership...
Checking iptables...
Checking JDK...
JDK already exists, using
Installing JDK to /usr/jdk64
Successfully installed JDK to /usr/jdk64/jdk1.6.0_31
JCE Policy archive already exists, using
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)?
Default properties detected. Using built-in database.
Checking PostgreSQL...
Running initdb: This may take upto a minute.
Initializing database: [  OK  ]

About to start PostgreSQL
Configuring local database...
Connecting to the database. Attempt 1...
Configuring PostgreSQL...
Restarting PostgreSQL
Ambari Server 'setup' completed successfully.

After the Ambari setup completes successfully we need to start the Ambari server

[biadmin@bivm ~]$ ls /usr/sbin/ambari-*
/usr/sbin/ambari-agent        /usr/sbin/ambari-server        /usr/sbin/ambari-server.py

[biadmin@bivm ~]$ sudo /usr/sbin/ambari-server start
Using python  /usr/bin/python2.6
Starting ambari-server
Ambari Server running with 'root' privileges.
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Ambari Server 'start' completed successfully.

[biadmin@bivm ~]$ ps -ef | grep Ambari
root       32780       1  0 Mar31 pts/1    00:01:52 /usr/jdk64/jdk1.6.0_31/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -cp /etc/ambari-server/conf:/usr/lib/ambari-server/*:/usr/bin:/bin:/usr/lib/ambari-server/* org.apache.ambari.server.controller.AmbariServer
root       32980       1  0 Mar31 pts/1    00:00:00 /usr/bin/python2.6 /usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py start
biadmin   114324  110797  0 14:14 pts/4    00:00:00 grep Ambari

Now that we have Ambari server installed and running on the cluster node, we will use the Web interfacer
of Ambari to Install, Configure, and Deploy Hadoop 2.x on the cluster nodes.

I then pointed my browser to http://bivm.example.com:8080
I had some issues here as the default Firefox browser I was using was really old. I installed the latest
version of the Firefox browser, and then was able to see the Ambari server username/password screen.
admin/admin to enter.

The Welcome to Apache Ambari page asks for a cluster name. I called mine "myhadoop".

Next, it asked me to Select the service stack that I want to use to install Hadoop. Here I select
HDP 2.0.6

Next, in the Initial Options scteen I entered the Fully Qualified Name(FQDN) name of my cluster nodes
on each line. Under the Host Registeration Information, I selected "Perform manual registeration  on hosts
and do not use SSH".

In the next screen I confirmed the cluster hosts, and choose the services that I would like to have installed. The list of services include HDFS, YARN, MapReduce, Hive, WebCat, HBase, ZooKeeper, Oozie, Nagios, Ganglia, Mics.

I then assigned bivm.examples.com as my Master, and bivn.examples as the Slave. The NameNode would be running on bivm.examples.com, while the DataNode would be running on

During the "Review" stage of the Ambari Cluster Install Wizard, my second node ie. bivn.example.com Failed. On looking at the error log, I said that it was missing the net-snmp and net-snmp-utils packages. I then went back and disabled some of the services like Nagios and Ganglia to see it the error goes away, but it didn't. The only way to fix this was to do a yum install of snmp on node bivn.example.com

[biadmin@bivn ~]$ sudo rpm -qa | grep snmp

Next, the "Install, Start and Test" page appeared. The install on both nodes went on for some time,and at around 22% it complained that "hadoop-lzo" was missing. Again, "yum install hadoop-lzo" fixed that issue and we moved forward....not for long though.

At around 33% Status, both nodes gave the Message "Install complete (Waiting to start)". On clicking the Message on node bivn.example.com I see that all the Tasks ie. DataNode install, HBase install, HCat install, HDFS install, Hive Client install, Hive Metastore install, HiverServer2 install, MapReduce2 Client install, NodeManager install, Pig install, ResourceManager install, SNameNode install, WebHCat Server install, YARN Client install, ZooKeeper Client install have a gree check mark, and says "executed successfully"

On clicking the Message on node bivm.example.com its says "No tasks to show".

Fix: or at at least what got it moving for me.

On looking at the ambari-agent and ambari-server logs, I noticed that the agentVersion = while the serverVersion =
I upgraded Ambari server to, and this got things moving, but not for long. Now I hit another error.

 In this case most of the services installed fine ie. DataNode, HBase client, HBase Region Server, HCat, HDFS client, History server, Hive Client, Hive Metastore, Hive Server2, MapReduce2 Client all installed fine.

The failure came from MySQL Server install. Below is the complete error message that I got.

stderr:   /var/lib/ambari-agent/data/errors-314.txt

2014-04-02 13:30:53,601 - Error while executing command 'install':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 95, in execute
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/mysql_server.py", line 30, in install
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/mysql_server.py", line 51, in configure
    mysql_service(daemon_name=params.daemon_name, action='stop')
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/mysql_service.py", line 35, in mysql_service
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 239, in action_run
    raise ex
Fail: Execution of 'service mysqld stop' returned 1. /etc/profile: line 33: id: command not found
Stopping MySQL:  [FAILED]

Did the below test on the node and it worked fine, so I'm not sure what causes the above errro.

[biadmin@bivn ambari-agent]$ sudo rpm -qa | grep mysql
[biadmin@bivn ambari-agent]$ sudo service mysqld stop
Stopping MySQL:                                            [FAILED]
[biadmin@bivn ambari-agent]$ sudo service mysqld start
Starting MySQL:                                            [  OK  ]
[biadmin@bivn ambari-agent]$ sudo service mysqld stop
Stopping MySQL:                                            [  OK  ]
[biadmin@bivn ambari-agent]$

<Not sure what the issue is, work in progress>

No comments: