I wanted to play around with Hadoop, but only had one RedHat (RHEL) 5 box to play around with. So I decided to install Cloudera Hadoop CDH4 in Pseudo-distributed mode.
In the Pseudo-distributed mode all the HDFS and MapReduce daemons run on a single node. In short, the tasks of both the NameNode and the DataNode is done by a single machine.
What I did here is basically follow the Cloudera Hadoop documentation for Pseudo-distributed mode installation.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html
Here is what I did after that: Add the Cloudera Public GPG Key to your repository by executing one of the the following commands
Install Hadoop in pseudo-distributed mode:
To install Hadoop with MRv1:
Starting Hadoop and Verifying it is Working Properly:
For MRv1, a pseudo-distributed Hadoop installation consists of one node running all five Hadoop daemons: namenode, jobtracker, secondarynamenode, datanode, and tasktracker.
To verify the hadoop-0.20-conf-pseudo packages on your system.
To view the files on Red Hat or SLES systems:
To start Hadoop, proceed as follows.
Step 1: Format the NameNode.
Before starting the NameNode for the first time you must format the file system. Make sure you perform the format of the NameNode as user hdfs. You can do this as part of the command string, using sudo -u hdfs as in the command above.
Step 2: Start HDFS
I then verified if the services started fine by cheching the web console. The NameNode provides a web console
http://localhost:50070/ for
viewing your Distributed File System (DFS) capacity, number of DataNodes, and logs. In this pseudo-distributed
configuration, you should see one live DataNode named.
Clicking on the Browse File system link above showed the following:
The NameNode logs showed the following:
Clicking on the Live Nodes above showed:
Step 3: Create the /tmp Directory
Create the /tmp directory and set permissions:
Step 4: Create the MapReduce system directories:
Step 5: Verify the HDFS File Structure
Step 6: Start MapReduce
Step 7: Create User Directories
Create a home directory for MapReduce user joe. It is best to do this on the NameNode; for example:
In the Pseudo-distributed mode all the HDFS and MapReduce daemons run on a single node. In short, the tasks of both the NameNode and the DataNode is done by a single machine.
What I did here is basically follow the Cloudera Hadoop documentation for Pseudo-distributed mode installation.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html
[root@isvx3 hadoop]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.9 (Tikanga) |
Here is what I did after that: Add the Cloudera Public GPG Key to your repository by executing one of the the following commands
[root@isvx7 ~]# sudo rpm --import http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera |
Install Hadoop in pseudo-distributed mode:
To install Hadoop with MRv1:
[root@isvx7 ~]# sudo yum install hadoop-0.20-conf-pseudo Loaded plugins: rhnplugin, security This system is receiving updates from RHN Classic or RHN Satellite. Setting up Install Process Resolving Dependencies There are unfinished transactions remaining. You might consider running yum-complete-transaction first to finish them. The program yum-complete-transaction is found in the yum-utils package. --> Running transaction check ---> Package hadoop-0.20-conf-pseudo.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated --> Processing Dependency: hadoop-hdfs-datanode = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-0.20-conf-pseudo --> Processing Dependency: hadoop-hdfs-namenode = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-0.20-conf-pseudo --> Processing Dependency: hadoop-hdfs-secondarynamenode = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-0.20-conf-pseudo --> Processing Dependency: hadoop = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-0.20-conf-pseudo --> Processing Dependency: hadoop-0.20-mapreduce-tasktracker = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-0.20-conf-pseudo --> Processing Dependency: hadoop-0.20-mapreduce-jobtracker = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-0.20-conf-pseudo --> Running transaction check ---> Package hadoop.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated --> Processing Dependency: bigtop-utils >= 0.6 for package: hadoop --> Processing Dependency: zookeeper >= 3.4.0 for package: hadoop ---> Package hadoop-0.20-mapreduce-jobtracker.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated --> Processing Dependency: hadoop-0.20-mapreduce = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-0.20-mapreduce-jobtracker ---> Package hadoop-0.20-mapreduce-tasktracker.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated ---> Package hadoop-hdfs-datanode.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated --> Processing Dependency: hadoop-hdfs = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-hdfs-datanode --> Processing Dependency: hadoop-hdfs = 2.0.0+1357-1.cdh4.3.0.p0.21.el5 for package: hadoop-hdfs-datanode ---> Package hadoop-hdfs-namenode.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated ---> Package hadoop-hdfs-secondarynamenode.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated --> Running transaction check ---> Package bigtop-utils.noarch 0:0.6.0+73-1.cdh4.3.0.p0.17.el5 set to be updated ---> Package hadoop-0.20-mapreduce.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated ---> Package hadoop-hdfs.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 set to be updated --> Processing Dependency: bigtop-jsvc for package: hadoop-hdfs ---> Package zookeeper.noarch 0:3.4.5+19-1.cdh4.3.0.p0.14.el5 set to be updated --> Running transaction check ---> Package bigtop-jsvc.x86_64 0:1.0.10-1.cdh4.3.0.p0.14.el5 set to be updated --> Finished Dependency Resolution Dependencies Resolved ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: hadoop-0.20-conf-pseudo noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 7.2 k Installing for dependencies: bigtop-jsvc x86_64 1.0.10-1.cdh4.3.0.p0.14.el5 cloudera-cdh4 48 k bigtop-utils noarch 0.6.0+73-1.cdh4.3.0.p0.17.el5 cloudera-cdh4 7.8 k hadoop noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 16 M hadoop-0.20-mapreduce noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 25 M hadoop-0.20-mapreduce-jobtracker noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 4.8 k hadoop-0.20-mapreduce-tasktracker noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 4.8 k hadoop-hdfs noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 12 M hadoop-hdfs-datanode noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 4.7 k hadoop-hdfs-namenode noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 4.8 k hadoop-hdfs-secondarynamenode noarch 2.0.0+1357-1.cdh4.3.0.p0.21.el5 cloudera-cdh4 4.7 k zookeeper noarch 3.4.5+19-1.cdh4.3.0.p0.14.el5 cloudera-cdh4 3.9 M Transaction Summary ================================================================================ Install 12 Package(s) Upgrade 0 Package(s) Total download size: 58 M Is this ok [y/N]: y Downloading Packages: (1/12): hadoop-hdfs-datanode-2.0.0+1357-1.cdh4.3.0.p0.21 | 4.7 kB 00:00 (2/12): hadoop-hdfs-secondarynamenode-2.0.0+1357-1.cdh4. | 4.7 kB 00:00 (3/12): hadoop-0.20-mapreduce-tasktracker-2.0.0+1357-1.c | 4.8 kB 00:00 (4/12): hadoop-hdfs-namenode-2.0.0+1357-1.cdh4.3.0.p0.21 | 4.8 kB 00:00 (5/12): hadoop-0.20-mapreduce-jobtracker-2.0.0+1357-1.cd | 4.8 kB 00:00 (6/12): hadoop-0.20-conf-pseudo-2.0.0+1357-1.cdh4.3.0.p0 | 7.2 kB 00:00 (7/12): bigtop-utils-0.6.0+73-1.cdh4.3.0.p0.17.el5.noarc | 7.8 kB 00:00 (8/12): bigtop-jsvc-1.0.10-1.cdh4.3.0.p0.14.el5.x86_64.r | 48 kB 00:00 (9/12): zookeeper-3.4.5+19-1.cdh4.3.0.p0.14.el5.noarch.r | 3.9 MB 00:02 (10/12): hadoop-hdfs-2.0.0+1357-1.cdh4.3.0.p0.21.el5.noa | 12 MB 00:08 (11/12): hadoop-2.0.0+1357-1.cdh4.3.0.p0.21.el5.noarch.r | 16 MB 00:11 (12/12): hadoop-0.20-mapreduce-2.0.0+1357-1.cdh4.3.0.p0. | 25 MB 00:17 -------------------------------------------------------------------------------- Total 1.4 MB/s | 58 MB 00:41 Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : bigtop-jsvc 1/12 Installing : bigtop-utils 2/12 Installing : zookeeper 3/12 Installing : hadoop 4/12 Installing : hadoop-hdfs 5/12 Installing : hadoop-0.20-mapreduce 6/12 Installing : hadoop-0.20-mapreduce-jobtracker 7/12 Installing : hadoop-0.20-mapreduce-tasktracker 8/12 Installing : hadoop-hdfs-namenode 9/12 Installing : hadoop-hdfs-datanode 10/12 Installing : hadoop-hdfs-secondarynamenode 11/12 Installing : hadoop-0.20-conf-pseudo 12/12 Installed: hadoop-0.20-conf-pseudo.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 Dependency Installed: bigtop-jsvc.x86_64 0:1.0.10-1.cdh4.3.0.p0.14.el5 bigtop-utils.noarch 0:0.6.0+73-1.cdh4.3.0.p0.17.el5 hadoop.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 hadoop-0.20-mapreduce.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 hadoop-0.20-mapreduce-jobtracker.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 hadoop-0.20-mapreduce-tasktracker.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 hadoop-hdfs.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 hadoop-hdfs-datanode.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 hadoop-hdfs-namenode.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 hadoop-hdfs-secondarynamenode.noarch 0:2.0.0+1357-1.cdh4.3.0.p0.21.el5 zookeeper.noarch 0:3.4.5+19-1.cdh4.3.0.p0.14.el5 Complete! [root@isvx7 ~]# |
Starting Hadoop and Verifying it is Working Properly:
For MRv1, a pseudo-distributed Hadoop installation consists of one node running all five Hadoop daemons: namenode, jobtracker, secondarynamenode, datanode, and tasktracker.
To verify the hadoop-0.20-conf-pseudo packages on your system.
To view the files on Red Hat or SLES systems:
[root@isvx7 ~]# rpm -ql hadoop-0.20-conf-pseudo /etc/hadoop/conf.pseudo.mr1 /etc/hadoop/conf.pseudo.mr1/README /etc/hadoop/conf.pseudo.mr1/core-site.xml /etc/hadoop/conf.pseudo.mr1/hadoop-metrics.properties /etc/hadoop/conf.pseudo.mr1/hdfs-site.xml /etc/hadoop/conf.pseudo.mr1/log4j.properties /etc/hadoop/conf.pseudo.mr1/mapred-site.xml |
To start Hadoop, proceed as follows.
Step 1: Format the NameNode.
Before starting the NameNode for the first time you must format the file system. Make sure you perform the format of the NameNode as user hdfs. You can do this as part of the command string, using sudo -u hdfs as in the command above.
Step 2: Start HDFS
[root@isvx7 ~]# for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done Starting Hadoop datanode: [ OK ] starting datanode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-datanode-isvx7.storage.tucson.ibm.com.out Starting Hadoop namenode: [ OK ] starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-isvx7.storage.tucson.ibm.com.out Starting Hadoop secondarynamenode: [ OK ] starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-isvx7.storage.tucson.ibm.com.out |
Clicking on the Browse File system link above showed the following:
The NameNode logs showed the following:
Step 3: Create the /tmp Directory
Create the /tmp directory and set permissions:
[root@isvx7 ~]# sudo -u hdfs hadoop fs -mkdir /tmp $ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp [root@isvx7 ~]# |
[root@isvx7 ~]# sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging [root@isvx7 ~]# sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging [root@isvx7 ~]# sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred |
Step 5: Verify the HDFS File Structure
[root@isvx7 ~]# sudo -u hdfs hadoop fs -ls -R / drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /tmp drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/$ drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/-R drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/-chmod drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/-u drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/1777 drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/fs drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/hadoop drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/hdfs drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:44 /user/hdfs/sudo drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:45 /var drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:45 /var/lib drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:45 /var/lib/hadoop-hdfs drwxr-xr-x - hdfs supergroup 0 2013-08-14 22:45 /var/lib/hadoop-hdfs/cache drwxr-xr-x - mapred supergroup 0 2013-08-14 22:45 /var/lib/hadoop-hdfs/cache/mapred drwxr-xr-x - mapred supergroup 0 2013-08-14 22:45 /var/lib/hadoop-hdfs/cache/mapred/mapred drwxrwxrwt - mapred supergroup 0 2013-08-14 22:45 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging [root@isvx7 ~]# |
Step 6: Start MapReduce
[root@isvx7 ~]# for x in `cd /etc/init.d ; ls hadoop-0.20-mapreduce-*` ; do sudo service $x start ; done Starting Hadoop jobtracker: [ OK ] starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-isvx7.storage.tucson.ibm.com.out Starting Hadoop tasktracker: [ OK ] starting tasktracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-tasktracker-isvx7.storage.tucson.ibm.com.out |
Step 7: Create User Directories
Create a home directory for MapReduce user joe. It is best to do this on the NameNode; for example:
[root@isvx7 ~]# sudo -u hdfs hadoop fs -mkdir /user/joe [root@isvx7 ~]# sudo -u hdfs hadoop fs -chown joe /user/joe [root@isvx7 ~]# |
No comments:
Post a Comment