Install Hadoop single node cluster in Pseudo Distributed Mode

We will install HDFS (Namenode and Datanode), YARN, MapReduce on the single node cluster in Pseudo Distributed Mode which is distributed simulation on a single machine ubuntu 18.04 server . Each Hadoop daemon such as hdfs, yarn, mapreduce etc. will run as a separate/individual java process.

  • Create user hadoop
adduser hadoop
  • Download Java JDK (i am using 8u201) under /opt
tar -xzvf jdk-8u201-linux-x64.tar.gz -C /opt
  • Set the java as default JVM
update-alternatives --install /usr/bin/java java /opt/jdk1.8.0_201/bin/java 100
update-alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_201/bin/javac 100
update-alternatives --display java
update-alternatives --display javac
  • Configure passwordless SSH
sudo apt-get install openssh-server openssh-client
sudo su
su hadoop
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost
  • Download Hadoop in hadoop user
cd ~
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
tar -xzvf hadoop-2.8.5.tar.gz
  • setup environment variable in .bashrc (edit and add)
export HADOOP_HOME=/home/hadoop/hadoop-2.8.5
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
  • source the .bashrc
source ~/.bashrc
  • Edit hadoop-env.sh
cd hadoop-2.8.5/etc/hadoop/
nano hadoop-env.sh 
export JAVA_HOME=/opt/jdk1.8.0_201
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/hadoop-2.8.5/etc/hadoop"}
  • edit coresite.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadooptmpdata</value>
</property>
</configuration>
  • create hadooptmpdata directory
mkdir /home/hadoop/hadooptmpdata
  • edit hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
  • create the directory
cd ~
mkdir -p hdfs/namenode
mkdir -p hdfs/datanode
  • edit mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
nano mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
  • edit yarn-site.xml
<configuration>
<property>
<name>mapreduceyarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
  • starting the hadoop
cd ~
hdfs namenode -format
start-dfs.sh
start-yarn.sh
  • verify
/opt/jdk1.8.0_201/bin/jps
  • check version
hadoop@zu-hadoop:~$ hadoop version
Hadoop 2.8.5
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 0b8464d75227fcee2c6e7f2410377b3d53d3d5f8
Compiled by jdu on 2018-09-10T03:32Z
Compiled with protoc 2.5.0
From source with checksum 9942ca5c745417c14e318835f420733
This command was run using /home/hadoop/hadoop-2.8.5/share/hadoop/common/hadoop-common-2.8.5.jar
hadoop@zu-hadoop:~$ hdfs version
Hadoop 2.8.5
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 0b8464d75227fcee2c6e7f2410377b3d53d3d5f8
Compiled by jdu on 2018-09-10T03:32Z
Compiled with protoc 2.5.0
From source with checksum 9942ca5c745417c14e318835f420733
This command was run using /home/hadoop/hadoop-2.8.5/share/hadoop/common/hadoop-common-2.8.5.jar

Comments are closed.