DEV Community

Ashwin Telmore
Ashwin Telmore

Posted on

How to Install Hadoop on Ubuntu 18.04 or 20.04

Prerequisites

  • Access to a terminal window/command line
  • Sudo or root privileges on local /remote machines

Use the following command to update your system before initiating a new installation:

sudo apt update
Enter fullscreen mode Exit fullscreen mode

Type the following command in your terminal to install OpenJDK 8:

sudo apt install openjdk-8-jdk -y
Enter fullscreen mode Exit fullscreen mode

Once the installation process is complete, verify the current Java version:

java -version; javac -version
Enter fullscreen mode Exit fullscreen mode

Install OpenSSH on Ubuntu
Install the OpenSSH server and client using the following command:

sudo apt install openssh-server openssh-client -y
Enter fullscreen mode Exit fullscreen mode

Create Hadoop User

Utilize the adduser command to create a new Hadoop user:

sudo adduser ashwin
Enter fullscreen mode Exit fullscreen mode

enter the corresponding password:

su - ashwin
Enter fullscreen mode Exit fullscreen mode

Generate an SSH key pair and define the location is is to be stored in:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Enter fullscreen mode Exit fullscreen mode

Use the cat command to store the public key as authorized_keys in the ssh directory:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Enter fullscreen mode Exit fullscreen mode

Set the permissions for your user with the chmod command:

chmod 0600 ~/.ssh/authorized_keys
Enter fullscreen mode Exit fullscreen mode

hdoop user to SSH to localhost:

ssh localhost
Enter fullscreen mode Exit fullscreen mode

Download and Install Hadoop on Ubuntu
Visit the official Apache Hadoop project page, and select the version of Hadoop you want to implement.

tar xzf hadoop-3.2.1.tar.gz
Enter fullscreen mode Exit fullscreen mode

Single Node Hadoop Deployment (Pseudo-Distributed Mode)

Edit the .bashrc shell configuration file using a text editor of your choice (we will be using nano):

sudo nano .bashrc
Enter fullscreen mode Exit fullscreen mode

Define the Hadoop environment variables by adding the following content to the end of the file:

#Hadoop Related Options
export HADOOP_HOME=/home/ashwin/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS"-Djava.library.path=$HADOOP_HOME/lib/nativ"
Enter fullscreen mode Exit fullscreen mode

It is vital to apply the changes to the current running environment by using the following command:

source ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Use the previously created $HADOOP_HOME variable to access the hadoop-env.sh file:

sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Enter fullscreen mode Exit fullscreen mode

. If you have installed the same version as presented in the first part of this tutorial, add the following line:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Use the provided path to find the OpenJDK directory with the following command:

echo $JAVA_HOME
Enter fullscreen mode Exit fullscreen mode

Open the core-site.xml file in a text editor:

sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml
Enter fullscreen mode Exit fullscreen mode

Add the following configuration to override the default values for the temporary directory and add your HDFS URL to replace the default local file system setting:

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/ashwin/tmpdata</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

Use the following command to open the hdfs-site.xml file for editing:

sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Enter fullscreen mode Exit fullscreen mode

Add the following configuration to the file and, if needed, adjust the NameNode and DataNode directories to your custom locations:

<configuration>
<property>
  <name>dfs.data.dir</name>
  <value>/home/ashwin/dfsdata/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/home/ashwin/dfsdata/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

Use the following command to access the mapred-site.xml file and define MapReduce values:

sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
Enter fullscreen mode Exit fullscreen mode

Add the following configuration to change the default MapReduce framework name value to yarn:

<configuration> 
<property> 
  <name>mapreduce.framework.name</name> 
  <value>yarn</value> 
</property> 
</configuration>
Enter fullscreen mode Exit fullscreen mode

Open the yarn-site.xml file in a text editor:

sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
Enter fullscreen mode Exit fullscreen mode

Append the following configuration to the file:

<configuration>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>   
  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

It is important to format the NameNode before starting Hadoop services for the first time:

hdfs namenode -format
Enter fullscreen mode Exit fullscreen mode

Navigate to the hadoop-3.2.1/sbin directory and execute the following commands to start the NameNode and DataNode:

./start-all.sh

Enter fullscreen mode Exit fullscreen mode

Type this simple command to check if all the daemons are active and running as Java processes:

jps
Enter fullscreen mode Exit fullscreen mode

Done! 🙂

Top comments (0)