Video version of this article: https://www.youtube.com/watch?v=Slbi-uzPtnw
Credits:@codewitharjun
(Video uses cli text editor to edit config files, this tutorial will use normal text editor.)
First install java-jdk-8
sudo apt install openjdk-8-jdk
(Optional) To check it’s there
cd /usr/lib/jvm
Now ensure that you are at the root of the terminal if not run
cd ~
Open .bashrc file
sudo gedit .bashrc
And paste in the following block
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-8-openjdk-amd64/bin
export HADOOP_HOME=~/hadoop-3.3.6/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export PDSH_RCMD_TYPE=ssh
sudo apt-get install ssh
For following commands check your hadoop version number at the time of writing this it is 3.3.6
Now go to hadoop.apache.org website download the tar file.
Direct Link
Website Link
Once downloaded execute:
(To extract the tar file)
tar -zxvf ~/Downloads/hadoop-3.3.6.tar.gz
For all the configuration below ensure you are in hadoop-3.3.6/etc/hadoop
directory.
cd hadoop-3.3.6/etc/hadoop
Many of the files might have <configuration>
tag already so watch before you paste in new configurations.
Now open hadoop-env.h:
sudo gedit hadoop-env.h
Paste the following in hadoop-env.h:
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
(set the path for JAVA_HOME)
You might not need to use sudo in the following commands but to avoid permission issues I have added it to everything.
Let's configure other files similarly:
sudo gedit core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value> </property>
<property>
<name>hadoop.proxyuser.dataflair.groups</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.dataflair.hosts</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.hosts</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.groups</name> <value>*</value>
</property>
</configuration>
sudo gedit hdfs-site.xml
Change <User> with your ubuntu username !
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href=''?>
<configuration>
<property>
<name>dfs.name.dir</name>
<value>file:///home/<USER>/pseudo/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/<USER>/pseudo/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
sudo gedit mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name> <value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
sudo gedit yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREP END_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Hadoop is now configured.
Next execute following one by one:
ssh
ssh localhost
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
hadoop-3.3.6/bin/hdfs namenode -format
export PDSH_RCMD_TYPE=ssh
To start hadoop
start-all.sh
To check if hadoop is running go to http://localhost:9870/
To stop hadoop
stop-all.sh
This is an update version to this article
Top comments (0)