DEV Community

Tomer Ben David
Tomer Ben David

Posted on

Local hadoop on laptop for practice

Introduction

Here is what I learned last week about hadoop installation:

Hadoop sounds like a really big thing, complex installation, cluster, hundreds of machines, Tera's if not Peta's of data, but actually, you can download a simple jar and run hadoop with hdfs on your laptop, for practice, it's very easy!

Our plan

  1. Setup JAVA_HOME (hadoop is built on java).
  2. Download hadoop tar.gz.
  3. Extract hadoop tar.gz
  4. Setup hadoop config
  5. Start and format hdfs
  6. Upload files to hdfs.
  7. Run hadoop job on these uploaded files.
  8. Get back and print results!

Sounds like a plan!

Setup JAVA_HOME

As we said hadoop is built on java so we need JAVA_HOME set.

➜  hadoop$ ls /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
➜  hadoop$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
Enter fullscreen mode Exit fullscreen mode

Download Hadoop tar.gz

Next we download hadoop, nice :)

➜  hadoop$ curl http://apache.spd.co.il/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz --output hadoop.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  1  310M    1 3581k    0     0   484k      0  0:10:57  0:00:07  0:10:50  580k
Enter fullscreen mode Exit fullscreen mode

Extract hadoop tar.gz

Now that we have the tar.gz on our laptop let's extract it.

➜  hadoop$ tar xvfz ~/Downloads/hadoop-3.1.0.tar.gz
Enter fullscreen mode Exit fullscreen mode

Setup HDFS

Now let's config HDFS on our laptop:

➜  hadoop$ cd hadoop-3.1.0
➜  hadoop-3.1.0$
➜  hadoop-3.1.0$ vi etc/hadoop/core-site.xml
Enter fullscreen mode Exit fullscreen mode

Configuration should be:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

So we configured the hdfs port, let's configure how many replicas we need, we are on laptop we want only one replica for our data:

➜  hadoop-3.1.0$ vi etc/hadoop/hdfs-site.xml:
Enter fullscreen mode Exit fullscreen mode

The above hdfs-site.xml is the site for replica configuration below is the configuration it should have (hint: 1):

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

Enabled SSHD

Hadoop connects to nodes with ssh so let's enable it on our mac laptop:

http://cdn.osxdaily.com/wp-content/uploads/2011/09/enable-sftp-server-mac-os-x-lion.jpg

You should be able to ssh with no pass:

➜  hadoop-3.1.0 ssh localhost
Last login: Wed May  9 17:15:28 2018
➜  ~
Enter fullscreen mode Exit fullscreen mode

If you can't do this:

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys
Enter fullscreen mode Exit fullscreen mode

Start HDFS

Next we start and format HDFS on our laptop:

bin/hdfs namenode -format

➜  hadoop-3.1.0$ bin/hdfs namenode -format
WARNING: /Users/tomer.bendavid/tmp/hadoop/hadoop-3.1.0/logs does not exist. Creating.
2018-05-10 22:12:02,493 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Tomers-MacBook-Pro.local/192.168.1.104


➜  hadoop-3.1.0$ sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Enter fullscreen mode Exit fullscreen mode

Create folders on hdfs

Next we create sample input folder on HDFS on our laptop:

➜  hadoop-3.1.0$ bin/hdfs dfs -mkdir /user
2018-05-10 22:13:16,982 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0$ bin/hdfs dfs -mkdir /user/tomer
2018-05-10 22:13:22,474 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0$
Enter fullscreen mode Exit fullscreen mode

Upload testdata to HDFS

Now that we have HDFS up and running on our laptop lets upload some files:

➜  hadoop-3.1.0$ bin/hdfs dfs -put etc/hadoop input
2018-05-10 22:14:28,802 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: `input': No such file or directory: `hdfs://localhost:9000/user/tomer.bendavid/input'
➜  hadoop-3.1.0$ bin/hdfs dfs -put etc/hadoop /user/tomer/input
2018-05-10 22:14:37,526 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0$ bin/hdfs dfs -ls /user/tomer/input
2018-05-10 22:16:09,325 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x   - tomer.bendavid supergroup          0 2018-05-10 22:14 /user/tomer/input/hadoop
Enter fullscreen mode Exit fullscreen mode

Run hadoop job

So we have hdfs with files on our laptop, let's run a job on it what do you think?

➜  hadoop-3.1.0$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep /user/tomer/input/hadoop/*.xml /user/tomer/output1 'dfs[a-z.]+'
➜  hadoop-3.1.0$ bin/hdfs dfs -cat /user/tomer/output1/part-r-00000
2018-05-10 22:22:29,118 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1   dfsadmin
1   dfs.replication
Enter fullscreen mode Exit fullscreen mode

We managed to have local hadoop installation with HDFS for tests! and run a test job! That is so cool!.

Summary

We managed to download hadoop, startup hdfs, upload files to this hdfs, run hadoop job, and get results from hdfs, all on our laptop on a single directory! that is cool!

In addition there is nothing new here, I just followed that straight forward guidance at hadoop installation docs. With a few minor modifications and some minor updated explanations to myself so it's clearer for me when I look at it in future for reference.

If you want to see more of what I learned last week i'm always at https://tomer-ben-david.github.io

Discussion (0)