DEV Community

khalil la
khalil la

Posted on

Install Hadoop on Fedora

Prerequisites

First, ensure you have Java installed since Hadoop requires it:

# Install OpenJDK 8 or 11 (Hadoop works well with both)
sudo dnf install java-11-openjdk java-11-openjdk-devel

# Verify Java installation
java -version
javac -version
Enter fullscreen mode Exit fullscreen mode

Download and Install Hadoop

  1. Download Hadoop:
# Create a directory for Hadoop
sudo mkdir -p /opt/hadoop
cd /tmp

# Download Hadoop (replace with latest stable version)
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

# Extract and move to /opt
sudo tar -xzf hadoop-3.3.6.tar.gz -C /opt/hadoop --strip-components=1

# Change ownership
sudo chown -R $USER:$USER /opt/hadoop
Enter fullscreen mode Exit fullscreen mode
  1. Set up environment variables:
# Edit your ~/.bashrc file
echo 'export HADOOP_HOME=/opt/hadoop' >> ~/.bashrc
echo 'export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop' >> ~/.bashrc
echo 'export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin' >> ~/.bashrc
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> ~/.bashrc

# Reload the configuration
source ~/.bashrc
Enter fullscreen mode Exit fullscreen mode
  1. Configure Hadoop:

Edit the Hadoop configuration file:

# Set JAVA_HOME in Hadoop's environment
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Enter fullscreen mode Exit fullscreen mode

Basic Configuration (Pseudo-distributed mode)

For a single-node setup:

  1. Configure core-site.xml:
cat > $HADOOP_HOME/etc/hadoop/core-site.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
EOF
Enter fullscreen mode Exit fullscreen mode
  1. Configure hdfs-site.xml:
cat > $HADOOP_HOME/etc/hadoop/hdfs-site.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///opt/hadoop/data/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///opt/hadoop/data/datanode</value>
    </property>
</configuration>
EOF
Enter fullscreen mode Exit fullscreen mode
  1. Create data directories:
mkdir -p /opt/hadoop/data/namenode
mkdir -p /opt/hadoop/data/datanode
Enter fullscreen mode Exit fullscreen mode

Set up SSH (required for Hadoop)

# Install SSH server if not already installed
sudo dnf install openssh-server

# Generate SSH key pair
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

# Add key to authorized_keys
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

# Start SSH service
sudo systemctl start sshd
sudo systemctl enable sshd
Enter fullscreen mode Exit fullscreen mode

Initialize and Start Hadoop

  1. Format the namenode:
hdfs namenode -format
Enter fullscreen mode Exit fullscreen mode
  1. Start Hadoop services:
# Start HDFS
start-dfs.sh

# Start YARN (optional, for resource management)
start-yarn.sh
Enter fullscreen mode Exit fullscreen mode
  1. Verify installation:
# Check running processes
jps

# Check HDFS status
hdfs dfsadmin -report

# Access web interfaces:
# NameNode: http://localhost:9870
# ResourceManager: http://localhost:8088
Enter fullscreen mode Exit fullscreen mode

Test Hadoop

# Create a directory in HDFS
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/$USER

# Put a file into HDFS
echo "Hello Hadoop" > test.txt
hdfs dfs -put test.txt /user/$USER/

# List files in HDFS
hdfs dfs -ls /user/$USER/

# Get file from HDFS
hdfs dfs -get /user/$USER/test.txt downloaded_test.txt
Enter fullscreen mode Exit fullscreen mode

Stop Hadoop

When you're done:

stop-yarn.sh
stop-dfs.sh
Enter fullscreen mode Exit fullscreen mode

This setup gives you a working single-node Hadoop cluster on Fedora. For production or multi-node clusters, you'll need additional configuration for security, networking, and resource management.

Top comments (0)