Prerequisites
First, ensure you have Java installed since Hadoop requires it:
# Install OpenJDK 8 or 11 (Hadoop works well with both)
sudo dnf install java-11-openjdk java-11-openjdk-devel
# Verify Java installation
java -version
javac -version
Download and Install Hadoop
- Download Hadoop:
# Create a directory for Hadoop
sudo mkdir -p /opt/hadoop
cd /tmp
# Download Hadoop (replace with latest stable version)
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
# Extract and move to /opt
sudo tar -xzf hadoop-3.3.6.tar.gz -C /opt/hadoop --strip-components=1
# Change ownership
sudo chown -R $USER:$USER /opt/hadoop
- Set up environment variables:
# Edit your ~/.bashrc file
echo 'export HADOOP_HOME=/opt/hadoop' >> ~/.bashrc
echo 'export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop' >> ~/.bashrc
echo 'export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin' >> ~/.bashrc
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> ~/.bashrc
# Reload the configuration
source ~/.bashrc
- Configure Hadoop:
Edit the Hadoop configuration file:
# Set JAVA_HOME in Hadoop's environment
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Basic Configuration (Pseudo-distributed mode)
For a single-node setup:
- Configure core-site.xml:
cat > $HADOOP_HOME/etc/hadoop/core-site.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
EOF
- Configure hdfs-site.xml:
cat > $HADOOP_HOME/etc/hadoop/hdfs-site.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop/data/datanode</value>
</property>
</configuration>
EOF
- Create data directories:
mkdir -p /opt/hadoop/data/namenode
mkdir -p /opt/hadoop/data/datanode
Set up SSH (required for Hadoop)
# Install SSH server if not already installed
sudo dnf install openssh-server
# Generate SSH key pair
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
# Add key to authorized_keys
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
# Start SSH service
sudo systemctl start sshd
sudo systemctl enable sshd
Initialize and Start Hadoop
- Format the namenode:
hdfs namenode -format
- Start Hadoop services:
# Start HDFS
start-dfs.sh
# Start YARN (optional, for resource management)
start-yarn.sh
- Verify installation:
# Check running processes
jps
# Check HDFS status
hdfs dfsadmin -report
# Access web interfaces:
# NameNode: http://localhost:9870
# ResourceManager: http://localhost:8088
Test Hadoop
# Create a directory in HDFS
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/$USER
# Put a file into HDFS
echo "Hello Hadoop" > test.txt
hdfs dfs -put test.txt /user/$USER/
# List files in HDFS
hdfs dfs -ls /user/$USER/
# Get file from HDFS
hdfs dfs -get /user/$USER/test.txt downloaded_test.txt
Stop Hadoop
When you're done:
stop-yarn.sh
stop-dfs.sh
This setup gives you a working single-node Hadoop cluster on Fedora. For production or multi-node clusters, you'll need additional configuration for security, networking, and resource management.
Top comments (0)