DEV Community

Cover image for Hadoop FS Shell Expunge: Optimizing HDFS Storage with Ease
Labby for LabEx

Posted on • Originally published at labex.io

Hadoop FS Shell Expunge: Optimizing HDFS Storage with Ease

Introduction

Welcome to our exciting lab set in an interstellar base where you play the role of a skilled intergalactic communicator. In this scenario, you are tasked with managing the Hadoop HDFS using the FS Shell expunge command to maintain data integrity and optimize storage utilization. Your mission is to ensure the efficient cleanup of unnecessary files and directories to free up storage space and improve system performance.

Enabling and Configuring the HDFS Trash Feature

In this step, let's start by accessing the Hadoop FS Shell and examining the current files and directories in the Hadoop Distributed File System.

  1. Open the terminal and switch to the hadoop user:
   su - hadoop
Enter fullscreen mode Exit fullscreen mode
  1. Modifying /home/hadoop/hadoop/etc/hadoop/core-site.xml to enable the Trash feature:
   nano /home/hadoop/hadoop/etc/hadoop/core-site.xml
Enter fullscreen mode Exit fullscreen mode

Add the following property between the <configuration> tags:

    <property>
        <name>fs.trash.interval</name>
        <value>1440</value>
    </property>
    <property>
        <name>fs.trash.checkpoint.interval</name>
        <value>1440</value>
    </property>
Enter fullscreen mode Exit fullscreen mode

Save the file and exit the text editor.

  1. restart the HDFS service:

Stop the HDFS service:

   /home/hadoop/hadoop/sbin/stop-dfs.sh
Enter fullscreen mode Exit fullscreen mode

Start the HDFS service:

   /home/hadoop/hadoop/sbin/start-dfs.sh
Enter fullscreen mode Exit fullscreen mode
  1. Create a file and delete it in the HDFS:

Create a file in the HDFS:

   hdfs dfs -touchz /user/hadoop/test.txt
Enter fullscreen mode Exit fullscreen mode

Delete the file:

   hdfs dfs -rm /user/hadoop/test.txt
Enter fullscreen mode Exit fullscreen mode
  1. Check if the Trash feature is enabled:
   hdfs dfs -ls /user/hadoop/.Trash/Current/user/hadoop/
Enter fullscreen mode Exit fullscreen mode

You should see the file you deleted in the Trash directory.

Expunge Unnecessary Files

Now, let's proceed to expunge unnecessary files and directories using the FS Shell expunge command.

  1. Expunge all the trash checkpoints:
   hdfs dfs -expunge -immediate
Enter fullscreen mode Exit fullscreen mode
  1. Verify that the unnecessary files are successfully expunged:
   hdfs dfs -ls /user/hadoop/.Trash
Enter fullscreen mode Exit fullscreen mode

There should be no files or directories listed.

Summary

In this lab, we delved into the power of the Hadoop FS Shell expunge command to manage and optimize data storage in the Hadoop Distributed File System. By learning how to initiate the FS Shell, view current files, and expunge unnecessary data, you have gained valuable insights into maintaining data integrity and enhancing system performance. Practicing these skills will equip you to efficiently manage your Hadoop environment and ensure smooth operations.


Want to learn more?

Join our Discord or tweet us @WeAreLabEx ! 😄

Top comments (0)