DEV Community

Anuj Vaghani
Anuj Vaghani

Posted on

HIVE installation on WSL

Install and run hive

Install Apache Hive on windows Linux subsystem
To configure Apache Hive, first you need to download and unzip Hive. Then you need to customize the following files and settings:
Ubuntu command line and download the compressed Hive files using and the wget command followed by the download path:

wget https://downloads.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
Enter fullscreen mode Exit fullscreen mode

Image description
Once the download process is complete, untar the compressed Hive package:

tar xzf apache-hive-3.1.2-bin.tar.gz
Enter fullscreen mode Exit fullscreen mode

Image description

step-2

Configure Hive Environment Variables (~/.bashrc)
The $HIVE_HOME environment variable needs to direct the client shell to the apache-hive-3.1.2-bin directory. Edit the .bashrc shell configuration file using a text editor of your choice (we will be using nano):

source vim ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Append the following Hive environment variables to the .bashrc file:

export HIVE_HOME= "home/anuj/hadoop/apache-hive-3.1.2-bin"
export PATH=$PATH:$HIVE_HOME/bin
Enter fullscreen mode Exit fullscreen mode

Image description
Save and exit the .bashrc file once you add the Hive variables. Apply the changes to the current environment with the following command:

source ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Step 3

Edit hive-config.sh file
Apache Hive needs to be able to interact with the Hadoop Distributed File System. Access the hive-config.sh file using the previously created $HIVE_HOME variable:

sudo vim $HIVE_HOME/bin/hive-config.sh
Enter fullscreen mode Exit fullscreen mode

Image description

Setp-4

Create Hive Directories in HDFS

  • The temporary, tmp directory is going to store the intermediate results of Hive processes.

  • The warehouse directory is going to store the Hive related tables.
    Create tmp Directory
    Create a tmp directory within the HDFS storage layer. This directory is going to store the intermediary data Hive sends to the HDFS:

hdfs dfs -mkdir /tmp
Enter fullscreen mode Exit fullscreen mode

Add write and execute permissions to tmp group members:

hdfs dfs -chmod g+w /tmp
Enter fullscreen mode Exit fullscreen mode

Check if the permissions were added correctly:

hdfs dfs -ls /
Enter fullscreen mode Exit fullscreen mode

The output confirms that users now have write and execute permissions.
type a command to hadoop fs -ls /
Image description
Create warehouse Directory
Create the warehouse directory within the /user/hive/ parent directory:

hdfs dfs -mkdir -p /user/hive/warehouse
Enter fullscreen mode Exit fullscreen mode

Add write and execute permissions to warehouse group members:

hdfs dfs -chmod g+w /user/hive/warehouse
Enter fullscreen mode Exit fullscreen mode

Check if the permissions were added correctly:

hdfs dfs -ls /user/hive
Enter fullscreen mode Exit fullscreen mode

The output confirms that users now have write and execute permissions.

Image description

Setp-5

Configure hive-site.xml File (Optional)
Apache Hive distributions contain template configuration files by default. The template files are located within the Hive conf directory and outline default Hive settings.

Use the following command to locate the correct file:

cd $HIVE_HOME/conf
Enter fullscreen mode Exit fullscreen mode

List the files contained in the folder using the ls command.

Image description
Use the hive-default.xml.template to create the hive-site.xml file:

cp hive-default.xml.template hive-site.xml
Enter fullscreen mode Exit fullscreen mode

Access the hive-site.xml file using the nano text editor:

sudo vim hive-site.xml
Enter fullscreen mode Exit fullscreen mode

Step-6

Apache Hive uses the Derby database to store metadata. Initiate the Derby database, from the Hive bin directory using the schematool command:

$HIVE_HOME/bin/schematool -dbType derby -initSchema
Enter fullscreen mode Exit fullscreen mode

The process can take a few moments to complete.

Image description
Derby is the default metadata store for Hive. If you plan to use a different database solution, such as MySQL or PostgreSQL, you can specify a database type in the hive-site.xml file.

Launch Hive Client Shell on Ubuntu
Start the Hive command-line interface using the following commands:

cd $HIVE_HOME/bin
Enter fullscreen mode Exit fullscreen mode
hive
Enter fullscreen mode Exit fullscreen mode

You are now able to issue SQL-like commands and directly interact with HDFS.

Image description

Top comments (0)