loading...

Quickly set up a Hive environment on GCP

mesmacosta profile image Marcelo Costa ・2 min read

This quick-start guide is part of a series that shows how to set up databases on Google Cloud Platform, for developing and testing purposes.

This guide will show you how to create a Hive environment running inside your Google Cloud Project.

Create a Compute Engine VM

Using Cloud Shell:

# Create the Hive GCE instance
gcloud compute instances create hive \
  --zone=us-central1-c \
  --machine-type=n1-standard-1 \
  --image-project=debian-cloud --boot-disk-size=30GB \
  --image=debian-9-stretch-v20190916 \
  --boot-disk-type=pd-standard \
  --boot-disk-device-name=hive \
  --scopes=cloud-platform 

Configure your VM with Hive

Using Cloud Shell:

# Connect to the Hive VM
gcloud compute ssh --zone=us-central1-c hive

# Login as super user
sudo -s

# Install Docker
curl -sSL https://get.docker.com/ | sh

# Install Docker-compose
curl -L https://github.com/docker/compose/releases/download/1.18.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose

sudo chmod +x /usr/local/bin/docker-compose

# Test installation
docker-compose --version

Create docker environment

Inside hive vm with ssh:

# Install git if you don’t have
apt-get install git

# Clone the github
git clone https://github.com/mesmacosta/docker-hive

# Go inside the created directory
cd docker-hive

# Start docker compose
docker-compose up -d

Creating tables (internal, managed by hive)

Inside hive vm with ssh:

# Connect to hive-server with the command
docker-compose exec hive-server bash

# Connect to beeline cmd:
/opt/hive/bin/beeline -u jdbc:hive2://localhost:10000

# Create an internal table named funds
CREATE TABLE funds (code INT, opt STRING);

# Load table with data (Optional)
LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE funds;

# Test 
select * from funds;

Creating tables (external)

Inside hive vm with ssh:

# Connect to the hadoop namenode
docker-compose exec namenode bash

# Create a new file with some data at any directory 
echo '1,2,3,4' > csvFile

# Create a directory inside hdfs
hdfs dfs -mkdir -p /test/another_test/one_more_test

# Add the file to the directory
hdfs dfs -put csvFile /test/another_test/one_more_test/csvFile

# Connect to hive-server with the command
docker-compose exec hive-server bash

# Create external table
CREATE EXTERNAL TABLE IF NOT EXISTS store
(ID int,
DEPT int,
CODE int,
DIGIT int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/test/another_test/one_more_test/';

# You can then query it inside hive to see that it worked
select * from store;

And that's it!

If you have difficulties, don’t hesitate reaching out. I would love to help you!

Discussion

pic
Editor guide