Using Python to create Hive tables with random schema

#hive #database #metadata #python

Having a large amount of test data sometimes take a lot of effort, and to simulate a more realistic scenario, it’s good to have a large number of tables with distinct column types. This script generates random tables schema for Hive.

If you want to set up a Hive environment for dev and test purposes, take a look at: https://dev.to/mesmacosta/quickly-set-up-a-hive-environment-on-gcp-38j8

Environment

Activate your virtualenv

pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate

Install the requirements for the metadata generator

pip install -r requirements.txt

Code

Execution

export HIVE_SERVER=127.0.0.1
export HIVE_USERNAME=hive
export HIVE_PORT=10000
export HIVE_DATABASE=default

python metadata_generator.py \
--hive-host=$HIVE_SERVER \
--hive-user=$HIVE_USERNAME \
--hive-port=$HIVE_PORT \
--hive-database=$HIVE_DATABASE

And that's it!

If you have difficulties, don’t hesitate reaching out. I would love to help you!

DEV Community

Using Python to create Hive tables with random schema

Environment

Activate your virtualenv

Install the requirements for the metadata generator

Code

Execution

Top comments (0)

Read next

5. Writing our first program in python with home🧡work

Django AllAuth Chapter 3 - Social login with Django AllAuth

What is React Native?

Automatically reload Celery workers with a custom Django command