DEV Community

Marcelo Costa
Marcelo Costa

Posted on

Using Python to create Hive tables with random schema

Having a large amount of test data sometimes take a lot of effort, and to simulate a more realistic scenario, it’s good to have a large number of tables with distinct column types. This script generates random tables schema for Hive.

If you want to set up a Hive environment for dev and test purposes, take a look at:


Activate your virtualenv
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
Install the requirements for the metadata generator
pip install -r requirements.txt



export HIVE_USERNAME=hive
export HIVE_PORT=10000
export HIVE_DATABASE=default

python \
--hive-host=$HIVE_SERVER \
--hive-user=$HIVE_USERNAME \
--hive-port=$HIVE_PORT \

And that's it!

If you have difficulties, don’t hesitate reaching out. I would love to help you!

Top comments (0)