DEV Community

Marcelo Costa
Marcelo Costa

Posted on • Edited on

1 1

Using Python to create PostgreSQL tables with random schema

Having a large amount of test data sometimes take a lot of effort, and to simulate a more realistic scenario, it’s good to have a large number of tables with distinct column types. This script generates random tables schema for PostgreSQL.

If you want to set up a PostgreSQL environment for dev and test purposes, take a look at: https://dev.to/mesmacosta/quickly-set-up-a-postgresql-environment-on-gcp-758

Environment

Activate your virtualenv
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
Enter fullscreen mode Exit fullscreen mode
Install the requirements for the metadata generator
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Code

psycopg2-binary
import argparse
import logging
import random
import sys
import uuid
from psycopg2 import connect
_DATA_TYPES = [
'INT', 'SERIAL', 'SMALLINT', 'UUID', 'FLOAT', 'REAL', 'CHAR(5)',
'VARCHAR(25)', 'TEXT', 'JSONB', 'JSON', 'DATE', 'TIME', 'TIMESTAMP'
]
_COLUMN_NAMES = [
'name', 'address', 'city', 'state', 'date_time', 'paragraph', 'randomdata',
'person', 'credit_card', 'size', 'reason', 'school', 'food', 'location',
'house', 'price', 'cpf', 'cnpj', 'passport', 'security_number',
'phone_number', 'bank_account_number', 'ip_address', 'stocks'
]
_TABLE_NAMES = [
'school_info', 'personal_info', 'persons', 'employees', 'companies',
'store', 'home'
]
_SCHEMA_NAMES = [
'school_warehouse', 'company_warehouse', 'on_prem_warehouse',
'factory_warehouse', 'organization_warehouse'
]
def get_conn(connection_args):
return connect(database=connection_args['database'],
host=connection_args['host'],
user=connection_args['user'],
password=connection_args['pass'])
def create_random_metadata(connection_args):
conn = get_conn(connection_args)
cursor = conn.cursor()
for x in range(4):
schema_name, schema_stmt = build_create_schema_statement()
cursor.execute(schema_stmt)
for y in range(250):
query = build_create_table_statement(schema_name)
print('\n' + query)
cursor.execute(query)
conn.commit()
cursor.close()
def get_random_data_type():
return random.choice(_DATA_TYPES)
def get_random_column_name():
return random.choice(_COLUMN_NAMES)
def get_random_table_name():
return random.choice(_TABLE_NAMES)
def get_random_schema_name():
return random.choice(_SCHEMA_NAMES)
def build_create_table_statement(schema_name):
table_stmt = 'CREATE TABLE {}.{}{} ( '.format(schema_name,
get_random_table_name(),
uuid.uuid4().hex[:8])
table_stmt = '{}{}{} {}'.format(table_stmt, get_random_column_name(),
str(random.randint(1, 100000)),
get_random_data_type())
for x in range(random.randint(1, 15)):
table_stmt += ', {}{}'.format(get_random_column_name(),
str(random.randint(1, 100000))) + \
' {}'.format(get_random_data_type())
table_stmt = '{} )'.format(table_stmt)
return table_stmt
def build_create_schema_statement():
schema_name = '{}{}'.format(get_random_schema_name(),
str(random.randint(1, 100000)))
schema_stmt = 'CREATE SCHEMA {} '.format(schema_name)
return schema_name, schema_stmt
def parse_args():
parser = argparse.ArgumentParser(
description='Command line generate random metadata into postgresql')
parser.add_argument(
'--postgresql-host',
help='Your postgresql server host, this is required even'
' for the raw_metadata_csv,'
' so we are able to map the created entries'
' resource with the postgresql host',
required=True)
parser.add_argument('--postgresql-user',
help='Your postgresql credentials user')
parser.add_argument('--postgresql-pass',
help='Your postgresql credentials password')
parser.add_argument('--postgresql-database',
help='Your postgresql database name')
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
# Enable logging
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
create_random_metadata({
'database': args.postgresql_database,
'host': args.postgresql_host,
'user': args.postgresql_user,
'pass': args.postgresql_pass
})

Execution

export POSTGRESQL_SERVER=127.0.0.1
export POSTGRESQL_USERNAME=postgres
export POSTGRESQL_PASSWORD=postgresql_pwd
export POSTGRESQL_DATABASE=postgres

python metadata_generator.py \
--postgresql-host=$POSTGRESQL_SERVER \
--postgresql-user=$POSTGRESQL_USERNAME \
--postgresql-pass=$POSTGRESQL_PASSWORD \
--postgresql-database=$POSTGRESQL_DATABASE

Enter fullscreen mode Exit fullscreen mode

And that's it!

If you have difficulties, don’t hesitate reaching out. I would love to help you!

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay