DEV Community

Marcelo Costa
Marcelo Costa

Posted on • Edited on

6 1

Quickly set up a Greenplum environment on GCP

This quick-start guide is part of a series that shows how to set up relational databases on Google Cloud Platform, for developing and testing purposes.

This guide will show you how to create an Greenplum environment running inside your Google Cloud Project.

Create a Compute Engine VM

Using Cloud Shell:

# Create the Greenplum GCE instance
gcloud compute instances create greenplum \
  --zone=us-central1-c \
  --machine-type=n1-standard-1 \
  --image-project=debian-cloud --boot-disk-size=10GB \
  --image=debian-9-stretch-v20190916 \
  --boot-disk-type=pd-standard \
  --boot-disk-device-name=greenplum \
  --scopes=cloud-platform
Enter fullscreen mode Exit fullscreen mode

Configure your VM with Greenplum

Using Cloud Shell:

# Connect to the greenplum VM
gcloud compute ssh --zone=us-central1-c greenplum

# Login as super user
sudo -s

# Install Docker
curl -sSL https://get.docker.com/ | sh

# Install Git 
apt-get install git

# Install postgresl client
apt-get install postgresql-client

# Clone greenplum official repo
git clone https://github.com/greenplum-db/gpdb

# Go to to the docker directory
cd gpdb/src/tools/docker/ubuntu16_ppa-persistent

# Build and run
docker build -t local/gpdb .
mkdir -p /tmp/gpdata/
docker run -d -p 5432:5432 -h dwgpdb -v /tmp/gpdata:/data local/gpdb
Enter fullscreen mode Exit fullscreen mode

Load your Greenplum database with data

Using Cloud Shell:

# Verify GPDB is installed and started up successfully
docker ps -a
docker logs --follow <GPDB_CONTAINDER_ID>

# Wait for the message to appear:
# ===> GPDB starting process has completed, check the result above   
# or try to connect
# Leave the logs command, by pressing CTRL + C

# Log into the GPDB container
docker ps
docker exec -it <GPDB_CONTAINER_ID> bash

# Log as the gpadmin user
su gpadmin

# Verify that the GDPB instance is running
gpstate

# Create a DB called messages_db
createdb messages_db

# Log into the new DB
psql messages_db

# Create tables and populate with Data
CREATE TABLE Users (uid INTEGER PRIMARY KEY,
                    name VARCHAR);
INSERT INTO Users
  SELECT generate_series, random()
  FROM generate_series(1, 100000);

CREATE TABLE Messages (mid INTEGER PRIMARY KEY,
 uid INTEGER REFERENCES Users(uid),
 ptime DATE, message VARCHAR);

INSERT INTO Messages
   SELECT generate_series,
          round(random()*100000),
          date(now() - '1 hour'::INTERVAL * round(random()*24*30)),
          random()::text
   FROM generate_series(1, 100000);
Enter fullscreen mode Exit fullscreen mode

And that's it!

If you have difficulties, don’t hesitate reaching out. I would love to help you!

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

AWS GenAI Live!

GenAI LIVE! is a dynamic live-streamed show exploring how AWS and our partners are helping organizations unlock real value with generative AI.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️