DEV Community

Nkwam Philip
Nkwam Philip

Posted on

Loading Data Into Google Cloud SQL

Cloud SQL is Google Cloud fully-managed database service that allows one setup, maintain, manage and administer other relational databases. Could be MySQL, PostgreSQL or Microsoft SQL Server.

Objectives:

  • Create Cloud SQL instance
  • Create a Cloud SQL database
  • Import text data into Cloud SQL
  • Check the data for integrity

The first thing required is to activate Cloud Shell at the top right corner of the GCP Home Page, list the active account name with the command

gcloud auth list

List the Project id with

gcloud config list project

Create environmental variables and the storage bucket that will contain the data

export PROJECT_ID=$(gcloud info --format='value(config.project)')
export BUCKET=${PROJECT_ID}-ml

Create a Cloud SQL Instance named taxi

gcloud sql instances create taxi \
--tier=db-n1-standard-1 --activation-policy=ALWAYS

Set root Password for the Cloud SQL Instance

gcloud sql users set-password root --host % --instance taxi \
--password Passw0rd

Now create an environment variable with the IP address of the Cloud Shell

export ADDRESS=$(wget -qO - http://ipecho.net/plain)/32

export ADDRESS=$(wget -qO - http://ipecho.net/plain)/32

gcloud sql instances patch taxi --authorized-networks $ADDRESS

When prompted press Y to accept the change.

Get the IP address of your Cloud SQL instance by running:

MYSQLIP=$(gcloud sql instances describe \
taxi --format="value(ipAddresses.ipAddress)")

Check the variable MYSQLIP, you should get the IP Address as an output:

echo $MYSQLIP

Create the taxi trips table by logging into the mysql command line interface and enter password whe prompted

mysql --host=$MYSQLIP --user=root \
--password --verbose

Create a Schema for the trips

create database if not exists bts;
use bts;
drop table if exists trips;
create table trips (
vendor_id VARCHAR(16),

pickup_datetime DATETIME,
dropoff_datetime DATETIME,
passenger_count INT,
trip_distance FLOAT,
rate_code VARCHAR(16),
store_and_fwd_flag VARCHAR(16),
payment_type VARCHAR(16),
fare_amount FLOAT,
extra FLOAT,
mta_tax FLOAT,
tip_amount FLOAT,
tolls_amount FLOAT,
imp_surcharge FLOAT,
total_amount FLOAT,
pickup_location_id VARCHAR(16),
dropoff_location_id VARCHAR(16)
);

In the mysql command line interface check the import and query the trips table

describe trips;
select distinct(pickup_location_id) from trips;

This will return an empty set as there is no data in the database yet, then Exit

exit

**

Add data to Cloud SQL instance

**

run the following - the database is pulled from google public datasets

gsutil cp gs://cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_1.csv trips.csv-1
gsutil cp gs://cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_2.csv trips.csv-2

Connect to the mysql interactive console to load local infile data

mysql --host=$MYSQLIP --user=root --password --local-infile

In the mysql interactive console select the database

use bts;

Load the local CSV file data using local-infile

LOAD DATA LOCAL INFILE 'trips.csv-1' INTO TABLE trips
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code,store_and_fwd_flag,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,imp_surcharge,total_amount,pickup_location_id,dropoff_location_id);

LOAD DATA LOCAL INFILE 'trips.csv-2' INTO TABLE trips
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code,store_and_fwd_flag,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,imp_surcharge,total_amount,pickup_location_id,dropoff_location_id);

Checking For Data Integrity

select distinct(pickup_location_id) from trips;
select
max(trip_distance),
min(trip_distance)
from
trips;
select count() from trips where trip_distance = 0;
select count(
) from trips where fare_amount < 0;
select
payment_type,
count(*)
from
trips
group by
payment_type;

If all these are confirmed, you have your db. then exit

exit

Top comments (0)