Cloud SQL is Google Cloud fully-managed database service that allows one setup, maintain, manage and administer other relational databases. Could be MySQL, PostgreSQL or Microsoft SQL Server.
Objectives:
- Create Cloud SQL instance
- Create a Cloud SQL database
- Import text data into Cloud SQL
- Check the data for integrity
The first thing required is to activate Cloud Shell at the top right corner of the GCP Home Page, list the active account name with the command
gcloud auth list
List the Project id with
gcloud config list project
Create environmental variables and the storage bucket that will contain the data
export PROJECT_ID=$(gcloud info --format='value(config.project)')
export BUCKET=${PROJECT_ID}-ml
Create a Cloud SQL Instance named taxi
gcloud sql instances create taxi \
--tier=db-n1-standard-1 --activation-policy=ALWAYS
Set root Password for the Cloud SQL Instance
gcloud sql users set-password root --host % --instance taxi \
--password Passw0rd
Now create an environment variable with the IP address of the Cloud Shell
export ADDRESS=$(wget -qO - http://ipecho.net/plain)/32
export ADDRESS=$(wget -qO - http://ipecho.net/plain)/32
gcloud sql instances patch taxi --authorized-networks $ADDRESS
When prompted press Y to accept the change.
Get the IP address of your Cloud SQL instance by running:
MYSQLIP=$(gcloud sql instances describe \
taxi --format="value(ipAddresses.ipAddress)")
Check the variable MYSQLIP, you should get the IP Address as an output:
echo $MYSQLIP
Create the taxi trips table by logging into the mysql command line interface and enter password whe prompted
mysql --host=$MYSQLIP --user=root \
--password --verbose
Create a Schema for the trips
create database if not exists bts;
use bts;
drop table if exists trips;
create table trips (
vendor_id VARCHAR(16),
pickup_datetime DATETIME,
dropoff_datetime DATETIME,
passenger_count INT,
trip_distance FLOAT,
rate_code VARCHAR(16),
store_and_fwd_flag VARCHAR(16),
payment_type VARCHAR(16),
fare_amount FLOAT,
extra FLOAT,
mta_tax FLOAT,
tip_amount FLOAT,
tolls_amount FLOAT,
imp_surcharge FLOAT,
total_amount FLOAT,
pickup_location_id VARCHAR(16),
dropoff_location_id VARCHAR(16)
);
In the mysql command line interface check the import and query the trips table
describe trips;
select distinct(pickup_location_id) from trips;
This will return an empty set as there is no data in the database yet, then Exit
exit
**
Add data to Cloud SQL instance
**
run the following - the database is pulled from google public datasets
gsutil cp gs://cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_1.csv trips.csv-1
gsutil cp gs://cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_2.csv trips.csv-2
Connect to the mysql interactive console to load local infile data
mysql --host=$MYSQLIP --user=root --password --local-infile
In the mysql interactive console select the database
use bts;
Load the local CSV file data using local-infile
LOAD DATA LOCAL INFILE 'trips.csv-1' INTO TABLE trips
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code,store_and_fwd_flag,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,imp_surcharge,total_amount,pickup_location_id,dropoff_location_id);LOAD DATA LOCAL INFILE 'trips.csv-2' INTO TABLE trips
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code,store_and_fwd_flag,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,imp_surcharge,total_amount,pickup_location_id,dropoff_location_id);
Checking For Data Integrity
select distinct(pickup_location_id) from trips;
select
max(trip_distance),
min(trip_distance)
from
trips;
select count() from trips where trip_distance = 0;
select count() from trips where fare_amount < 0;
select
payment_type,
count(*)
from
trips
group by
payment_type;
If all these are confirmed, you have your db. then exit
exit
Top comments (0)