DEV Community

Nkwam Philip
Nkwam Philip

Posted on

Loading Data Into BigQuery Using The CLI/Console

BigQuery is Google's fully managed, NoOps, low cost analytics database.
With BigQuery you can query terabytes in seconds and petabytes in minutes of data without having any infrastructure to manage or needing a database administrator.
BigQuery uses SQL and can take advantage of the pay-as-you-go model. BigQuery allows you to focus on analysing data to find meaningful insights. Scales up resources(compute & storage) based on needs.
BigQuery maximizes flexibility by separating the compute engine that analyzes your data from your storage choices. You can store and analyze your data within BigQuery or use BigQuery to assess your data where it lives.

In this session, we will be loading data into BigQuery using the CLI/Console

Start with creating a dataset under your Project ID. Check the view actions and click on "Create Dataset" and name it anything you want.
Then ingest a new dataset from a CSV file. You can use this https://storage.googleapis.com/cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_1.csv

In the BigQuery console, click on the dataset created and create a table.

Specify the below table options:

Source:

Create table from: Upload
Choose File: select the file you downloaded locally earlier
File format: CSV
Destination: your local storage

Table name: 2018trips Leave all other setting at default.

Schema:
Check Auto Detect (tip: Not seeing the checkbox? Ensure the file format is CSV and not Avro)
Advanced Options

Leave at default values
Click Create Table.

Select Preview and confirm all columns have been loaded:
You have successfully loaded in a CSV file into a new BigQuery table.

Ingest a new Dataset from Google Cloud Storage
Now, lets try load another subset of the same 2018 trip data that is available on Cloud Storage. And this time, let's use the CLI tool to do it.

In your Cloud Shell, run the following command :
bq load \
--source_format=CSV \
--autodetect \
--noreplace \
nyctaxi.2018trips \
gs://cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_2.csv
Copied!

Note: With the above load job, you are specifying that this subset is to be appended to the existing 2018trips table that you created above.

When the load job is complete, you will get a confirmation on the screen.

Back on your BigQuery console, select the 2018trips table and view details. Confirm that the row count has now almost doubled.

You may want to run the same query like earlier to see if the top 5 most expensive trips have changed.

Create tables from other tables with DDL
The 2018trips table now has trips from throughout the year. What if you were only interested in January trips? For the purpose of this lab, we will keep it simple and focus only on pickup date and time. Let's use DDL to extract this data and store it in another table

In the Query Editor, run the following CREATE TABLE command :

standardSQL

CREATE TABLE
nyctaxi.january_trips AS
SELECT
*
FROM
nyctaxi.2018trips
WHERE
EXTRACT(Month
FROM
pickup_datetime)=1;

Now run the below query in your Query Editor find the longest distance traveled in the month of January:

standardSQL

SELECT
*
FROM
nyctaxi.january_trips
ORDER BY
trip_distance DESC
LIMIT
1

Top comments (0)