DEV Community

Uri Shaked
Uri Shaked

Posted on

1

Exporting Google Cloud Storage File List to BigQuery (CSV)

Sometimes you need to run complex queries against the list of files storage in your Google Cloud Storage. In my case, I wanted find out which files were no longer referenced by my database and clean them up.

Here is how I did it:

Exporting the File List to a CSV file

This simple 1-line script will allow you to the entire file list from Google Cloud Storage to a simple CSV format that can be then loaded into Google BigQuery:

gsutil ls -l gs://bucket-name/** | head -n-1 | awk 'BEGIN{print "Size,Modified,Path"}{print $1","$2",\""$3"\""}' > filelist.csv

Make sure to replace bucket-name with your actual GCS bucket name.

Loading the CSV file to BigQuery

Before you can load the CSV File to Google's BigQuery, you first need to upload it to Google Cloud storage. You can do it by running:

gsutil cp filelist.csv gs://some-bucket/filelist.csv

Then, you can load it right into bigquery:

bq load --autodetect --source_format=CSV mydataset.mytable gs://some-bucket/filelist.csv

Make sure you change some-bucket to your actual bucket name, and mydataset and mytable to the target BigQuery dataset and table, respectively.

I tested this process for a bucket containing about 350,000 files and it worked flawlessly.

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post