DEV Community

Cover image for Configuring Google Cloud Storage for Mage
Cris Crawford
Cris Crawford

Posted on

Configuring Google Cloud Storage for Mage

In this post I'll talk about how I configured Google Cloud Storage so that Mage could transfer data to a Google Cloud Storage Bucket. First I created the bucket. I went to cloud console (console.cloud.google.com), and in the menu on the left (hamburger menu in the top left), I selected Cloud Storage->Bucket.

To create a bucket, I clicked on the CREATE button. I had to give my bucket a globally unique name. I named it mage-zoomcamp-cris-crawford. Nobody else had used that name. I hit return. You could click "CONTINUE" to see the default settings, which are fine for this bucket. A popup menu appeared titled "Public access will be prevented". I clicked "CONFIRM".

Next, you should create a service account. I'd already done this for terraform, but I'll describe the steps anyway. In the left menu, go to the menu and choose IAM & Admin->Service Accounts. Choose a name for your service account. This name doesn't have to be unique. Click "CREATE AND CONTINUE". Now set roles for these permissions. For now we'll be generous in setting permissions. Choose the role "Owner". Click "CONTINUE" and "DONE".

Now we need to have a key for authorization. Click on the key tab. Click on "ADD KEY" and select "Create new key". Choose json payload in the popup window (the default). This will download a file to your computer. You now want to copy these credentials to your mage project, mage-zoomcamp, which for me is on my VI instance. I already had my key in a directory on my VI instance called ~/.gc, but I needed to copy the json file to mage-zoomcamp in order for the docker-compose to make it available. You can use sftp on your computer to copy the key to your VI instance.

Now that Mage had my service account key, I could use Mage to copy files to my google cloud storage bucket. I opened Mage on localhost:6789. (I had this port mapped in my port settings on VSCode, which should be connected to the VI instance.) I navigated to "Files" in the left side menu and opened io_config.yaml. There are two ways to set up Google cloud credentials. The first is just to paste the contents of the file into io_config.yaml, and the second, which I used, was to copy the path of the key file. I deleted the first bunch of code ("GOOGLE_SERVICE_ACC_KEY" and everything contained therein), and entered the path to my key in "GOOGLE_SERVICE_ACC_KEY_FILEPATH". I typed "/home/src/keys.json". It's necessary to use /home/src/filename.json, because that's where the files are mapped in the Mage docker container.

Now Mage will use this key anytime it wants to read or write data to Google cloud storage.

I used the pipeline test_config from before to test this. I selected "pipelines" from the left side menu and chose "test_config", and edited it. I changed the connection to BigQuery and the profile to "default", and ran it. I could see a message appear the BigQuery was reached. This means Mage was able to access Google using the service account key.

To test Google cloud storage, I used "example_pipeline". I opened it up in the editor and ran all the blocks. This put a "clean" version of the Titanic database into my home directory. Now I had to upload the titanic_clean.csv into my bucket on Google cloud. I could not drag and drop the way he did on the video. I had to ask ChatGPT how to do this. I used the gcloud sdk, and typed gcloud auth login (which may not have been necessary) from the VM instance. After following all the instructions, I logged into gcloud and ran gsutil cp titanic_clean.csv gs://[MY_BUCKET_NAME]/, and the file appeared in my bucket.

I went back to the pipeline "test_config" and deleted the Load data block. I opened another Load data block and chose Python->Google Cloud Storage for the template. I called it test_gcs. I edited the template to add my bucket name as the bucket_name and the .csv file as the object_key. I was able to run the file and see the data appear, fetched from the Google cloud storage bucket.

Top comments (3)

Collapse
 
twissi84 profile image
twissi84

Hi, I am basically stuck exaxtly at that point where I need to authenticate myself and I always get a connection error that the file can't be found.
I also have my service account json file saved in "/.gc". Actually I have two service accounts now. Anyway I want to point to the newly created mage-serviceaccount but it seems not possible as I am not able to write down the right directory in mage, in the "io_config_yaml". It seems impossible to write down the right json-file.

I also don't understand why you use "/home/src/keys.json" if you have the key saved in ".gc/". I tried the same but it can't find the json-file neither. Any idea what I might be doing wrong when writing down where my key-file is?

Collapse
 
cmcrawford2 profile image
Cris Crawford

Hi! I found it easier to just copy my key file to the directory where he has it. I copied it to the directory "mage-zoomcamp". That is, to ~/data-engineering-zoomcamp/02-workflow-orchestration/mage-zoomcamp. Then you have to edit the io_config.yaml and add the path "/home/src/keyname.json" for GOOGLE_SERVICE_ACC_KEY_FILEPATH. This was not intuitive because I have no directory named /home/src. But that's what he used, that's what I did, and it worked.

Collapse
 
twissi84 profile image
twissi84

Thank you for the insights. I did it as well. Sadly I didnt really get how this path links to it but it works :)