loading...

How I used Google Cloud Platform to start investing in stocks

dansyuqri profile image Muhammad Syuqri ・7 min read

I got interested in investing after attending a short talk recommended by a friend of mine. I decided to do some research and started reading The Little Book That Still Beats The Market by Joel Greenblatt. From the book, I found some formulas that could be useful to me when making decisions on whether or not to invest in the stocks of companies in Singapore. This post isn't to promote the book or its investment strategies, but more to showcase the following and how I did it:

  1. Interacting with Firestore through Python
  2. Running a Python script at specific time intervals on the Compute Engine
  3. Using Cloud Functions to retrieve data from Firestore

At first, I created a Python script for populating a Google Sheet with the financial details and self-calculated ratios from companies listed on the Singapore Exchange website. I found this a hassle as I had to do run the Python script everyday to get the updated prices of the stocks. I then decided to move this everyday process to the Google Cloud Platform so that I no longer have to do the everyday process myself, and leave it to the cloud to do it for me :D

The following will explain how I did what I did in hopes to help anyone else out there who might want to use the Google Cloud Platform in a similar fashion as I did.

Prerequisites

Before proceeding any further, I would like to note that the following have to been done first to keep this post short and simple. I have included links to get you started as well.

  1. Creating a Google Cloud Platform project
  2. Retrieving service account key
  3. Creating a Cloud Engine VM Instance
  4. Setting Up Firebase Cloud Functions

Overview

Overview

From the above diagram, the only thing I have to do is to make a GET request through the Cloud Functions HTTP API which will return all the already calculated formulas and values stored in the Firestore. Essentially, steps 1, 2 and 3 involve the Python script I have created. Steps 1 and 2 are done simply by using the Requests library.

Interacting with Firestore through Python

Firestore uses the concept of collections, documents and fields to store the data you want it to. So for instance, using the analogy of a book library, if you have a shelf of books, that is a collection in Firestore's viewpoint. The books themselves are documents, and each page in the book is a field on its own. Each document can have its own collection as well, but I will not get into that.

shelf [collection]
|--book1 [document]
  |-- page1 [field]
  |-- page2 [field]
|--book2 [document]
  |-- page1 [field]

To interact and update data on the Cloud Firestore from your Python script, you first have to install the Google Cloud Firestore library via pip install google-cloud-firestore. The following is the code snippet to initialize Firestore with your service account key that you have previously retrieved.

from google.cloud import firestore
db = firestore.Client.from_service_account_json('/path/to/service/key')

Well that is it actually! To write data to Firestore, simply do the following:

doc_ref = db.collection(u'name_of_collection').document(u'name_of_document')
doc_ref.set(data_to_update)

data_to_update is a Python dictionary which holds the keys and respective values you would want the Firestore document to hold. The .set() allows you to update or insert new fields into the document. For myself, I was putting the company name, stock prices, financial ratios and other fields here.

A point to note here is that even if the document or collection does not exist yet, .set() function automatically creates the collection and document for you and populates the document with the fields as mentioned before.

Running a Python script on Compute Engine

There are a few ways of pushing your Python script to your VM Instance. How I did it was to create a repository in my Google Cloud project and pushed it there. The reason I created the repository was because I still wanted some form of version control as, knowing myself, I like to make changes and explore different ways to do things in my code and end up confusing myself in the end. Even though it is a small project, I felt it was a good practice for me personally. I then remotely accessed the VM Instance via SSH and cloned the repository into the instance.

Now for the scheduling of the Python script. Initially, I thought calling the Python script every 30 minutes was a good idea. However, after some consideration, I felt scheduling the script to run at 6pm (GMT +0800) was the ideal case because the Singapore Exchange opens at 9am and closes at 5pm, and I really only have time to view the stock prices after work anyway.

To schedule your Python script to run either at certain time intervals or at specific timings, you can use Cron jobs as I did. In the SSH session of your VM Instance, edit your user's Crontab using the crontab -e command. At the end of the file, at your schedules in the following format

# m h  dom mon dow   command
0 10 * * 1-5 cd /path/to/python/folder && python main.py

The above snippet runs the Python script at 10am UTC (aka 6pm SGT), every weekday of the day, indicated by the 1-5 segment. If you would like your script to run after every time interval, you can do the following instead:

# Runs the command every hour at the 0th minute
0 */1 * * * <some command>

# Runs the command at the 0th minute every day
0 * */1 * * <some command>

Note: A mistake that I made during my first few times using Crontab in the VM Instance is the following:

# Runs the command every minute after every hour
* */1 * * * <some command>

My intention was to run it at every hour. But I missed the 0 at the minute mark of the cron job. So it was running the script at EVERY MINUTE AFTER EVERY HOUR. My script was taking around 3 minutes to run each time it was called. I did not mind the relatively long run time. However, since the script is being run every minute, and each takes 3 minutes to complete... Well, you can do the math. And silly me was trying to figure out why the CPU usage on my VM Instance was constantly at 150-200% and I could not even access it via SSH. That was a funny lesson :P

Using Cloud Functions to retrieve data from Firestore

For this step, I linked the Google Cloud project to Firebase. The reason I did this was for possible future versions in which I could host a website on Firebase Hosting, which taps on the data from the Cloud Firestore, allowing anyone to view the financial details at a glance. Another reason is also because I am much more familiar with Firebase and the requirements for Cloud Functions there.

I installed Express.js into my Cloud Functions folder via npm install --save express. Express.js allows me to easily create web APIs as I needed multiple end-points for retrieving various company information from the Firestore I have.

var  db  =  admin.firestore();

const  express  =  require("express");
const  app  =  express();

app.get('/:nameOfDocument',( req, res)=>{
    const  nameOfDocument  =  req.params.nameOfDocument;
    var  firestoreRef  =  db.collection("name_of_collection").doc(nameOfDocument);
    res.setHeader('Content-Type', 'application/json');
    firestoreRef.get().then((snapshot) => {
    if (snapshot.exists) {
        var  returnObj  =  snapshot.data();
        return  res.status(200).json(returnObj);
    }
    else {
        return  res.status(422).json({error:"Invalid document name"});
    }
    }).catch(errorObject  => {
        return  res.status(500).json({error:"Internal Server Error"});
    });
})
exports.api  =  functions.https.onRequest(app);

Here is a step by step explanation of what is happening is the snippet above. Firstly, access to Firestore is initialized by var db = admin.firestore();.

app.get('/:nameOfDocument',( req, res)=>{
...
}

The above tells the Express that we would like to create a GET request with the '/:nameOfDocument' end-point, where :nameOfDocument is a parameter in the URL. req and res are request and response objects which are received and going to be sent respectively. Currently, only the res is being used, but more on that later.

const nameOfDocument = req.params.nameOfDocument;

This line takes the parameter from the URL, that is :nameOfDocument in this case, and stores it as a variable called nameOfDocument, which will be used in the next line.

var firestoreRef = db.collection("name_of_collection").doc(nameOfDocument);

This line essentially creates a reference to the document nameOfDocument. The collection name is currently not a variable. You can also use include the name of collection as a parameter as such:

app.get('/:nameOfCollection/:nameOfDocument',( req, res)=>{
    const nameOfDocument = req.params.nameOfDocument;
    const nameOfCollection= req.params.nameOfCollection;
    var firestoreRef = db.collection(nameOfCollection).doc(nameOfDocument);
    ...
}

This way, you can specify it in the URL without having to alter the code.

firestoreRef.get().then((snapshot)  =>  {
    if  (snapshot.exists)  {  
    var returnObj = snapshot.data();  
    return res.status(200).json(returnObj);  
    }
    ...
}

The above segment takes the reference mentioned earlier and checks if it exists. This is essential as a user might accidentally type a wrong document or collection name, and we would want to return the appropriate response. snapshot.data() retrieves all the field key-value pairs and puts it in the object called returnObj We then return this as a JSON object with a status code of 200.

exports.api  =  functions.https.onRequest(app);

This line tells Cloud Functions that when a request is made to <cloudfunctions.net url>/api should be passed to the Express object called app and handled accordingly based on the end-points specified in the app object itself.

And that is it! You can now call your Cloud Functions from the link provided on the Firebase Cloud Functions page which will retrieve the relevant data that you want to work on from your Firestore.

P.S. This is my first tutorial/personal experience post. Kindly do let me know what can be improved and how I can be a better programmer as well. All constructive feedback are welcome. Thank you for reading through my post! :D

Posted on Oct 6 '18 by:

dansyuqri profile

Muhammad Syuqri

@dansyuqri

/shook-ree/ I love solving problems through code

Discussion

markdown guide
 

That's some real cool stuff!

I'm half-skeptical about the Greenblatt formula's effectiveness on larger exchanges compared to a reliable passive ETF such as the venerable SPY, but on the smaller ones it seems good enough to weed out the garbage tickers.

Now I want to try it out on the B3 here in Brazil!

 

It does seem to weed out garbage tickers, but only time will tell if the decisions are the 'right' ones or not. But it's a good idea to diversify into various securities and not be tied to just stocks.

Here's to a fruitful learning journey in investing :D

Would like to see the outcome from your strategies too in the near future!

 

Successful in receiving gains in my stocks? Not yet as I expect the investment to be long-term, meaning 1-3 years, before seeing any profits from them. I am still new to the concept of investing. Still learning to invest both my money, and time, so let's see where it goes from here :D

 
 
 

Hey, The clear explanation of the structure is wonderful. what's the cost structure?

 

The cost structure of running on GCP? For this use case, it is $0.00. I have deliberately checked the costs for running this on AWS and GCP and GCP provides free usage for these components forever (till there are changes in their use agreements). AWS only provides free usage for 12 months for their EC2 servers. In the long run, I found that keeping things free would be good as ultimately, my aim would be to minimize cost, so that I can invest more :)

 

Nice post! Thank you for sharing.