Can you imagine yourself buying something during a major sale in a store and not able to complete the purchase because the site is not responding? You click again and again but nothing happens. That is a frustrating experience. So, what could be a reason? Sometimes it is a frontend or a middleware problem but it also can be a database behind the application where CPU skyrockets to 100% because of the increased demand.
In some cases when you know when the load starts to grow you can proactively scale up the database instance and subsequently scale down it when the activity subsides. But what to do when the increased demands are not so predictable? In such a case, one of the possible solutions is to enable automation to scale up and down the instance based on defined criterias, CPU load for example. Let me show how to build such automation for AlloyDB instances based on monitoring metrics.
Here is the main workflow for the process. We create monitoring alerts for high or low CPU utilization which are sent to a Pub/Sub topic. Then a Cloud Function subscribed to the topic scale up or scale down the instance. Sounds simple, right? Let me show how to do that step by step.
Let’s start with the Pub/Sub topic. To create a topic you go to the Pub/Sub -> Topics and push the “Create Topic” button on the top. It will open a dialog where you can create a brand new topic.
The topic will serve as a connection point where our cloud function will pick up the message and depending on the information it will scale up or down an AlloyDB instance.
When the topic is created we can create an alert. To create an alert you go to Google Cloud console and choose Logging->Alerting
Then create a policy.
To build a policy based on CPU load the “Mean CPU Utilization” is probably the best choice since it allows us to generate an alert on an average CPU utilization for a sliding window preventing false alarms from accidental short CPU spikes.
For the metric I would put at least a 15 min sliding window to avoid cyclical scaling up and down. When your instance is resized you might experience a short spike in CPU utilization when all the services are starting up and the buffers are warming up. You will need to test and configure the exact size of your sliding window and maybe other filters and options for the metric.
Then we define the threshold for the alert. I put 85% as the value there. Depending on your usage pattern you can be more conservative or aggressive with the threshold.
And now we come to the notification channels. You remember we created the Pub/Sub topic? We go to the input form for the notification channels and push the button “Manage Notification Channels” where we choose “Add new” for Pub/Sub.
In another tab open the Pub/Sub topic we have created and copy the topic name.
Then we can put that topic name in the first tab where we add a new channel and name the notification channel.
And we can now choose the created notification channel for our alerting policy.
The final step is to name the alert policy since it will be our marker to identify what to do when we get the alert.
The alert has been created but to make it possible to publish messages in the alloydb-scale-up topic we need to grant role pubsub.publisher to the internal service account for notifications and alerts. Here is how to do it using gcloud SDK:
PROJECT_ID=$(gcloud config get-value project)
gcloud pubsub topics add-iam-policy-binding alloydb-scaling --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-monitoring-notification.iam.gserviceaccount.com" --role="roles/pubsub.publisher"
Before going forward and creating a Cloud Run function we need to add a service account and grant some roles to that account. I am going to create an account alloydb-scale-sa and use that name in the following commands. This is how I do that in a cloud shell session.
PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create alloydb-scale-sa --project $PROJECT_ID
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:alloydb-scale-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/alloydb.admin"
Then we can go forward and create a cloud function. In console switch to Cloud Run Functions and click “Create Function”.
There we need to provide the function name, choose Pub/Sub as a trigger and select our topic. Then we need to expand the runtime settings.
In the runtime settings we need to select the service account alloydb-scale-sa we created earlier.
On the next screen we need to put our code which will be responsible for scaling up and scaling down the AlloyDB instance. The name for the cluster, instance and location will be in the alert message and the name of the alert policy will define what kind of action we should execute. I’ve put a sample code here. This is a simplified example and in production you probably need to define your own parameters, conditions and error handling.
After pushing the deploy function we need to wait until the function is completely built and deployed.
When the function is deployed we need t ogo back to the Pub/Sub topic and click on the eventarc subscription.
We need to edit the subscription and change the service account to our alloydb-scale-sa account.
Save it by clicking the Update button at the bottom.
In theory you can use separate accounts for function and invocation but I’ve put the same one here for simplicity.
We also need to enable that service account or any other account assigned to the Pub/Sun subscription to be able to invoke the function. Here is how you can do it for the alloydb-scale-sa account.
PROJECT_ID=$(gcloud config get-value project)
gcloud functions add-invoker-policy-binding alloydb-scale-fnc --region=us-central1 --member="serviceAccount:alloydb-scale-sa@$PROJECT_ID.iam.gserviceaccount.com"
We have the alert set up, notifications, subscription and the function to react to parse the notification and scale up our instance if the mean CPU load exceeds 85% of CPU. To test it out, I’ve built a client VM with pgbench and run a TCP-B-like benchmark on my AlloyDB instance. You can read how to do that in detail in the public guide.
When I run the pgbench it creates a CPU load of about 87–89% which is sufficient to trigger the alert. After 15 minutes it creates an incident which you can see on the picture.
The alert is published to the Pub/Sub topic and the function parses the message, gets the information about the instance and scales up the instance.
And we can see the AlloyDB instance is getting updated.
The scaling up works now. How can we scale down automatically? So, the next step is to create another alert for a CPU lower than 10% to scale the instance down with the name “alloydb-scale-down”. The steps are exactly the same as for the previous alert and the only difference is the trigger action
And the policy name
The policy creates a request to scale down the instance if the mean CPU utilization for the last 15 minutes is lower than 10%.
We have created a policy to scale up and down our instance based on average CPU utilization. But there is a lot of different options you can use along with the policy using filters and conditions on the alert, different triggers based on other monitored values and how you want to increase or reduce the number of CPUs on the instances. Try it and let me know how it works.






















Top comments (0)