Wilson Rivera

Posted on Jan 25, 2022

Auto Scaling Azure SignalR Units

#signalr #azure #serverless #architecture

Context

Hi! I'm a programmer and a Software Architect. I want to tell you, how I could optimize costs implementing the autoscale of the Azure SignalR Units based on the demand on our platform.

If you need to read more information about Azure SignalR, please go to the official website

What's the problem with Azure SignalR Service?

OK, the short answer is that it's not possible to use connection units by demand.

The long answer or my explanation is that Azure SignalR Service is a great managed tool, that allows you to create great real-time applications and of course, it offers you an underlying infrastructure which guarentes high availability... it's perfect!! right? :), No, It's not.

Azure SignalR Service manages its pool connections through units. One unit represents 1000 concurrent connections, this means that if you know the number of concurrent connections of your application, you can choose the units needed. For example, if we have 4000 concurrent users, you must contract 5 units with Azure, why 5 units instead of 4 units?. Azure has limited unit options to select for the scaling: (https://docs.microsoft.com/en-us/azure/azure-signalr/signalr-howto-scale-signalr)

So, if you have 4000 concurrent users, then you must setup 5 units, but, in the future your users increase to 6000, you must manually change the configuration to 10 units, !!Yes, manually¡¡ Because, there is no automatic mechanism to do it.

Someone did an close solution

Stafford Williams did a close solution which was a guide for me. Do you want know what he did? Please go to his blog to read about that.

Hey Azure !! Here is my solution

Design

This is my idea...

The Azure SignalR Service setup page in the Azure Portal offers a monitoring section that contains an Alerts option. I decided to create a set of alerts based in the maximum connection acount metric (MCAM), it allows me to know when the MCAM exceeds the 1K, 2k, 5k, 10k, 20k, 50k. So, Each alert can execute an action group that permits to call an Azure Function.

The action group created was named ag-call-azure-function and it invokes an Azure function named fnscaleunits which it was created before.

Why an Azure Function?

Ok, the Azure Function acts like my count connections manager. The idea is to use Azure Rest API to know the maximun connection acount in a determinated moment, and to calculate the ideal number of units that should have the Azure SignalR instance. When the Az function has the number of units, then it will do a request to Azure Rest API to change the unit value inside of SignalR instance. Also, this process must be able to calculate to up or down. Easy, right? Ok, Let's continue with more technical details.

Implementation

oAuth2 Authentication

First, Azure Rest API requires an authentication method through an oAuht2 token. We need to register a new application inside the Azure Active Directory. With this application you can use the oAuth2 endpoint to get one token. The endpoint looks like:

https://login.microsoftonline.com/{your-tenant-id}/oauth2/token

Important: It's required to integrate the Active Directory Application with SignalR Instance. It can do adding the application to the Azure SignalR Instance. For this, you must go to the SignalR Instance setup page, in the Access control (IAM) option, in its Role assigments tab, and to choose our application.

Considerations

Also, we will use the next endpoints:

Get one oauth token: GET https://login.microsoftonline.com/{your-tenant-id}/oauth2/token
Get metrics about with the SignalR instance: GET https://management.azure.com/subscriptions/${subscription}/resourceGroups/${resourceGroup}/providers/Microsoft.SignalRService/SignalR/${nameSignalRInstance}/providers/microsoft.insights/metrics?api-version=2018-01-01&$filter=Endpoint eq 'server' or Endpoint eq 'client'
Update the value of units. PUT https://management.azure.com/subscriptions/${subscription}/resourceGroups/${resourceGroup}/providers/Microsoft.SignalRService/signalR/${nameSignalRInstance}?api-version=2020-05-01.

One best practice is to integrate secrets from an Azure key vault in the app function, please, check in Google how to do this.

Let´s introduce to the Azure Function.

Before, I must show you, a sequence diagram as a first view of the bussiness logic.

One alert inside the SignalR instance is actived.
This alert executes the action group, and it calls the azure function fnscalerunits.
The Azure function does a request to the oAuth2 endpoint to get a token.
With the Active Directory token, the function consumes the Azure Rest API, to get the metrics of the SignalR Instance.
The function calculates the ideal number of units.
The function sends a request to Azure Rest API updating the number of units of the instance.
It works!!!

And finally, the code....

const fetch = require('node-fetch');

//signalR service data from a keyvault through a env variable.
const subscription = process.env["subscription"];
const resourceGroup = process.env["resourceGroup"];
const nameSignalRInstance = process.env["nameSignalRInstance"];

//service principal data from a keyvault through a env variable.
const clientId = process.env["key-vault-clientId"];
const clientSecret = process.env["key-vault-secret"];
const oAuthTokenEndpoint = process.env["key-vault-oAuthUrl"];

const numberConnectionsPerUnit = 1000;

//This Array represents the ranges allowed by Azure for a SignalR Instance
const signalRUnitRanges = [
    {
        initialUnit: 0,
        finalUnit: 1
    },
    {
        initialUnit: 1,
        finalUnit: 2
    },
    {
        initialUnit: 2,
        finalUnit: 5
    },
    {
        initialUnit: 5,
        finalUnit: 10
    },
    {
        initialUnit: 10,
        finalUnit: 20
    },
    {
        initialUnit: 20,
        finalUnit: 50
    },
    {
        initialUnit: 50,
        finalUnit: 100
    }
];

let token = '';
let metrics = []; 
let contextGlobal;

//Main function
module.exports = async function (context, req) {
    context.log('Start the evaluation of units');
    contextGlobal = context;
    try {
        token = await GetToken();
        metrics = await GetMetrics();
        const idealUnit = GetIdealUnit();
        context.log.info(`ideal Unit ${idealUnit}`);
        let resultUpdate = UpdateUnits(idealUnit);
        context.log.info(`The result was successfully`);
    } catch(error) {
        context.log.error( error );
    }
    context.res = {
        status: 200
    };
}

//updates the capacity of the SignalR Service.
async function UpdateUnits( unit ) {

    var details = {
        'sku': {
           'name': 'Standard_S1',
           'tier': 'Standard',
           'capacity': unit
        },
        'location': 'eastus2'
    };

    return await fetch(`https://management.azure.com/subscriptions/${subscription}/resourceGroups/${resourceGroup}/providers/Microsoft.SignalRService/signalR/${nameSignalRInstance}?api-version=2020-05-01`,
    {
        method: 'PUT',
        headers: {
            'Authorization': `Bearer ${token}`,
            'Content-Type': 'application/json'
        },
        body:  JSON.stringify(details)
    }).then( (response ) => {
        if(response.ok) {
            return response.json();
        } else {
            throw new Error(`HttpStatus: ${response.status} Message: GetToken Failed: reason: ${response.statusText}`);
        }
    });
}

//Calculates what is the ideal unit value based on the last current number of connections.
function GetIdealUnit() {
    let clientMetrics = metrics[0];
    let serverMetrics = metrics[1];
    let currentMaxClientConnections = clientMetrics[clientMetrics.length - 1].maximum;
    let currentMaxServerConnections = serverMetrics[serverMetrics.length - 1].maximum;
    let currentTotalConnections = currentMaxClientConnections + currentMaxServerConnections ;

    contextGlobal.log.info(`currentTotalConnections: ${currentTotalConnections}`);
    let unitCalculated = currentTotalConnections / numberConnectionsPerUnit;
    contextGlobal.log.info(`unitCalculated: ${unitCalculated}`);

    for( let index = 0; index < signalRUnitRanges.length ; index++) {

        let currentUnitRange = signalRUnitRanges[index];

        if( unitCalculated >= currentUnitRange.initialUnit && unitCalculated < currentUnitRange.finalUnit) {
            console.log(`${unitCalculated} is between ${ currentUnitRange.initialUnit } and ${currentUnitRange.finalUnit} units`);
            return currentUnitRange.finalUnit;
        }
    }
}

//Allows obtaining a valid active directory token of Azure.
async function GetToken() {
    var details = {
        'grant_type': 'client_credentials',
        'client_id': clientId,
        'client_secret': clientSecret,
        'resource': 'https://management.azure.com/'
    };

    var formBody = [];
    for (var property in details) {
        var encodedKey = encodeURIComponent(property);
        var encodedValue = encodeURIComponent(details[property]);
        formBody.push(encodedKey + "=" + encodedValue);
    }
    formBody = formBody.join("&");

    return await fetch( oAuthTokenEndpoint, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8'
        },
        body: formBody
    })
    .then( (response) => {
        if(response.ok) {
            return response.json();
        } else {
            throw new Error(`HttpStatus: ${response.status} Message: GetToken Failed: reason: ${response.statusText}`);
        }
    })
    .then( data => data.access_token);

}
//Obtains the SignalR metrics in a time-periodic
async function GetMetrics() {
    return await fetch(`https://management.azure.com/subscriptions/${subscription}/resourceGroups/${resourceGroup}/providers/Microsoft.SignalRService/SignalR/${nameSignalRInstance}/providers/microsoft.insights/metrics?api-version=2018-01-01&$filter=Endpoint eq 'server' or Endpoint eq 'client'`, {
        method: 'GET',
        headers: {
            'Authorization': `Bearer ${token}`
        }
    }).then((response) => {
            if (response.ok) {
                return response.json();
            } else {
                console.log(response);
                throw new Error('HttpStatus: ' + response.status + ' Message:' + response.statusText);
            }
        })
        .then(data => {
            let clientMetrics = data.value[0].timeseries[0].data;
            let serverMetrics = data.value[0].timeseries[1].data;
            return [ clientMetrics, serverMetrics];
        });
}

Wait, you are thinking, but this function has many responsibilities, I know it, and it can be better, but it works for our case use.

Results

This solution permits to scale up and to scale out automatically, and to save money in your subscription.

In my case, I can have until 35k concurrent connections during a normal day of job, and in the evenings the units fall until 1k. It is my behavior:

Thank you, and I hope that this implementation works for you.

DEV Community