Imagine a situation where you need to download data from external sources and you need to make complex calculations and aggregations on it. You don't know in advance the amount of data you will have and the schedule of flow during the day, more calculations and aggregations process are mono-thread.
On-premises, to deal with this kind of situation, you may need to put several computers to perform calculations and aggregation, one or more computers to collect data and at least one orchestrator computer to coordinate the flow between all the servers.
In Azure, you have several options. You can use Azure Batch, you can use Azure Service Fabric but you can, also, use Azure Function to orchestrate this kind of flow.
Take a look at this setup:
One Azure function to collect the data and send them to blob containers
A pool of VM to perform calculations
One Azure function to start a VM and send it a message to collect a file to process.
One Azure function to collect the result
An Azure Storage queue to communicate between functions and computers in the pools
One of the goals of this setup is to avoid having computers online when there is no work to do and to build a reliable solution.
Four parts are necessary.
The Data Collection part
To collect the data and put them in an Azure Storage blob container.
The VM Selector part
To create a message in a storage queue and check for an un-started VM to start it. If all VM in the pool are already working, simply add the message in the least charged VM.
The Data Processing part
At the machine start, it checks the queue for a message, takes the first message, copies the file from the blob container to a local path, removes the message in the queue and starts working with the data.
If it needs another action, it can send the transformed file to the same blob container so it can re-enter in the cycle.
After all the action the VM sends a signal to indicate that the process with the data is finished and it can be stoped.
Result processing Part
Send the result to a Storage Table and Stop the VM.
Note this is a simplified version of proof of concept I made.
What do we need?
A Timer Azure function to collect and tag data and move files into a blob container, funcTimerData
A Blob Azure function that responds to every new blob, assign a VM, starts it and puts a message in the queue, with the action name, VM name, and file name. funcBlobDataProcessing
An HTTP Azure function to response to VM request, funcHttpVMStop, it must stop the VM and write the name of the result into a table and finally delete the original file. funcHttpResult
To have a naming convention for functions name as for the use of modules is part of the best practices. You may enter in some situations where the same code needs to be triggered by more than one event. You need to identify the function clearly (put the trigger name in the function name) and share code execution (with a module).
For the storage we need:
A blob container to store data file, data
A table to store the log, tbllog
A table to store the list of the VM, tblvm
A table to store the list of result file tblresult
A queue for the message between funcBlobDataProcessing and VM, QueueProcessing
We also need:
A bunch of VM named vm01 to vm10 for example.
A Script or a Software on the VM that collects data from the queue, copy the file and consume the data. If there are no data left in the local folder and no message in the queue the WM needs to request a shutdown.
As we need to share some code between function, we need a module, this module will store the business logic and Azure function will act as an interface.
Modules in an Azure Functions App are automatically loaded and available for all functions. You just need to add modules inside a modules folder in the functions app.
Let's start by creating the storage table, queue, and blob containers.
# Get the Storage account name from the Az Functions App
$RessourceGroup = "devtoServerless201909"
$FunctionAppName = "devtoServerless201909"
$functionapp = Get-AzWebApp -ResourceGroupName $RessourceGroup -Name $FunctionAppName
$FunctionStorageConfigString = ($functionapp.SiteConfig.AppSettings | where-object name -eq "AzureWebJobsStorage").Value
$FunctionStorageConfigHash = ConvertFrom-StringData -StringData $FunctionStorageConfigString.Replace(";","`r`n")
# Get the Storage account object
$storageAccountObject = Get-AzStorageAccount -ResourceGroupName $RessourceGroup -Name $FunctionStorageConfigHash.AccountName
# Create Containers, queues and tables
New-AzStorageContainer -context $storageAccountObject.Context -Permission Off -name "data "
New-AzStorageQueue -context $storageAccountObject.Context -name "queueprocessing"
"tbllog tblvm tblresults".split() | New-AzStorageTable -context $storageAccountObject.Context
Now let's start with the first function, funcTimerData. It's a timer function. The function needs only one output to create the file collected from the APIs.
{
"bindings": [
{
"name": "Timer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 * */10 * * *"
},
{
"type": "blob",
"name": "DataOut",
"path": "data/{rand-guid}.json",
"connection": "AzureWebJobsStorage",
"direction": "out"
}
],
"disabled": false
}
To write data to the blob container we only need to use Push-OutputBinding
Push-OutputBinding -Name DataOut -Value $SomeJsonValue
As you see in the binding configuration, the function will create the file with a random name in the container data.
The second function is more complexes. We need to retrieve the name of the newly created file in the blob container, get the list of the VM from tblvm table and we need also to query VM status and start one and finally, we need to create a message in the QueueProcessing Queue.
The function.json file is like that
{
"bindings": [
{
"name": "Dataforprocessing",
"type": "blobTrigger",
"direction": "in",
"path": "data/{name}",
"connection": "AzureWebJobsStorage"
},
{
"type": "table",
"name": "vmtable",
"tableName": "tblvm",
"partitionKey": "vmlist",
"take": 50,
"connection": "AzureWebJobsStorage",
"direction": "in"
},
{
"type": "queue",
"name": "OutputMessageToVm",
"queueName": "queueprocessing",
"connection": "AzureWebJobsStorage",
"direction": "out"
},
{
"type": "table",
"name": "logtable",
"tableName": "tbllog",
"connection": "AzureWebJobsStorage",
"direction": "out"
}
],
"disabled": false
}
At the function level, we need the name of the file but we do not need to read the data. To get the file name we need to use TriggerMetadata parameter variables.
To get access to the VM table we need to reference the table bindings in the parameter section.
We need to add the name of the trigger bidings as one of the parameters. Without the function will generate an error.
You can note that I use the default bytes array. 1s I know the type I expect, it's not an obligation, so it's possible to use a string object instead. The function run time will send the content as value to the variable. If you plan to use binary data or mixed type, you need to use bytes array
param([byte[]] $Dataforprocessing, $TriggerMetadata, $vmtable)
To get the file name
$DataFileName = $TriggerMetadata.Name
The name of the table binding is referenced as a parameter for the function. The vmTable variable contains an array of a hashtable.
As we know the table structure of the table, we can get the property by name like this
foreach ($vm in $vmtable) {
$vm.vmName
}
As we need to interact with other Azure resources, we need an identity. Azure Functions provide a mechanism for that, Managed System Identity. By activating it, either during the deployment or after, it creates a special type of Service Principal (Managed Identity) in Azure AD to be used as a role in RBAC resources.
For VM you can use the VM Contributor roles (in production environments you need to apply the least privileges, Microsoft.Compute/virtualMachines/read, Microsoft.Compute/virtualMachines/restart/action, Microsoft.Compute/virtualMachines/deallocate/action) to deallocate and start VM.
To start the VM, we can use the start-azvm with βnowait switch.
We need also to create a message queue for the VM. The message needs to contain the name of the selected VM and the file name to be used by the system.
$QueueMessage = @{"vmName"=$SelectedVM; "FileName"=$TriggerMetadata.Name } | convertTo-json
Push-OutputBinding -Name OutputMessageToVm -Value $QueueMessage
Now, how can we get the message from the VM? There is only one queue for all VM. How can we deal with this unique queue?
First, we need to create an AzureStorageQueue Object by using Get-AzStorageQueue.
This object contains a property named CloudQueue we need to use to access queue messages. This Object contains several methods to deal with queue messages.
The main problem is that when you read a message in a queue, the message is removed from the queue, this is the behavior of the GetMessage method. Hopefully, there is a method to read messages without deleting them, GetMessageAsync. This method takes a timespan object as a parameter. It's an amount of time to make the message invisible to other processes. After that messages can be read.
With this method, we can read messages and search for the VM Name.
$invisibleTimeout = [System.TimeSpan]::FromSeconds(2)
while ($true){
$message = $queueObject.CloudQueue.GetMessageAsync($invisibleTimeout,$null,$null)
$qmsgHash = $message.Result.AsString | ConvertFrom-Json -ErrorAction SilentlyContinue
if ($qmsgHash.Vmname -eq "vm02") {
write-host $qmsgHash.filename
$queueObject.CloudQueue.DeleteMessageAsync($message.Result.Id,$message.Result.popReceipt)
break
}elseif ($null -eq $message.Result ){
break
}
}
Last thing for the funcBlobDataProcessing, we need to implement a log system. We have already set up a table inside the functions App storage account, tbllog.
Azure Storage tables are not like relational tables you can find in Oracle, SQL Server or MySql. It's a simple NoSql table.
To create a new row from our Azure storage table, a hashtable is needed. There must 2 elements, PartitonKey and RowKey, in the hashtable along with the data.
$LogEntry = @{
partitionKey = 'Vmselector'
rowKey = (new-guid).guid
VmName = $SelectedVM
FileName = $TriggerMetadata.Name
}
Push-OutputBinding -Name logtable -Value $LogEntry
You can notice that you can have multiple Push-OutputBinding in a function. But if you add a return in the function, the function will exit at this point and nothing after the return will be executed.
The funcHttpVMStop Function is the final function. VM needs to send a signal that they finished processing the file. The filename will be one of the parameters of the function.
An HTTP function is open to everyone by default. It's a public URI. We need to take care of the security, after all, the purpose of the function is to shut down and deallocate a VM. We need to make sure the request is legitimate and come from our system.
To meet our goal, we can test if the IP of the HTTP client is the IP of one of the VM of our pool. So, we need to add another parameter to the function, the VmName.
We can also check that the VM name is present in our VM Pool by checking the VM list we need to import the VM list table.
We need to restrict HTTP verbs to only allow GET.
{
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "Request",
"methods": [
"get"
]
},
{
"type": "http",
"direction": "out",
"name": "Response"
},
{
"type": "table",
"name": "vmtable",
"tableName": "tblvm",
"partitionKey": "vmlist",
"take": 50,
"connection": "AzureWebJobsStorage",
"direction": "in"
},
{
"type": "table",
"name": "restulttable",
"tableName": "tblresult",
"connection": "AzureWebJobsStorage",
"direction": "out"
}
],
"disabled": false
}
We need to import the system.net namespace
using namespace System.Net
The param Section need to reference Http and table binding in the param section
param($Request, $vmtable)
Now how to test if the VM is in our list
param($Request, $vmtable)
Now how to test if the VM is in our list
if ($Request.query.vmName -in $vmtable.vmname) {
$RessourcePib = Get-AzResource -ResourceId (Get-AzNetworkInterface -ResourceId (get-azvm -Name $request.Query.Vmname).NetworkProfile.NetworkInterfaces.id).IpConfigurations.PublicIpAddress.id
$VmSupposedId = (Get-AzPublicIpAddress -Name $RessourcePib.Name -ResourceGroupName $RessourcePib.ResourceGroupName).IpAddress
if ($request.Headers.client-ip.split(":")[0] -eq $VmSupposedId) {
$status = [HttpStatusCode]::OK
$body = $Request
}else {
$status = [HttpStatusCode]::InternalServerError
$Body = "not valid IP"
}
}
else {
$status = [HttpStatusCode]::InternalServerError
$body = "$Body = "not valid Vm name"
}
Note that I needed to use $request.Headers.client-ip.split(":")[0] and not the header's value $request.Headers.client-ip. This is Because Azure Function reference the client IP Address with the client TCP port.
To make it work we need to create a HttpResponseContext object for the http output binding. This object needs to have a statuscode, HttpStatusCode Object and a body (optional if it's not an HTTP 200, Ok code).
[HttpResponseContext] $ReturnObject = [HttpResponseContext] @{
StatusCode = $status
Body = $body
}
Push-OutputBinding -Name Response -Value $ReturnObject
```
{% endraw %}
We need also to log the result, the VM sends the HTTP GET request to function with VmName and FileName as Parameter.
To log the FileName, we need to use $Request.Query.Filename. We need to add the DateTime of the action and the VM for the record.
We need to construct a hashtable with a partitionKey, result, and a rowkey. as we did on the second function.
{% raw %}
```powershell
$ResultEntry = @{
partitionKey = 'resultβ
rowKey = (new-guid).guid
VmName = $SelectedVM
FileName = $TriggerMetadata.Name
DateTime = get-date -Format FileDateTime
}
Push-OutputBinding -Name restulttable -Value $ResultEntry
```
This setup is just an illustration of what you can do to orchestrate complexes workflow in Azure. The real solution is far more complex. But you can have here some of the best practices I have applied to my projects.
One of the more prominent always establishes a naming convention for function names. Another prime advice, use modules as it is easier to test and to share across functions in a project. And finally, make functions as simple as possible and do not hesitate to split complex operations into small functions.
You can follow me on Github, I made a module to automate some task on Azure Function.
Top comments (4)
Hi Olivier. Would you check the code snippets? They seem to be a bit off with extra newlines.
Thanks~
Hi Kim, thanks for the feedback. I will correct that asap
Thank you for making it more readable :)
Thank you to help me to improve my post