<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stefano d'Antonio</title>
    <description>The latest articles on DEV Community by Stefano d'Antonio (@unosd).</description>
    <link>https://dev.to/unosd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F511486%2F882cb795-933c-44f3-9c14-4dfc338c577e.jpeg</url>
      <title>DEV Community: Stefano d'Antonio</title>
      <link>https://dev.to/unosd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/unosd"/>
    <language>en</language>
    <item>
      <title>Business continuity and disaster recovery blueprints for enterprises</title>
      <dc:creator>Stefano d'Antonio</dc:creator>
      <pubDate>Thu, 25 Nov 2021 17:17:12 +0000</pubDate>
      <link>https://dev.to/unosd/business-continuity-and-disaster-recovery-blueprints-for-enterprises-4a0o</link>
      <guid>https://dev.to/unosd/business-continuity-and-disaster-recovery-blueprints-for-enterprises-4a0o</guid>
      <description>&lt;p&gt;Planning for disasters and having recovery processes in place is critical to any business; whether your domain is an e-commerce platform or a financial institution, a disruption of IT services means loss in revenue and reputation for your enterprise.&lt;/p&gt;

&lt;p&gt;What changes is the tolerance to those outages.&lt;/p&gt;

&lt;p&gt;If your system is down for 10 minutes it can either mean that the users will happily come back and play your online game later or that you will lose millions in critical transaction fees and your clients will go somewhere else where the system is more reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do I need to account for downtimes?
&lt;/h3&gt;

&lt;p&gt;Hardware failures is physiological; a rack of servers can suffer for faulty power suppliers, network switches, overheating and so on, despite all good prevention measures in place. Software can also fail and bugs/attacks can cause disruptions to the service.&lt;/p&gt;

&lt;p&gt;If a component develops a fault, your workload on that hardware/software can be disrupted and data can also be lost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terminology
&lt;/h3&gt;

&lt;p&gt;Before we dive deeper into the different options, let's clarify few terms that may not be familiar if you do not have to deal regularly with systems' reliability:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmzjiuleuei6n1jgli4t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmzjiuleuei6n1jgli4t.png" alt="RTO/RPO"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Recovery Time Objective (RTO)
&lt;/h4&gt;

&lt;p&gt;This is what the business defines as the maximum acceptable time it takes to get the system back up in case of an outage. It can be informally or contractually agreed with consumers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E.G.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your system is deployed in the &lt;strong&gt;West Europe Azure region&lt;/strong&gt; on &lt;strong&gt;Virtual Machines&lt;/strong&gt; within a single &lt;strong&gt;Availability Zone&lt;/strong&gt; and a single &lt;strong&gt;fault/update domain&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The rack of servers that hosts your &lt;strong&gt;VM&lt;/strong&gt; fails, users cannot reach your website anymore.&lt;/p&gt;

&lt;p&gt;How long do you deem acceptable to wait until the system is back up in a different rack/datacenter? That's &lt;strong&gt;RTO&lt;/strong&gt; in a nutshell.&lt;/p&gt;

&lt;h4&gt;
  
  
  Recovery Point Objective (RPO)
&lt;/h4&gt;

&lt;p&gt;How much data can you afford to lose? &lt;strong&gt;RPO&lt;/strong&gt; is measured in time from the outage.&lt;/p&gt;

&lt;p&gt;This is all about data, has nothing to do with your system being back to life.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E.G.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your rack of server from the previous example develops a fault on hard drives and data is lost.&lt;/p&gt;

&lt;p&gt;Assuming you back up your data regularly, how old is your last back up? That's the &lt;strong&gt;RPO&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you back up the data at 9:00PM every day, worst case scenario is that the outage happens at 8:59PM, then your backup will be 1 day old and you would have lost ~24 hours of data. 24h is your &lt;strong&gt;RPO&lt;/strong&gt;. You could be lucky and the outage may happen at 9:01PM right after a completed backup, then you have lost 1 minute of data, but you need to account for the worst case scenario.&lt;/p&gt;

&lt;h4&gt;
  
  
  Service-level agreement (SLA)
&lt;/h4&gt;

&lt;p&gt;This is probably the most familiar term; likely you have heard of SLA as it is all over the place for cloud resources.&lt;/p&gt;

&lt;p&gt;This is the maximum contractual downtime for a service over the year. &lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Azure&lt;/strong&gt;, if a service is down for more than that time, you can ask for &lt;a href="https://docs.microsoft.com/en-us/partner-center/request-credit?tabs=workspaces-view#service-outages-service-level-agreement-issues-credit" rel="noopener noreferrer"&gt;credits&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is to indicate the confidence level of the provider in the availability of the service; things could still go wrong and an outage could last longer, but it is extremely unlikely as this figure comes out of careful Microsoft &lt;strong&gt;BCDR&lt;/strong&gt; planning internally for each service.&lt;/p&gt;

&lt;p&gt;You often hear "three nines, four nines, ...", this refers to the digits in the percentage. E.G. 99.9 -&amp;gt; Three nines, 99.9999 -&amp;gt; Six nines...&lt;/p&gt;

&lt;p&gt;How do we translate that into time? Let's consider 99.99% SLA, that mean that there is a chance that 0.01% of the year, the service will be unavailable; OK, what's that in actual time the system can be down?&lt;/p&gt;

&lt;p&gt;Daily: 8s&lt;br&gt;
Weekly: 1m 0s&lt;br&gt;
Monthly: 4m 22s&lt;br&gt;
Quarterly: 13m 8s&lt;br&gt;
Yearly: 52m 35s&lt;/p&gt;

&lt;p&gt;You could very well have a server unreachable for 8 seconds every day over the year of one off for 52 minutes. Having your system down for an hour could have a massive impact in certain domains, but yet 99.99% is quite a good number.&lt;/p&gt;

&lt;h4&gt;
  
  
  Composite SLA
&lt;/h4&gt;

&lt;p&gt;If you consider a single service, 99.99% is a common figure, but your system will unlikely be that simple, it will be usually composed of multiple chained components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E.G.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One &lt;strong&gt;web app&lt;/strong&gt; talking to a &lt;strong&gt;back-end API&lt;/strong&gt; talking to a &lt;strong&gt;database server&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ibx7z7un8en0p42lfrj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ibx7z7un8en0p42lfrj.png" alt="Composite SLA"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even if each component has 99.99% &lt;strong&gt;SLA&lt;/strong&gt;, in the worst case scenario, they could all go down sequentially: &lt;strong&gt;Web app&lt;/strong&gt; is down for 52m then is back up, when the &lt;strong&gt;Web App&lt;/strong&gt; is back, the &lt;strong&gt;API&lt;/strong&gt; is down for 52m and then is back up, when the &lt;strong&gt;API&lt;/strong&gt; is back up, the &lt;strong&gt;DB&lt;/strong&gt; is down for 52m... 2 hours and 36 minutes. OK, that syzygy is quite unlikely to happen, but nonetheless is possible and there is no responsibility on the service provider if that happens.&lt;/p&gt;

&lt;p&gt;The provider would have respected their contractual &lt;strong&gt;SLAs&lt;/strong&gt; for each component, so no credits for you, but your system would have still been down for hours.&lt;/p&gt;

&lt;p&gt;You can use this calculator to convert percentage into time over the year: &lt;a href="https://uptime.is/99.99" rel="noopener noreferrer"&gt;Calculate time from SLA&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Those are the Azure services &lt;strong&gt;SLAs&lt;/strong&gt; in a nice map: &lt;a href="https://azurecharts.com/sla" rel="noopener noreferrer"&gt;Azure charts SLA&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As discussed, a given system is usually composed of different parts, it is possible to calculate the composite &lt;strong&gt;SLA&lt;/strong&gt; with the formulas here: &lt;a href="https://docs.microsoft.com/en-us/azure/architecture/framework/resiliency/business-metrics#composite-slas" rel="noopener noreferrer"&gt;Composite SLA&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have built a tool to do this for you by adding components and defining dependencies, it's on GitHub: &lt;a href="https://dev.toSLA%20Calculator"&gt;https://github.com/UnoSD/SlaCalculator&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fktl1mvepz7k122lo3t8v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fktl1mvepz7k122lo3t8v.png" alt="SLA calculator"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2022-02-03 update: I have finally published a graphical web app version of the &lt;strong&gt;SlaCalculator&lt;/strong&gt;, please find it here: &lt;a href="http://wiki.unosd.com/slacalculator/" rel="noopener noreferrer"&gt;http://wiki.unosd.com/slacalculator/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrae607p5a3j1hmgpv0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrae607p5a3j1hmgpv0w.png" alt="SlaCalculator"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article is more for business guidance so I will not dive deeper into the technical aspects, but in &lt;strong&gt;Azure&lt;/strong&gt; you can leverage within datacenter distribution (&lt;strong&gt;Availability sets&lt;/strong&gt;), across datacenters distribution (&lt;strong&gt;Availability zones&lt;/strong&gt;) and across regions distribution (&lt;strong&gt;Region pairs&lt;/strong&gt;) to maximise the &lt;strong&gt;SLA&lt;/strong&gt; for your solution, see the picture below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo766scus3zkub5otyhg8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo766scus3zkub5otyhg8.png" alt="Compute SLA"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Picture from: &lt;a href="https://azure.microsoft.com/en-gb/resources/azure-resiliency-infographic/" rel="noopener noreferrer"&gt;Azure resiliency infographic&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  High availability (HA)
&lt;/h4&gt;

&lt;p&gt;This is quite a generic term and, frankly, you will find it mostly in marketing papers. It means a system resilient to failures, but it does not bear a unit of measure and what I consider to be highly available could be something that has a 70% &lt;strong&gt;SLA&lt;/strong&gt; with a manual failover, someone would say &lt;strong&gt;HA&lt;/strong&gt; only for 99.999% &lt;strong&gt;SLA&lt;/strong&gt;. If you would like a less "abrupt" explanation, have a look at the comprehensive &lt;a href="https://en.wikipedia.org/wiki/High_availability" rel="noopener noreferrer"&gt;Wikipedia page&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Disaster recovery (DR)
&lt;/h4&gt;

&lt;p&gt;By now, if you have gone through the whole article, you should know this is our main focus here; A dictionary definition could be a set of policies, tools and process to recover data/compute from an unforeseen disaster, all the options available to implement this are in the next section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Different levels of DR
&lt;/h3&gt;

&lt;p&gt;Bear in mind that different levels can apply to data and compute. If your &lt;strong&gt;web portal&lt;/strong&gt; is inaccessible, that's still a disaster, but it is not usually as bad as where there is data loss; as long as you have a recent backup of your data, you are in a much better position and can tolerate the system being out of service, but bringing it back up to the same state where you left it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flq4ketgjxzzde755lqy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flq4ketgjxzzde755lqy5.png" alt="Levels"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  No DR
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs36r23f5eulz6dgx57vj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs36r23f5eulz6dgx57vj.png" alt="No DR"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the most self-explanatory option; no plan, no resources and no costs.&lt;/p&gt;

&lt;p&gt;This can still be a valid option in certain scenarios where your service is not critical and users can wait days/months to access your system again.&lt;/p&gt;

&lt;p&gt;If my blog was down for a month, I would be disappointed, but I could just start all over again on a new platform from scratch. There are also real business which could tolerate this, but it is quite rare. It could apply to some internal employee system in certain companies.&lt;/p&gt;

&lt;p&gt;You could still have a backup of your articles (like I do) on your laptop and you could restore that on a different platform if &lt;strong&gt;dev.to&lt;/strong&gt; is unavailable; if I have a process to back that up on every change, to me that would count as having a &lt;strong&gt;manual DR plan&lt;/strong&gt; for &lt;strong&gt;data&lt;/strong&gt;, but &lt;strong&gt;no DR&lt;/strong&gt; for &lt;strong&gt;compute&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Manual
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltqdv71zi6qxngznr0dl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltqdv71zi6qxngznr0dl.png" alt="Manual"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This option has some overlap with the previous one. When I refer to "manual", I do not include someone noticing the system is down and tries to fix it and puts it back up by clicking around cloud portals and uploading a site somewhere else, I will classify that as "no DR" for the purpose of this article.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual&lt;/strong&gt; is a well documented process for a step-by-step procedure to react to outages.&lt;/p&gt;

&lt;p&gt;A cloud operations team of system administrators will get a notification of service disruption and will be able to respond accordingly and start the failover process.&lt;/p&gt;

&lt;p&gt;The process could be going into the &lt;strong&gt;Azure&lt;/strong&gt; portal, creating a new &lt;strong&gt;virtual machine&lt;/strong&gt;, uploading a &lt;strong&gt;web application&lt;/strong&gt;, switching the &lt;strong&gt;DNS servers&lt;/strong&gt; configuration to point to the new server. All documented and the environment could be back up in a matter of hours.&lt;/p&gt;

&lt;p&gt;This is how many organisations approach &lt;strong&gt;DR&lt;/strong&gt;, but this approach is significantly slow (can take from hours to days). Problems can happen outside office hours and you need to have a team on-call during the night, make sure they are skilled, there is no single point of failure (people on holidays, sick leave, leaving the company and so on...) and despite all of this, &lt;strong&gt;human error&lt;/strong&gt; is still a significant factor.&lt;/p&gt;

&lt;p&gt;This approach may sound appealing as there is no additional cost for redundant infrastructure, but the &lt;strong&gt;TCO&lt;/strong&gt; (total cost of ownership) of the solution must take into account people, training and errors.&lt;/p&gt;

&lt;h4&gt;
  
  
  Infrastructure as code
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcd6jseudbzfxw179dix.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcd6jseudbzfxw179dix.png" alt="Infrastructure as code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the first automated approach. We are removing human error from the equation, RTO will be much more predictable, you can perform regular tests of this and measure reaction times.&lt;/p&gt;

&lt;p&gt;You need to make sure your development teams understand and build scripted environments (or your DevOps engineers, or sysadmins). This is good practice in any case, but not always the reality in the IT world. Most of the legacy applications are still deployed manually on bespoke infrastructure.&lt;/p&gt;

&lt;p&gt;We think of "the cloud" as virtually unlimited scalability, but, in reality, the cloud is just yet another datacentre with its own physical and virtual limitations. There is the chance that, when you have an outage and try to redeploy your entire infrastructure in a different &lt;strong&gt;zone/region&lt;/strong&gt;, you can hit capacity limits. This could be a disaster from which you cannot recover if you are not prepared.&lt;/p&gt;

&lt;p&gt;A solution in the &lt;strong&gt;Azure&lt;/strong&gt; world would be to purchase &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview" rel="noopener noreferrer"&gt;&lt;strong&gt;capacity reservations&lt;/strong&gt;&lt;/a&gt;; you pay for the guarantee that you will have that capacity when you need to failover. This option increases significantly the cost of just a &lt;strong&gt;repository&lt;/strong&gt; with your scripts, but can save the day during an emergency. &lt;/p&gt;

&lt;p&gt;What you will save is the &lt;strong&gt;cost of **management&lt;/strong&gt; of the inactive resources: no OS patching, no upgrading applications, no security alerts and so on.&lt;/p&gt;

&lt;p&gt;The whole deployment process should be ideally automated with &lt;strong&gt;CD pipelines&lt;/strong&gt; to create environments and deploy application workloads with minimal to no configuration effort in case of a disaster.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cold environment
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkes90xccr0br3oh6f26y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkes90xccr0br3oh6f26y.png" alt="Cold environment"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now definitions start to become more woolly and more technology specific.&lt;/p&gt;

&lt;p&gt;A cold environment is an environment that is already deployed, but stopped.&lt;/p&gt;

&lt;p&gt;Cost savings may vary depending on the resources.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Azure&lt;/strong&gt; this could mean: &lt;strong&gt;VM&lt;/strong&gt; in a &lt;strong&gt;deallocated&lt;/strong&gt; state, &lt;strong&gt;Azure Firewall&lt;/strong&gt; and &lt;strong&gt;Application Gateway&lt;/strong&gt; &lt;strong&gt;stopped&lt;/strong&gt; et cetera.&lt;/p&gt;

&lt;p&gt;When a &lt;strong&gt;VM&lt;/strong&gt; is &lt;strong&gt;deallocated&lt;/strong&gt;, you don't pay for it, you just pay for the &lt;strong&gt;storage disks&lt;/strong&gt; which preserve the state (tiny fraction of the cost). This means that you can start it up in half the time and you don't have to install software on it after as you have everything ready in your disk. Similarly for other resources and for more auto-scaling cloud-native services (&lt;strong&gt;Azure SQL serverless&lt;/strong&gt;, &lt;strong&gt;Cosmos DB&lt;/strong&gt; et cetera).&lt;/p&gt;

&lt;p&gt;You can still have capacity problems and you can solve them in the same way you would with the previous approach (&lt;strong&gt;capacity reservations&lt;/strong&gt;). The main difference is that this approach will be faster as you have just to start your environment, not provision it and prepare it as it can take 50% or more of the recovery time.&lt;/p&gt;

&lt;p&gt;This approach seems better than the &lt;strong&gt;IaC/CD&lt;/strong&gt; one, but has a significant downside:&lt;/p&gt;

&lt;p&gt;You still have to keep your environment up to date!&lt;/p&gt;

&lt;p&gt;Your applications and the &lt;strong&gt;OSs&lt;/strong&gt; are already deployed into the &lt;strong&gt;storage disks&lt;/strong&gt; or hosting platforms; you need to have a frequent (possibly automated) process for updating also your inactive environment in case it is needed.&lt;/p&gt;

&lt;p&gt;Imagine you install &lt;strong&gt;Windows Server 2016&lt;/strong&gt; with 2017/11 security patches and in 2017/12/14 a new dangerous security bug surfaces, you do not upgrade the system as it is in hibernation, then you need to fail-over on 2018/01 and you end up with a vulnerable environment that could be hacked during the fail-over period.&lt;/p&gt;

&lt;h4&gt;
  
  
  Warm standby environment
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcvw6y5jfxmgoz7l4lyb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcvw6y5jfxmgoz7l4lyb.png" alt="Warm standby"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This strategy bear similar/equal costs to a fully replicated production environment.&lt;/p&gt;

&lt;p&gt;You have a second production environment, you keep it up to date with applications, configuration, &lt;strong&gt;OS&lt;/strong&gt; patches and you treat it as if it was productions; the difference?&lt;/p&gt;

&lt;p&gt;This environment does not process any data, does not actively run any task or interact with any user. It is just there, burning money.&lt;/p&gt;

&lt;p&gt;Why would you do that? To have an almost-instantaneous fail-over. If the primary system has an outage, you can automate a &lt;strong&gt;DNS&lt;/strong&gt; change or &lt;strong&gt;load balancer&lt;/strong&gt; to immediately direct the traffic to the secondary environment. In &lt;strong&gt;Azure&lt;/strong&gt;, all the &lt;strong&gt;load balancers&lt;/strong&gt; (DNS/Layer 7/Layer 4) include &lt;strong&gt;health probes&lt;/strong&gt; to automatically fail-over to a secondary if a primary environment does not provide the expected response.&lt;/p&gt;

&lt;p&gt;To save on costs, you could have a "smaller scale" &lt;strong&gt;warm standby environment&lt;/strong&gt; (less &lt;strong&gt;CPUs&lt;/strong&gt;, less &lt;strong&gt;RAM&lt;/strong&gt;, cheaper &lt;strong&gt;SKUs&lt;/strong&gt; for services et cetera); you will need your users to tollerate a slower experience in the "rare" event of outages.&lt;/p&gt;

&lt;h4&gt;
  
  
  Active/active
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qfu8g90gccty67sx7mw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qfu8g90gccty67sx7mw.png" alt="Active/active"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Eventually, we analyse the "state of the art" of &lt;strong&gt;BCDR&lt;/strong&gt; strategies.&lt;/p&gt;

&lt;p&gt;This is the same as the &lt;strong&gt;warm standby&lt;/strong&gt;, but with a small difference with a massive impact: Both environments are active production environments.&lt;/p&gt;

&lt;p&gt;This usually involves a &lt;strong&gt;load balancer&lt;/strong&gt; at the ingress of the system that distributes the load to both environments, usually in a &lt;strong&gt;round robin&lt;/strong&gt; fashion, but can also be more sophisticated and distribute load based on geographic latency and, in case of an outage in one region, direct all the traffic to the only working environment.&lt;/p&gt;

&lt;p&gt;The massive difference between this and the previous option is that you can now make the most of both environments. Theoretically you could reduce performance (and costs) of both environments to 50%, but also send half of the users to one and the other half to the other environment keeping response times consistent. Most of the cloud-native services also offer no-downtime scaling, so you could scale up one environment in case of failure of the other one.&lt;/p&gt;

&lt;p&gt;Why do people still use &lt;strong&gt;warm standby&lt;/strong&gt; (&lt;strong&gt;active/passive&lt;/strong&gt;) when they could do &lt;strong&gt;active/active&lt;/strong&gt;? In reality, applications (in particular legacy ones) are often &lt;strong&gt;stateful&lt;/strong&gt;, you cannot just handle one request on a server and another one on a different server, this can break existing applications or no-so-well designed new application; for this reason &lt;strong&gt;warm standby&lt;/strong&gt; is quite popular and often still requires careful planning as switching to fail-over could be complicated and could corrupt the state; often it requires connection draining or remediation if the application does not support handling traffic on multiple hosts.&lt;/p&gt;

&lt;p&gt;This is truly what should be the goal for migrating legacy systems and the design for all new systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsqhqa6n2g8c6yf98nee9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsqhqa6n2g8c6yf98nee9.png" alt="Cartesian"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Active/active strategy in practice
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Active/active&lt;/strong&gt; is "easy" in theory; in practice we have discussed that it can be hard for stateful applications, single-tenanted solutions (where each user/group needs to have dedicated infrastructure).&lt;/p&gt;

&lt;p&gt;Start small, those ideas can apply to a whole solution or can be applied individually to part of the solutions.&lt;/p&gt;

&lt;p&gt;You could apply this strategy to part of the system or to new /rewritten components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E.G.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You need to add a new background service to your application.&lt;/p&gt;

&lt;p&gt;Instead of running this on the same infrastructure, consider building a separate stateless microservice,&lt;/p&gt;

&lt;p&gt;add a &lt;strong&gt;load balancer&lt;/strong&gt; to distribute the tasks,&lt;/p&gt;

&lt;p&gt;think about concurrency when storing the results to a data store,&lt;/p&gt;

&lt;p&gt;use asynchronous message queues to send requests,&lt;/p&gt;

&lt;p&gt;create two queues one per region and add retries with fail-over to your application and distributed reads on the microservice,&lt;/p&gt;

&lt;p&gt;avoid strict ordering requirements et cetera.&lt;/p&gt;

&lt;p&gt;This is just a simple example and we could go on forever with best practices to build a resilient highly available solution.&lt;/p&gt;

&lt;h4&gt;
  
  
  Education
&lt;/h4&gt;

&lt;p&gt;Education is key to innovation; it is a culture that needs to encouraged by leadership and built from the ground in each single line of code.&lt;/p&gt;

&lt;p&gt;With strong guidance and a good enterprise skilling program, you can educate developers to build resilient systems in each piece of code and that would enable a good modern architecture for the whole system.&lt;/p&gt;

&lt;p&gt;Stateful applications cannot be made stateless with infrastructure and architecture, that needs to start at code level.&lt;/p&gt;

&lt;p&gt;How can you drive this? Hire talents with growth mindset, nurture them by providing opportunities for learning in-person, virtually, on-demand and promoting cloud adoption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Microsoft Learn&lt;/strong&gt;, &lt;strong&gt;LinkedIn learning&lt;/strong&gt;, &lt;strong&gt;Pluralsight&lt;/strong&gt; and so on; there are plenty of platforms with excellent material on stateless, cloud-native, modern architectures.&lt;/p&gt;

&lt;h6&gt;
  
  
  RTO/RPO image: "Graphic representation of RPO and RTO in case of an incident" by &lt;a href="https://commons.wikimedia.org/wiki/File:RPO_RTO_example_converted.png" rel="noopener noreferrer"&gt;Own work&lt;/a&gt; is licensed under CC BY-SA 4.0.
&lt;/h6&gt;

</description>
    </item>
    <item>
      <title>API Management and Azure Functions secured without secret keys leveraging managed identities (RSS of TV series' new episodes)</title>
      <dc:creator>Stefano d'Antonio</dc:creator>
      <pubDate>Sat, 20 Nov 2021 11:57:20 +0000</pubDate>
      <link>https://dev.to/unosd/api-management-and-azure-functions-secured-without-secret-keys-leveraging-managed-identities-rss-of-tv-series-new-episodes-2gmn</link>
      <guid>https://dev.to/unosd/api-management-and-azure-functions-secured-without-secret-keys-leveraging-managed-identities-rss-of-tv-series-new-episodes-2gmn</guid>
      <description>&lt;p&gt;I wanted to keep up with new episodes of my favourite TV shows as they aired; being a big &lt;strong&gt;RSS feed&lt;/strong&gt; fan, I wanted to centralise all the news in one place and I built a simple &lt;strong&gt;API&lt;/strong&gt; that checks for new episodes and turns them into &lt;strong&gt;RSS XML&lt;/strong&gt; for my feed reader and hosted it on cheap &lt;strong&gt;Azure Functions&lt;/strong&gt; on &lt;strong&gt;consumption plan&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xYlrlad1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uerkhgu18j1pz58dfasd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xYlrlad1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uerkhgu18j1pz58dfasd.png" alt="Feed reader" width="880" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Link to the GitHub repository: &lt;a href="https://github.com/UnoSD/TvShowRss"&gt;TvShowRss&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WYEuCdP_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3ge1r9rieou3uxaawid8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WYEuCdP_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3ge1r9rieou3uxaawid8.png" alt="Architecture" width="811" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Securing with function keys
&lt;/h3&gt;

&lt;p&gt;Making the &lt;strong&gt;Azure Function&lt;/strong&gt; public, I would have opened my solution to abuse attacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E.G.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Someone knowing the &lt;strong&gt;URL&lt;/strong&gt;, could run a massive invocation of my functions; given that &lt;strong&gt;Azure Functions&lt;/strong&gt; scales seamlessly with no virtual limit, I could end up with a colossal bill on my &lt;strong&gt;Azure subscription&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So I decided to add an &lt;strong&gt;API Management&lt;/strong&gt; instance in consumption plan in front of the functions &lt;strong&gt;API&lt;/strong&gt; and set quotas, so only a certain amount of executions would be allowed per minute.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Q2a-EOS0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dmq35h1u2ekxnzjppnci.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Q2a-EOS0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dmq35h1u2ekxnzjppnci.png" alt="Rate limit" width="880" height="126"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wanted also to restrict access to my functions only from the &lt;strong&gt;outbound IP ranges&lt;/strong&gt; of my &lt;strong&gt;APIM&lt;/strong&gt; instance, but the consumption plan does not have a specific outbound range, so I had to secure it only with function authorisation (for now).&lt;/p&gt;

&lt;p&gt;To secure the &lt;strong&gt;back-end&lt;/strong&gt; I initially restricted the functions with &lt;strong&gt;function keys&lt;/strong&gt;; so only people in possession of that key could invoke my functions.&lt;/p&gt;

&lt;p&gt;This well known pattern, carries a big burden: to make sure your solution is secure, you need to have a key rotation system in place, in case someone manages to steal that secret value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication policies
&lt;/h3&gt;

&lt;p&gt;Sometimes in 2021, a new feature was introduced in &lt;strong&gt;API Management&lt;/strong&gt;, the authentication policies: &lt;a href="https://docs.microsoft.com/en-us/azure/api-management/api-management-authentication-policies"&gt;https://docs.microsoft.com/en-us/azure/api-management/api-management-authentication-policies&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this feature, I could leverage the &lt;strong&gt;managed identity&lt;/strong&gt; of the &lt;strong&gt;APIM&lt;/strong&gt; instance and get rid of the keys completely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Easy Auth
&lt;/h3&gt;

&lt;p&gt;There is another amazing feature of &lt;strong&gt;App Services/Functions&lt;/strong&gt; required, this is &lt;strong&gt;Easy Auth&lt;/strong&gt;; this name is not well known, and the feature in the portal is just called &lt;strong&gt;Authentication/Authorization&lt;/strong&gt;, which is a quite generic term and makes it harder to find in search engines. This is the link: &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/configure-authentication-provider-aad"&gt;https://docs.microsoft.com/en-us/azure/app-service/configure-authentication-provider-aad&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This feature takes the burden of auth away from your application code and gets implemented at &lt;strong&gt;PaaS&lt;/strong&gt; level.&lt;/p&gt;

&lt;p&gt;You just configure your app to block non-authenticated traffic and it does the rest for your (create &lt;strong&gt;AAD App Registration&lt;/strong&gt;, configure it, block anonymous access, check authentication token on calls):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KQRmeSvd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gugwsvr6qp0chul1qu3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KQRmeSvd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gugwsvr6qp0chul1qu3y.png" alt="Easy Auth" width="728" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6QAMy_yK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0anirh4pxcdv6m1yebdk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6QAMy_yK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0anirh4pxcdv6m1yebdk.png" alt="Easy Auth set up" width="880" height="894"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now you can set up your &lt;strong&gt;Functions&lt;/strong&gt; as back-end of &lt;strong&gt;API Management&lt;/strong&gt; and add the &lt;code&gt;authentication-managed-identity resource&lt;/code&gt; policy inbound, which will automatically add the &lt;code&gt;Authorization&lt;/code&gt; header to the requests and authenticate seamlessly, without stored secrets to your functions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9J5-F-me--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6k6lu2jugtlbwgbyruvm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9J5-F-me--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6k6lu2jugtlbwgbyruvm.png" alt="Policy" width="880" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Unexpected problem
&lt;/h3&gt;

&lt;p&gt;I was confident that it was going to take a few minutes to enable this and remove all the references to the old function key, but when I ran my test I had the following error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IDX10214: Audience validation failed. Audiences: ‘[PII is hidden]’. Did not match: validationParameters.ValidAudience: ‘[PII is hidden]’ or validationParameters.ValidAudiences: ‘[PII is hidden]’.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Helpful, isn't it?&lt;/p&gt;

&lt;p&gt;I kept looking at my audience in the &lt;strong&gt;JWT&lt;/strong&gt; token from the &lt;strong&gt;APIM&lt;/strong&gt; trace and at the allowed audiences in &lt;strong&gt;Easy Auth&lt;/strong&gt; and they were exactly the same!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;APIM trace&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--D56Squ7G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yq3gflwwwu1mbqj4g6xe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--D56Squ7G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yq3gflwwwu1mbqj4g6xe.png" alt="APIM trace" width="880" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JWT.MS inspection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--c-Cj-Oul--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/97rz8f7b8x3xfy9ozdyz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--c-Cj-Oul--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/97rz8f7b8x3xfy9ozdyz.png" alt="Token" width="880" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Easy Auth config&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MBZ8WSNP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s9wykebbkp6f5bpgue61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MBZ8WSNP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s9wykebbkp6f5bpgue61.png" alt="Easy Auth" width="880" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OK, the data is obfuscated, but you can trust me on that one, the rest of the &lt;code&gt;27c65941-&lt;/code&gt; GUID, matched everywhere.&lt;/p&gt;

&lt;p&gt;The PII hidden did not help as I was not able to see the difference in the error message. If you try and find how to allow PII to be shown in debug environments, that can be done by setting a property in code, but the &lt;strong&gt;Easy Auth&lt;/strong&gt; code is not something we can change.&lt;/p&gt;

&lt;p&gt;Then I stumbled across &lt;a href="http://www.mattruma.com/adventures-with-azure-doh-built-in-app-service-identity-provider-not-working/"&gt;this&lt;/a&gt; article from a fellow CSA and I figured it out!&lt;/p&gt;

&lt;p&gt;Despite the error message talking about audience, the problem was with the issuer! If you look at the &lt;strong&gt;Easy Auth&lt;/strong&gt; configuration, it is requesting &lt;code&gt;v2.0&lt;/code&gt;, whereas the auto-generated &lt;strong&gt;AAD application&lt;/strong&gt; issues &lt;code&gt;v1.0&lt;/code&gt; by default.&lt;/p&gt;

&lt;p&gt;So I changed that in the app manifest et voilà... authentication worked and I was able to get rid of the secrets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Infrastructure as code
&lt;/h3&gt;

&lt;p&gt;OK, that's sorted, but it was time to script the changes and make sure my &lt;strong&gt;Pulumi&lt;/strong&gt; deployment created the &lt;strong&gt;app registration&lt;/strong&gt; with the correct configuration.&lt;/p&gt;

&lt;p&gt;I started typing in &lt;strong&gt;Visual Studio&lt;/strong&gt; in &lt;strong&gt;C#&lt;/strong&gt;, under the &lt;strong&gt;Application&lt;/strong&gt; definition: &lt;code&gt;AccessTokenAccep&amp;lt;ctrl+space&amp;gt;&amp;lt;ctrl+space&amp;gt;&amp;lt;ctrl+space&amp;gt;&amp;lt;ctrl+space&amp;gt;&lt;/code&gt; but no joy from IntelliSense...&lt;/p&gt;

&lt;p&gt;This property wasn't in the &lt;strong&gt;Pulumi&lt;/strong&gt; class! Which is odd, considering &lt;strong&gt;Pulumi&lt;/strong&gt; auto generates code from the &lt;strong&gt;REST API&lt;/strong&gt; (I presume also for the &lt;strong&gt;Azure AD&lt;/strong&gt; module now).&lt;/p&gt;

&lt;p&gt;So I looked up in the &lt;strong&gt;Azure AD application REST API&lt;/strong&gt; docs and... nothing. Turns out (see &lt;a href="https://stackoverflow.com/questions/57826919/azure-ad-how-to-set-app-manifest-properties-programatically-accesstokenaccept"&gt;here&lt;/a&gt;) it was only available on the beta API.&lt;/p&gt;

&lt;p&gt;So I had to include the &lt;strong&gt;REST API&lt;/strong&gt; call in &lt;strong&gt;Pulumi&lt;/strong&gt; after the creation of the &lt;strong&gt;app registration&lt;/strong&gt;, using a workaround to force the call that is my code but I will not discuss in this article (see code &lt;a href="https://github.com/UnoSD/TvShowRss/blob/a1bad97ba1d5defa205cfeb985472a32d8a18c34/Infrastructure/Program.cs#L221"&gt;here&lt;/a&gt;).&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Leveraging Logic Apps to prevent over-provisioning owner access to subscriptions</title>
      <dc:creator>Stefano d'Antonio</dc:creator>
      <pubDate>Thu, 11 Nov 2021 08:06:51 +0000</pubDate>
      <link>https://dev.to/unosd/leveraging-logic-apps-to-prevent-over-provisioning-owner-access-to-subscriptions-36aa</link>
      <guid>https://dev.to/unosd/leveraging-logic-apps-to-prevent-over-provisioning-owner-access-to-subscriptions-36aa</guid>
      <description>&lt;p&gt;Often happens that agility and freedom conflict with security.&lt;/p&gt;

&lt;h4&gt;
  
  
  (Aaron Paul voice) Has this ever happened to you?
&lt;/h4&gt;

&lt;p&gt;Have you ever had developer teams request ownership of a full &lt;strong&gt;subscription&lt;/strong&gt; to be able to freely experiment?&lt;/p&gt;

&lt;p&gt;You still want to keep isolation, segregate responsibilities and permissions.&lt;/p&gt;

&lt;p&gt;Ability to experiment freely is paramount to innovation, but uncontrolled proliferation of &lt;strong&gt;subscriptions&lt;/strong&gt; can bear a significant management overhead.&lt;/p&gt;

&lt;p&gt;Can we have the best of both worlds? The short answer is: yes.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;resource group&lt;/strong&gt; can be an effective boundary as it allows its &lt;a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#contributor"&gt;contributors&lt;/a&gt; to yet create any resource, but also restrict the scope of access within a &lt;strong&gt;subscription&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can also enforce &lt;strong&gt;tags&lt;/strong&gt; and &lt;strong&gt;Azure policies&lt;/strong&gt; to control costs and enforce security.&lt;/p&gt;

&lt;p&gt;But now a team is restricted to creating resources within a single &lt;strong&gt;resource group&lt;/strong&gt; and it can get messy quite quickly and permission-wise is not so granular within teams.&lt;/p&gt;

&lt;p&gt;What if we could allow teams to create their own &lt;strong&gt;resource groups&lt;/strong&gt; within a subscription with &lt;strong&gt;contributor&lt;/strong&gt; access and not being able to read/write other &lt;strong&gt;resource groups&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;We can quickly set up a &lt;strong&gt;Logic App&lt;/strong&gt; to enable this. Orchestrating creation and role assignment of &lt;strong&gt;resource groups&lt;/strong&gt; within a single workflow, this enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Invoking the &lt;strong&gt;Logic App&lt;/strong&gt; manually through a &lt;strong&gt;REST API&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Invoking from a &lt;strong&gt;DevOps&lt;/strong&gt; pipeline to create resources as part of a dev/test automated environment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;Logic App&lt;/strong&gt; can create a &lt;strong&gt;resource group&lt;/strong&gt; and assign a certain &lt;strong&gt;contributor&lt;/strong&gt; based on the input payload from the &lt;strong&gt;HTTP trigger&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Warning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Setting up this workflow could lead to a security loophole: What if someone uses the name of an existing &lt;strong&gt;resource group&lt;/strong&gt; so the workflow grants access to other teams' resources? We need to make sure we address this concern when building our &lt;strong&gt;Logic App&lt;/strong&gt; as, usually, &lt;strong&gt;Azure management&lt;/strong&gt; operations are idempotent and the &lt;strong&gt;Logic App&lt;/strong&gt; won't fail if we pass the name of an existing &lt;strong&gt;resource group&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Logic App
&lt;/h3&gt;

&lt;p&gt;Let's have a look at the flow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dSL5p2uf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uwdlqd640foct4sczm9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dSL5p2uf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uwdlqd640foct4sczm9t.png" alt="Image description" width="880" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The logic is pretty simple and most of the operations we require have a native &lt;strong&gt;connector&lt;/strong&gt;; the only missing one is "Create role assignment", but we can easily perform the operation by invoking the &lt;strong&gt;Azure REST API&lt;/strong&gt; and we do not have to worry about authentication as the &lt;strong&gt;managed identity&lt;/strong&gt; will do this for us.&lt;/p&gt;

&lt;p&gt;Now, you may have noticed that there is no condition stating: "If the group already exists, interrupt the flow"; but if you look at the picture above, you may notice a red dotted line between two operations, this is because we changed the &lt;code&gt;Run after&lt;/code&gt; settings of our create operation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vu6ccGxX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vfdhcsqm4yriknx97a8r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vu6ccGxX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vfdhcsqm4yriknx97a8r.png" alt="Image description" width="880" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XEuMOT0Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/st0paar2hit118lzziex.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XEuMOT0Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/st0paar2hit118lzziex.png" alt="Image description" width="880" height="935"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the operation of creation (and so the role assignment after) will only occur if the &lt;code&gt;Read resource group&lt;/code&gt; failed, hence the group does not exist; this will block the loophole described above.&lt;/p&gt;

&lt;p&gt;Now, we can prevent people from having broad access, but our &lt;strong&gt;Logic App's managed identity&lt;/strong&gt; still requires &lt;strong&gt;owner&lt;/strong&gt; permissions on the &lt;strong&gt;subscription&lt;/strong&gt;; more secure, but we can do even better. Let's create a &lt;strong&gt;custom role&lt;/strong&gt; that has only enough permissions to read/create a &lt;strong&gt;resource group&lt;/strong&gt; and assign permissions to it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BUcgV2ms--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1fyeboby4irz9kfksngc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BUcgV2ms--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1fyeboby4irz9kfksngc.png" alt="Image description" width="880" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can now use the UI to create a &lt;strong&gt;custom role&lt;/strong&gt;, but we may also want to script it and define it in JSON format:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--177X8rOP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8uawl707bx43x38m6728.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--177X8rOP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8uawl707bx43x38m6728.png" alt="Image description" width="880" height="549"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's now assign that to the &lt;strong&gt;managed identity&lt;/strong&gt; of our &lt;strong&gt;Logic App&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6_UJ3mvu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qa5aslbmf8h4fedpdtlt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6_UJ3mvu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qa5aslbmf8h4fedpdtlt.png" alt="Image description" width="880" height="346"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we should have all permissions in place. If you also want to use a &lt;strong&gt;security group&lt;/strong&gt;, your &lt;strong&gt;Logic App&lt;/strong&gt; identity may also require &lt;code&gt;Directory.Read.All&lt;/code&gt; permissions on your &lt;strong&gt;Azure AD&lt;/strong&gt; instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating role assignment
&lt;/h3&gt;

&lt;p&gt;I mentioned above that all the other actions can be performed with native &lt;strong&gt;Logic App connectors&lt;/strong&gt;, but the &lt;strong&gt;role assignment&lt;/strong&gt;, at the time of writing, requires the &lt;strong&gt;HTTP&lt;/strong&gt; connector to invoke the &lt;strong&gt;Azure REST API&lt;/strong&gt;, let's have a look at that:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ntsw49Ff--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ntr7n35qcj9uv9oe65ki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ntsw49Ff--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ntr7n35qcj9uv9oe65ki.png" alt="Image description" width="880" height="905"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even if we cannot do it in idiomatic &lt;strong&gt;Logic App&lt;/strong&gt;, that is yet pretty simple.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;API&lt;/strong&gt; documentation: &lt;a href="https://docs.microsoft.com/en-us/rest/api/authorization/role-assignments/create"&gt;https://docs.microsoft.com/en-us/rest/api/authorization/role-assignments/create&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We just need to set the right method (&lt;code&gt;PUT&lt;/code&gt;), the correct &lt;strong&gt;URL&lt;/strong&gt;, using the &lt;strong&gt;resource group ID&lt;/strong&gt; as scope from the output of the previous &lt;strong&gt;connector&lt;/strong&gt; and we can auto-generate a random &lt;strong&gt;GUID&lt;/strong&gt; as name for the assignment using &lt;strong&gt;Logic App&lt;/strong&gt; expressions (&lt;code&gt;guid()&lt;/code&gt;). The body must contain the &lt;code&gt;role definition ID&lt;/code&gt;, which needs to be the built-in &lt;code&gt;contributor&lt;/code&gt; &lt;strong&gt;GUID&lt;/strong&gt; under our &lt;strong&gt;subscription ID&lt;/strong&gt;, I have used a variable for that to improve clarity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cj2eu-w1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z827fnwr7rq3c2cel1xh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cj2eu-w1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z827fnwr7rq3c2cel1xh.png" alt="Image description" width="880" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;subscription ID&lt;/strong&gt; is one of the input of our workflow and the hardcoded &lt;strong&gt;GUID&lt;/strong&gt; can be found here: &lt;a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#contributor"&gt;Contributor&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To get values as input from the workflow invocation, we need to set the input &lt;strong&gt;JSON schema&lt;/strong&gt; in the &lt;strong&gt;HTTP trigger&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OOjWsN59--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y561ryxxpqdaxcepd2b8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OOjWsN59--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y561ryxxpqdaxcepd2b8.png" alt="Image description" width="880" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;principalId&lt;/code&gt; The object ID of the assignee of the &lt;strong&gt;contributor&lt;/strong&gt; role, this can be found looking up the user in &lt;strong&gt;Azure AD&lt;/strong&gt; from the portal&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;resourceGroupLocation&lt;/code&gt;, &lt;code&gt;resourceGroupName&lt;/code&gt;, &lt;code&gt;subscriptionId&lt;/code&gt; Quite self-explanatory arguments
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"principalId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"resourceGroupLocation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"resourceGroupName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"subscriptionId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After adding the schema above to the trigger, those values will be available as variables in the rest of the workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing
&lt;/h3&gt;

&lt;p&gt;All there is left now is to test, let's run our workflow with the following input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"subscriptionId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;Target subscription ID to create groups&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"resourceGroupName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rg-test2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"resourceGroupLocation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"West Europe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"principalId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;Object ID GUID of your user from AAD&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And this is what happens, assuming &lt;code&gt;rg-test2&lt;/code&gt; does not exist:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5HI7lALA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1qr20p3alwm0dfu1o94c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5HI7lALA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1qr20p3alwm0dfu1o94c.png" alt="Image description" width="880" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looks good, all the steps we wanted to run were successful.&lt;/p&gt;

&lt;p&gt;Now, let's try and run this again with the same inputs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bd7gIVRS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tcqqsn78yxylzomr6y16.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bd7gIVRS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tcqqsn78yxylzomr6y16.png" alt="Image description" width="880" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OK, as you can see from the grey circles next to the actions below &lt;code&gt;Read a resource group&lt;/code&gt;, none of the other operations were performed, exactly as expected.&lt;/p&gt;

&lt;p&gt;Now let's have a look at our newly created &lt;strong&gt;resource group IAM blade&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3neqNb13--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4blcaco3h5f91gdrqitf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3neqNb13--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4blcaco3h5f91gdrqitf.png" alt="Image description" width="880" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Exactly what we wanted; a new &lt;strong&gt;resource group&lt;/strong&gt; of which I am &lt;strong&gt;contributor&lt;/strong&gt; without requiring any permission on the &lt;strong&gt;subscription&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The full template for the &lt;strong&gt;Logic App&lt;/strong&gt; is available on my &lt;a href="https://github.com/UnoSD/ResourceGroupsLogicApp"&gt;GitHub&lt;/a&gt; so you can save the extra 5 minutes it took me to create it to enable this security feature for your teams.&lt;/p&gt;

&lt;p&gt;You can also enhance security of the &lt;strong&gt;Logic App&lt;/strong&gt; to prevent unauthorised users from calling it by fronting it with &lt;strong&gt;API Management&lt;/strong&gt; or you can use &lt;strong&gt;Azure Active Directory Authorization Policies&lt;/strong&gt; on the &lt;strong&gt;Logic App&lt;/strong&gt; itself or a combination of the two.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>logicapps</category>
      <category>security</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>No-rewrite free TLS offloading, WAF and more for legacy web applications with Azure Front Door</title>
      <dc:creator>Stefano d'Antonio</dc:creator>
      <pubDate>Fri, 05 Nov 2021 15:27:29 +0000</pubDate>
      <link>https://dev.to/unosd/no-rewrite-free-tls-offloading-waf-and-more-for-legacy-web-applications-with-azure-front-door-21lc</link>
      <guid>https://dev.to/unosd/no-rewrite-free-tls-offloading-waf-and-more-for-legacy-web-applications-with-azure-front-door-21lc</guid>
      <description>&lt;p&gt;You have a one or more legacy web applications, running on virtual machines and no time or resources to rewrite and implement security; you can get all those features and more just with infrastructure leveraging &lt;strong&gt;Azure Front Door&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There are two flavours we are going to explore:&lt;/p&gt;

&lt;p&gt;1) &lt;strong&gt;Front Door&lt;/strong&gt; forwarding requests to your VM on its public IP&lt;br&gt;
2) &lt;strong&gt;Front Door Premium&lt;/strong&gt; (preview at the time of writing) with private origins forwarding requests directly into your &lt;strong&gt;VNET&lt;/strong&gt; without public internet exposure of the machine&lt;/p&gt;
&lt;h3&gt;
  
  
  Azure Front Door forwarding requests to your VM's public IP
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fipgst3wuvz0t3hdv0itj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fipgst3wuvz0t3hdv0itj.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this scenario you can set up the &lt;strong&gt;backend&lt;/strong&gt; of &lt;strong&gt;Azure Front Door&lt;/strong&gt; directly to the &lt;strong&gt;public IP&lt;/strong&gt; attached to the &lt;strong&gt;VM&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This has the following advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works with &lt;strong&gt;Azure Front Door&lt;/strong&gt; classic (GA)&lt;/li&gt;
&lt;li&gt;Simpler infrastructure and configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach has the downside of being &lt;strong&gt;less&lt;/strong&gt; secure; using an &lt;strong&gt;NSG&lt;/strong&gt; against the &lt;strong&gt;VM&lt;/strong&gt; (&lt;strong&gt;subnet&lt;/strong&gt;/&lt;strong&gt;NIC&lt;/strong&gt; level) you can still ensure that the traffic originating from the internet will never hit your &lt;strong&gt;VM&lt;/strong&gt; by allowing only traffic from the &lt;strong&gt;Azure Front Door service tag&lt;/strong&gt;, this way, any other source will be blocked, with two pitfalls:&lt;/p&gt;

&lt;p&gt;1) &lt;strong&gt;DDoS&lt;/strong&gt; attacks could still block access to the &lt;strong&gt;VM&lt;/strong&gt; (although they can be mitigated by adding standard &lt;strong&gt;DDoS&lt;/strong&gt; protection for &lt;strong&gt;VNETs&lt;/strong&gt;)&lt;br&gt;
2) A sophisticated and knowledgeable attacker (perhaps an internal agent) who manages to find the &lt;strong&gt;public IP&lt;/strong&gt; of the &lt;strong&gt;VM&lt;/strong&gt;, could spawn up their own &lt;strong&gt;Front Door&lt;/strong&gt; instance and point it to your &lt;strong&gt;public IP&lt;/strong&gt;, bypassing the existing &lt;strong&gt;Front Door&lt;/strong&gt; security.&lt;/p&gt;

&lt;p&gt;Those threats can be still mitigated, but we will not explore that in this article; just to share them at high level:&lt;/p&gt;

&lt;p&gt;You can filter incoming traffic by checking the &lt;code&gt;X-Azure-FDID&lt;/code&gt; header that &lt;strong&gt;Front Door&lt;/strong&gt; adds to the forwarded requests with its unique ID; using that and the &lt;strong&gt;service tag&lt;/strong&gt; will ensure that the traffic is coming only from your &lt;strong&gt;Front Door&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you still do not want to make any application changes, you can add an instance of &lt;strong&gt;Application Gateway&lt;/strong&gt; and let it do this filtering for your before forwarding the traffic to your &lt;strong&gt;VM&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you go through a threat modelling exercise and decide this risk stil needs mitigation, you can go for the second option below.&lt;/p&gt;
&lt;h3&gt;
  
  
  Azure Front Door Premium forwarding requests to your VM's internal IP via Private Link
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkwd8a9pgdlo5gr5h850.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkwd8a9pgdlo5gr5h850.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach improves significantly your security posture, but bears increased costs (&lt;strong&gt;Front Door Premium&lt;/strong&gt; is required for private origins support) and a slightly more complex architecture and configuration (requires extra components such as a &lt;strong&gt;Standard Load Balancer&lt;/strong&gt;, which also adds to the costs, and &lt;strong&gt;Private Link&lt;/strong&gt;).&lt;/p&gt;

&lt;p&gt;Using this approach, you can get completely rid of the &lt;strong&gt;public IP&lt;/strong&gt; and project your &lt;strong&gt;Front Door&lt;/strong&gt; instance directly into your &lt;strong&gt;VNET&lt;/strong&gt;, eliminating the attack vectors of &lt;strong&gt;DDoS&lt;/strong&gt; and &lt;strong&gt;Front Door hijacking&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Without the IP, an attacker has no endpoint to use beside &lt;strong&gt;Front Door&lt;/strong&gt; which carries enhanced security and can optionally be set up with a &lt;strong&gt;WAF&lt;/strong&gt; (Web Application Firewall), mitigating the most common attacks with several security rules.&lt;/p&gt;

&lt;p&gt;Advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No public endpoint for the &lt;strong&gt;VM&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No &lt;strong&gt;DDoS&lt;/strong&gt; risk&lt;/li&gt;
&lt;li&gt;No &lt;strong&gt;Front Door hijacking&lt;/strong&gt; attack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both the options enable your application to leverage advanced security without application changes. You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free &lt;strong&gt;TLS&lt;/strong&gt; certificates with &lt;strong&gt;Azure Front Door&lt;/strong&gt; also on your &lt;strong&gt;custom domains&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WAF&lt;/strong&gt; capabilities (protect against &lt;strong&gt;OWASP&lt;/strong&gt; common attacks and more)&lt;/li&gt;
&lt;li&gt;Dynamic site acceleration&lt;/li&gt;
&lt;li&gt;Enables &lt;strong&gt;HTTP/2&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Global &lt;strong&gt;HA&lt;/strong&gt; with load balancing if you have multiple instances of your application&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Multiple instances with path-based routing
&lt;/h3&gt;

&lt;p&gt;Now, let's pretend you love this, but have multiple apps to secure behind &lt;strong&gt;Azure Front Door&lt;/strong&gt; and want use a single domain in front of those applications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mydomain.com/app1
mydomain.com/app2
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can easily set up routing rules to redirect all the traffic to your &lt;strong&gt;backends&lt;/strong&gt; depending on the &lt;strong&gt;URL&lt;/strong&gt; segment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozeutd37bx4uuq2stxwz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozeutd37bx4uuq2stxwz.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You just set up multiple &lt;strong&gt;backend pools&lt;/strong&gt; in &lt;strong&gt;Front Door&lt;/strong&gt; and create &lt;strong&gt;routing rules&lt;/strong&gt; to forward the traffic that has &lt;code&gt;/webapp1/&lt;/code&gt; to the backend 1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5xpw221hjs3xm27d7z1d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5xpw221hjs3xm27d7z1d.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Assuming again that you do not want to make any changes to your web application, you can also use the rules to rewrite the &lt;strong&gt;URL&lt;/strong&gt; to avoid forwarding the &lt;code&gt;/webapp1/&lt;/code&gt; and just keep the rest of the &lt;strong&gt;URL&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frz98ffk3na5476jrmana.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frz98ffk3na5476jrmana.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So your application does not need to be aware of &lt;strong&gt;Front Door&lt;/strong&gt; at all.&lt;/p&gt;

&lt;p&gt;What if your application, not being aware, redirects then the user to a &lt;code&gt;/*&lt;/code&gt; endpoint?&lt;/p&gt;

&lt;p&gt;Imagine that your application calls a &lt;code&gt;/api/dosomething&lt;/code&gt; or &lt;code&gt;/login&lt;/code&gt;, that will not preserve the &lt;code&gt;/webapp1&lt;/code&gt;, hence &lt;strong&gt;Front Door&lt;/strong&gt; will not know where to direct the traffic.&lt;/p&gt;

&lt;p&gt;Once again, we can have a pure infrastructure solution, without application changes.&lt;/p&gt;

&lt;p&gt;The idea is that: on a first call to &lt;code&gt;/webapp1&lt;/code&gt;, &lt;strong&gt;Azure Front Door&lt;/strong&gt; can alter the response and add a new header value; this header can easily be a &lt;code&gt;Set-Cookie&lt;/code&gt; header that sets a value unique to the &lt;code&gt;web app 1&lt;/code&gt; backend; on subsequent calls, the browser will include that &lt;strong&gt;cookie&lt;/strong&gt; and the &lt;strong&gt;Front Door rules engine&lt;/strong&gt; can override the routing configuration based on the content of the returned &lt;code&gt;Cookie&lt;/code&gt; header and make the correct decision to forward &lt;code&gt;/*&lt;/code&gt; traffic to &lt;code&gt;/webappX/*&lt;/code&gt;. See configuration below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyddbnhan0jpajpy9pz2l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyddbnhan0jpajpy9pz2l.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This sets the &lt;strong&gt;cookie&lt;/strong&gt; to the correct &lt;strong&gt;backend&lt;/strong&gt; on the initial call, the full header value contains: &lt;/p&gt;

&lt;p&gt;&lt;code&gt;BACKEND=backendIdentified; Domain=mydomain.com; Path=/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It requires to specify the path, if omitted, it will assume that the &lt;strong&gt;cookie&lt;/strong&gt; will be specific to &lt;code&gt;mydomain.com/webapp1/*&lt;/code&gt; and will not work also on &lt;code&gt;mydomain.com/*&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now the rule to override the behaviour on calls to &lt;code&gt;mydomain.com/*&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cah9x5eq779ssxh22w7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cah9x5eq779ssxh22w7.png" alt="Image description"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Now you need to add the &lt;strong&gt;rules engine&lt;/strong&gt; rules to your routing configuration and you are done.&lt;/p&gt;

&lt;p&gt;You can optionally set up an error page &lt;strong&gt;backend&lt;/strong&gt; if the &lt;strong&gt;cookie&lt;/strong&gt; is not present and someone requests directly &lt;code&gt;mydomain.com&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Et voilà, with few simple infrastructure changes, you have seamlessly added free &lt;strong&gt;TLS&lt;/strong&gt; and &lt;strong&gt;WAF&lt;/strong&gt; and more to your legacy web applications without changing a single line of code.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>frontdoor</category>
      <category>tls</category>
      <category>waf</category>
    </item>
    <item>
      <title>Virtual machine scale sets flexible orchestration mode (and benefits over regular VMs)</title>
      <dc:creator>Stefano d'Antonio</dc:creator>
      <pubDate>Wed, 11 Aug 2021 09:25:56 +0000</pubDate>
      <link>https://dev.to/unosd/virtual-machine-scale-sets-flexible-orchestration-mode-and-benefits-over-regular-vms-266b</link>
      <guid>https://dev.to/unosd/virtual-machine-scale-sets-flexible-orchestration-mode-and-benefits-over-regular-vms-266b</guid>
      <description>&lt;h1&gt;
  
  
  Introduction and VMSS benefits
&lt;/h1&gt;

&lt;p&gt;When we close our eyes and we try and picture "the cloud", two quintessential &lt;strong&gt;IaaS&lt;/strong&gt; services come to mind: &lt;strong&gt;Virtual machines&lt;/strong&gt; and &lt;strong&gt;Virtual machine scale sets&lt;/strong&gt; (in &lt;strong&gt;Azure&lt;/strong&gt;).&lt;/p&gt;

&lt;p&gt;Historically, &lt;strong&gt;VMs&lt;/strong&gt; and &lt;strong&gt;VMSSs&lt;/strong&gt; are the first iteration of our cloud migration journey; a &lt;strong&gt;VM&lt;/strong&gt; facilitates the &lt;strong&gt;lift-and-shift&lt;/strong&gt; pattern and a &lt;strong&gt;VMSS&lt;/strong&gt; our first step towards leveraging cloud scaling for our workload.&lt;/p&gt;

&lt;p&gt;A traditional &lt;strong&gt;VMSS&lt;/strong&gt; sacrifices control in favour of simpler deployment, simpler management, faster recovery and faster horizontal scaling.&lt;/p&gt;

&lt;p&gt;You define an image and the service will stamp instances of that blueprint on demand, it can auto-scale based on the usage, on failure, on updates; all on your behalf.&lt;/p&gt;

&lt;h1&gt;
  
  
  Downsides of VMSS
&lt;/h1&gt;

&lt;p&gt;Those glorious benefits have some drawbacks:&lt;/p&gt;

&lt;p&gt;1) &lt;strong&gt;VMSS API&lt;/strong&gt; diverges from the standard &lt;strong&gt;VM API&lt;/strong&gt; for individual instances&lt;br&gt;
2) Lack of &lt;strong&gt;RBAC&lt;/strong&gt; granular permissions per &lt;strong&gt;VM&lt;/strong&gt;&lt;br&gt;
3) Lack of &lt;strong&gt;Azure Site Recovery&lt;/strong&gt; and &lt;strong&gt;Azure Backup&lt;/strong&gt; support&lt;/p&gt;

&lt;p&gt;In addition, to handle &lt;strong&gt;VMs&lt;/strong&gt; availability, there are two different patterns making the experience inconsistent between &lt;strong&gt;VMSSs&lt;/strong&gt; and &lt;strong&gt;VMs&lt;/strong&gt;; for datacenter redundancy for &lt;strong&gt;VMs&lt;/strong&gt;, availability sets must be used, &lt;strong&gt;VMSSs&lt;/strong&gt; have native support for distribution across fault and update domains.&lt;/p&gt;

&lt;h1&gt;
  
  
  Enter VMSS flexible
&lt;/h1&gt;

&lt;p&gt;A new option for &lt;strong&gt;VMSS&lt;/strong&gt; has been created, the &lt;strong&gt;"orchestration mode"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The classic experience, has been re-branded as &lt;strong&gt;uniform orchestration&lt;/strong&gt;, but is no different from the traditional &lt;strong&gt;VMSS&lt;/strong&gt; experience.&lt;/p&gt;

&lt;p&gt;The new option, still in preview at the time of writing, is called &lt;strong&gt;flexible orchestration&lt;/strong&gt; and it promises to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unify the experience, no more &lt;strong&gt;availability sets&lt;/strong&gt;, the availability experience will be handled in the same way between individual &lt;strong&gt;VMs&lt;/strong&gt; and &lt;strong&gt;VMSSs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Individual instances have full control with the same &lt;strong&gt;VM&lt;/strong&gt; API as regular &lt;strong&gt;VMs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VMs&lt;/strong&gt; can be added to &lt;strong&gt;VMSS flex&lt;/strong&gt; after creation&lt;/li&gt;
&lt;li&gt;Custom instance naming&lt;/li&gt;
&lt;li&gt;Target individual &lt;strong&gt;VMs&lt;/strong&gt; with extensions&lt;/li&gt;
&lt;li&gt;Assign machines to specific fault domains&lt;/li&gt;
&lt;li&gt;In guest OS security patching (without re-imaging)&lt;/li&gt;
&lt;li&gt;Mix Windows and Linux in the same set&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Why should I use that instead of regular VMs?
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;VMSS flex&lt;/strong&gt; shines when managing a substantial number of &lt;strong&gt;VMs&lt;/strong&gt; (30+).&lt;/p&gt;

&lt;p&gt;It provides a single control plane for distribution of the machines across a datacenter with automatic and optimized spread.&lt;/p&gt;

&lt;p&gt;When dealing with many machines, it can be a significant overhead to manage availability.&lt;/p&gt;

&lt;p&gt;It does also support the template-based scaling  like a &lt;strong&gt;uniform&lt;/strong&gt; scale set should you need that, but, I believe, its main purpose is to unify the experience between &lt;strong&gt;VMs&lt;/strong&gt; and &lt;strong&gt;VMSSs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It supports up to 1000 instances spread across &lt;strong&gt;fault domains&lt;/strong&gt;, whereas &lt;strong&gt;uniform&lt;/strong&gt; supports only up to 100. &lt;strong&gt;Fault domains&lt;/strong&gt; are also treated the same as &lt;strong&gt;update domains&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There are obvious advantages (and some cons) over a &lt;strong&gt;uniform&lt;/strong&gt; mode, including having different machine sizes and other features listed above in the article; in addition, there is a comprehensive table in the official &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes#a-comparison-of-flexible-uniform-and-availability-sets"&gt;Microsoft documentation&lt;/a&gt; that would be pointless to copy and paste here.&lt;/p&gt;

&lt;h1&gt;
  
  
  Current limitations
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;VMSS flex&lt;/strong&gt; does not currently support &lt;strong&gt;availability zones&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Does not support single placement groups.&lt;/p&gt;

&lt;h1&gt;
  
  
  How to get started
&lt;/h1&gt;

&lt;p&gt;To try &lt;strong&gt;VMSS flex&lt;/strong&gt; you need to register the feature in your subscription, see the full guide here: &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes#register-for-flexible-orchestration-mode"&gt;https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes#register-for-flexible-orchestration-mode&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quickstart &lt;strong&gt;ARM template&lt;/strong&gt; to deploy &lt;strong&gt;VMSS flex&lt;/strong&gt;: &lt;a href="https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.compute/vm-vmss-flexible-orchestration-mode"&gt;https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.compute/vm-vmss-flexible-orchestration-mode&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>vmss</category>
      <category>compute</category>
      <category>iaas</category>
    </item>
    <item>
      <title>Infrastructure as code in 2021</title>
      <dc:creator>Stefano d'Antonio</dc:creator>
      <pubDate>Fri, 16 Apr 2021 19:12:55 +0000</pubDate>
      <link>https://dev.to/unosd/infrastructure-as-code-in-2021-3l40</link>
      <guid>https://dev.to/unosd/infrastructure-as-code-in-2021-3l40</guid>
      <description>&lt;p&gt;There are numerous advantages to using IaC, here are few examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt; Scripted deployments reduce surface for human error and are faster to execute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility&lt;/strong&gt; Being able to recreate the same environment every time or multiple identical environment based on the same templates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; Permission to production environments can be granted only to deployment service accounts, reducing risks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Costs&lt;/strong&gt; Being able to destroy and recreate environments quickly can enable fast de-provisioning of expensive resources when they are not used&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  What are the options?
&lt;/h1&gt;

&lt;p&gt;This article is &lt;strong&gt;Azure-centric&lt;/strong&gt;, but most of what will be discussed will also apply to &lt;strong&gt;AWS&lt;/strong&gt;, &lt;strong&gt;GCP&lt;/strong&gt; and other clouds and targets. Cloud-specific systems such as &lt;strong&gt;ARM templates/Bicep&lt;/strong&gt; may be comparable with &lt;strong&gt;AWS CloudFormation&lt;/strong&gt;, &lt;strong&gt;GCP Cloud Deployment Manager&lt;/strong&gt; and to &lt;strong&gt;K8s YAML&lt;/strong&gt; for on-premises targets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The code examples below all deploy exactly the same resources&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full working examples on &lt;strong&gt;GitHub&lt;/strong&gt; &lt;a href="https://github.com/UnoSD/IaC" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ARM templates
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Azure-native&lt;/strong&gt; way of scripting resources; the language used is &lt;strong&gt;JSON&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I have worked extensively with &lt;strong&gt;ARM templates&lt;/strong&gt;, but I still get an headache when I open one for the first time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JSON&lt;/strong&gt; &lt;em&gt;is&lt;/em&gt; human readable, but not intuitive.&lt;/p&gt;

&lt;p&gt;It requires a great deal of boilerplate and, &lt;strong&gt;JSON&lt;/strong&gt; being &lt;strong&gt;JSON&lt;/strong&gt;, naturally requires an abundance of curly braces, double quotes and other symbols that represent a substantial distraction from the actual meaningful content and hence easily lead to &lt;a href="https://www.teachingenglish.org.uk/article/cognitive-overload" rel="noopener noreferrer"&gt;cognitive overload&lt;/a&gt;. Many people I spoke to dislike it for this precise reason.&lt;/p&gt;

&lt;p&gt;The nesting doesn't help and it handles modules poorly.&lt;/p&gt;

&lt;p&gt;Enough being negative, what's great about it? It's &lt;strong&gt;Azure&lt;/strong&gt;'s mother-tongue. It supports all &lt;strong&gt;Azure&lt;/strong&gt; resources as soon as they are available.&lt;/p&gt;

&lt;p&gt;In addition, if you use the &lt;strong&gt;Azure Resource Explorer&lt;/strong&gt;, you will find exactly what has been deployed in your subscription and that will be in &lt;strong&gt;JSON&lt;/strong&gt; and compatible with your &lt;strong&gt;ARM templates&lt;/strong&gt;;&lt;/p&gt;

&lt;p&gt;If you create something in the portal, you can easily export its current configuration as an &lt;strong&gt;ARM template&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is a template that creates a &lt;strong&gt;resource group&lt;/strong&gt; and a &lt;strong&gt;storage account&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"contentVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.0.0.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rgName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"defaultValue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rg-arm"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rgLocation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"defaultValue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"West Europe"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Microsoft.Resources/resourceGroups"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2018-05-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[parameters('rgLocation')]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[parameters('rgName')]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Microsoft.Resources/deployments"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2017-05-10"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"storageDeployment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"resourceGroup"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[parameters('rgName')]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"dependsOn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"[resourceId('Microsoft.Resources/resourceGroups/', parameters('rgName'))]"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Incremental"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"template"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"contentVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.0.0.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"variables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"resources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Microsoft.Storage/storageAccounts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"apiVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2017-10-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[concat('sa', uniquestring(subscription().id))]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"West Europe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"kind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"StorageV2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"sku"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Standard_LRS"&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"supportsHttpsTrafficOnly"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"outputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Native deployment history tracking&lt;/li&gt;
&lt;li&gt;Always up to date with new resources&lt;/li&gt;
&lt;li&gt;Supported by &lt;strong&gt;Microsoft&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Unfriendly language&lt;/li&gt;
&lt;li&gt;Hard to manage modules&lt;/li&gt;
&lt;li&gt;Complexity increases exponentially for large environments&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Bicep
&lt;/h2&gt;

&lt;p&gt;OK, now take &lt;strong&gt;ARM templates&lt;/strong&gt;, remove all the downsides and here you have &lt;strong&gt;Bicep&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Bicep template&lt;/strong&gt; is pretty much the same as an &lt;strong&gt;ARM template&lt;/strong&gt;, in fact it transpiles to &lt;strong&gt;ARM JSON&lt;/strong&gt; to use the same underlying deployment system, but &lt;strong&gt;Bicep&lt;/strong&gt; addresses the two main cons of &lt;strong&gt;ARM&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;The first massive advantage over &lt;strong&gt;ARM&lt;/strong&gt; is the language; it's a bespoke DSL, easy to write and understand.&lt;/p&gt;

&lt;p&gt;Modularisation is also easier than &lt;strong&gt;ARM&lt;/strong&gt; as it allows to reference other templates in the same container (also same directory locally).&lt;/p&gt;

&lt;p&gt;Being the same as &lt;strong&gt;ARM&lt;/strong&gt; means also that is &lt;strong&gt;Azure-only&lt;/strong&gt;, which, currently, is the main drawback.&lt;/p&gt;

&lt;p&gt;Same resources from the &lt;strong&gt;ARM template&lt;/strong&gt; above, but in &lt;strong&gt;Bicep&lt;/strong&gt; this time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;targetScope = 'subscription'

resource rg 'Microsoft.Resources/resourceGroups@2020-01-01' = {
  name: 'rg-bicep'
  location: 'West Europe'
  scope: subscription()
}

module stgModule './storageAccount.bicep' = {
  name: 'storageDeploy'
  scope: rg
  params: {
    location: rg.location
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;param location string

resource stg 'Microsoft.Storage/storageAccounts@2019-06-01' = {
  name: 'sa${uniqueString(resourceGroup().id)}'
  location: location
  kind: 'StorageV2'
  properties: {
    supportsHttpsTrafficOnly: true
  }  
  sku: {
    name: 'Standard_LRS'
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even without syntax highlighting from the blog engine, this immediately looks awesome, way more readable and succinct. In my opinion a great option if you are &lt;strong&gt;Azure-only&lt;/strong&gt; and want to avoid the burden of state management. You will miss out on clean-up of resources that &lt;strong&gt;Pulumi&lt;/strong&gt; and &lt;strong&gt;Terraform&lt;/strong&gt; achieve with an external state, but it is worth evaluating.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Same pros as &lt;strong&gt;ARM&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Expressive (concise) language&lt;/li&gt;
&lt;li&gt;Good modularisation support&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Azure-only&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;No resources clean-up&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Terraform
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Terraform&lt;/strong&gt; is the (first, I believe) exceptional attempt at a multi-cloud, human-readable, infrastructure as code tool; a successful attempt; it is quickly becoming the &lt;strong&gt;industry standard&lt;/strong&gt; for cloud deployments.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;HashiCorp&lt;/strong&gt; tool uses an external state storage, this registers all the resources deployed and enables destruction of the resources removed from the templates and clean up of an entire environment on demand.&lt;/p&gt;

&lt;p&gt;Uses a custom DSL called &lt;strong&gt;HCL&lt;/strong&gt;, quite friendly and understandable by anyone at a glance without prior training.&lt;/p&gt;

&lt;p&gt;One major downside is that is a limited language; it covers the basic conditionals, loops, variables (poorly, in my opinion).&lt;/p&gt;

&lt;p&gt;It's a beautiful solution for simple deployments, but it gets pretty frustrating when attempting to be more clever with the logic.&lt;/p&gt;

&lt;p&gt;Same resources as above in the example here, in &lt;strong&gt;Terraform&lt;/strong&gt; this time. I am using &lt;strong&gt;Azure&lt;/strong&gt; blob as a state storage, but it has several options including the local file system.&lt;/p&gt;

&lt;p&gt;One big downside is that often you find yourself in a situation where a new resource or a new feature of a resource comes up in &lt;strong&gt;Azure&lt;/strong&gt; (and I presume the same for other providers), but you have to wait for the &lt;strong&gt;Terraform&lt;/strong&gt; team to implement it to leverage it in idiomatic &lt;strong&gt;Terraform&lt;/strong&gt;; you can always deploy &lt;strong&gt;ARM templates&lt;/strong&gt; from within &lt;strong&gt;Terraform&lt;/strong&gt;, but it is ugly and you miss a richer diff experience.&lt;/p&gt;

&lt;p&gt;Modules support in &lt;strong&gt;Terraform&lt;/strong&gt; is also great and it allows you to reference modules directly from external &lt;strong&gt;Git&lt;/strong&gt; repositories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="s2"&gt;"azurerm"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;resource_group_name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"rg-iac-demo"&lt;/span&gt;
    &lt;span class="nx"&gt;storage_account_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"saiacdemo"&lt;/span&gt;
    &lt;span class="nx"&gt;container_name&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform"&lt;/span&gt;
    &lt;span class="nx"&gt;key&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"demo.tfstate"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"subscription_id"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;sensitive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"client_id"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;sensitive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"client_secret"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;sensitive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"tenant_id"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;sensitive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"azurerm"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;features&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

  &lt;span class="nx"&gt;subscription_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subscription_id&lt;/span&gt;
  &lt;span class="nx"&gt;client_id&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;client_id&lt;/span&gt;
  &lt;span class="nx"&gt;client_secret&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;client_secret&lt;/span&gt;
  &lt;span class="nx"&gt;tenant_id&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tenant_id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"random_id"&lt;/span&gt; &lt;span class="s2"&gt;"storage_account"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;byte_length&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"azurerm_resource_group"&lt;/span&gt; &lt;span class="s2"&gt;"example"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"rg-terraform"&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"West Europe"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"azurerm_storage_account"&lt;/span&gt; &lt;span class="s2"&gt;"example"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;                      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sa&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;random_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;storage_account&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nx"&gt;resource_group_name&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;azurerm_resource_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;azurerm_resource_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;
  &lt;span class="nx"&gt;account_tier&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Standard"&lt;/span&gt;
  &lt;span class="nx"&gt;account_replication_type&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"LRS"&lt;/span&gt;
  &lt;span class="nx"&gt;enable_https_traffic_only&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Unofficial &lt;strong&gt;industry standard&lt;/strong&gt; as of 2021 (endorsed by several organisations)&lt;/li&gt;
&lt;li&gt;Language and CLI are incredibly easy to understand and to use&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HCL&lt;/strong&gt; has its limits and simple logic can turn into complex templates (and can be frustrating to code)&lt;/li&gt;
&lt;li&gt;Storage of the state is in plain text, including secrets; responsibility for securing it lies with the user&lt;/li&gt;
&lt;li&gt;Tooling and code completion is not always great and misses "compile"-time checks&lt;/li&gt;
&lt;li&gt;Being an open-source tool, is supported by the community only unless you pay for &lt;strong&gt;Terraform Enterprise&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Resources support is delayed, sometimes quite heavily; bugs can stay unfixed for years&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Pulumi
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://en.wiktionary.org/wiki/dulcis_in_fundo" rel="noopener noreferrer"&gt;Dulcis in fundo...&lt;/a&gt; my favourite &lt;strong&gt;IaC&lt;/strong&gt; tool as of today.&lt;/p&gt;

&lt;p&gt;The people at &lt;strong&gt;Pulumi&lt;/strong&gt; had a great intuition:&lt;/p&gt;

&lt;p&gt;Using &lt;strong&gt;general purpose&lt;/strong&gt; programming languages to define infrastructure.&lt;/p&gt;

&lt;p&gt;It was such a good idea that a few months later, &lt;strong&gt;Terraform&lt;/strong&gt; published a preview of its &lt;strong&gt;CDK&lt;/strong&gt; to write &lt;strong&gt;Terraform&lt;/strong&gt; in &lt;strong&gt;TypeScript&lt;/strong&gt; (and I believe they now also support other languages).&lt;/p&gt;

&lt;p&gt;With all the pressure for a &lt;strong&gt;DevOps&lt;/strong&gt; culture, this fits really well as it enables developers to use a familiar language to also define infrastructure (it also allows interop between apps and infra code)&lt;/p&gt;

&lt;p&gt;It supports &lt;strong&gt;.NET&lt;/strong&gt; languages (&lt;strong&gt;C#/F#/VB.NET&lt;/strong&gt;/...), &lt;strong&gt;Go&lt;/strong&gt;, &lt;strong&gt;TypeScript&lt;/strong&gt;, &lt;strong&gt;Python&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The ability to use the &lt;strong&gt;Azure management SDKs&lt;/strong&gt; is not new, we could always do that in those languages, but &lt;strong&gt;Pulumi&lt;/strong&gt; manages all the resources and dependencies for us.&lt;/p&gt;

&lt;p&gt;You can specify the resources to create in a declarative way; you just build a list of stuff to create and &lt;strong&gt;Pulumi&lt;/strong&gt; works out dependencies, changes and everything else for you.&lt;/p&gt;

&lt;p&gt;It also features an &lt;strong&gt;encryption&lt;/strong&gt; capability for secrets in the state; you can use external providers to encrypt the content moving the responsibility of securing the state more towards the tool.&lt;/p&gt;

&lt;p&gt;The only downside may be that, for system administrators, picking up a programming language may have a steeper learning curve than learning &lt;strong&gt;HCL&lt;/strong&gt; or &lt;strong&gt;Bicep&lt;/strong&gt; and you are more likely to find ops talents on the job market that know &lt;strong&gt;Terraform&lt;/strong&gt; rather than &lt;strong&gt;C#&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The support story is similar to &lt;strong&gt;Terraform&lt;/strong&gt;. &lt;strong&gt;Pulumi&lt;/strong&gt; offers a paid plan for storage, but the actual tool is open-source and community-supported.&lt;/p&gt;

&lt;p&gt;One feature that &lt;strong&gt;Terraform&lt;/strong&gt; has, but it's missing in &lt;strong&gt;Pulumi&lt;/strong&gt; is the ability to plan the changes to an output file and then apply that file later; this eliminates the risk of race conditions if you plan and deploy independently and I find it really useful in CD pipelines (I've already created a GitHub issue to request the feature).&lt;/p&gt;

&lt;p&gt;There is no roadmap for it at the moment, but if they will support &lt;strong&gt;PowerShell&lt;/strong&gt; as a language in the future, that may remove the need for sysadmins to learn a new language and I believe it could ramp up its adoption.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Pulumi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Pulumi.AzureNative.Resources&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Pulumi.AzureNative.Storage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Pulumi.AzureNative.Storage.Inputs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyStack&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Stack&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;MyStack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;resourceGroup&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ResourceGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"rg-pulumi"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;StorageAccount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sa"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;StorageAccountArgs&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ResourceGroupName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resourceGroup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Sku&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;SkuArgs&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SkuName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Standard_LRS&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;Kind&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Kind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StorageV2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;EnableHttpsTrafficOnly&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;AzureNative&lt;/strong&gt; provider is always up-to-date with new &lt;strong&gt;Azure&lt;/strong&gt; resources and features&lt;/li&gt;
&lt;li&gt;If you are a developer, you do not need to learn a new language&lt;/li&gt;
&lt;li&gt;It gives you the full power of a real programming language, whatever you can do in &lt;strong&gt;C#&lt;/strong&gt; (or &lt;strong&gt;Python&lt;/strong&gt; etc...) you can do in a &lt;strong&gt;Pulumi&lt;/strong&gt; project&lt;/li&gt;
&lt;li&gt;Great CLI, quiet in the output by default&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Niche, less likely to find experts and documentation is poor&lt;/li&gt;
&lt;li&gt;More hostile to pick up for sysadmins than &lt;strong&gt;Terraform&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No output to the planning phase&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Pulumi.FSharp.Extensions
&lt;/h2&gt;

&lt;p&gt;I wanted to take &lt;strong&gt;Pulumi&lt;/strong&gt; a step further; I love the technology, but I still do not like the verbosity of &lt;strong&gt;C#&lt;/strong&gt; and, using &lt;strong&gt;Pulumi&lt;/strong&gt; in &lt;strong&gt;F#&lt;/strong&gt; was ugly, it is not made for this and you end up with code that looks like this to mimic property initialisation in &lt;strong&gt;C#&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight fsharp"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;infra&lt;/span&gt; &lt;span class="bp"&gt;()&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;resourceGroup&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ResourceGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"rg-pulumi"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nc"&gt;StorageAccount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"sa"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StorageAccountArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;ResourceGroupName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resourceGroup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;Sku&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SkuArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;SkuName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Standard_LRS&lt;/span&gt;
        &lt;span class="o"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;Kind&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Kind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StorageV2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;EnableHttpsTrafficOnly&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;true&lt;/span&gt;
    &lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So I decided to write an extension to use &lt;strong&gt;Pulumi&lt;/strong&gt;, but make it look even better and simpler than &lt;strong&gt;Terraform&lt;/strong&gt; in &lt;strong&gt;F#&lt;/strong&gt; using &lt;strong&gt;computational expressions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight fsharp"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;rg&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
    &lt;span class="n"&gt;resourceGroup&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;                   &lt;span class="s2"&gt;"rg-pulumi"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;sa&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
    &lt;span class="n"&gt;storageAccount&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;                   &lt;span class="s2"&gt;"sa"&lt;/span&gt;
        &lt;span class="n"&gt;resourceGroup&lt;/span&gt;          &lt;span class="n"&gt;rg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Name&lt;/span&gt;
        &lt;span class="n"&gt;accountReplicationType&lt;/span&gt; &lt;span class="nn"&gt;SkuName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Standard_LRS&lt;/span&gt;
        &lt;span class="n"&gt;accountTier&lt;/span&gt;            &lt;span class="nn"&gt;Kind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StorageV2&lt;/span&gt;
        &lt;span class="n"&gt;enableHttpsTrafficOnly&lt;/span&gt; &lt;span class="bp"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Link to the GitHub repo &lt;a href="https://github.com/UnoSD/Pulumi.FSharp.Extensions" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bird's eye view and lines of code
&lt;/h3&gt;

&lt;p&gt;Purely looking at the core of the template, probably &lt;strong&gt;Pulumi&lt;/strong&gt; in &lt;strong&gt;C#&lt;/strong&gt; is the shorter one, but, being fair, you also have an external &lt;strong&gt;Pulumi.yaml&lt;/strong&gt; file with project and state configuration (which is included in my &lt;strong&gt;Terraform&lt;/strong&gt; example) and language-specific files such as project files (csproj), solution etc... &lt;strong&gt;Terraform&lt;/strong&gt; and &lt;strong&gt;Bicep&lt;/strong&gt; are also quite short. The &lt;strong&gt;ARM template&lt;/strong&gt; is the most verbose as anticipated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Other options
&lt;/h2&gt;

&lt;h3&gt;
  
  
  PSArm
&lt;/h3&gt;

&lt;p&gt;A new option that recently came up (announced a few days before this blog post) is: &lt;strong&gt;PSArm&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I have not had a chance to play with it and I will update this article later, but &lt;strong&gt;PSArm&lt;/strong&gt; seems to answer the prayers of many sysadmins fed up with learning new languages;&lt;/p&gt;

&lt;p&gt;It is a way of writing &lt;strong&gt;ARM templates&lt;/strong&gt; using idiomatic &lt;strong&gt;PowerShell&lt;/strong&gt; which is a familiar language to ops.&lt;/p&gt;

&lt;p&gt;It sounds quite appealing to those who want to reuse existing skills to embrace &lt;strong&gt;IaC&lt;/strong&gt; and &lt;strong&gt;PowerShell&lt;/strong&gt; is almost as flexible as a programming language which may help overcome some limitations of &lt;strong&gt;Terraform&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Farmer
&lt;/h3&gt;

&lt;p&gt;Honourable mention goes to &lt;strong&gt;Farmer&lt;/strong&gt; which lets you write nice-looking &lt;strong&gt;F#&lt;/strong&gt; that generates &lt;strong&gt;ARM templates&lt;/strong&gt;; I have not played with it much, but I will try to see if there is any advantage in using it over &lt;strong&gt;Pulumi&lt;/strong&gt; (and &lt;strong&gt;Pulumi.FSharp.Extensions&lt;/strong&gt; if you want it to look pretty). Bear in mind that, generating &lt;strong&gt;ARM&lt;/strong&gt; it means that it works only on &lt;strong&gt;Azure&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Azure CLI (Bash/PS) and PowerShell (Az module)
&lt;/h3&gt;

&lt;p&gt;In my opinion, a less honourable mention. Many infrastructure engineers adopted this method because they did not know any better (early in the days when &lt;strong&gt;ARM/Terraform/etc&lt;/strong&gt; were not so popular or did not exist at all), I personally see no benefits in using this approach nowadays as it moves a massive burden towards the engineer:&lt;/p&gt;

&lt;p&gt;You have to worry about dependencies&lt;br&gt;
You have to worry about error handling&lt;br&gt;
You have to make sure it is idempotent (and cmdlets not always are)&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;I would not recommend this option or the &lt;strong&gt;ARM templates route&lt;/strong&gt; as of today, I believe that there is no compelling reason to write &lt;strong&gt;IaC&lt;/strong&gt; in this way. Please let me know in the comments if you have a good use case and I will happily update the article including it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison table
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5k6jhs8aqclbqbs7rh8b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5k6jhs8aqclbqbs7rh8b.png" alt="Comparison table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a comparison table where I evaluate features for each tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Declarative
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxto2uqy9iokwjxarkcxl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxto2uqy9iokwjxarkcxl.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That is the difference between getting into a shop and asking for a "&lt;em&gt;chocolate cake with cream filling and 30 candles&lt;/em&gt;" and telling the baker: &lt;em&gt;OK, now shake the eggs, mix with sugar, add milk etc...&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Writing &lt;strong&gt;declarative&lt;/strong&gt; code means telling the system &lt;strong&gt;what&lt;/strong&gt; you want, not &lt;strong&gt;how&lt;/strong&gt; to do it. You lose (unnecessary) control in favour of a feature-rich simplicity. That results also in less verbosity.&lt;/p&gt;

&lt;p&gt;Most of the options are &lt;strong&gt;declarative&lt;/strong&gt;, you can define resources in whichever order you prefer and the tools will work out dependencies and parallelisation for you. They will also manage retries, error handling and so on without having to explicitly code for it.&lt;/p&gt;

&lt;p&gt;The only options that are imperative are &lt;strong&gt;Azure CLI&lt;/strong&gt; and &lt;strong&gt;PowerShell&lt;/strong&gt; (or directly using the &lt;strong&gt;REST API/SDK&lt;/strong&gt;) to create the resources. I would not recommend this to anyone. It is worth upskilling (if you are a sysadmin) to understand &lt;strong&gt;Pulumi&lt;/strong&gt; or &lt;strong&gt;Terraform&lt;/strong&gt; and avoid &lt;strong&gt;PowerShell&lt;/strong&gt; (or potentially try &lt;strong&gt;PSArm&lt;/strong&gt; or wait for &lt;strong&gt;Pulumi&lt;/strong&gt; to support &lt;strong&gt;PowerShell&lt;/strong&gt;)&lt;/p&gt;




&lt;h3&gt;
  
  
  Idempotency
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxl6icw0nz8iwqq4mmqxx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxl6icw0nz8iwqq4mmqxx.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All the options &lt;em&gt;can&lt;/em&gt; be &lt;strong&gt;idempotent&lt;/strong&gt;; &lt;strong&gt;idempotency&lt;/strong&gt; means that you can rerun the same deployment as many times as you like and, as long as the resources are unchanged, it will do nothing;&lt;/p&gt;

&lt;p&gt;if a resource drifted away from the configuration or does not exist, it will be picked up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Azure CLI&lt;/strong&gt; and &lt;strong&gt;PowerShell&lt;/strong&gt; are in yellow as you can still achieve this, but you have to code against it in certain cases; many &lt;strong&gt;cmdlets&lt;/strong&gt; and &lt;strong&gt;AzCLI&lt;/strong&gt; commands will be idempotent, but there is no guarantee.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fallback mechanism
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgor3vq8t8k9alt4vztc3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgor3vq8t8k9alt4vztc3.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both &lt;strong&gt;Terraform&lt;/strong&gt; and &lt;strong&gt;Pulumi&lt;/strong&gt; can include &lt;strong&gt;ARM templates&lt;/strong&gt; in their code;&lt;/p&gt;

&lt;p&gt;if a resource is not supported yet, you can temporarily use an &lt;strong&gt;ARM template&lt;/strong&gt; and then update it later when the provider gets updated. &lt;strong&gt;Pulumi&lt;/strong&gt; is unlikely to be out of date as it auto-generates from the &lt;strong&gt;Azure REST API&lt;/strong&gt;; the folks there just need to kick off another build and in a matter of minutes a new &lt;strong&gt;Pulumi&lt;/strong&gt; library is ready with the new &lt;strong&gt;Azure&lt;/strong&gt; resources supported.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARM/Bicep&lt;/strong&gt; are updated immediately, &lt;strong&gt;AzCLI/PS&lt;/strong&gt; almost immediately. I have never seen a resource available only in the &lt;strong&gt;REST API&lt;/strong&gt; and not in all those.&lt;/p&gt;

&lt;h3&gt;
  
  
  Modularisation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs5g8u99ijk61wd1sy6o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs5g8u99ijk61wd1sy6o.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARM&lt;/strong&gt; has an awful way of modularising templates, &lt;strong&gt;Bicep&lt;/strong&gt; improves that significantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Terraform&lt;/strong&gt; works nicely with modules and also supports modules directly from a &lt;strong&gt;Git&lt;/strong&gt; repository and &lt;strong&gt;Pulumi&lt;/strong&gt; is as good as the language you pick (which is very good for all the languages); you can use &lt;strong&gt;NuGet&lt;/strong&gt; packages in &lt;strong&gt;.NET&lt;/strong&gt;, &lt;strong&gt;npm&lt;/strong&gt; packages with &lt;strong&gt;TypeScript&lt;/strong&gt;, I presume &lt;strong&gt;pip&lt;/strong&gt; with &lt;strong&gt;Python&lt;/strong&gt; and I am sure that also applies to &lt;strong&gt;Go&lt;/strong&gt; with its own package system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Legacy deployments
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzp2imc0jf804x7pqyp5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzp2imc0jf804x7pqyp5.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARM&lt;/strong&gt; is not that flexible, but it won't matter at all as most of the time it will not need to use any legacy code; I consider &lt;strong&gt;ARM&lt;/strong&gt; itself the "legacy".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bicep&lt;/strong&gt; has an automated tool to convert from &lt;strong&gt;ARM&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;From &lt;strong&gt;Terraform&lt;/strong&gt; you can invoke commands locally (including &lt;strong&gt;PowerShell/Bash&lt;/strong&gt; scripts)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pulumi&lt;/strong&gt;, again, can do whatever &lt;strong&gt;TS/C#/Python/Go&lt;/strong&gt; can do; which is pretty much everything your computer can do, including invoking &lt;strong&gt;Terraform&lt;/strong&gt;, &lt;strong&gt;REST APIs&lt;/strong&gt;, buy a &lt;strong&gt;pizza&lt;/strong&gt; on every deployment using your favourite pizza place APIs, feed your cat in your smart home or play a fanfare when a resource is created...&lt;/p&gt;

&lt;p&gt;Also worth noting that &lt;strong&gt;Pulumi&lt;/strong&gt; has &lt;strong&gt;tf2pulumi&lt;/strong&gt;, a tool that converts your &lt;strong&gt;Terraform&lt;/strong&gt; to &lt;strong&gt;Pulumi&lt;/strong&gt; in your chosen flavour of language, worth also noting that I got an exception the first time I tried using it; I will insist before judging it too harshly, but it did not seem mature enough at the time I tried it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supportability
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumg3fz7nj3lexenl371b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumg3fz7nj3lexenl371b.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARM/Bicep&lt;/strong&gt; are supported by &lt;strong&gt;Microsoft&lt;/strong&gt;, not much else to say there, it is a massive plus.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Terraform&lt;/strong&gt; and &lt;strong&gt;Pulumi&lt;/strong&gt; by the community, but you can get a &lt;strong&gt;paid&lt;/strong&gt; support plans if you use their storage; although, it may still mean that you are covered for the storage, but if the tooling has a bug you may have to wait in line like any other mere mortal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AzCLI/PS&lt;/strong&gt; you get the obvious support for the tools, but, if your custom code goes wrong, you're on your own and it will be mostly your custom code that will fail as there is so much more to write to achieve what the tools above achieve naturally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error handling/Plan/Clean up
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfko4h955wre4h2cnwil.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfko4h955wre4h2cnwil.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is all managed for your by all the tools, except &lt;strong&gt;CLI/PS&lt;/strong&gt; where you have to look after this yourself; write conditional code and specify retries and what to do if it all goes bad.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I will soon publish a repository on GitHub with working examples in each language&lt;/strong&gt;&lt;/p&gt;

&lt;h6&gt;
  
  
  This is a live article, I will try and keep it up to date with the new development and to complete the missing bits, if you want to suggest a change, please submit a pull request to &lt;a href="https://github.com/UnoSD/Blog/blob/master/Infrastructure%20as%20code%20in%202021/article.md" rel="noopener noreferrer"&gt;this&lt;/a&gt; repository.
&lt;/h6&gt;

&lt;h6&gt;
  
  
  Cover image: "A visual representation of the DevOps workflow" by &lt;a href="https://commons.wikimedia.org/wiki/User:Kharnagy" rel="noopener noreferrer"&gt;Kharnagy&lt;/a&gt; (edited) is licensed under CC BY-SA 4.0.
&lt;/h6&gt;

</description>
      <category>azure</category>
      <category>iac</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
    <item>
      <title>High availability for Event Hubs processors</title>
      <dc:creator>Stefano d'Antonio</dc:creator>
      <pubDate>Tue, 10 Nov 2020 16:23:04 +0000</pubDate>
      <link>https://dev.to/unosd/high-availability-for-event-hubs-processors-3da4</link>
      <guid>https://dev.to/unosd/high-availability-for-event-hubs-processors-3da4</guid>
      <description>&lt;p&gt;&lt;strong&gt;Azure&lt;/strong&gt; &lt;strong&gt;Event Hubs&lt;/strong&gt; with &lt;strong&gt;Stream Analytics&lt;/strong&gt; is a powerful combination for quasi real time high throughput data processing.&lt;/p&gt;

&lt;p&gt;It’s a great solution if you want fast reporting on live data and save your architecture from extra complexity.&lt;/p&gt;

&lt;p&gt;Use case: Your service receives real time high volume user data I.E. website tracking or application telemetry or IoT.&lt;/p&gt;

&lt;p&gt;The traditional process for generating reports would involve slow and painful steps such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sensors/source sending data to a queue&lt;/li&gt;
&lt;li&gt;Service A constantly fetching messages from the queue&lt;/li&gt;
&lt;li&gt;Service A storing data&lt;/li&gt;
&lt;li&gt;Service B querying the data persistence layer&lt;/li&gt;
&lt;li&gt;Service B processing/aggregating the data&lt;/li&gt;
&lt;li&gt;Service B storing the result of the processor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And, of course, the process involves development resources for the implementation and several machine resources.&lt;/p&gt;

&lt;h2&gt;&lt;strong&gt;The solution: Event Hub, Stream Analytics, Service Bus&lt;/strong&gt;&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--obpEVnkJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/1.png" class="article-body-image-wrapper"&gt;&lt;img class=" size-full wp-image-38 aligncenter" src="https://res.cloudinary.com/practicaldev/image/fetch/s--obpEVnkJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/1.png" alt="1" width="124" height="780"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This can be simplified by letting &lt;strong&gt;Azure&lt;/strong&gt; take care of most of the steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sensors sending data to &lt;strong&gt;Event Hub&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream Analytics&lt;/strong&gt; aggregating/processing the data in time windows and storing the result or handing it over to a service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In diagram the result is sent to a &lt;strong&gt;Service Bus queue&lt;/strong&gt; to be stored/processed by a service later, but &lt;strong&gt;Stream Analytics&lt;/strong&gt; has the capability of storing directly to many data layers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event Hub&lt;/strong&gt; is a chronological stream storage, there is no concept of message locking or deletion and stores data in partitions allowing parallel access for multiple consumers. We don’t have to worry about removing messages once processed as our progress can be stored into &lt;strong&gt;checkpoints&lt;/strong&gt; which record the point in time where we stopped processing to resume later.&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Data is safely stored there and resilient and it will only be deleted after it passed the expiration set in the retention option, no manual/accidental deletion.&lt;/p&gt;

&lt;p&gt;The architecture described in the diagram is the cheapest and simplest but it doesn’t take into account &lt;strong&gt;high availability&lt;/strong&gt;; if the &lt;strong&gt;Azure Event Hub&lt;/strong&gt; in our region has an outage we will lose all the messages during that period.&lt;/p&gt;

&lt;p&gt;We can setup a failover stream in a different region to accept messages in case the primary hub is unavailable:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ViGHJHTQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/2.png" class="article-body-image-wrapper"&gt;&lt;img class=" size-full wp-image-39 aligncenter" src="https://res.cloudinary.com/practicaldev/image/fetch/s--ViGHJHTQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/2.png" alt="2" width="523" height="708"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this example our web application has to implement a failover logic to try and send messages to the primary and failover to the secondary during downtimes (I.E. in case of exceptions on sends).&lt;/p&gt;

&lt;p&gt;We now have a good solution to prevent data loss from the web application side as the event will find its way in one or another hub and, assuming we set up the secondary in a different &lt;strong&gt;Azure&lt;/strong&gt; region, we achieved geo-redundancy and we also qualify for the &lt;strong&gt;Azure&lt;/strong&gt; 99.9% SLA making our messages resilient to alien invasions (I do not take responsibility if they target your two &lt;strong&gt;Azure&lt;/strong&gt; clusters specifically).&lt;/p&gt;

&lt;p&gt;Is it over? Are we happy our customers will never contact our support again? Of course not...&lt;/p&gt;

&lt;p&gt;What happens then if the &lt;strong&gt;Stream Analytics&lt;/strong&gt; service has an outage? Messages will be safe and sound but their journey will be delayed for the duration of the outage.&lt;/p&gt;

&lt;p&gt;It is true that looking at the history of outages there has never been one involving &lt;strong&gt;Stream Analyitics&lt;/strong&gt;, but we cannot rely on hope when it comes to the danger of pissing off customers.&lt;/p&gt;

&lt;p&gt;Let’s try and solve this, too. On a Microsoft blog, the recommended solution is similar to the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dYU2PL4a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/3.png" class="article-body-image-wrapper"&gt;&lt;img class=" size-full wp-image-40 aligncenter" src="https://res.cloudinary.com/practicaldev/image/fetch/s--dYU2PL4a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/3.png" alt="3" width="566" height="658"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we have a geo-redundancy of all the resources, but what happens when one &lt;strong&gt;Stream Analytics&lt;/strong&gt; only has an outage? It’s unlikely that the whole region will be down, and we have no guarantee the services will be in the same fault/update domain. It might only happen in case a bomb lands on the whole data centre set, but in that eventuality we have bigger fish to fries so let's go back to it later.&lt;/p&gt;

&lt;p&gt;If only &lt;strong&gt;Stream Analytics&lt;/strong&gt; is down, the primary &lt;strong&gt;Event Hub&lt;/strong&gt; and our web application will be unaware and will continue to store messages in the stream that will not be processed down in the chain; we will be unaware until we hear customer shouting.&lt;/p&gt;

&lt;p&gt;We need to make sure a &lt;strong&gt;Stream Analytics&lt;/strong&gt; works as failover and we could do so by checking the status of the service before sending messages... but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There is no public &lt;strong&gt;Azure&lt;/strong&gt; API to check the status of the service&lt;/li&gt;
&lt;li&gt;Our &lt;strong&gt;high throughput and performance critical&lt;/strong&gt; application receiving millions of requests would have to delay the completion to make a new check request to a service on every call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, even if we had the API, it would not be the best way to go. Maybe a separate background service checking the status, but we still risk of losing messages in the delay for our service to give us the status after the real outage occurred.&lt;/p&gt;

&lt;p&gt;So let's try something else:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EF_wl1AO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/x.png" class="article-body-image-wrapper"&gt;&lt;img class=" size-full wp-image-42 aligncenter" src="https://res.cloudinary.com/practicaldev/image/fetch/s--EF_wl1AO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/x.png" alt="X.png" width="514" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But now, both the SA will process the same messages and generate an output which is likely to be similar, but extremely unlikely to be exactly the same (Resources contention, time windows unaligned, et cetera).&lt;/p&gt;

&lt;p&gt;This is still a good solution if you do not care about having potential duplicates in the data.&lt;/p&gt;

&lt;p&gt;But if this is not your case, how do we know what’s in the primary and what’s in the secondary? How do we avoid duplication?&lt;/p&gt;

&lt;p&gt;At the time I’m writing this there is no solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event Hubs&lt;/strong&gt; works with checkpoints and &lt;strong&gt;Stream Analytics&lt;/strong&gt; stores its checkpoint somewere and a good solution would be to share checkpoints across the two &lt;strong&gt;Stream Analytics &lt;/strong&gt;as they will work in synergy and never overlap each other; but &lt;strong&gt;Azure&lt;/strong&gt; does not support this feature at the moment.&lt;/p&gt;

&lt;p&gt;So we go back to this original solution if we care about the precision of our data and we have to hope the weak ring (&lt;strong&gt;Stream Analytics&lt;/strong&gt;) will not fail (which is unlikely as it is also based on &lt;strong&gt;Service Fabric&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I will stress the fact that we never lose data in this configuration so we might only delay the processing for the duration of the outage:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ViGHJHTQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/2.png" class="article-body-image-wrapper"&gt;&lt;img class=" size-full wp-image-39 aligncenter" src="https://res.cloudinary.com/practicaldev/image/fetch/s--ViGHJHTQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/2.png" alt="2" width="523" height="708"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But yet we have another problem… What happens when the output &lt;strong&gt;Service Bus&lt;/strong&gt; is down? I verified empirically that &lt;strong&gt;Stream Analytics&lt;/strong&gt; has an internal cache for messages, but there is no guarantee the message will land eventually and the retention can be days, minutes or seconds (source:Microsoft). So we can send message identical copies to two geo-redundant &lt;strong&gt;Service Bus&lt;/strong&gt; queues to prevent loss.&lt;/p&gt;

&lt;p&gt;We would also like our services not to process twice the same message though, so what can we do?&lt;/p&gt;

&lt;p&gt;My first idea was to add a unique identifier to the messages so our service can cache it and make sure it doesn’t process it twice (in a distributed cache/database if we want to make the service scalable over multiple instances). Great, so let’s ask &lt;strong&gt;Stream Analytics&lt;/strong&gt; to generate a &lt;strong&gt;GUID&lt;/strong&gt; for us and attach it to the two messages… failed. &lt;strong&gt;ASA&lt;/strong&gt; cannot generate &lt;strong&gt;GUIDs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;OK, not a problem, I will just get the current date and time which should be unique enough if we have time windows… failed. &lt;strong&gt;ASA&lt;/strong&gt; doesn’t have a &lt;strong&gt;GETDATE()&lt;/strong&gt; or any time function so we need to rely on some data in our message to generate a sort of “hash” or unique identifier for the message.&lt;/p&gt;

&lt;p&gt;I choose to use the combination of the first event date and the last event date so it could precisely define our window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--P1Q3kOQg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/4.png" class="article-body-image-wrapper"&gt;&lt;img class=" size-full wp-image-41 aligncenter" src="https://res.cloudinary.com/practicaldev/image/fetch/s--P1Q3kOQg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://rocketdotsciencedotblog.files.wordpress.com/2016/12/4.png" alt="4" width="529" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now our architecture is as highly available as possible (still with a chance of delaying the results in case of a &lt;strong&gt;ASA&lt;/strong&gt; outage) so we should be able to sleep reasonably well knowing our service is (almost) always available (bombs/aliens apart).&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Migrating my blog from WordPress.com</title>
      <dc:creator>Stefano d'Antonio</dc:creator>
      <pubDate>Tue, 10 Nov 2020 16:20:29 +0000</pubDate>
      <link>https://dev.to/unosd/migrating-my-blog-from-wordpress-269d</link>
      <guid>https://dev.to/unosd/migrating-my-blog-from-wordpress-269d</guid>
      <description>&lt;p&gt;Moving my old &lt;a href="https://rocket.science.blog/"&gt;https://rocket.science.blog/&lt;/a&gt; here, so I can carry on ghosting a different blog.&lt;/p&gt;

&lt;p&gt;I'll copy the only blog post I have in the hope that new others will follow.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
